From left to right, we want to focus on the very first chart. We have the bars representing the CPU as average in green and 95th percentile in blue on top. It goes from 0 to 100% and is normalized, meaning that even with 8 CPU cores, it will still read 100% usage and not 800%. The line graph represents the transaction duration, the average being in red, and the 95th percentile in purple. Last, we have the orange area at the bottom, which is the average memory usage on that host.
We immediately realize that our calculator does not need a lot of memory. Hovering over the graph reveals 2.89% memory usage. The e2-standard-8 machine that we are using has 32 GB of memory. We occasionally spike to 100% CPU in the 95th percentile. When this happens, we see that the average transaction duration spikes to 2.5 milliseconds. However, every hour this machine costs us a rounded 30 cents. Using this information, we can now downsize to a better fit. The average CPU usage is around 11-13%, and the 95th percentile is not that far away.
Because we are using 8 CPUs, one could now say that 12.5% represents a full core, but that is just an assumption on a piece of paper. Nonetheless, we know there is a lot of headroom, and we can downscale quite a bit. In this case, I decided to go to 2 CPUs and 2 GB of RAM, known as e2-highcpu2. This should fit my calculator application better. We barely touched the RAM, 2.89% out of 32GB are roughly 1GB of use. After the change and reboot of the calculator machine, I started the same Locust test to identify my CPU usage and, more importantly, if my transactions get slower, and if so, by how much. Ultimately, I want to decide whether 1 millisecond more latency is worth 10 more cents per hour. I added the change as an annotation in Lens.
After letting it run for a bit, we can now identify the smaller hosts’ impact. In this case, we can see that the average did not change. However, the 95th percentile — as in 95% of all transactions are below this value — did spike up. Again, it looks bad at first, but checking in, it went from ~1.5 milliseconds to ~2.10 milliseconds, a ~0.6 millisecond increase. Now, you can decide whether that 0.6 millisecond increase is worth paying ~180$ more per month or if the current latency is good enough.
Leave a Reply