Friday, February 15, 2019

7 techniques that help system administrator to find bottleneck

Your web application runs slow and your clients are complaining about long response time.

You need to figure out what's the bottleneck of your program.

  • is it cpu bound?
  • is it memory bound?
  • is it disk space bound?
  • is it network io bound?
Here are 7 techniques to help you find out.

1. shell command "uptime"


demo>uptime
18:30  up 24 days, 12:18, 3 users, load averages: 1.66 2.31 7.74


in the output, the most important information is the load averages: it is the 1 minute, 5 minute and 15 minute load average respectively. This number need to be make sense together with your cpu number.

You can find your cpu number with command:
grep -c '^processor' /proc/cpuinfo

For example, you 15 minutes load average is 7.74 and you have 2 cores. Then each core have roughly 4 processes running on it. This is far overloaded. However, if you have 8 cores, each process get one core, your cpu is running at the sweet spot. If you have 16 cores, there are about 8 cores idle, you didn't utilize your cpu resources cost-effectively.

the load average can also be found from command
top

2. shell command "free -m" 

free -m total used free shared buff/cache availableMem: 1695 517 249 0 927 1010Swap: 0 0 0
the output gives you an estimation of the memory usage in M. In the above example, you have 1010M RAM available, which is 2 times more than the amount used. Your server is not memory bound.

Again, this information can be found from the command
top

3. shell command "df -h"
df -hFilesystem Size Used Avail Use% Mounted onoverlay 36G 25G 11G 71% /tmpfs 64M 0 64M 0% /devtmpfs 848M 0 848M 0% /sys/fs/cgroup/dev/sdb1 4.8G 11M 4.6G 1% /home/dev/sda1 36G 25G 11G 71% /rootoverlayfs 1.0M 124K 900K 13% /etc/ssh/keystmpfs 848M 736K 847M 1% /run/metricsshm 64M 0 64M 0% /dev/shmoverlayfs 1.0M 124K 900K 13% /etc/ssh/ssh_host_rsa_keytmpfs 848M 0 848M 0% /run/google/devshell
The output of df let you know your disk space usage. 
In the above example, the first row tells you this: you have a 36G storage from a file system named "overlay", it is mounted on dirctory /. 71% of the disk space on that file system has been used. If you want to clean up some space there, you can 
cd /
then find the largest files occupying the space then decide which ones you want to rm.
du -h . | sort -nr | head

Similarly, the 5th row tells you: you have another 36G storage from a file system named /dev/sda1, it is mounted on directory /root. 71% of the space on that file system has been used. If you want to clean up some space there, you can 

cd /root
then find the largest files occupying the space then decide which ones you want to rm.
du -h . | sort -nr | head

4. shell command "sar -u" and "iostat -x 3"
these commands will gives you historical/realtime information about cpu/memory/disk usage, which further helps you look into hardware resource related bottleneck.

For example, if your web application is running in a VM, that VM might be hosted in a hypervisor. You have to consider the hypervisor when you analyze performance. Your hypervisor could be overloaded if the administrator put tons of VMs on one machine, or that guy simply put the hypervisor in power save mode to save bucks. You can find the tell-tell sign such as abnormal %iowait in "sar -u" output. Once you found something smelly, you can have the VM administrators to check the VM stats such as READY and Co-Stop for signs of overloading.

5. shell command "ps -ef | grep java"
the output could gives you the resource usage of your java web application, or even better, the JVM parameters of your java program. Look for parameter such as
  • -server         optimized for server application
  • -Xms           minimum heap size
  • -Xmx           maximum heap size
  • -XX:NewRatio                     ratio of young and old generation section
  • -XX:+UseG1GC                  garbage collection strategy
  • -XX:-DisableExplicitGC     prevent stop of the world full garbage collection
6. check the web container resource usage
For example, you might have tomcat as your web container. Your web container's thread pool size might be too large or too small. The web container could either wasted too much memory with a over-sized thread pool, or have the requests queued up with a too small thread pool.

7. check the web application resource usage
Have your application log the performance matrix such gc stats, cpu stats, thread pool size, database connection pool size, queue size, etc. then seek optimization opportunities. 
For example:
  • thread pool size should be decided according to the requests per second and the response time. For example, your host gets 10 requests per second, each requests need average 2 seconds to serve. You should have at least 20 threads if you don't want arriving requests to pile up waiting for service. It is a rule of thumb, 2 threads will be too few, 200 threads will be too many. You benchmark and test in order to find the number between 2 and 200.
  • if your gc happened too often or took too long, you need a bigger heap size or change the garbage collecting strategy.
  • if your threads spend most of time waiting for IO, increasing thread pool size or increasing heap size won't help. 
  • if your database query took too long, you either need a bigger database pool or explore more parallelization to fetch from database earlier, or you need to cache some contents local.
  • if your threads spends most of time idle, you want to shrink the thread pool size or increase supply. For example, giving a bigger job queue, so that there are always job available to fetch even though the requests arriving rates fluctuate. Notice an over-sized job queue could boost performance on a single host but harm overall performance in a host farm. Smaller job queue encourage more fair traffic distribution among the hosts, while larger job queue could cause one host to be overloaded, by chance, when there are spikes in the traffic.
  • if your threads spent time do task in serial, such as request another network service one by one, then find an opportunity for parallelization. Create a dedicated thread pool just to change that serial computation into a parallel one. 
  • if two computation steps don't have to be coupled, decouple them with a concurrent queue, then have a producer thread pool to feed the queue then have a consumer thread pool to consume the queue.

No comments:

Post a Comment

meta.ai impression

Meta.ai is released by meta yesterday, it is super fast you can generate image while typing! You can ask meta.ai to draw a cat with curvy fu...