Friday, May 8, 2020

Application performance management (APM) tools

APM tools in a nut shell is to expose the machine internals to human to analyze.
There are two sides of the story:
we want as much information as possible, so the tools need to cover many layers of abstraction.
we want information instead of machine 1s and 0s, so the presentation is important.

At system level, we need to know the user experience, the overall performance matrix such as over all response time, throughput, binding rate, cache rate, error rate etc. We want to group them by ip ranges, endpoints, service groups, products etc. The Network Operation Center needs to get alerts for emergency events. We want to make inventory for host clusters, build network topology graph, build service dependency map, so that we can write report, locate hotspots, detect bottlenecks etc. Many tools work at this layer, for example, new relic, SFx, consul, istio, splunk.

At service level, we would like to get information about host system metrics: cpu, memory, process, http transactions, throughput, request rate, request latency, request latency distribution, error rates, JVMs, databases etc. Most API tools cover this layer, however some APM tools only focus on this layer, for example zabbix.

At application level, we concern about things like thread pool, database connection pool, transactions (GET/PUT...), application exception, error, warning, info, trace, business matrix for example sell amount, customer retention rate, customer inquiry times. Example APM tools working on this layer include splunk, new relic, SFx, kafka, tableau.


No comments:

Post a Comment

Why I stopped publishing blog posts as information provider

Now the AI can generate content. Does that mean the web publishing industry reaches the end? ChatGPT said: ChatGPT Not at all. While AI can ...