Tuesday, February 5, 2019

7 tools that monitor your servers and web applications

Your web application is a distributed software system that runs 24/7 on a fleets of servers. There are many functional components such as DNS server, load balancer, router, web server, authentication service, provisioning service, redex server, database server etc. Constantly they are generating data such as request count, response time, cpu load, ram usage, disk usage, web container thread pool size, application thread pool size, database connection pool size, garbage collection time, etc. For each data source, you can get static data such as average, sum, min, max, perc99, perc95, etc. You can then sort them by site, hostname, IP, date, tag, source etc. Sometimes, there are correlations between data types. For example high request count often correlates to high cpu load, high ram, high thread-pool size and longer database response time, etc.

The following 7 tools will help you to collect the data from your application and organize them in smart ways, so that your machines are under control.

1. Event queues

Your web application is a group of applications. Each application is deployed on a fleet of servers behind load balancer, and each copy of the artifact deployed on each server is a multi-threading program. As you can tell, there are high chance for race conditions. We don't want our application meta data collection ruin the main functionality or create synchronize bottleneck. A thread-safe queue implementation can help a lot in this situation. It decouples producer and consumer, so that the log generated from one thread in one VM won't mess up with another thread in the same VM or another VM. For example Kafka and RabbitMQ are two of this kind.  Your application can send events as object to a topic/stream of the queue. These topics, after processing, can be sent to the API of consumers such as another app, database, zabbix, splunk, tableau etc.

2. Data analytic tools

Your Kafka stream can be sent to data analytic tools such as splunk. Once the data get there, they are stored in the way fast retrieving and sorting are possible. Splunk can analyze huge amount of data, use them to create statistics, time charts and dashboards. Splunk also allows events to trigger alerts. The alert threshold criteria can range from count larger than a number to time longer than a value. Once the alert is triggered, actions can be taken, for example, send message to slack channel, email, SMS, ticketing API.

3. Incident Response Platform

There are many ticketing platforms such as bmc remedy, salesforce, pagerduty. Pagerduty for example, allows you to install an app in splunk. One of the splunk alert action is send to a pagerduty service. Once the pagerduty service receives a call, it acts according to the escalation policy -- for example, call the primary contacts, if not get acknowledgement within 5 minutes, call the secondary contacts, so on and so forth, until someone takes action about the event-- could be annoying during night hours.

4. Host inventory tools

Your server warehouse need management. Hardware as well as VMs can be managed with tools such as zabbix. Once the zabbix agent is installed on an unix server, it starts to collect infrastructure informations such as cpu, memory, hdc io, hdc bw, sda io, sda bw, etc. It allows monitor, inventory and report about infrastructure hosts on the Zabbix server. You can also draw graph with your hardware resource usage history.

5. VM and cloud inventory tools

Virtual machines and cloud VM instances are special resources. They are special because they are elastic and volatile. Tools such as VMWare Vcenter/Vsphere, AWS, GCP, AZURE allows you to quickly create/destroy VMs, create cloud virtual company, reconfigure routing rule, setup ACL, configure scale up/down policy etc. Oracle has its own JVM debug tool which can be invoked with command JVisualVM.

6. Data warehouse mining and visualization tools

Tools such as tableau can connect to almost any data warehouse applications on the market: oracle, mysql, AWS Redshift, cubes, Teradata, cassandra, data lake, redis, microsoft SQL Server, mongodb, hyperion, etc. With the wealth of data already stored in the data warehouse, tableau can generate reports, statistics, graphs across different data sources and give the user power to further analyze them.

7. Network inventory tool

Your network resources such as IPs, Nodes, Servers need to be organized and grouped. There are tools such as NGINX, BIG IP to help you out. At anytime, you are able to inventory your network resource by locations, OS, liveness, cells etc. You are also able to take a group of servers out of service or put them into service.

2 comments:

Why I stopped publishing blog posts as information provider

Now the AI can generate content. Does that mean the web publishing industry reaches the end? ChatGPT said: ChatGPT Not at all. While AI can ...