VMware ESXi performance monitoring with Cacti

In VMware ESXi, the SNMP query interface is not included. If you want to monitor your VMware hosts, you either have to buy ESX, or you have to wrap your solution around the VMware CLI tools.

Overview

I felt that buying ESX licenses for just the monitoring of the machine was overkill. Nevertheless, I want to know what's going on on my servers. So I created a solution based purely on VMware's free tools, open source software and some glue scripts.

Currently, I monitor these performace values of my VMware ESXi hosts:

  • CPU Load
  • Memory utilization
  • I/O latency
  • Network utilization

VMware ESXi monitoring in Cacti

Additionally, I monitor the virtual machines themselves via SNMP. But this is out of scope of this blog post.

Here's a high level overview of what I did:

I will not go into the details of the installation of RCLI or the op5 script, as they are documented in detail in various places on the net.

Writing a wrapper for cacti

Cacti expects input from external scripts is a slightly different way than Nagios. It wants all values of one data source in a single line. For example the output I generate for I/O stats looks like this:

read:0 write:0 device:0 queue:0 kernel:0

In order to get this output, I wrote a very small wrapper perl script that looks like this:

#!/usr/bin/perl

$response = `/usr/local/bin/check_esx3.nagios -H $ARGV[0] -f /etc/vmware-rcli/authfile -l io`;
chomp $response;
($read) = ($response =~ /io_read=(\d+)ms/);
($write) = ($response =~ /io_write=(\d+)ms/);
($device) = ($response =~ /io_device=(\d+)ms/);
($queue) = ($response =~ /io_queue=(\d+)ms/);
($kernel) = ($response =~ /io_kernel=(\d+)ms/);

print "read:$read write:$write device:$device queue:$queue kernel:$kernel\n";

For all you perl hackers out there: Yes, there's definitely room for improvement. ;-) The scripts expects exactly one parameter on the command line, and that is IP address or hostname of the ESXi host. Furthermore, it expects an RCLI-compatible authfile in /etc/vmware-rcli/authfile. A similar script can be created for CPU, RAM and network utilization.

Generating a Cacti template

The configuration of Cacti for a moderately complex data source is a topic of its own. I will describe it in a separate blog post in the near future. For now, you'll have to cope with an outline of the required steps here and figure out the details for yourself.

  • Configure a data input method for each script
  • Add a data template for storing the output of each script in a RRD database
  • Add a graph template for displaying each RRD database
  • Add a host template containing all graph templates
  • Add hosts for each ESXi server with this host template