I use Cacti to monitor a lot of Dell servers, primarily 1850s and 2850s but also the newer models of same (1950s and 2950s). One itch that I’ve meant to scratch for a while is graphing some of the information available through the servers’ IPMI interface; specifically the servers’ various temperatures and and fan speeds.
IPMI Details
There are patches available for the Linux kernel to allow the IPMI information to be read via the lm_sensors project but I chose to avoid this (at least for now) as I’d have to schedule downtime to reboot the servers for a new kernel. It’d also ruin their uptime – most of the servers (serving many thousands of users daily) have almost two years of uptime. (The kernels are monolithic.)
Instead, I went with the already compiled in Linux IPMI Driver (see kernel source: Documentation/IPMI.txt) which is available in the ‘Character Devices’ menu. I specifically needed the following options for the Dells:
-
drivers/char/ipmi/ipmi_msghandler
-
drivers/char/ipmi/ipmi_devintf
-
drivers/char/ipmi/ipmi_si
In order to read information from the IPMI, you need the ipmitool
utility which is available on most recent Linux distributions or from here.
Lastly, I needed to create a character special file to interface with the IPMI:
mknod /dev/ipmi0 c 254 0
The sensor information was then available via:
# ipmitool sensor
Temp | 30.000 | degrees C | ok | na | na | na | 85.000 | 90.000 | na
Temp | 34.000 | degrees C | ok | na | na | na | 85.000 | 90.000 | na
Ambient Temp | 16.000 | degrees C | ok | na | 3.000 | 8.000 | 42.000 | 47.000 | na
...
Making IPMI Sensor Information Available via SNMP
I make the IPMI sensor information available over SNMP by adding the following to the snmpd.conf
file:
# Monitor IPMI Temperature and Fan stats
exec .1.3.6.1.4.1.X.1000 ipmitemp /usr/local/sbin/ipmi-temp-stats
exec .1.3.6.1.4.1.X.1001 ipmifan /usr/local/sbin/ipmi-fan-stats
(Replace X above as appropriate.)
The scripts referenced are: /usr/local/sbin/ipmi-temp-stats
:
#! /bin/sh
PATH=/usr/bin:/bin
STATS=/tmp/ipmisensor-snmp
printf "%f\n" `cat $STATS | grep Temp | cut -s -d "|" -f 2`
And /usr/local/sbin/ipmi-fan-stats
:
#! /bin/sh
PATH=/usr/bin:/bin
STATS=/tmp/ipmisensor-snmp
printf "%f\n" `cat $STATS | grep FAN | cut -s -d "|" -f 2`
The file they reference is generated every 5mins (Cacti polling interval) via a cron entry in the file /etc/cron.d/ipmitool
:
*/5 * * * * root /usr/bin/ipmitool sensor >/tmp/ipmisensor-snmp
After restarting SNMP and allowing the cron job to execute at least once, you can test the results via:
# snmpwalk -c <community> -v <version> <ip/hostname> .1.3.6.1.4.1.X.1000
SNMPv2-SMI::enterprises.X.1000.1.1 = INTEGER: 1
SNMPv2-SMI::enterprises.X.1000.2.1 = STRING: "ipmitemp"
SNMPv2-SMI::enterprises.X.1000.3.1 = STRING: "/usr/local/sbin/ipmi-temp-stats"
SNMPv2-SMI::enterprises.X.1000.100.1 = INTEGER: 0
SNMPv2-SMI::enterprises.X.1000.101.1 = STRING: "37.000000"
SNMPv2-SMI::enterprises.X.1000.101.2 = STRING: "39.000000"
SNMPv2-SMI::enterprises.X.1000.101.3 = STRING: "23.000000"
SNMPv2-SMI::enterprises.X.1000.101.4 = STRING: "36.000000"
...
SNMPv2-SMI::enterprises.X.1000.102.1 = INTEGER: 0
SNMPv2-SMI::enterprises.X.1000.103.1 = ""
Graphing This Information in Cacti
Finally, I graph this information on Cacti (see end of post for examples).
I am making six templates available here which can be imported into Cacti (these were generated using version 0.8.6j) for graphing the above:
- Cacti graph template for Dell 1850 temperatures (see first image below);
- Cacti graph template for Dell 2850 temperatures (see second image below);
- Cacti graph template for Dell 1850 fan speeds (see third image below);
- Cacti graph template for Dell 2850 fan speeds (see fourth image below);
- Cacti host template for Dell 1850; and
- Cacti host template for Dell 2850.
The last two templates available are host templates for Dell 1850s and 2850s (I’m sure they’ll work fine with 1950s and 2950s also). These templates include:
- Host MIB – Logged in Users;
- Host MIB – Processes;
- IPMI Fan Speeds (Dell x850) (from above);
- IPMI Temperatures (Cel) (Dell x850) (from above);
- ucd/net – CPU Usage;
- ucd/net – Load Average;
- ucd/net – Memory Usage;
- SNMP – Get Mounted Partitions (data query); and
- SNMP – Interface Statistics (data query).
Example graphs are shown below; they’re not the cleanest given the amount of information they contain but they serve my purposes.
© 2007 Barry O’Donovan. All text is licensed under a Creative Commons Attribution 3.0 License. All scripts and Cacti templates are licensed under the MIT License.