Enabling External Commands in Nagios / Ubuntu

I get caught by the following quite often (too many Nagios installations!):

Error: Could not stat() command file ‘/var/lib/nagios3/rw/nagios.cmd’!

The external command file may be missing, Nagios may not be running, and/or Nagios may not be checking external commands. An error occurred while attempting to commit your command for processing.

The correct way to fix this in Ubuntu is:

service nagios3 stop
dpkg-statoverride --update --add nagios www-data 2710 /var/lib/nagios3/rw
dpkg-statoverride --update --add nagios nagios 751 /var/lib/nagios3
service nagios3 start

Translating SNMP OIDs Using MIB Files

I get caught trying to remember this a lot and there’s a really useful tutorial on this at the Net-SNMP website: Using and loading MIBS.

If you’re using Ubuntu, also consider checking the comments in /etc/snmp/snmp.conf which (in 13.04) contains:

As the snmp packages come without MIB files due to license reasons, loading of MIBs is disabled by default. If you added the MIBs you can reenable loading them by commenting out the following line.

Also, run the following:

apt-get install snmp-mibs-downloader

which will download some basic MIBs as part of the installation.

Nagios / Icinga Alerts via Pushover

I came across Pushover recently which makes it easy to send real-time notifications to your Android and iOS devices. And easy it is. It also allows you to set up applications with logos so that you can have multiple Nagios installations shunting alerts to you via Pushover with each one easily identifiable. After just a day playing with this, it’s much nicer than SMS’.

So, to set up Pushover with Nagios, first register for a free Pushover account. Then create a new application for your Nagios instance. I set the type to Script and also upload a logo. After this, you will be armed with two crucial pieces of information: your application API tokan/key ($APP_KEY) and your user key ($USER_KEY).

To get the notification script, clone this GitHub repository or just down this file – notify-by-pushover.php.

You can test this immediately with:

echo "Test message" | \
    ./notify-by-pushover.php HOST $APP_KEY $USER_KEY RECOVERY OK

The parameters are:

USAGE: notify-by-pushover.php  <$APP_KEY> \
    <$USER_KEY> <NOTIFICATIONTYPE>

Now, set up the new notifications in Nagios / Icinga:

# 'notify-by-pushover-service' command definition
define command{
    command_name notify-by-pushover-service
    command_line /usr/bin/printf "%b" "$NOTIFICATIONTYPE$: \
        $SERVICEDESC$@$HOSTNAME$: $SERVICESTATE$           \
        ($SERVICEOUTPUT$)" |                               \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        SERVICE $APP_KEY $CONTACTADDRESS1$                 \
        $NOTIFICATIONTYPE$ $SERVICESTATE$
}

# 'notify-by-pushover-host' command definition
define command{
  command_name notify-by-pushover-host
  command_line /usr/bin/printf "%b" "Host '$HOSTALIAS$'    \
        is $HOSTSTATE$: $HOSTOUTPUT$" |                    \
      /usr/local/nagios-plugins/notify-by-pushover.php     \
        HOST $APP_KEY $CONTACTADDRESS1$ $NOTIFICATIONTYPE$ \
        $HOSTSTATE$
}

Then, in your contact definition(s) add / update as follows:

define contact{
  contact_name ...
  ...
  service_notification_commands ...,notify-by-pushover-service
  host_notification_commands ...,notify-by-pushover-host
  address1 $USER_KEY
}

Make sure you break something to test that this works!

Recovering MySQL Master-Master Replication

MySQL Master-Master replication is a common practice and is implemented by having the auto-increment on primary keys increase by n where n is the number of master servers. For example (in my.conf):

auto-increment-increment = 2
auto-increment-offset = 1

This article is not about implementing this but rather about recovering from it when it fails. A word of caution – this form of master-master replication is little more than a useful hack that tends to work. It is typically used to implement hot stand-by master servers along with a VRRP-like protocol on the database IP. If you implement this with a high volume of writes; or with the expectation to write to both without application knowledge of this you can expect a world of pain!

It’s also essential that you use Nagios (or another tool) to monitor the replication on all masters so you know when an issue crops up.

So, let’s assume we have two master servers and one has failed. We’ll call these the Good Server (GS) and the Bad Server (BS). It may be the case that replication has failed on both and then you’ll have the nightmare of deciding which to choose as the GS!

1) You will need the BS to not process any queries from here on in. This may already be the case in a FHRP (e.g. VRRP) environment; but if not, use combinations of stopping services, firewalls, etc. to stop / block access to the BS. It is essential that the BS does not process any queries besides our own during this process.

2) On the BS, execute STOP REPLICA to prevent it replicating from the GS during the process.

3) On the GS, execute:

STOP SLAVE; (to stop it taking replication information from BS);
FLUSH TABLES WITH READ LOCK; (to stop it updating for a moment);
SHOW MASTER STATUS; (and record the output of this);

4) Switch to the BS and import all the data from the GS via something like:

$ mysqldump -h GS -u root -psoopersecret --all-database --quick \
--lock-all-tables | mysql -h BS -u root -psoopersecret;

Note that I am assuming that you are  replicating all databases here. Change as appropriate if not.

5) You can now switch back to the GS and execute UNLOCK TABLES to allow it to process queries again.

6) On the BS, set the master status with the information your recorded from the GS via:

CHANGE REPLICATION SOURCE TO SOURCE_LOG_FILE='mysql-bin.xxxxxx', SOURCE_LOG_POS=yy;

7) Then, again on the BS, execute START REPLICA. The BS should now be replication from the GS again and you can verify this via SHOW REPLICA STATUS.

8) We now need to have the GS replicate from the BS again. On the BS, execute SHOW MASTER STATUS and record the information. Remember that we have stopped the execution of queries on the BS in step 1 above. This is essential.

9) On the GS, using the information just gathered from the BS, execute: 

CHANGE REPLICATION SOURCE TO SOURCE_LOG_FILE='mysql-bin.xxxxxx', SOURCE_LOG_POS=yy;

10) Then, on the GS, execute START REPLICA. You should now have two way replication again and you can verify this via SHOW REPLICA STATUS on the GS.

11) If necessary, undo anything from step 1 above to put the BS back into production.

There is a --master-data switch for mysqldump which would remove the requirement to lock the GS server above but in our practical experience, there are various failure modes for the BS and the --master-data method does not work for them all.

DNS Blacklists, DNS Monitoring and Related Links

Random Links on Writing (Technical) Books

I stumbled upon a blog post by Baron Schwartz, the author of High Performance MySQL, today apropos of nothing. He wrote about his experiences writing the book in What is it like to write a technical book? His editor followed up and Baron finished with some final comments. I’ve always liked writing and found this long post very informative – and, as a geek, I especially liked his use of regular expressions to catch some bad habits he had.

This is turn led me to What I’ve Earned (And Learned) From Writing “Beginning Ruby” by Peter Cooper. Although, I must confess, I’m somewhat uncomfortable with his position on pirating his own book.

The conclusion I draw from these and other posts is that unless you have an extremely popular technical book, don’t give up your day job. Looks like I’ll be shepherding packets and wrangling servers for some time to come!

If these posts were of interest, then these may also be:

Dovecot – checkpassword – bash

The data from Dovecot’s checkpassword authentication mechanism can be read from a bash script via:

read -d $'\0' -r -u 3 USER
read -d $'\0' -r -u 3 PASS

NOCtools and OSS_SNMP Get Support for Multiple Spanning Tree (MST) Protocol

NOCtools (a mixed bag collection of tools and utilities for NOC engineers) and OSS_SNMP (a PHP SNMP Library for People Who HATE SNMP, MIBs and OIDs) have just gotten support for Multiple Spanning Tree.

Specifically, OSS_SNMP has two new MIBS (Cisco’s original MST tree which has a lot of deprecated nodes – MIBS\Cisco\MST; and the newer IEEE tree – MIBS\Cisco\SMST). With these, we can, for example, get an array of [instanceID] => instanceName values from a switch by just coding:

$ciscosw = new \OSS_SNMP\SNMP( $ip, $community );
print_r( $ciscosw->useCisco_SMST()->instances() );

NOCtools has the more impressive use cases of these new features. Specifically (and just likes its RSTP/pvrspt functionality), it can:

  • Show MST port roles (root, designated, alternate, etc) for a given (or all) MST instance(s) – this is equivalent to the RSTP version;
  • From a given device, it can crawl all CDP neighbours and create a graph of all devices, their connecting ports and the MST roles of those ports. This is a really useful feature as it means you don’t need to log into multiple switches to get a handle on what links are blocking. See documentation and a sample diagram here.

Irish Radio Stations on Linux (2013)

This is updating an older article from October 2010. While Linux has come a long way since then for playing back various types of media (and new services such as tunein make it easier again), I still like to just play the radio from the command line.

The following are updating working aliases:

alias newstalk='cvlc http://newstalk.fmstreams.com:8008/listen.pls'
alias rteradio1='cvlc http://av.rasset.ie/av/live/radio/radio1.m3u'
alias rteradio1extra='cvlc http://av.rasset.ie/av/live/radio/radio1extra.m3u'
alias 2fm='cvlc http://av.rasset.ie/av/live/radio/2fm.m3u'
alias todayfm='cvlc http://audiostore.todayfm.com/audio/todayfmIRL_64K.asx'

Git Tips From the Pros

I just came across some useful Git tips on a net tuts+ article  Git Tips From the Pros which includes:

  • Staging your commits in hunks;
  • Grabbing a file from another branch without changing branches;
  • Ignoring changes in a tracked file;
  • Zero-ing out a branches history; and
  • Some very useful aliases.