perl: warning: Falling back to the standard locale (“C”)

Every time I debootstrap a new Debian server for a XenU domain, I get lots of verbose output from Perl scripts:

perl: warning: Setting locale failed.                                      
perl: warning: Please check that your locale settings:                     
        LANGUAGE = (unset),                                                
        LC_ALL = (unset),                                                  
        LANG = "en_IE.UTF-8"                                               
    are supported and installed on your system.                            
perl: warning: Falling back to the standard locale ("C").    

The fix is simple:

# apt-get install locales
# dpkg-reconfigure locales

and select your locales.

Kubuntu 8.10 and Mobile Broadband (and KDE 4.1)

Kubuntu 8.10 and mobile broadband – the KNetworkManager has come a long way!

I updated my laptop from Kubuntu 8.04 to 8.10 (just released) yesterday. I do 90% of my work on my desktop which needs to just work and, as such, it’s running Kubuntu 7.10. My laptop, however, I play around with.

Most people’s first impression of 8.10 will be based on the upgrade process and post install issues. To date, I’ve always had to fix a lot of problems with the system after an upgrade to make it work. Not this time – it was absolutely seamless.

I was also apprehensive about KDE 4.1 and, to be honest, I was really worried that in a crunch I’d have to fall back to Gnome before degrading back to 8.04. I just don’t have the time these days to follow KDE development as much as I used to and I briefly installed KDE 4 a few months ago and thought it was far from finished.

I’m delighted to report KDE 4.1 is very slick and very polished. I’ve only had it for just over 24 but I have no complaints yet.

However, my main motivation for the upgrade was mobile broadband. Like most people, I use my laptop when on the move and my desktop when in the office. My laptop has an Ethernet port and a wi-fi card which both worked great with KNetworkManager but not mobile broadband. I got O2’s broadband dongle (the small USB stick) about four months ago and rely on it heavily.

I’ve been using Vodafone’s Mobile Connect Client to great effect but there were some issues:

  • setting up the connection was a manual process (change X window access control; su to root; export the DISPLAY setting; and start the application);
  • if I suspended the laptop then I needed to reboot the system to use the dongle again.

While both of the above could be solved, it’s just not plug and play. 8.10 is. With the dongle plugged into the USB port, KNetworkManager discovered the tty port. Configuring it was as easy as right clicking on the KNetworkManager icon and selecting New Connection… icon for the tty port.

The next step requires knowledge of the O2 / provider settings but this is readily available online. For O2:

KNetworkManager - Settings for O2 Ireland
KNetworkManager - Settings for O2 Ireland

After the above, I just accepted the defaults for the rest of the options. And – to my delight – it just worked. And it worked after suspended the laptop. And after popping the USB dongle in and out for the heck of it. By clicking the Auto Connect option as part of the process, it also just works when I pop the dongle in.

chan_ss7, pcap files and 64bit machines

UPDATE: April 29th 2008

Anders Baekgaard of Dicea ApS (current chan_ss7 maintainers) recommends the following alternative patch. Please note that mtp3d.c will also need to be patched in the same way:

--- chan_ss7.c~ 2008-04-03 09:23:56.000000000 +0200
+++ chan_ss7.c  2008-04-29 08:29:20.000000000 +0200
@@ -249,11 +249,12 @@

 static void dump_pcap(FILE *f, struct mtp_event *event)
 {
+  unsigned int sec  = event->dump.stamp.tv_sec;
   unsigned int usec  = event->dump.stamp.tv_usec - 
     (event->dump.stamp.tv_usec % 1000) +
     event->dump.slinkno*2 + /* encode link number in usecs */
     event->dump.out /* encode direction in/out */;

-  fwrite(&event->dump.stamp.tv_sec, sizeof(event->dump.stamp.tv_sec), 1, f);
+  fwrite(&sec, sizeof(sec), 1, f);
   fwrite(&usec, sizeof(usec), 1, f);
   fwrite(&event->len, sizeof(event->len), 1, f); /* number of bytes of packet in file */
   fwrite(&event->len, sizeof(event->len), 1, f); /* actual length of packet */

END UPDATE: April 29th 2008

A quickie for the Google trolls:

While trying to debug some SS7 Nature of Address (NAI) indication issues, I needed to use chan_ss7’s ‘dump’ feature from the Asterisk CLI. It worked fine but the resultant pcap files always failed with messages like:

# tshark -r /tmp/now
tshark: "/tmp/now" appears to be damaged or corrupt.
(pcap: File has 409000-byte packet, bigger than maximum of 65535)

After much digging about and head-against-wall banging, I discovered the issue
is with the packet header in the pcap file. It’s defined by its spec to be:

typedef struct pcaprec_hdr_s {
        guint32 ts_sec;         /* timestamp seconds */
        guint32 ts_usec;        /* timestamp microseconds */
        guint32 incl_len;       /* number of octets of packet saved in file */
        guint32 orig_len;       /* actual length of packet */
} pcaprec_hdr_t;

chan_ss7 uses the timeval struct defined by system headers to represent ts_sec and ts_usec. But, on 64bit machines (certainly mine), these values are defined as unsigned long rather than unsigned int (presumably as a step to get over the ‘year 2038 bug’). Hence the packet header is all wrong.

An easy solution is the following patch in mtp.h:

77a78,90
> /*
>  * The packet header in the pcap file (used for the CLI command 'dump') is 
defined so has to
>  * have the two time components as unsigned ints. However, on 64bit 
machines, the system
>  * timeval struct may use unsigned long. As such, we use a custom version 
here:
>  */
> struct _32bit_timeval
> {
>   unsigned int tv_sec;            /* Seconds.  */
>   unsigned int tv_usec;      /* Microseconds.  */
> };
>
>
>
125c138
<       struct timeval stamp;        /* Timestamp */
---
>       struct _32bit_timeval stamp;        /* Timestamp */

There may be a better way – but this works.

This relates to chan_ss7-1.0.0 from http://www.dicea.dk/company/downloads and I have let them know also. It’s also a known issue for the Wireshark developers (although I did not investigate in detail to see what their resolution was for the future). See the following thread from 1999:

Amazon AWS Keeps Getting Better

Amazon’s Web Services have just launched a health dashboard which should prove very useful and can be found at http://status.aws.amazon.com.

They’ve also announced paid support services.

An eagerly awaited feature for me is persistent EC2 storage which they are trialling right now and hopefully I’ll get into the beta program. Fingers crossed!

Stargate SG1 – Finishing the Story



Following up from a discussion on how the series finale disappointed a few people over on Donncha’s blog, Holy Shmoly!, I thought I might point out that a straight-to-DVD movie which actually ends the two-year story arc has just been released.

You can buy a copy (at a great price thanks to the dollar rate) from here on Amazon.

A spoiler-free review and a discussion on the decision to end the series in this manner can be found here on GateWorld. And just to whet you appetite, a trailer follows below.

I’m in yr datacentur…

The last thing I want to do is point and laugh at anyone else’s problems – God knows we’ve all been in the trenches – but this is just too funny a pun and deserves a link:

http://talideon.com/weblog/2008/02/cooling-problems.cfm

Linux on a Dell Vostro 200

Following a recent post to ILUG asking about setting Linux up on a Dell Vostro 200, I followed up with my notes from the time I had to do it a few months back.

This is just a copy of my notes rather than a how-to but any competent Linux user should have no problem. Apologies in advance for the brevity; with luck you’ll be using a later version of Linux which will already have solved the network issue…

The two main issues and fixes were:

  • The SATA CD-ROM was not recognised initially and the fix was set the following parameter in the BIOS:

    BIOS -> Integrated Peripherals -> SATA Mode -> RAID

  • The network card is not recognised during a Linux install. Allow install to complete without network and then download Intel’s e1000 driver from http://downloadcenter.intel.com/ or specifically by clicking here. The one I used then was e1000-7.6.9.tar.gz but the current version appears to be e1000-7.6.15.tar.gz (where the above link heads to – check for later versions).

    My only notes of the install just say “essentially follow instructions in README” so I assume they were good enough! Obviously you’ll need Linux kernel headers at least as well as gcc and related tools.

    Once built and installed, a:

    modprobe e1000

    should have you working. Use dmesg to confirm.

Recovering an LVM Physical Volume

Yesterday disaster struck – during a CentOS/RedHat installation, the installer asked (not verbatim): “Cannot read partition information for /dev/sda. The drive must be initialized before continuing.”.

Now on this particular server, sda and sdb were/are a RAID1 (containing the OS) and a RAID5 partition respectively and sdc was/is a 4TB RAID5 partition from an externally attached disk chassis. This was a server re-installation and all data from sda and sdb had multiple snapshots off site. sdc had no backups of its 4TBs of data.

The installer discovered the drives in a different order and sda became the externally attached drive. I, believing it to be the internal RAID1 array, allowed the installer to initialise it. Oh shit…

Now this wouldn’t be the end of the world. It wasn’t backed up because a copy of the data exists on removable drives in the UK. It would mean someone flying in with the drives, handing them off to me at the airport, bringing them to the data center and copying all the data back. Then returning the drives to the UK again. A major inconvenience. And it’s also an embarrassment as I should have ensured that sda is what I thought it was via the installers other screens.

Anyway – from what I could make out, the installer initialised the drive with a single partition spanning the entire drive.

Once I got the operating system reinstalled, I needed to try and recover the LVM partitions. There’s not a whole lot of obvious information on the Internet for this and hence why I’m writing this post.

The first thing I needed to do was recreate the physical volume. Now, as I said above, I had backups of the original operating system. LVM creates a file containing the metadata of each volume group in /etc/lvm/backup in a file named the same as the volume group name. In this file, there is a section listing the physical volumes and their ids that make up the volume group. For example (the id is fabricated):

physical_volumes {
        pv0 {
                id = "fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn"
                device = "/dev/sdc"     # Hint only

                status = ["ALLOCATABLE"]
                pe_start = 384
                pe_count = 1072319      # 4.09057 Terabytes
        }
}

Note that after I realised my mistake, I installed the OS on the correct partition and after booting, the external drive became /dev/sdc* again. Now, to recreate the physical volume with the same id, I tried:

# pvcreate -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
  Device /dev/sdc not found (or ignored by filtering).

Eh? By turning on verbosity, you find the reason among a few hundred lines of debugging:

# pvcreate -vvvv -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
...
#filters/filter.c:121         /dev/sdc: Skipping: Partition table signature found
#device/dev-io.c:486         Closed /dev/sdc
#pvcreate.c:84   Device /dev/sdc not found (or ignored by filtering).

So pvcreate will not create a physical volume using the entire disk unless I remove partition(s) first. I do this with fdisk and try again:

# pvcreate -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
  Physical volume "/dev/sdc" successfully created

Great. Now to recreate the volume group on this physical volume:

# vgcreate -v md1000 /dev/sdc
    Wiping cache of LVM-capable devices
    Adding physical volume '/dev/sdc' to volume group 'md1000'
    Archiving volume group "md1000" metadata (seqno 0).
    Creating volume group backup "/etc/lvm/backup/md1000" (seqno 1).
  Volume group "md1000" successfully created

Now I have an “empty” volume group but with no logical volumes. I know all the data is there as the initialization didn’t format or wipe the drive. I’ve retrieved the LVM backup file called md1000 and placed it in /tmp/lvm-md1000. When I try to restore it to the new volume group I get:

# vgcfgrestore -f /tmp/lvm-md1000 md1000
  /tmp/lvm-md1000: stat failed: Permission denied
  Couldn't read volume group metadata.
  Restore failed.

After a lot of messing, I copied it to /etc/lvm/backup/md1000 and tried again:

# vgcfgrestore -f /etc/lvm/backup/md1000 md1000
  Restored volume group md1000

I don’t know if it was the location, the renaming or both but it worked.

Now the last hurdle is that on a lvdisplay, the logical volumes show up but are marked as:

  LV Status              NOT available

This is easily fixed by marking the logical volumes as available:

#  vgchange -ay
  2 logical volume(s) in volume group "md1000" now active

Agus sin é. My logical volumes are recovered with all data intact.

* how these are assigned is not particularly relevant to this story.

Amazon Web Service’s ec2-bundle-image on Ubuntu

This is really a post for Google’s crawlers on getting AWS’s EC2 AMI tools working under Ubuntu (I’m currently on Gutsy 7.10). Despite any bitching I may do below, EC2 and S3 are cool services.

The first problem is that AWS only distribute the tools as an RPM (really guys? I mean FFS). Convert and install with alien.

# apt-get install alien
# alien -k ec2-ami-tools.noarch.rpm
# dpkg -i ec2-ami-tools_1.3-15283_all.deb

Make sure you also install libopenssl-ruby.

Set your Ruby path as the RPM places them where RedHat expects to find them:

# export RUBYLIB="/usr/lib/site_ruby"

Now when you run the utility, you’ll probably get:

$ ec2-bundle-image -r ... -i ... -k ... -c ... -u ...
sh: Syntax error: Bad substitution

Aparently Ubuntu switched from invoking bash to dash for sh somewhere along the line. Just relink it (temporarily or permanently as suits):

# rm /bin/sh
# ln -s /bin/bash /bin/sh

And you should be good to go.

One other issue I encountered was that the permissions of the directories were for root only (i.e. /usr/local/aes, /usr/lib/site_ruby/ and /etc/aes). A very sloppy chmod a+rX on each of these will resolve that. Although I suspect it’s more to do with the fact that I used rpm2cpio and cpio rather than alien the first time around.

Nagios Plugin to Check the Status of PRI Lines in Asterisk

I have a number of Asterisk implementations that I keep an eye on that have multiple PRI connections. Knowing if and when they ever go down has the obvious benefits of alerting me to a problem in near real time. But besides that, it allows my customers and I to verify SLAs, track and log issues, etc.

To this end, I have written a Nagios plugin which queries Asterisk’s manager interface and executes the pri show spans CLI command (this is Asterisk 1.4 by the way). The script then parses the output to ascertain whether a PRI is up or not.

The actual code to connect to the manager interface and execute the query is simply:

if( ( $astsock = fsockopen( $host, $port, $errno, $errstr, $timeout ) ) === false )
{
    echo "Could not connect to Asterisk manager: $errstr";
    exit( STATUS_CRITICAL );
}

fputs( $astsock, "Action: Login\r\n");
fputs( $astsock, "UserName: $username\r\n");
fputs( $astsock, "Secret: $password\r\n\r\n"); 

fputs( $astsock, "Action: Command\r\n");
fputs( $astsock, "Command: pri show spans\r\n\r\n");

fputs( $astsock, "Action: Logoff\r\n\r\n");

while( !feof( $astsock ) )
{
    $asttext .= fread( $astsock, 8192 );
}

fclose( $astsock );

if( strpos( $asttext, "Authentication failed" ) !== false )
{
    echo "Asterisk manager authentication failed.";
    exit( STATUS_CRITICAL );
}

This plugin is hard coded to English and expects to find Provisioned, Up, Active for a good PRI. For example, the Asterisk implementations that support the pri show spans command that I have access to return one of:

  • PRI span 1/0: Provisioned, In Alarm, Down, Active
  • PRI span 3/0: Provisioned, Up, Active
  • PRI span 2/0: Up, Active

I’m actually running a slightly older version of Nagios at the moment, version 1.3. To integrate the plugin, first add the following command definition to an appropriate existing or new file under /etc/nagios-plugings/config/:

define command{
        command_name    check_asterisk_pri
        command_line    /usr/lib/nagios/plugins/check_asterisk_pri.php \\
             -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -w $ARG3$ \\
             -c $ARG4$ -n $ARG5$
}

where $ARG1$ is the Asterisk manager username and $ARG2$ is the password. $ARG3$ and $ARG4$ are the warning and critical thresholds respectively whereby if the number of available PRIs reaches one of these values, the appropriate error condition will be set. Lastly, $ARG5$ is the number of PRIs the plugin shouldexpect to find.

NB: the command_line line above is split for readability but it should all be on the one line.

Now create a test for a host in an appropriate file in /etc/nagios/config/:

define service{
        use                             core-service
        host_name                       hostname.domain.ie
        service_description             Asterisk PRIs
        check_command                   check_asterisk_pri!user!pass!2!1!4
}

Ensure that your Nagios server has permissions to access the Asterisk server via TCP on the Asterisk manager port (5038 by default). If on a public network, this should be done via stunnel or a VPN for security reasons.

Lastly, you’ll need a user with the appropriate permissions and host allow statements in your Asterisk configuration (/etc/asterisk/manager.conf):

[username]
secret = password
deny=0.0.0.0/0.0.0.0
permit=1.2.3.4/255.255.255.255
read = command
write = command

The next version may include support for BRI and Zap FXO ports also. I also plan on a Cacti plug in to show the channels on each PRI (up – on a call, down, etc). In any case, updates will be posted here.

The plug in can be download from: http://www.opensolutions.ie/misc/check_asterisk_pri.php.txt

UPDATED 20/03/2012: Aterisk 1.8.9 takes out the word “Provisioned” in “pri show spans”. Thanks to Shane O’Cain.