Yesterday disaster struck – during a CentOS/RedHat installation, the installer asked (not verbatim): “Cannot read partition information for /dev/sda. The drive must be initialized before continuing.”.
Now on this particular server, sda and sdb were/are a RAID1 (containing the OS) and a RAID5 partition respectively and sdc was/is a 4TB RAID5 partition from an externally attached disk chassis. This was a server re-installation and all data from sda and sdb had multiple snapshots off site. sdc had no backups of its 4TBs of data.
The installer discovered the drives in a different order and sda became the externally attached drive. I, believing it to be the internal RAID1 array, allowed the installer to initialise it. Oh shit…
Now this wouldn’t be the end of the world. It wasn’t backed up because a copy of the data exists on removable drives in the UK. It would mean someone flying in with the drives, handing them off to me at the airport, bringing them to the data center and copying all the data back. Then returning the drives to the UK again. A major inconvenience. And it’s also an embarrassment as I should have ensured that sda is what I thought it was via the installers other screens.
Anyway – from what I could make out, the installer initialised the drive with a single partition spanning the entire drive.
Once I got the operating system reinstalled, I needed to try and recover the LVM partitions. There’s not a whole lot of obvious information on the Internet for this and hence why I’m writing this post.
The first thing I needed to do was recreate the physical volume. Now, as I said above, I had backups of the original operating system. LVM creates a file containing the metadata of each volume group in /etc/lvm/backup in a file named the same as the volume group name. In this file, there is a section listing the physical volumes and their ids that make up the volume group. For example (the id is fabricated):
physical_volumes {
pv0 {
id = "fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn"
device = "/dev/sdc" # Hint only
status = ["ALLOCATABLE"]
pe_start = 384
pe_count = 1072319 # 4.09057 Terabytes
}
}
Note that after I realised my mistake, I installed the OS on the correct partition and after booting, the external drive became /dev/sdc* again. Now, to recreate the physical volume with the same id, I tried:
# pvcreate -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
Device /dev/sdc not found (or ignored by filtering).
Eh? By turning on verbosity, you find the reason among a few hundred lines of debugging:
# pvcreate -vvvv -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
...
#filters/filter.c:121 /dev/sdc: Skipping: Partition table signature found
#device/dev-io.c:486 Closed /dev/sdc
#pvcreate.c:84 Device /dev/sdc not found (or ignored by filtering).
So pvcreate will not create a physical volume using the entire disk unless I remove partition(s) first. I do this with fdisk and try again:
# pvcreate -u fvrw-GHKde-hgbf43-JKBdew-rvKLJc-cewbn /dev/sdc
Physical volume "/dev/sdc" successfully created
Great. Now to recreate the volume group on this physical volume:
# vgcreate -v md1000 /dev/sdc
Wiping cache of LVM-capable devices
Adding physical volume '/dev/sdc' to volume group 'md1000'
Archiving volume group "md1000" metadata (seqno 0).
Creating volume group backup "/etc/lvm/backup/md1000" (seqno 1).
Volume group "md1000" successfully created
Now I have an “empty” volume group but with no logical volumes. I know all the data is there as the initialization didn’t format or wipe the drive. I’ve retrieved the LVM backup file called md1000 and placed it in /tmp/lvm-md1000. When I try to restore it to the new volume group I get:
# vgcfgrestore -f /tmp/lvm-md1000 md1000
/tmp/lvm-md1000: stat failed: Permission denied
Couldn't read volume group metadata.
Restore failed.
After a lot of messing, I copied it to /etc/lvm/backup/md1000 and tried again:
# vgcfgrestore -f /etc/lvm/backup/md1000 md1000
Restored volume group md1000
I don’t know if it was the location, the renaming or both but it worked.
Now the last hurdle is that on a lvdisplay, the logical volumes show up but are marked as:
LV Status NOT available
This is easily fixed by marking the logical volumes as available:
# vgchange -ay
2 logical volume(s) in volume group "md1000" now active
Agus sin é. My logical volumes are recovered with all data intact.
* how these are assigned is not particularly relevant to this story.