The main data volume on the system at work ran out of PEs due to the old default PE size of 4 MB and 64k of PEs being in the volume group. There’s no way to change that without recreating the volume group, and blowing away everything. So I have to move the 250 gigs of crap to somewhere else, then delete the old volume group, recreate it, then move everything back. Since moving that much data requires several hours and hence downtime to users, that’s not good.
So, the idea is to move the data around live. This requires making a RAID 1 mirror on top of lvm — not usually done this way. It also requires learning enough about mdadm to be able to create a mirror without destroying the good data (ouch). The process of intially creating the mirror will require the regular lvm device be unmounted and the RAID (md) device mounted. But after that is done, the syncing of the mirror can happen live. Hence downtime is kept to an absolute minimum.
There’s a real derth of information out there for doing this type of thing, and maybe for good reason. I’m also disappointed that this is quite a bit of hassle compared to how I used to do this sort of thing on a DG/UX (Data General) system back in the mid 90s.
- Create a new phys volume on temporary disk
- Create a new volume group on new disk with larger extent size (32 megs)
- Mirror the old data and new data together. NOTE: This requires RAID on top of lvm, not lvm on top of RAID
- Break the mirror after it syncs, leaving data on the temporary disk
- Create a new PV on the original disk with 32 meg PEs
- Extend the volume group onto the new PE
- Move the PEs from the temporary disk to the original disk
- Shrink the volume group so it lives just on the original disk
Of course, before doing any of this, testing and documentation is required, hence this post. Also, this procedure was used and tested on RHEL 3. For the love of your job, data, sanity, and all that is holy, do not trust what I am saying here. Use it as a guide with other docs and test it on non-production box with data you can afford to lose. Then before doing it on a production box, make sure you have safe backups — preferably multiples.
Some commands to use to get a feel for what is on your disk already
vgdisplay -v vg_group_name
For purposes of the demo, the good data is on vg san0 and physical device /dev/sdd. The temporary disk will be named san2 on /dev/sdc.
Initial test set up
vgcreate -s 16m san2 /dev/sdc # NOTE: Using 16m as extent size just for testing
lvcreate -l 60 --name newhome san2
RAID 1 (mirror) background
Idea will be to create a RAID 1 mirror consisting of the main data disk and a “missing” disk. This will start it off in a degraded mode. After that we can add the temporary (new) disk to it as a hot add and it wil rebuild the mirror onto it. After that is done, we mark as bad the original disk and leave ourselves with data on the new disk. Then we kill the mirror and go back to naked lvm.
Finding info on if you could create a mirror and preserve old data, let alone if it’s on top of lvm was difficult, as in, my googling could not find anyone who did it.
Creating the mirror
THE ORDER OF THE DEVICES BELOW IS CRITICAL. The first device will be the master and will copy to the second device.
mdadm --create /dev/md0 -l 1 -n 2 /dev/san0/home /dev/san2/newhome
mdadm --detail /dev/md0
mount /dev/md0 /mnt/home
If that bothers you (and it really should) you can create the array with just the original disk, mount it, then hot add the second disk into the array after you know the data is there. Also, if you’re paranoid after reading the various notes below and intend to fsck the md device before mounting, it’ll save you time if you do it like this since the fsck won’t be beating the disk at the same time as the mirror sync is going.
mdadm --create /dev/md0 -l 1 -n 2 /dev/san0/home missing
mdadm --detail /dev/md0
fsck -f /dev/md0
mount /dev/md0 /mnt/home
mdadm /dev/md0 -a /dev/san2/newhome
At this point the mirror should start rebuilding. Now it will say it has three devices and the missing will still be listed, but when it’s done the rebuild it only shows the two devices (but the total # of devices still says three). So maybe there’s a better way to do this, but it works.
Run the –detail option again to monitor when the drive is done rebuilding. At this point we should be able to break off the original disk and be left with our data on the new disk.
But before doing that, just in case, we should create an mdadm.conf file so if the array has to be stopped and restarted, we don’t have to scan for it.
echo -e 'DEVICEt/dev/san0/home /dev/san2/newhome' >> /etc/mdadm.conf
mdadm --detail --scan >> /etc/mdadm.conf
Stopping / Restarting the array
mdadm --stop --scan # will stop the array
mdadm --detail --scan # will start the array iff mdadm.conf file above was appended with correct info
mdadm -Ac partitions -m 0 /dev/md0
# will start it back up (if there is no config file)
Breaking the mirror
After (and only after, verify first) that the mirror syncing is done (.e.g.
mdadm --detail /dev/md0), the mirror can be broken and we’ll have two identical mountable ext3 lvm partitions. One on the original disk, one on the temporary (new) disk. Before killing the mirror, be sure to unmount any file systems using the mirror. When checking the rebuild status with mdadm, look for a line that says “Rebuild Status :” — if that line is not there, it’s rebuilt.
mdadm --manage --set-faulty /dev/md0 /dev/san0/home
mdadm --manage --set-faulty /dev/md0 /dev/san2/newhome
mdadm --stop --scan
Once this is done we can actually mount and use the parition, but the raid “superblock” will still be associated with each partition and that could lead to it accidently being restarted. To really destroy the mirror, the raid superblock needs to be zeroed out.
mdadm --misc --zero-superblock /dev/san2/newhome
mdadm --misc --zero-superblock /dev/san0/home
NOTE: This procedure goes for minimize downtime at the risk of file system safety. Before starting this you really should fsck the disk you are going to mirror — offline. Also, read notes below.
NOTE: The set-faulty commands above aren’t really needed if you’re just going to zero the superblocks right away but if you don’t, you could technically mount up both partitions, make changes, then unmount them and restart the mirror with probably really bad effects.
NOTE: fsck will fail on the mirror device because the size of the device shrinks a wee bit. However, once the mirror is broken, a fsck should work and pass OK.
NOTE: When you import a disk with an ext3 file system into a mirror like this, the physical size of the partiton shrinks a bit because the md superblock is stored at the end of the partition. Hence while it will mount, it will fail a fsck and there’s the risk (I think) that real data might overwrite the raid superblock, especially if the partition is filled. Hence to be very safe, one should run ‘fsck -f /dev/md0 ; resize2fs /dev/md0’ against the new raid partition before remounting. However, once you do this, when you destroy the mirror the filesystem probably should be rezised out to original size again.