vSphere vDR 1.2 LVM limitation and workaround


One of our users at virtualDCS was recently experiencing problems recovering data using the vSphere ‘VMware Data Recovery’ (vDR) release 1.2 within their CentOS Linux VM using the ‘File Level Recovery’ (FLR) tool.

I won’t go into detail with regards to describing the tool, as it has been expertly described here, and documented here already.

Although the vDR appliance was reporting successful backups, the FLR utility was not mounting all the partitions when accessing a selected restore point. The virtual machine in question was running CentOS 5.4 32bit, and had just a single vmdk but had a specific partition layout which caused issues with vDR.

The disk was configured with a small /boot partition and two larger LVM partitions as follows:

[root@localhost ~]#fdisk -l

Disk /dev/sda: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   8e  Linux LVM
/dev/sda3            1306       10443    73400985   8e  Linux LVM

It was these two 8e LVM partitions that vDR had issue with.

The standard vDR FLR mount looked ok, but only recovered the non-LVM partition on /dev/sda1 as follows:

[root@localhost /opt/vdr/VMwareRestoreClient]#./VdrFileRestore -a 172.16.1.10

(98) "Mon Feb  7 00:09:09 2011"
(99) "Tue Feb  8 02:53:03 2011"

Please input restore point to mount from list above
98
Created "/root/2011-02-07-00.09.09/Mount1"

Restore point has been mounted...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
root mount point -> "/root/2011-02-07-00.09.09"

Please input "unmount" to terminate application and remove mount point

In order to see the details of the problem you need to run the FLR tool in verbose mode with the -v switch as follows:

[root@localhost /opt/vdr/VMwareRestoreClient]#./VdrFileRestore -a 172.16.1.10

(98) "Mon Feb  7 00:09:09 2011"
(99) "Tue Feb  8 02:53:03 2011"

Please input restore point to mount from list above
98
findRestorePointNdx: searching for 98
Restore Point 98 has been found...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
"/SCSI-0:2/"
"Mon Feb  7 00:09:09 2011"

Initializing vix...
VixDiskLib: config options: libdir '/opt/vdr/VMwareRestoreClient/disklibpluginvcdr', tmpDir '/tmp/vmware-root'.
VixDiskLib: Could not load default plugins from /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libdiskLibPlugin.so: Cannot open library: /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libdiskLibPlugin.so: cannot open shared object file: No such file or directory.
DISKLIB-PLUGIN : Not loading plugin /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libvdrplugin.so.1.0: Not a shared library.
VMware VixDiskLib (1.2) Release build-254294
Using system libcrypto, version 9080CF
VixDiskLib: Failed to load libvixDiskLibVim.so : Error = libvixDiskLibVim.so: cannot open shared object file: No such file or directory.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/etc/vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /etc/vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/usr/lib/vmware/settings": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /usr/lib/vmware/settings. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/usr/lib/vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /usr/lib/vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/root/.vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /root/.vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/root/.vmware/preferences": No such file or directory.
----------------------------------------
PREF Failed to load user preferences.
DISKLIB-LINK  : Opened 'vdr://vdr://vdrip:1.1.1.40<>vcuser:<>vcpass:<>vcsrvr:<>vmuuid:<>destid:39<>sessdate:129415109490000000<>datastore:P4500-DS05<>vmdk_name:CusClone1-WEB-1.vmdk<>oppid:4499' (0x1e): plugin, 167772160 sectors / 80 GB.
DISKLIB-LIB   : Opened "vdr://vdr://vdrip:1.1.1.40<>vcuser:<>vcpass:<>vcsrvr:<>vmuuid:<>destid:39<>sessdate:129415109490000000<>datastore:P4500-DS05<>vmdk_name:CusClone1-WEB-1.vmdk<>oppid:4499" (flags 0x1e, type plugin).
DISKLIB-LIB   : CREATE CHILD: "/tmp/flr-4499-w4U6cS" -- twoGbMaxExtentSparse grainSize=128
DISKLIB-DSCPTR: "/tmp/flr-4499-w4U6cS" : creation successful.
PREF early PreferenceGet(filePosix.coalesce.enable), using default
PREF early PreferenceGet(filePosix.coalesce.aligned), using default
PREF early PreferenceGet(filePosix.coalesce.count), using default
PREF early PreferenceGet(filePosix.coalesce.size), using default
PREF early PreferenceGet(aioCusClone1r.numThreads), using default
--- Mounting Virtual Disk: /tmp/flr-4499-w4U6cS ---
SNAPSHOT: IsDiskModifySafe: Scanning directory of file /tmp/flr-4499-w4U6cS for vmx files.
Disk flat file mounted under /var/run/vmware/fuse/2848693010656666867
VixMntapi_OpenDisks: Mounted disk /tmp/flr-4499-w4U6cS at /var/run/vmware/fuse/2848693010656666867/flat.
Mounting Partition 1 from disk /tmp/flr-4499-w4U6cS
Created "/root/2011-02-07-00.09.09/Mount1"
MountsDone: LVM volume detected, start: 106928640, flat file: "/var/run/vmware/fuse/2848693010656666867/flat"
MountsDone: LVM volume detected, start: 10733990400, flat file: "/var/run/vmware/fuse/2848693010656666867/flat"
System: running "lvm version 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4701: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4701: sh
LVM version:     2.02.46-RHEL5 (2009-09-15)
Library version: 1.02.32 (2009-05-21)
Driver version:  4.11.5
System: end results...
System: command "lvm version 2>&1" completed successfully
LoopMountSetup: Setup loop device for "/dev/loop1" (offset: 106928640) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopMountSetup: Setup loop device for "/dev/loop2" (offset: 2144055808) : "/var/run/vmware/fuse/2848693010656666867/flat"
System: running "lvm vgdisplay 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4706: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4706: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
--- Volume group ---
VG Name               VolGroup00
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  24
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                4
Open LV               4
Max PV                0
Cur PV                2
Act PV                1
VG Size               79.88 GB
PE Size               32.00 MB
Total PE              2556
Alloc PE / Size       2556 / 79.88 GB
Free  PE / Size       0 / 0
VG UUID               KTg9lK-J48t-P6sw-03lC-TjAX-d5n6-8qcAEx

System: end results...
System: command "lvm vgdisplay 2>&1" completed successfully
LVMFindInfo: found "VG Name" -> "VolGroup00"
System: running "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvscan /dev/loop1 /dev/loop2 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4710: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4710: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
PV /dev/loop1       VG VolGroup00   lvm2 [9.88 GB / 0    free]
PV unknown device   VG VolGroup00   lvm2 [70.00 GB / 0    free]
Total: 2 [79.88 GB] / in use: 2 [79.88 GB] / in no VG: 0 [0   ]
System: end results...
System: command "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvscan /dev/loop1 /dev/loop2 2>&1" completed successfully
System: running "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvdisplay /dev/loop1 /dev/loop2 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4714: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4714: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
No physical volume label read from /dev/loop2
Failed to read physical volume "/dev/loop2"
--- Physical volume ---
PV Name               /dev/loop1
VG Name               VolGroup00
PV Size               9.90 GB / not usable 22.76 MB
Allocatable           yes (but full)
PE Size (KByte)       32768
Total PE              316
Free PE               0
Allocated PE          316
PV UUID               Qqk2st-jiXP-k281-A1Ug-nCtM-rn0a-I8eXlX

System: end results...
System: command "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvdisplay /dev/loop1 /dev/loop2 2>&1" failed with error 1280
LoopDestroy: Removed loop device "/dev/loop1" (offset: 106928640) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopDestroy: Removed loop device "/dev/loop2" (offset: 10733990400) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopMountSetup: LVM mounts terminating due to fatal error
VdrVixMountDone: Failed 1

Restore point has been mounted...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
root mount point -> "/root/2011-02-07-00.09.09"

Please input "unmount" to terminate application and remove mount point

Once again, only the /boot non-LVM partition held on /dev/sda1 was mounted. You can see from the results above that the LVM mounts failed due to a fatal error.

I wasn’t sure whether there was an undocumented incompatibility with my LVM version or fuse version, so I took the easy route and logged a call with VMware Support SR# 1589684961. After eliminating the obvious the ticket was escalated to a Research Engineer who was excellent (aren’t they all?). He told me that VMware was aware of an issue with multiple LVM partitions and were expecting to include a fix in an upcoming relase of vDR.

That was great, but my customer needed to ensure his backup process allowed FLR restores. I had to find a workaround that could be implemented without requiring a reboot as the virtual machine in question was aiming for 100% uptime.

My plan was to add a new vmdk to the VM, and migrate the data off the two existing LVM partitions, remove them both, than create a single LVM partition on the original disk and migrate the data back, before removing the temporary disk.

This is the procedure I used:

***hot add new 80GB thin SCSI disk as SCSI0:1

echo "scsi add-single-device" 0 0 1 0 > /proc/scsi/scsi

***partition the new disk

fdisk /dev/sdb
n
p
1

Accept first and last cylinders to use all space

***format partition as LVM type 8e

t
1
8e
w

***prepare the new partition for LVM

pvcreate /dev/sdb1

***add the partition to the existing LVM Volume Group

vgextend VolGroup00 /dev/sdb1

***move the data off /dev/sda2 and /dev/sda3

pvmove /dev/sda2 /dev/sdb1
pvmove /dev/sda3 /dev/sdb1

***remove /dev/sda2 and /dev/sda3 from the VolGroup

vgreduce VolGroup00 /dev/sda2
vgreduce VolGroup00 /dev/sda3

***unprepare the original partitions

pvremove /dev/sda2
pvremove /dev/sda3

***delete the original partitions and create a single new bigger one

fdisk /dev/sda
d
2
d
3
n
p
2

Accept first and last cylinders to use all space

t
2
8e
w

***instead of rebooting to recognise the partition you can just run

partprobe

I didn’t have parted installed, so before I could probe the partitions, I had to run

yum install parted

***prepare the new partition for LVM:

pvcreate /dev/sda2

***add the partition to the existing Vol Group

vgextend VolGroup00 /dev/sda2

***next move the data back off /dev/sdb1

pvmove /dev/sdb1 /dev/sda2

***remove the temp disk from the LVM Volume Group

vgreduce VolGroup00 /dev/sdb1

***unprepare the partition

pvremove /dev/sdb1

***delete the partition

fdisk /dev/sdb
d
1

***remove the temporary disk from the virtual machine using vcenter then finally:

echo "scsi remove-single-device" 0 0 1 0 > /proc/scsi/scsi

I have no idea if the above will be of use to anyone, so please let me know if you find it helpful in any way. The new version of vDR will include a new version of the FLR tool anyway so let’s hope the issue is resolved in that.

About these ads

One thought on “vSphere vDR 1.2 LVM limitation and workaround

  1. Pingback: Tweets that mention vSphere vDR 1.2 LVM limitation and workaround « Adaptive Thinking -- Topsy.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s