I was recently asked to look into a problem a client was having with his vSphere vDR backup routines. All guest machines were being successfully backed up apart from one.
The virtual machine that was failing was the proxy server running MS Forefront TMG on a Windows 2008 R2 guest. The error reported that that a snapshot had failed with error:
"Failed to create snapshot for proxy, error -3960 ( cannot quiesce virtual machine)"
I started by taking a manual quiesced snapshot to test it outside of vDR:
Sure enough, this produced a similar error –
"Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine."
Obviously I began scouring the forum posts at http://communities.vmware.com but found a whole range of posts regarding quiesced snapshots, all of which I found rather confusing. All the virtual machines on this system were built from the same template. All therefore had the same VMware tools installed, in the same way, and only one was having this issue.
In the end, I decided to investigate the TMG services to see if they were causing the VSS to fail themselves. When all TMG services were stopped, the snapshots worked with no errors!!! I then began selectively stopping them to see which was causing the problem, and found my culprit – the ISASTGCTRL service. This service, described here by Marc Grote, is used to store the TMG configuration in the AD-LDS (Active Directory – Lightweight Directory Service). When the service is running, snapshots fail, and when stopped, they succeed.
In order to allow quiesced snapshots to be taken, I had to create a freeze and thaw script procedure as follows:
Within the guest operating system, I created the following folder C:\Program Files\VMware\VMware Tools\BackupScripts.d\ This folder is not created by default when the VMware tools install, but is required if you want to add pre-snapshot and post-snapshot scripts as I did. Within this folder I created a txt file called vss.bat with the following contents:
@echo off if %1 == freeze goto freeze if %1 == thaw goto thaw if %1 == freezeFail goto freezeFail :freeze net stop "ISASTGCTRL" /Y exit :thaw net start "ISASTGCTRL" /Y exit :freezeFail net start "ISASTGCTRL" /Y exit
Hence, when the snapshot is called by VMware tools, it first checks this folder for any scripts to run. When the snapshot is taken, it passed an argument ‘freeze’ to the script, and when the snapshot is finished, it passed the argument ‘thaw’ to the script. In doing so, the script successfully stops and then re-starts my problematic service.
Now when the scheduled VMware vDR process runs, the appliance is able to take a quiesced snapshot successfully. Happy days.