vCloud Director 101

I decided to write this blog (read it in reference to my slideshow) to give you guidance on the complex terminologies of VMware vCloud director. I will refer to it vCD from now on to save my poor fingers.

If you’re a vSphere admin, vCD terminology is very different, it uses new terms to label layers, a way to image this is an onion ring, as you peel away the layers you get to the core or centre of the onion, vCD is abstraction layer above your infrastructure. It hides all the bits and pieces your users don’t need to see, and you don’t want them to mess around with!

Massimo gave a great quote. Check out his blog for all things vCD. He wraps it up in this quote.

“Think about how difficult it is to implement something that allows and end-user to create, in self-service mode, separate layer 2 network segments, define custom layer 3 IP policies, configure services such as DHCP, NAT and Firewall… all without having to ask the vSphere / cloud administrator to do all that for you, all without messing up with the cloud-wide setup, all without causing conflicts with the other tenants on the cloud. This is a titanic effort, believe me.”

This blog will not go through the install of vCD, as it is beyond the scope of this article, but have a look over at Kendrick Coleman’s blog site, as he has a fantastic walkthrough on a vCD install. Now let’s tackle the terminology you need to understand, these terms are prompted by the wizard once installation has completed and you’re ready to create your first tenant.

So as my vCD slide outlines, what is vCloud Director, it’s the wrapper around your vSphere infrastructure, it hides the complex bits and automates creation of VM’s and networks without admin intervention.
What is a vCD Cell?

An instance of vCloud Director

Can be scaled by adding multiple cells behind a load balancer

Scales up to 10,000 VMs and 25 vCenter Servers

Creates virtual datacentres by pooling resources into new units of consumption

Secures and Isolates users with vShield, LDAP and RBAC with policies

Components of a vCD Deployment

Min 2xESXi hosts vSphere ENT or ENT+

No Enterprise Plus licence means No vCDNI networking

Shared Storage for DRS of hosts

vCenter

vCloud Director (VM)

Embedded or remote DB

AD / LDAP Directory

vShield Manager VM

vShield Edge VMs (automatically deployed on ESXi hosts)

vApps, deployed on ESXi hosts

Optional Components

VMware Chargeback

Meter the consumption of VM’s, networks etc., and bill them.

vCloud Connector

Connect Private Clouds to public, makes the interchange of VMs across clouds seamless.

vCD Logical Terminology:

Provider virtual Data Centre (PvDC): A logical grouping of vSphere compute and storage resources where all resources are equal (some clouds may have tiers with platinum/gold/silver)

Organisation: A unit of administration with its own users, groups, policies, and catalogues. An Org has its own security boundary. These are ‘tenants’.

Organisation vDC: A logical grouping of resources from one of more provider vDCs, enabling different performance, SLA, and cost options to be available in the same organisation.

Recommendations

Allocate at least one vCloud Director (Cell) for each vCenter server

Configure the vCloud Director database, VMware appliance is for testing purposes and uses embedded Oracle DB, not for production (16GB Ram, 100GB Storage, 4vCpus)

Read the vCat documentation to see how See VMware recommends building vCloud Director.

Recommended Configuration

Create 2 Clusters, 1 for management and 1 for resource, you don’t want your new cloud to be consuming resources before you have even installed any tenants onto it yet would you?

Create all the VMs needed for management in the management cluster.

Layers of Networking

Customer/Tenant/Organisation Network Layer (Completely Dynamic – No configuration by the customer)
————————————————————————————–

vCloud Director Network Layer

Maps to components of vSphere layer and physical layer

vSphere Network Layer

vSwitches, Port Groups etc. (must be stable and static)

Physical Network Layer

Switches/routers and IP’s etc. (must be stable and static)

vCD Networking Terms:

External Network

The vCD inner networking component is called External Networks. If you want your Organization (and in turns your vApps) to have connectivity to the external world you need to have External Networks. As the word implies, these are networks that are managed by someone that is typically external to the vCD environment and are identified by a vSphere Port Group. That’s in fact what you do when you create a vCD External Network: you point to an existing vSphere Port Group. Essentially you are telling vCloud Director that there is a Port Group that is able to provide external connectivity to your cloud environment. The typical example is a Port Group with VLAN 233 (for instance) which can support native Internet traffic. For naming convention you will be calling this External Network something like Internet or Ext-Net-Internet. I usually suggest naming the vCD External Network after the vSphere Port Group for ease of tracking.

• Connects vCD to the outside world

• Based on a vSphere port group
NOTE

When you create the port group on the dvSwitch recommended editing settings to make the ports Ephemeral – no limit on ports

Organisational Network

External Networks are easy. With Organization Networks things start to become more “interesting”. In the previous section we have created cloud-wide external connectivity (i.e. External Networks). Now we are zooming inside an Organization. An Organization (or Org) is a logical construct within vCD that describes a tenant or a customer. Cloud end-users are defined inside each Organization.

• A virtual network for tenants / customers

• Communicate with each other and access the internet

• Require an External network, network pool or both

The 3 Types of Org network a tenant can have are:

• External Organisational network: Direct

• External Organisational network: NAT-routed

• Internal Organisational network (private)

3 types of network pools you can allocate to tenants:
VLAN Backed (flexible, no special MTU settings, requires a lot of VLAN management)

Network Isolation Backed (vCDNI – no VLAN ranges to track, must change MTU / mac-in-mac encapsulation)

vSphere Port Group Backed (Standard and Distributed, no auto network deployment – most work involved)

Ideally you need to use vCDNI, so everything is automated, but you will need an Enterprise Plus licence for this feature, and also make sure that the MTU settings are set higher than 1500 at the physical switch level, esx host level and vCenter server level. You can use as high as 9000 without causing problems.

Network Pools

At this point you may have an overall understanding of what a Network Pool is and why it is used. In summary it is a small CMDB that contains layer 2 segments available to vCD administrators and end-users. Note Network Pools need to be created before we start deploying the actual networks we have described above (with the exception of the External Networks because they don’t use Networks Pools).

So far we kept referring to a “layer 2 segment” as a Port Group with an associated VLAN id. This is correct but it doesn’t tell the whole story. There are really three different types of Network Pools one can create.

VLAN-backed Network Pools: this is the easiest to get. You can, for example, create a Network Pool and give it a range of VLAN ID 100 to 199. Whenever you grab one of these IDs because you need to deploy a new layer 2 segment, vCD will tell vCenter “please create on the fly a Port Group, and give it VLAN ID 100″. The next time there is a need for another layer 2 segment vCD will tell vCenter “please create on the fly a Port Group, and give it VLAN ID 101″. And so on. Of course if one of these networks is destroyed during the lifecycle of the cloud, the corresponding VLAN ID gets put back into the pool of available networks to be deployed.

Port Group-backed Network Pools: it is similar to the VLAN-backed. The difference is that the Port Groups need to be pre-provisioned on the vSphere infrastructure and they need to be imported into vCloud Director. So vCD won’t tell vCenter to create these on the fly, they are already there pre-provisioned. Why using this? Well there are some circumstances where vCenter cannot easily (programmatically) create Port Groups on the fly. This is the case when you use vSphere Standard Switches (as opposed to Distributed Switches) or when you use the Nexus 1000v (at the moment vCD cannot manipulate programmatically Port Profiles).

vCloud Director Network Isolation Network Pools: This is when things start to get interesting (again). We use a technique called Mac-in-Mac to create layer 2 separated networks without using VLANs. Yeah that’s right. This is extremely useful for big environments where VLAN management is problematic, either because there is a limited number of VLANs available or because keeping track of VLANs is a big management overhead (especially if you use an excel spread sheet to do that ).

When you create such a Network Pool you only specify how many of these layer 2 networks you want this Network Pool to have and you are done. When vCD starts to deploy Port Groups from this Network Pool you won’t see any VLAN associated to them but they are indeed different layer 2 segments.

Now the acronym VCD-NI and the labels Preprovisioned and Created-on-the-fly in the pictures above should make more sense to you. Try to go back and have a look at them again.
Virtual Machines IP management

First of all note you cannot connect a vNIC to an External Network directly. You can however connect the vNIC to either an Organization Network or a vApp Network.
Now the question is: what happens when you connect a vNIC to either an Organization Network or a vApp Network? How do you control the layer 3 behaviour? As we said, you have a choice of connecting each vNIC of the VM to an Organization Network, a vApp Network or leave the vNIC not connected.
Reference URL’s
Massimo
http://it20.info/2010/09/vcloud-director-networking-for-dummies/

Duncan Epping – Creating a vCD Lab on your Mac/Laptop
http://www.yellow-bricks.com/2010/09/13/creating-a-vcd-lab-on-your-maclaptop/

Chris Colotti – VMware vCloud “In a Box” for your Home Lab
http://www.chriscolotti.us/vmware/vsphere/vmware-vcloud-in-a-box-for-your-home-lab/

vCloud networking explained in 1 slide and 52 animations
http://www.ntpro.nl/blog/archives/2024-vCloud-networking-explained-in-1-slide-and-52-animations.html

vSphere vDR 1.2 LVM limitation and workaround

One of our users at virtualDCS was recently experiencing problems recovering data using the vSphere ‘VMware Data Recovery’ (vDR) release 1.2 within their CentOS Linux VM using the ‘File Level Recovery’ (FLR) tool.

I won’t go into detail with regards to describing the tool, as it has been expertly described here, and documented here already.

Although the vDR appliance was reporting successful backups, the FLR utility was not mounting all the partitions when accessing a selected restore point. The virtual machine in question was running CentOS 5.4 32bit, and had just a single vmdk but had a specific partition layout which caused issues with vDR.

The disk was configured with a small /boot partition and two larger LVM partitions as follows:

[root@localhost ~]#fdisk -l

Disk /dev/sda: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   8e  Linux LVM
/dev/sda3            1306       10443    73400985   8e  Linux LVM

It was these two 8e LVM partitions that vDR had issue with.

The standard vDR FLR mount looked ok, but only recovered the non-LVM partition on /dev/sda1 as follows:

[root@localhost /opt/vdr/VMwareRestoreClient]#./VdrFileRestore -a 172.16.1.10

(98) "Mon Feb  7 00:09:09 2011"
(99) "Tue Feb  8 02:53:03 2011"

Please input restore point to mount from list above
98
Created "/root/2011-02-07-00.09.09/Mount1"

Restore point has been mounted...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
root mount point -> "/root/2011-02-07-00.09.09"

Please input "unmount" to terminate application and remove mount point

In order to see the details of the problem you need to run the FLR tool in verbose mode with the -v switch as follows:

[root@localhost /opt/vdr/VMwareRestoreClient]#./VdrFileRestore -a 172.16.1.10

(98) "Mon Feb  7 00:09:09 2011"
(99) "Tue Feb  8 02:53:03 2011"

Please input restore point to mount from list above
98
findRestorePointNdx: searching for 98
Restore Point 98 has been found...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
"/SCSI-0:2/"
"Mon Feb  7 00:09:09 2011"

Initializing vix...
VixDiskLib: config options: libdir '/opt/vdr/VMwareRestoreClient/disklibpluginvcdr', tmpDir '/tmp/vmware-root'.
VixDiskLib: Could not load default plugins from /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libdiskLibPlugin.so: Cannot open library: /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libdiskLibPlugin.so: cannot open shared object file: No such file or directory.
DISKLIB-PLUGIN : Not loading plugin /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libvdrplugin.so.1.0: Not a shared library.
VMware VixDiskLib (1.2) Release build-254294
Using system libcrypto, version 9080CF
VixDiskLib: Failed to load libvixDiskLibVim.so : Error = libvixDiskLibVim.so: cannot open shared object file: No such file or directory.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/etc/vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /etc/vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/usr/lib/vmware/settings": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /usr/lib/vmware/settings. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/usr/lib/vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /usr/lib/vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/root/.vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /root/.vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/root/.vmware/preferences": No such file or directory.
----------------------------------------
PREF Failed to load user preferences.
DISKLIB-LINK  : Opened 'vdr://vdr://vdrip:1.1.1.40<>vcuser:<>vcpass:<>vcsrvr:<>vmuuid:<>destid:39<>sessdate:129415109490000000<>datastore:P4500-DS05<>vmdk_name:CusClone1-WEB-1.vmdk<>oppid:4499' (0x1e): plugin, 167772160 sectors / 80 GB.
DISKLIB-LIB   : Opened "vdr://vdr://vdrip:1.1.1.40<>vcuser:<>vcpass:<>vcsrvr:<>vmuuid:<>destid:39<>sessdate:129415109490000000<>datastore:P4500-DS05<>vmdk_name:CusClone1-WEB-1.vmdk<>oppid:4499" (flags 0x1e, type plugin).
DISKLIB-LIB   : CREATE CHILD: "/tmp/flr-4499-w4U6cS" -- twoGbMaxExtentSparse grainSize=128
DISKLIB-DSCPTR: "/tmp/flr-4499-w4U6cS" : creation successful.
PREF early PreferenceGet(filePosix.coalesce.enable), using default
PREF early PreferenceGet(filePosix.coalesce.aligned), using default
PREF early PreferenceGet(filePosix.coalesce.count), using default
PREF early PreferenceGet(filePosix.coalesce.size), using default
PREF early PreferenceGet(aioCusClone1r.numThreads), using default
--- Mounting Virtual Disk: /tmp/flr-4499-w4U6cS ---
SNAPSHOT: IsDiskModifySafe: Scanning directory of file /tmp/flr-4499-w4U6cS for vmx files.
Disk flat file mounted under /var/run/vmware/fuse/2848693010656666867
VixMntapi_OpenDisks: Mounted disk /tmp/flr-4499-w4U6cS at /var/run/vmware/fuse/2848693010656666867/flat.
Mounting Partition 1 from disk /tmp/flr-4499-w4U6cS
Created "/root/2011-02-07-00.09.09/Mount1"
MountsDone: LVM volume detected, start: 106928640, flat file: "/var/run/vmware/fuse/2848693010656666867/flat"
MountsDone: LVM volume detected, start: 10733990400, flat file: "/var/run/vmware/fuse/2848693010656666867/flat"
System: running "lvm version 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4701: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4701: sh
LVM version:     2.02.46-RHEL5 (2009-09-15)
Library version: 1.02.32 (2009-05-21)
Driver version:  4.11.5
System: end results...
System: command "lvm version 2>&1" completed successfully
LoopMountSetup: Setup loop device for "/dev/loop1" (offset: 106928640) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopMountSetup: Setup loop device for "/dev/loop2" (offset: 2144055808) : "/var/run/vmware/fuse/2848693010656666867/flat"
System: running "lvm vgdisplay 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4706: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4706: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
--- Volume group ---
VG Name               VolGroup00
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  24
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                4
Open LV               4
Max PV                0
Cur PV                2
Act PV                1
VG Size               79.88 GB
PE Size               32.00 MB
Total PE              2556
Alloc PE / Size       2556 / 79.88 GB
Free  PE / Size       0 / 0
VG UUID               KTg9lK-J48t-P6sw-03lC-TjAX-d5n6-8qcAEx

System: end results...
System: command "lvm vgdisplay 2>&1" completed successfully
LVMFindInfo: found "VG Name" -> "VolGroup00"
System: running "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvscan /dev/loop1 /dev/loop2 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4710: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4710: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
PV /dev/loop1       VG VolGroup00   lvm2 [9.88 GB / 0    free]
PV unknown device   VG VolGroup00   lvm2 [70.00 GB / 0    free]
Total: 2 [79.88 GB] / in use: 2 [79.88 GB] / in no VG: 0 [0   ]
System: end results...
System: command "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvscan /dev/loop1 /dev/loop2 2>&1" completed successfully
System: running "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvdisplay /dev/loop1 /dev/loop2 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4714: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4714: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
No physical volume label read from /dev/loop2
Failed to read physical volume "/dev/loop2"
--- Physical volume ---
PV Name               /dev/loop1
VG Name               VolGroup00
PV Size               9.90 GB / not usable 22.76 MB
Allocatable           yes (but full)
PE Size (KByte)       32768
Total PE              316
Free PE               0
Allocated PE          316
PV UUID               Qqk2st-jiXP-k281-A1Ug-nCtM-rn0a-I8eXlX

System: end results...
System: command "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvdisplay /dev/loop1 /dev/loop2 2>&1" failed with error 1280
LoopDestroy: Removed loop device "/dev/loop1" (offset: 106928640) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopDestroy: Removed loop device "/dev/loop2" (offset: 10733990400) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopMountSetup: LVM mounts terminating due to fatal error
VdrVixMountDone: Failed 1

Restore point has been mounted...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
root mount point -> "/root/2011-02-07-00.09.09"

Please input "unmount" to terminate application and remove mount point

Once again, only the /boot non-LVM partition held on /dev/sda1 was mounted. You can see from the results above that the LVM mounts failed due to a fatal error.

I wasn’t sure whether there was an undocumented incompatibility with my LVM version or fuse version, so I took the easy route and logged a call with VMware Support SR# 1589684961. After eliminating the obvious the ticket was escalated to a Research Engineer who was excellent (aren’t they all?). He told me that VMware was aware of an issue with multiple LVM partitions and were expecting to include a fix in an upcoming relase of vDR.

That was great, but my customer needed to ensure his backup process allowed FLR restores. I had to find a workaround that could be implemented without requiring a reboot as the virtual machine in question was aiming for 100% uptime.

My plan was to add a new vmdk to the VM, and migrate the data off the two existing LVM partitions, remove them both, than create a single LVM partition on the original disk and migrate the data back, before removing the temporary disk.

This is the procedure I used:

***hot add new 80GB thin SCSI disk as SCSI0:1

echo "scsi add-single-device" 0 0 1 0 > /proc/scsi/scsi

***partition the new disk

fdisk /dev/sdb
n
p
1

Accept first and last cylinders to use all space

***format partition as LVM type 8e

t
1
8e
w

***prepare the new partition for LVM

pvcreate /dev/sdb1

***add the partition to the existing LVM Volume Group

vgextend VolGroup00 /dev/sdb1

***move the data off /dev/sda2 and /dev/sda3

pvmove /dev/sda2 /dev/sdb1
pvmove /dev/sda3 /dev/sdb1

***remove /dev/sda2 and /dev/sda3 from the VolGroup

vgreduce VolGroup00 /dev/sda2
vgreduce VolGroup00 /dev/sda3

***unprepare the original partitions

pvremove /dev/sda2
pvremove /dev/sda3

***delete the original partitions and create a single new bigger one

fdisk /dev/sda
d
2
d
3
n
p
2

Accept first and last cylinders to use all space

t
2
8e
w

***instead of rebooting to recognise the partition you can just run

partprobe

I didn’t have parted installed, so before I could probe the partitions, I had to run

yum install parted

***prepare the new partition for LVM:

pvcreate /dev/sda2

***add the partition to the existing Vol Group

vgextend VolGroup00 /dev/sda2

***next move the data back off /dev/sdb1

pvmove /dev/sdb1 /dev/sda2

***remove the temp disk from the LVM Volume Group

vgreduce VolGroup00 /dev/sdb1

***unprepare the partition

pvremove /dev/sdb1

***delete the partition

fdisk /dev/sdb
d
1

***remove the temporary disk from the virtual machine using vcenter then finally:

echo "scsi remove-single-device" 0 0 1 0 > /proc/scsi/scsi

I have no idea if the above will be of use to anyone, so please let me know if you find it helpful in any way. The new version of vDR will include a new version of the FLR tool anyway so let’s hope the issue is resolved in that.

HowTo: Quiesced snapshots of Forefront TMG virtual machines

I was recently asked to look into a problem a client was having with his vSphere vDR backup routines. All guest machines were being successfully backed up apart from one.

The virtual machine that was failing was the proxy server running MS Forefront TMG on a Windows 2008 R2 guest.  The error reported that that a snapshot had failed with error:

 "Failed to create snapshot for proxy, error -3960 ( cannot quiesce virtual machine)"

I started by taking a manual quiesced snapshot to test it outside of vDR:

Sure enough, this produced  a similar error –

"Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine."

Obviously I began scouring the forum posts at http://communities.vmware.com but found a whole range of posts regarding quiesced snapshots, all of which I found rather confusing.  All the virtual machines on this system were built from the same template.  All therefore had the same VMware tools installed, in the same way, and only one was having this issue.

Some of the knowledge base articles looked interesting, but didn’t get me any closer to the solution either – http://kb.vmware.com/kb/1009073 and http://kb.vmware.com/kb/1007696

In the end, I decided to investigate the TMG services to see if they were causing the VSS to fail themselves.  When all TMG services were stopped, the snapshots worked with no errors!!!  I then began selectively stopping them to see which was causing the problem, and found my culprit – the ISASTGCTRL service.  This service, described here by Marc Grote, is used to store the TMG configuration in the AD-LDS (Active Directory – Lightweight Directory Service).  When the service is running, snapshots fail, and when stopped, they succeed.

In order to allow quiesced snapshots to be taken, I had to create a freeze and thaw script procedure as follows:

Within the guest operating system, I created the following folder C:\Program Files\VMware\VMware Tools\BackupScripts.d\ This folder is not created by default when the VMware tools install, but is required if you want to add pre-snapshot and post-snapshot scripts as I did.  Within this folder I created a txt file called vss.bat with the following contents:

@echo off
if %1 == freeze goto freeze
if %1 == thaw goto thaw
if %1 == freezeFail goto freezeFail

:freeze
net stop "ISASTGCTRL" /Y
exit
:thaw
net start "ISASTGCTRL" /Y
exit
:freezeFail
net start "ISASTGCTRL" /Y
exit

Hence, when the snapshot is called by VMware tools, it first checks this folder for any scripts to run. When the snapshot is taken, it passed an argument ‘freeze’ to the script, and when the snapshot is finished, it passed the argument ‘thaw’ to the script. In doing so, the script successfully stops and then re-starts my problematic service.

Now when the scheduled VMware vDR process runs, the appliance is able to take a quiesced snapshot successfully.  Happy days.

vSphere ESX4: Hot Add CPUs for Linux Guests

Sometime ago within my post entitled vSphere ESX4: Hot Add Memory for Linux Guests, I promised to blog about hot add CPU support to the same VM.  As you may have guessed, I didn’t get around to writing it, and have subsequently found the area well covered elsewhere.

wila has knocked together a very clear procedure here http://communities.vmware.com/docs/DOC-10493 which kind of makes my post unnecessary.

My earlier post has been referenced as a resource by lamw for his helpful guide here: http://communities.vmware.com/docs/DOC-10492

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to Ma.gnoliaAdd to TechnoratiAdd to Furl

vSphere ESX4: Hot Add Memory for Linux Guests

I was asked recently if I could hot-add some RAM to a client’s virtual machine over at virtualDCS. Most of the information I’d found online related to Windows Server versions, but I needed to hot add resources to a Linux VM. VMware.com was lacking in detail about the hot-add compatibility with client operating systems, so I realised I’d better lab it up and see how it works for myself.

The first problem I had, was that the virtual machine I’d cloned from my clients live VM, was originally built using ESX3.5. Hence, it was VM version 4, and hot add hardware is not supported unless the VM hardware is upgraded to version 7. In order to enable hot-add features, I had to first upgrade VMware Tools, and then shut down the VM again to upgrade the virtual hardware to version 7.

Once this had been done, I made sure the VM General Options (VM > Edit Settings > Options > General Options) was set to the correct OS type. This important, as the interface will only display the Memory/CPU Hotplug options for supported OSes. In my case I was running CentOS 5.3 x86_64, so selected Other Linux 2.6.

General Options

Next I enabled the Hot Add CPU and Memory as below, but was unable to check the radio button for Hot Remove CPU, which is interesting in relation to what I found when playing with Hot Add CPUs (discussed in an upcoming post).

HotAdd-Remove

I found that the CentOS build I was using (2.6.18-128.el5) recognises hot added memory automatically. A colleague (thanks Stu) recommended I read the Linux Hotplug Memory docs which made the rest fairly obvious.

My VM was running with 512MB RAM, so I added some more via the vCenter console, so my VM now had 1GB RAM allocated to it. (BTW: even though vCenter appears to let you do this for the 32bit guest version, it doesn’t actually work. The task is reported as successful, but when you check the VM properties again, you’ll see the RAM was not added.)

When memory is hotplugged, the kernel recognizes new memory, makes new memory management tables, and makes sysfs files for new memory’s operation.
If firmware supports notification of connection of new memory to OS, this phase is triggered automatically. ACPI can notify this event. If not, “probe” operation by system administration is used instead.

Now comes the interesting part. Within

/sys/devices/system/memory

there are a number of folders all named ‘memoryX’ where X represents a unique ‘section’ of memory. How big each section is, and hence how many folders you have is dependent on your environment, but you can check the file

/sys/devices/system/memory/block_size_bytes

to view the size of sections in bytes. Basically, the whole memory has been divided up into equal sized chunks as per the SPARSEMEM memory model.

In each section’s folder there is a file called ‘state’, and in each file is one of two words; online or offline.
Locate the memoryX folder(s) which account for the hot added memory by working out the section sizes above, or (like me), just check the contents of the state files:

#cat /sys/devices/system/memory/memoryX/state

Once you locate the offline sections, you can bring them online as follows:

#echo online > /sys/devices/system/memory/memoryX/state

Validate the memory change is seen, using:

#free

That’s it! Quite simple really.

UPDATE: I noticed that William Lam (lamw on the VMware communities) created a nice script to automate the discovery and online process.  It’s very neat and can be downloaded here:

#!/bin/bash
# William Lam
# http://engineering.ucsb.edu/~duonglt/vmware/
# hot-add memory to LINUX system using vSphere ESX(i) 4.0
# 08/09/2009

if [ "$UID" -ne "0" ]
 then
 echo -e "You must be root to run this script.\nYou can 'sudo' to get root access"
 exit 1
fi

for MEMORY in $(ls /sys/devices/system/memory/ | grep memory)
do
 SPARSEMEM_DIR="/sys/devices/system/memory/${MEMORY}"
 echo "Found sparsemem: \"${SPARSEMEM_DIR}\" ..."
 SPARSEMEM_STATE_FILE="${SPARSEMEM_DIR}/state"
 STATE=$(cat "${SPARSEMEM_STATE_FILE}" | grep -i online)
 if [ "${STATE}" == "online" ]; then
 echo -e "\t${MEMORY} already online"
 else
 echo -e "\t${MEMORY} is new memory, onlining memory ..."
 echo online > "${SPARSEMEM_STATE_FILE}"
 fi
done

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to Ma.gnoliaAdd to TechnoratiAdd to Furl

VMware vShield – was it worth it?

I just spent a couple of hours happily deploying VMware vShield Zones, less happily pouring over the manuals, and then unhappily thinking I’d wasted my time.

I think our ESX platform is fairly typical. We have multiple ESX servers, running guest VM’s for multiple customers (or departments), many of which are tagged to isolated vLans, and most of which ultimately communicate to the outside world via our firewall clusters. To achieve security in this scenario means understanding your vlans, dropping the right vNic on the right VM, and managing a typical firewall appliance (Cisco in my environment).

VMware vShield Zones have been introduced (actually bought from Blue Lane Technologies) supposedly to simplify the network security by implementing a firewall within your ESX farm. Sounds cool, right? It would be too, if it was done right.

I won’t go into the detail of how it works, and how to configure it, as you can read up on that by following the links on Rodos‘ blog.
There are loads of gotchas, and strange concepts at first, but they’re all well documented in the manual. The install process was flawless too, so what’s not to like?

Well:

  • It requires a vShield agent VM per vSwitch with a physical NIC attached. That means lots of additional VM’s for us.
  • It does not offer anywhere near enough reporting detail. No real time bandwidth monitors, just per hour statistics.
  • It does not offer any bandwidth controls like rate limiting or QoS.
  • But mostly IT DOES NOT SIMPLIFY ANYTHING.

On the contrary, as I doubt anybody will be throwing out their perimeter firewalls just yet, vShield adds a further layer to manage. Perhaps I’m missing something.

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to Ma.gnoliaAdd to TechnoratiAdd to Furl

 

VMware and iSCSI – explained

A colleague alerted me to a great post regarding iSCSI performance with specific reference to VMware ESX hosts.

I know many organisations operating VMware farms with iSCSI storage systems, and I expect many will fall foul of some of these excellent gotchas.  The most important of which is that you should really have multiple iSCSI targets if you want to maximise your performance.  Hence, make sure your iSCSI storage hardware supports presentation of LUN’s as individual targets.

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to Ma.gnoliaAdd to TechnoratiAdd to Furl