Cisco VPN Client – Decrypted: 0 woes

For a long time I have used Cisco VPN client on my Windows 7 computers.  I use it to provide IPSec VPN tunnels to Cisco ASA firewalls and it works well enough for me to not resort to ShrewSoft.

Until today.

I wasted about an hour trying to work out why my VPN session would establish but not decrypt any packets.  Sending encrypted packets was fine, but I got nothing back.  It didn’t matter which ASA I was connecting to, so I figured this was a client issue.

Long story short – the Cisco VPN client will do this if you have more than one IP address assigned to your local LAN interface.  I had added a second to configure an access point earlier in the week, and left it in place without considering it could affect the VPN client.  After removing this second IP address, the session traffic traversed the tunnel as normal.

vCheck syslog plugin update

I regularly use Alan Renouf’s excellent vCheck powershell utility to help me manage and maintain some sort or order with my ESXi hosts.

Unfortunately the good people at VMware are charging ahead advancing the features of vSphere, which means that some useful powercli commands are deprecated from time to time.  This can break some vCheck plugins and hence the authors are often pestered for updates to support the newer versions of ESXi.

I am in the process of validating plugins which are broken, and adapting them to support new releases whilst still having backwards compatibility.  Of course I am sharing this info with the original authors, whom no doubt can code a little prettier than me, but at least I have an interim solution.  Anyway, here is my first one to address the new way in which the syslog server detials is queried on ESXi 5.x based upon the good work of Jonathan Medd‘s original plugin:

# Start of Settings
# The Syslog server which should be set on your hosts
$SyslogServer =”syslog.domain.local”
# End of Settings

$ESXiSyslog = @()
$ESXiSyslog += $VMH | Where { $_.Version -lt 5.0 } | Where {$_.ConnectionState -eq “Connected” -or $_.ConnectionState -eq “Maintenance”} | Select Name, @{Name=’SyslogServer’;Expression={($_ | Get-VMHostSysLogServer).Host}} | Where-Object {$_.host -ne $syslogserver}
$ESXiSyslog += $VMH | Where { $_.Version -ge “5.0.0” } | Where {$_.ConnectionState -eq “Connected” -or $_.ConnectionState -eq “Maintenance”} | Where {$_.ExtensionData.Summary.Config.Product.Name -match “i”} | Select Name, @{Name=”SyslogServer”;Expression={(Get-VMHost $_.Name | Get-VMHostAdvancedConfiguration -Name Syslog.global.logHost).Values}}

$Result = @($ESXiSyslog | Where { $_.SyslogServer -ne $syslogserver})
$Result

$Title = “Hosts with incorrect or empty Syslog Server defined”
$Header = “Hosts with incorrect or empty Syslog Server defined : $(@($Result).count)”
$Comments = “The following hosts do not have the correct Syslog settings which may cause issues if ESXi hosts experience issues and logs need to be investigated”
$Display = “Table”
$Author = “John Murray based on orginal scripts from Alan Renouf & Jonathan Medd”
$PluginVersion = 1.2
$PluginCategory = “vSphere”

 

 

vSphere vDR 1.2 LVM limitation and workaround

One of our users at virtualDCS was recently experiencing problems recovering data using the vSphere ‘VMware Data Recovery’ (vDR) release 1.2 within their CentOS Linux VM using the ‘File Level Recovery’ (FLR) tool.

I won’t go into detail with regards to describing the tool, as it has been expertly described here, and documented here already.

Although the vDR appliance was reporting successful backups, the FLR utility was not mounting all the partitions when accessing a selected restore point. The virtual machine in question was running CentOS 5.4 32bit, and had just a single vmdk but had a specific partition layout which caused issues with vDR.

The disk was configured with a small /boot partition and two larger LVM partitions as follows:

[root@localhost ~]#fdisk -l

Disk /dev/sda: 85.8 GB, 85899345920 bytes
255 heads, 63 sectors/track, 10443 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

 Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14        1305    10377990   8e  Linux LVM
/dev/sda3            1306       10443    73400985   8e  Linux LVM

It was these two 8e LVM partitions that vDR had issue with.

The standard vDR FLR mount looked ok, but only recovered the non-LVM partition on /dev/sda1 as follows:

[root@localhost /opt/vdr/VMwareRestoreClient]#./VdrFileRestore -a 172.16.1.10

(98) "Mon Feb  7 00:09:09 2011"
(99) "Tue Feb  8 02:53:03 2011"

Please input restore point to mount from list above
98
Created "/root/2011-02-07-00.09.09/Mount1"

Restore point has been mounted...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
root mount point -> "/root/2011-02-07-00.09.09"

Please input "unmount" to terminate application and remove mount point

In order to see the details of the problem you need to run the FLR tool in verbose mode with the -v switch as follows:

[root@localhost /opt/vdr/VMwareRestoreClient]#./VdrFileRestore -a 172.16.1.10

(98) "Mon Feb  7 00:09:09 2011"
(99) "Tue Feb  8 02:53:03 2011"

Please input restore point to mount from list above
98
findRestorePointNdx: searching for 98
Restore Point 98 has been found...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
"/SCSI-0:2/"
"Mon Feb  7 00:09:09 2011"

Initializing vix...
VixDiskLib: config options: libdir '/opt/vdr/VMwareRestoreClient/disklibpluginvcdr', tmpDir '/tmp/vmware-root'.
VixDiskLib: Could not load default plugins from /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libdiskLibPlugin.so: Cannot open library: /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libdiskLibPlugin.so: cannot open shared object file: No such file or directory.
DISKLIB-PLUGIN : Not loading plugin /opt/vdr/VMwareRestoreClient/disklibpluginvcdr/plugins32/libvdrplugin.so.1.0: Not a shared library.
VMware VixDiskLib (1.2) Release build-254294
Using system libcrypto, version 9080CF
VixDiskLib: Failed to load libvixDiskLibVim.so : Error = libvixDiskLibVim.so: cannot open shared object file: No such file or directory.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/etc/vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /etc/vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/usr/lib/vmware/settings": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /usr/lib/vmware/settings. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/usr/lib/vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /usr/lib/vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/root/.vmware/config": No such file or directory.
----------------------------------------
PREF Optional preferences file not found at /root/.vmware/config. Using default values.
Msg_Reset:
[msg.dictionary.load.openFailed] Cannot open file "/root/.vmware/preferences": No such file or directory.
----------------------------------------
PREF Failed to load user preferences.
DISKLIB-LINK  : Opened 'vdr://vdr://vdrip:1.1.1.40<>vcuser:<>vcpass:<>vcsrvr:<>vmuuid:<>destid:39<>sessdate:129415109490000000<>datastore:P4500-DS05<>vmdk_name:CusClone1-WEB-1.vmdk<>oppid:4499' (0x1e): plugin, 167772160 sectors / 80 GB.
DISKLIB-LIB   : Opened "vdr://vdr://vdrip:1.1.1.40<>vcuser:<>vcpass:<>vcsrvr:<>vmuuid:<>destid:39<>sessdate:129415109490000000<>datastore:P4500-DS05<>vmdk_name:CusClone1-WEB-1.vmdk<>oppid:4499" (flags 0x1e, type plugin).
DISKLIB-LIB   : CREATE CHILD: "/tmp/flr-4499-w4U6cS" -- twoGbMaxExtentSparse grainSize=128
DISKLIB-DSCPTR: "/tmp/flr-4499-w4U6cS" : creation successful.
PREF early PreferenceGet(filePosix.coalesce.enable), using default
PREF early PreferenceGet(filePosix.coalesce.aligned), using default
PREF early PreferenceGet(filePosix.coalesce.count), using default
PREF early PreferenceGet(filePosix.coalesce.size), using default
PREF early PreferenceGet(aioCusClone1r.numThreads), using default
--- Mounting Virtual Disk: /tmp/flr-4499-w4U6cS ---
SNAPSHOT: IsDiskModifySafe: Scanning directory of file /tmp/flr-4499-w4U6cS for vmx files.
Disk flat file mounted under /var/run/vmware/fuse/2848693010656666867
VixMntapi_OpenDisks: Mounted disk /tmp/flr-4499-w4U6cS at /var/run/vmware/fuse/2848693010656666867/flat.
Mounting Partition 1 from disk /tmp/flr-4499-w4U6cS
Created "/root/2011-02-07-00.09.09/Mount1"
MountsDone: LVM volume detected, start: 106928640, flat file: "/var/run/vmware/fuse/2848693010656666867/flat"
MountsDone: LVM volume detected, start: 10733990400, flat file: "/var/run/vmware/fuse/2848693010656666867/flat"
System: running "lvm version 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4701: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4701: sh
LVM version:     2.02.46-RHEL5 (2009-09-15)
Library version: 1.02.32 (2009-05-21)
Driver version:  4.11.5
System: end results...
System: command "lvm version 2>&1" completed successfully
LoopMountSetup: Setup loop device for "/dev/loop1" (offset: 106928640) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopMountSetup: Setup loop device for "/dev/loop2" (offset: 2144055808) : "/var/run/vmware/fuse/2848693010656666867/flat"
System: running "lvm vgdisplay 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4706: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4706: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
--- Volume group ---
VG Name               VolGroup00
System ID
Format                lvm2
Metadata Areas        1
Metadata Sequence No  24
VG Access             read/write
VG Status             resizable
MAX LV                0
Cur LV                4
Open LV               4
Max PV                0
Cur PV                2
Act PV                1
VG Size               79.88 GB
PE Size               32.00 MB
Total PE              2556
Alloc PE / Size       2556 / 79.88 GB
Free  PE / Size       0 / 0
VG UUID               KTg9lK-J48t-P6sw-03lC-TjAX-d5n6-8qcAEx

System: end results...
System: command "lvm vgdisplay 2>&1" completed successfully
LVMFindInfo: found "VG Name" -> "VolGroup00"
System: running "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvscan /dev/loop1 /dev/loop2 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4710: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4710: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
PV /dev/loop1       VG VolGroup00   lvm2 [9.88 GB / 0    free]
PV unknown device   VG VolGroup00   lvm2 [70.00 GB / 0    free]
Total: 2 [79.88 GB] / in use: 2 [79.88 GB] / in no VG: 0 [0   ]
System: end results...
System: command "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvscan /dev/loop1 /dev/loop2 2>&1" completed successfully
System: running "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvdisplay /dev/loop1 /dev/loop2 2>&1"
System: start results...
File descriptor 3 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4714: sh
File descriptor 4 (pipe:[1356862]) leaked on lvm invocation. Parent PID 4714: sh
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
Couldn't find device with uuid '1KPTt2-2Kya-Wk4H-MDz7-0tgJ-a82T-N6OsIX'.
No physical volume label read from /dev/loop2
Failed to read physical volume "/dev/loop2"
--- Physical volume ---
PV Name               /dev/loop1
VG Name               VolGroup00
PV Size               9.90 GB / not usable 22.76 MB
Allocatable           yes (but full)
PE Size (KByte)       32768
Total PE              316
Free PE               0
Allocated PE          316
PV UUID               Qqk2st-jiXP-k281-A1Ug-nCtM-rn0a-I8eXlX

System: end results...
System: command "env LVM_SYSTEM_DIR=/tmp/flr-4499-2kqxja/ lvm pvdisplay /dev/loop1 /dev/loop2 2>&1" failed with error 1280
LoopDestroy: Removed loop device "/dev/loop1" (offset: 106928640) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopDestroy: Removed loop device "/dev/loop2" (offset: 10733990400) : "/var/run/vmware/fuse/2848693010656666867/flat"
LoopMountSetup: LVM mounts terminating due to fatal error
VdrVixMountDone: Failed 1

Restore point has been mounted...
"/vcenter.domain.homelab/Datacentre One/host/Clus1/Resources/CustomerClone/CusClone1/CusClone1-WEB-1"
root mount point -> "/root/2011-02-07-00.09.09"

Please input "unmount" to terminate application and remove mount point

Once again, only the /boot non-LVM partition held on /dev/sda1 was mounted. You can see from the results above that the LVM mounts failed due to a fatal error.

I wasn’t sure whether there was an undocumented incompatibility with my LVM version or fuse version, so I took the easy route and logged a call with VMware Support SR# 1589684961. After eliminating the obvious the ticket was escalated to a Research Engineer who was excellent (aren’t they all?). He told me that VMware was aware of an issue with multiple LVM partitions and were expecting to include a fix in an upcoming relase of vDR.

That was great, but my customer needed to ensure his backup process allowed FLR restores. I had to find a workaround that could be implemented without requiring a reboot as the virtual machine in question was aiming for 100% uptime.

My plan was to add a new vmdk to the VM, and migrate the data off the two existing LVM partitions, remove them both, than create a single LVM partition on the original disk and migrate the data back, before removing the temporary disk.

This is the procedure I used:

***hot add new 80GB thin SCSI disk as SCSI0:1

echo "scsi add-single-device" 0 0 1 0 > /proc/scsi/scsi

***partition the new disk

fdisk /dev/sdb
n
p
1

Accept first and last cylinders to use all space

***format partition as LVM type 8e

t
1
8e
w

***prepare the new partition for LVM

pvcreate /dev/sdb1

***add the partition to the existing LVM Volume Group

vgextend VolGroup00 /dev/sdb1

***move the data off /dev/sda2 and /dev/sda3

pvmove /dev/sda2 /dev/sdb1
pvmove /dev/sda3 /dev/sdb1

***remove /dev/sda2 and /dev/sda3 from the VolGroup

vgreduce VolGroup00 /dev/sda2
vgreduce VolGroup00 /dev/sda3

***unprepare the original partitions

pvremove /dev/sda2
pvremove /dev/sda3

***delete the original partitions and create a single new bigger one

fdisk /dev/sda
d
2
d
3
n
p
2

Accept first and last cylinders to use all space

t
2
8e
w

***instead of rebooting to recognise the partition you can just run

partprobe

I didn’t have parted installed, so before I could probe the partitions, I had to run

yum install parted

***prepare the new partition for LVM:

pvcreate /dev/sda2

***add the partition to the existing Vol Group

vgextend VolGroup00 /dev/sda2

***next move the data back off /dev/sdb1

pvmove /dev/sdb1 /dev/sda2

***remove the temp disk from the LVM Volume Group

vgreduce VolGroup00 /dev/sdb1

***unprepare the partition

pvremove /dev/sdb1

***delete the partition

fdisk /dev/sdb
d
1

***remove the temporary disk from the virtual machine using vcenter then finally:

echo "scsi remove-single-device" 0 0 1 0 > /proc/scsi/scsi

I have no idea if the above will be of use to anyone, so please let me know if you find it helpful in any way. The new version of vDR will include a new version of the FLR tool anyway so let’s hope the issue is resolved in that.

SMTP, ESMTP, and the BDAT baddie

I recently had to troubleshoot a problem with an external SMTP service which was having difficulty delivering mail to our corporate mail server.  The delivering service was running Windows 2003 Standard and using the built-in Simple Mail Transfer Protocol (SMTP) service from IIS 6.0.  The receiving service was running Windows Server 2008, but also MS Exchange Server 2007 SP2.

Basically messages were not being received reliably.  Some came through and some didn’t.  The Message Tracking logs on Exchange 2007 didn’t yield much useful information, but before I turned up the logging level for the transport role, I took a look at the sending mail system.

Within C:\Windows\System32\LogFiles\SMTPSVC1 I found the most recent log file which recorded the following basic data around the failed email transmission:

22:15:27 172.16.1.10 – – 0
22:15:27 172.16.1.10 EHLO – 0
22:15:27 172.16.1.10 – – 0
22:15:27 172.16.1.10 MAIL – 0
22:15:27 172.16.1.10 – – 0
22:15:27 172.16.1.10 RCPT – 0
22:15:27 172.16.1.10 – – 0
22:15:27 172.16.1.10 BDAT – 0

I already knew that many security appliances do not like the new ESMTP BDAT command, so I Googled around and found this JoeKiller article which shed a little light on the subject, and that it was possible to force the session to not use the BDAT command at all.

By telneting to the service ‘telnet localhost 25’ and typing ‘ehlo’, the SMTP will list ESMTP verbs that it supports:

I knew I needed to remove BINARYMIME and CHUNKING, however little was mentioned regarding the exact steps to take, which in turn prompted this post.

Fortunately, I already had the IIS6.0 Resource Kit installed so was quick to find the SmtpInboundCommandSupportOptions value by opening the IIS Metabase Explorer, and navigating to LM\SmtpSvc\1

Here the default value was 7697601.  I knew that I wanted to disable the BINARYMIME and CHUNKING verbs so using the table here I subtracted 2097152 (BINARYMIME) and 1048576 (CHUNKING) from 797601:

7697601-2097152-1048576 = 4551873

I then set the SmtpInboundCommandSupportOptions value to 4551873, closed the IIS Metabase Explorer and restarted the IIS Admin Service (which in turn restarts the Simple Mail Transfer Protocol (SMTP) service).  Now the server only advertises and uses the following verbs:

Next was to restrict the sending of SMTP mail to not use the BDAT command either.  Back to the IIS Metabase Explorer, and change the value of SmtpOutboundCommandSupportOptions from 7 to 5.

Job done. Now I have a more firewall friendly mail host.

HowTo: Quiesced snapshots of Forefront TMG virtual machines

I was recently asked to look into a problem a client was having with his vSphere vDR backup routines. All guest machines were being successfully backed up apart from one.

The virtual machine that was failing was the proxy server running MS Forefront TMG on a Windows 2008 R2 guest.  The error reported that that a snapshot had failed with error:

 "Failed to create snapshot for proxy, error -3960 ( cannot quiesce virtual machine)"

I started by taking a manual quiesced snapshot to test it outside of vDR:

Sure enough, this produced  a similar error –

"Cannot create a quiesced snapshot because the create snapshot operation exceeded the time limit for holding off I/O in the frozen virtual machine."

Obviously I began scouring the forum posts at http://communities.vmware.com but found a whole range of posts regarding quiesced snapshots, all of which I found rather confusing.  All the virtual machines on this system were built from the same template.  All therefore had the same VMware tools installed, in the same way, and only one was having this issue.

Some of the knowledge base articles looked interesting, but didn’t get me any closer to the solution either – http://kb.vmware.com/kb/1009073 and http://kb.vmware.com/kb/1007696

In the end, I decided to investigate the TMG services to see if they were causing the VSS to fail themselves.  When all TMG services were stopped, the snapshots worked with no errors!!!  I then began selectively stopping them to see which was causing the problem, and found my culprit – the ISASTGCTRL service.  This service, described here by Marc Grote, is used to store the TMG configuration in the AD-LDS (Active Directory – Lightweight Directory Service).  When the service is running, snapshots fail, and when stopped, they succeed.

In order to allow quiesced snapshots to be taken, I had to create a freeze and thaw script procedure as follows:

Within the guest operating system, I created the following folder C:\Program Files\VMware\VMware Tools\BackupScripts.d\ This folder is not created by default when the VMware tools install, but is required if you want to add pre-snapshot and post-snapshot scripts as I did.  Within this folder I created a txt file called vss.bat with the following contents:

@echo off
if %1 == freeze goto freeze
if %1 == thaw goto thaw
if %1 == freezeFail goto freezeFail

:freeze
net stop "ISASTGCTRL" /Y
exit
:thaw
net start "ISASTGCTRL" /Y
exit
:freezeFail
net start "ISASTGCTRL" /Y
exit

Hence, when the snapshot is called by VMware tools, it first checks this folder for any scripts to run. When the snapshot is taken, it passed an argument ‘freeze’ to the script, and when the snapshot is finished, it passed the argument ‘thaw’ to the script. In doing so, the script successfully stops and then re-starts my problematic service.

Now when the scheduled VMware vDR process runs, the appliance is able to take a quiesced snapshot successfully.  Happy days.

vSphere ESX4: Hot Add CPUs for Linux Guests

Sometime ago within my post entitled vSphere ESX4: Hot Add Memory for Linux Guests, I promised to blog about hot add CPU support to the same VM.  As you may have guessed, I didn’t get around to writing it, and have subsequently found the area well covered elsewhere.

wila has knocked together a very clear procedure here http://communities.vmware.com/docs/DOC-10493 which kind of makes my post unnecessary.

My earlier post has been referenced as a resource by lamw for his helpful guide here: http://communities.vmware.com/docs/DOC-10492

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to Ma.gnoliaAdd to TechnoratiAdd to Furl

vSphere ESX4: Hot Add Memory for Linux Guests

I was asked recently if I could hot-add some RAM to a client’s virtual machine over at virtualDCS. Most of the information I’d found online related to Windows Server versions, but I needed to hot add resources to a Linux VM. VMware.com was lacking in detail about the hot-add compatibility with client operating systems, so I realised I’d better lab it up and see how it works for myself.

The first problem I had, was that the virtual machine I’d cloned from my clients live VM, was originally built using ESX3.5. Hence, it was VM version 4, and hot add hardware is not supported unless the VM hardware is upgraded to version 7. In order to enable hot-add features, I had to first upgrade VMware Tools, and then shut down the VM again to upgrade the virtual hardware to version 7.

Once this had been done, I made sure the VM General Options (VM > Edit Settings > Options > General Options) was set to the correct OS type. This important, as the interface will only display the Memory/CPU Hotplug options for supported OSes. In my case I was running CentOS 5.3 x86_64, so selected Other Linux 2.6.

General Options

Next I enabled the Hot Add CPU and Memory as below, but was unable to check the radio button for Hot Remove CPU, which is interesting in relation to what I found when playing with Hot Add CPUs (discussed in an upcoming post).

HotAdd-Remove

I found that the CentOS build I was using (2.6.18-128.el5) recognises hot added memory automatically. A colleague (thanks Stu) recommended I read the Linux Hotplug Memory docs which made the rest fairly obvious.

My VM was running with 512MB RAM, so I added some more via the vCenter console, so my VM now had 1GB RAM allocated to it. (BTW: even though vCenter appears to let you do this for the 32bit guest version, it doesn’t actually work. The task is reported as successful, but when you check the VM properties again, you’ll see the RAM was not added.)

When memory is hotplugged, the kernel recognizes new memory, makes new memory management tables, and makes sysfs files for new memory’s operation.
If firmware supports notification of connection of new memory to OS, this phase is triggered automatically. ACPI can notify this event. If not, “probe” operation by system administration is used instead.

Now comes the interesting part. Within

/sys/devices/system/memory

there are a number of folders all named ‘memoryX’ where X represents a unique ‘section’ of memory. How big each section is, and hence how many folders you have is dependent on your environment, but you can check the file

/sys/devices/system/memory/block_size_bytes

to view the size of sections in bytes. Basically, the whole memory has been divided up into equal sized chunks as per the SPARSEMEM memory model.

In each section’s folder there is a file called ‘state’, and in each file is one of two words; online or offline.
Locate the memoryX folder(s) which account for the hot added memory by working out the section sizes above, or (like me), just check the contents of the state files:

#cat /sys/devices/system/memory/memoryX/state

Once you locate the offline sections, you can bring them online as follows:

#echo online > /sys/devices/system/memory/memoryX/state

Validate the memory change is seen, using:

#free

That’s it! Quite simple really.

UPDATE: I noticed that William Lam (lamw on the VMware communities) created a nice script to automate the discovery and online process.  It’s very neat and can be downloaded here:

#!/bin/bash
# William Lam
# http://engineering.ucsb.edu/~duonglt/vmware/
# hot-add memory to LINUX system using vSphere ESX(i) 4.0
# 08/09/2009

if [ "$UID" -ne "0" ]
 then
 echo -e "You must be root to run this script.\nYou can 'sudo' to get root access"
 exit 1
fi

for MEMORY in $(ls /sys/devices/system/memory/ | grep memory)
do
 SPARSEMEM_DIR="/sys/devices/system/memory/${MEMORY}"
 echo "Found sparsemem: \"${SPARSEMEM_DIR}\" ..."
 SPARSEMEM_STATE_FILE="${SPARSEMEM_DIR}/state"
 STATE=$(cat "${SPARSEMEM_STATE_FILE}" | grep -i online)
 if [ "${STATE}" == "online" ]; then
 echo -e "\t${MEMORY} already online"
 else
 echo -e "\t${MEMORY} is new memory, onlining memory ..."
 echo online > "${SPARSEMEM_STATE_FILE}"
 fi
done

Add to FacebookAdd to NewsvineAdd to DiggAdd to Del.icio.usAdd to StumbleuponAdd to RedditAdd to BlinklistAdd to Ma.gnoliaAdd to TechnoratiAdd to Furl