Ghetto Fabulous

Most environments running VMware would like some way to backup, protect and revision VMs. There are a number of commercial products that do a good job protecting VMs; products such as Veeam Backup and Replication, Quest Software (formerly Vizioncore ) vRanger and PHD Virtual Backup to name a few. This post will focus on the implementation of much lower cost (free) implementation of a backup and recovery solution for VMware. As with any free or open source software there is no right or wrong implementation model so this is a post that will talk about how ghettoVCB was implemented with Data Doman to enhance the protection of VMs.

Why?…

What was the driver behind the requirement for image level protection of VMs in this particular instance? Within the particular environment that I am referencing in this post the customer has a fairly large ESX farm at their production site. Most of the production infrastructure is replicated to a DR location with the exception of some of the “less critical” systems. The DR site also has some running VMs such as domain controllers, etc… also deemed “less critical” so these are not replicated. You may ask why these are not replicated, the short answer is the customer uses EMC RecoverPoint to replicate data from Site A to Site B in conjunction with VMware SRM to facilitate failover, until recently (VNX) RecoverPoint had a capacity based license so dollars were saved by only replicating critical systems. Backups are taken of all systems but this does not provide the ability to restore an older VM image. A storage migration was being done from an older SAN infrastructure to a new SAN infrastructure, the migration was deemed completed but there was one VMFS volume that was missed and never migrated, the OEM was contracted to a do a date erasure on the old SAN prior to removing it from the data center. It was at that time that the “less critical” systems were lost and everyone realized that they were not really “less critical”. VMs needed to be rebuilt, this was labor intensive and could have been avoided had a good VM backup strategy been in place.

Discussions around how to protect against this in the future started to occur, the interesting thing was as part of the new infrastructure Data Domain was implemented as a backup to disk target but there was no money left in the budget to implement a commercial VMware image level backup product. vGhetto ghettoVCB to the rescue! With a little bit of design vGhetto was implemented on all the ESX servers and has been running successfully for over a year.

How to get started…

Download the appropriate ghettoVCB code from the vGhetto Script Repository there are multiple versions (you should use the latest version, the implementation discussed in this post uses ghettoVCBg2). All of the prerequisites and usage is well documented on the vGhetto site. Take your time and read, don’t jump in to this without reading the documentation.

Note: You will have to edit configuration files for vGhetto to setup alerts, retention, backup locations, etc… be sure to read the documentation carefully.

The Implementation details…

High-level Topology

Note: Site A and Site B backups target share on each respective DD670 (e.g. \\siteADD670\siteAvmbackup for daily backups at Site A) these are replicated to the peer DD670. Replicated data is accessible at the target side by accessing the backup sharename (e.g. – \\siteADD670\siteAvmbackup replicated data would be accessible by accessing \\siteBDD670\backup\siteA_vm_backup_replica).

In the environment that this deployment was done all of the ESX servers are running ESX 4.1 full (not ESXi) so the service console was leveraged, deployment models can differ from using the remote support console to using the vMA (vSphere Management Assistant). This is why it is critical that you read the ghettoVCB documentation.

Step-bt-Step…

  • Develop and document an architecture / design, this will require a little planning to make deployment as easy as possible.
  • Create a CIFS of NFS share on the Data Domain or other CIFS/NFS target.
    • If you want to keep the cost to nearly zero I recommend Opendedup
    • In this case Data Domain 670s already existed in both locations
    • I created two shares in each location one for daily backups and one for monthly backups (see High-level topology)

The reason for two shares is that only one (1) monthly is retained on the monthly share and fourteen (14) daily backups are maintained on the daily share. There is a tape backup job monthly that vaults the VM image backups from the monthly share.

  • There are basically three tasks that need to be performed on every ESX server in the environment:
    • Mount the target backup share(s):
      • Create mountpoint: mkdir /mnt/backup
      • For NFS: mount servername|IP:/sharename /mnt/backup
      • For CIFS: mount -t cifs //servername|IP /sharename /mnt/backup -o username=USERNAME,password=PASSWORD
  • Add the target backup share(s) to /etc/fstab to make them persistent:
    • For CIFS: echo “//servername|IP /sharename /mnt/backup cifs credentials=/root/.smbcreds” >> /etc/fstab
Note: FOR CIFS create .smbcreds file that contains the CIFS share login credentials. This file should contain the following two lines:
username=cifs_user_name
password=cifs_user_password
 
    • For NFS: echo “servername|IP: /sharename /mnt/backup nfs [any NFS mount options] ” >> /etc/fstab
  • Create cron job(s):
    • Daily Job (runs Mondy thru Friday at midnight): 0 0 * * 1-5 root /mnt/backup/.files/ghettoVCB/ghettoVCB.sh -a > /mnt/backup/.files/logs/hostname_ghettoVCB.log 2>&1
    • Monthly Job (runs Saturday at midnight): 0 0 * * 6 root /mnt/monthly_backup/.files/ghettoVCB/ghettoVCB.sh -a > /mnt/monthly_backup/.files/logs/hostname_ghettoVCB.log 2>&1
Note: You will notice that the path to the ghettoVCB.sh is .files on the CIFS | NFS share, this is so I can make modifications post deployment and since all the ESX server us a shared location it is easy to maintain, more on this when I walk through my deployment methodology.

Note: crontab entries need to go in /etc/crontab. If you place them in the user crontab using crontab –e or vi /var/spool/cron/root it will NOT work.

Deployment…

Once you complete the above steps and test on a single server you are ready to roll out to all the servers in your environment. To simplify this I recommend storing the config files, scripts, etc… in a hidden directory on the CIFS or NFS share.

In my case I have a .files directory in the daily backup and monthly backup directories. This includes the ghettoVCB code, .smbcreds file and the deployment scripts.

Deployment Scripts:

 Note: The above scripts assumes a CIFS target, modify accordingly for a NFS target.

Deployment is easy, as new ESX servers come online using plink  I remotely execute a mount of the appropriate share, copy the deployment script to /tmp and execute.

All the changes are made to the fstab, cron, etc.. and VM image backups will now run on a regular basis.

Accessing backed up data…

You will now be able to browse the //servername|IP/sharename from any host and see your backups organized by date:

I use vmware-mount.exe which is part of the VMware Virtual Disk Development Kit  on the virtual center server to mount the backup vmdk files for individual file restores, obviously for a full restore I just copy the vmdk back to the production datastore.

The following are the key steps to mount a backed up vmdk:

  • Mount the CIFS share (if using NFS you can usually share the volume via CIFS of SMB as well and gain access from windows to use the process I am outlining here)
    • net use v: //servername|IP/sharename
    • net use

You should see something similar to this:

  • v:
  • dir (you should see all you VM backup dirs)
  • cd to the VM perform a recovery from
  • cd to the proper backup image
  • dir

This is what the above command sequence looks like:

  • Now mount the vmdk
    • vmware-mount.exe z: “2003 SP2 Template.vmdk”
    • You can verify a successful mount by just typing vmwre-mount.exe
  • z:
  • dir

You are now looking at the c: drive from the “2003 SP2 Template” VM from January 24, 2012.

You can navigate and copy files just like any normal drive.

Oracle Storage Guy: Direct NFS on EMC NAS

I have been chomping at the bit to test VMware on dNFS on EMC NAS for a couple of reasons.  A number of my customers who are looking at EMC NAS in particular the NS20 would like to consolidate storage, servers, file services, etc… on to a unified platform and leverage a single replication technology like Celerra Replicator.   dNFS may offer this possibility, .vmdks can now reside on the a NFS volume, CIFS shares can be consolidated to the the NS20 and all can be replicated with Celerra Replicator.  The only downside to this solution that I can see is right now the replicated volumes will be crash consistent copies but I think with some VMware scripting even this concern can be addressed.  I hope to stand this configuration up in the lab in the next couple of weeks so I should have more detail and a better idea of is viability shortly.  You may be wondering why this post entitled Oracle Storage Guy…… the answer is I was searching the blogsphere for an unbiased opinion and some performance metrics of VMware and dNFS and this was the blog that I stumbled upon.

The performance numbers I have seen for VMware on dNFS come very close to the numbers I have seen for iSCSI, both technologies offer benefits but for the use case I mention above dNFS may become very compelling.  I recommend reading this post Oracle Storage Guy: Direct NFS on EMC NAS, is offers some great commentary on the performance characteristics and benefits of dNFS.

Open source virtualization

As I mentioned in my previous post I spend this weekend working on rebuilding my home desktop.  You will be happy to note that my VPN connection is now working.  Following the install I decided to test out VirtualBox an OSS (Open-source software) desktop virtualization product that holds some promise to compete with VMware.  All I can say is WOW!  I am a VirtualBox convert, as I am sure most OSS junkies will be.  The product is a snap to install and while I have no idea how it will run on Windows the install on Linux was simpler than my previous VMware install and the footprint is lighter.  VirtualBox only loads a single deamon vs VMware’s three daemons, this is important to me since I only start my VM when the use of Windows is absolutely necessary (aka – a dyer situation).  Of course I did some poking around and found this benchmark of VMware vs. VirtualBox:

  native VirtualBox Vmware
make 64:03 min 107:29 min 101:40 min
grep (100 MByte) 6,7 s 20,2 s 18,1 s

Courtesy of a German review of VirtualBox – (http://www.heise.de/open/artikel/83678)

Right now I am very happy with VirtualBox and I think that we will see VirtualBox become more pervasive in the OSS community and now that QEMU is open sourced other desktop OSS virtualization products based on QEMU may pop up and erode some of the market for commercial x86 products.  I think desktop products like VirtualBox will begin to take a grass roots hold long before products like Xen compete in the corporate data center.  After all I remember back to when I first loaded VMware Workstation 3 on my RedHat 7.3 machine, the delight of being able to run pesky Windows apps without dual booting. I felt a twinge of that same feeling when VirtualBox installed and worked in less than 2 minutes.  After all, in my opinion Linux user looking to run Windows apps were instrumental in putting VMware on the map, is a possible that there is now an OSS alternative knocking on the door of the VMware desktop community?  When will workstation go the way of VMware GSX – FREE?  How long can VMware hold on to the desktop without becoming free?  Eventually all hypervisors will need to be free – it should be interesting to watch.

VMware Virtual Machine shutdown, startup, etc….

So I did some additional testing on the claim that vmware-cmd will not shutdown a VM that is locked at the console and I have not experienced the problem what I did find was the bash scripts that scripts that worked on my ESX 2.x server did not work on my VMware Server machine – it appears that _spaces_ in the directory structure and/or vm file name cause the script to fail.

These are examples of the original scripts:

getstate.sh
#!/bin/bash
echo “Getting running state of VM Guests…”
for vm in vmware-cmd -l
do
vmware-cmd “$vm” getstate $vm
done

stopall.sh
#!/bin/bash
echo “Stopping all running VM Guests…”
for vm in vmware-cmd -l
do
vmware-cmd “$vm” stop trysoft hard
done

The results on my ESX 2.x server for getstate.sh is the following:
getstate(/home/bocchrj/vmware/rh62_1/linux.vmx) = off
getstate(/home/bocchrj/vmware/rh62_2/linux.vmx) = off


On my VMware Server box this is the output of getstate.sh:
Getting running state of VM Guests…
/usr/bin/vmware-cmd: Could not connect to VM /Virtual_Machines/Windows
  (VMControl error -11: No such virtual machine: The config file /Virtual_Machines/Windows is not registered.
Please register the config file on the server.  For example:
vmware-cmd -s register “/Virtual_Machines/Windows”)
/usr/bin/vmware-cmd: Could not connect to VM XP
  (VMControl error -14: Unexpected response from vmware-authd: Invalid pathname: XP)
/usr/bin/vmware-cmd: Could not connect to VM Professional
  (VMControl error -14: Unexpected response from vmware-authd: Invalid pathname: Professional)
/usr/bin/vmware-cmd: Could not connect to VM SNMP
  (VMControl error -14: Unexpected response from vmware-authd: Invalid pathname: SNMP)
/usr/bin/vmware-cmd: Could not connect to VM Tools/Windows
  (VMControl error -14: Unexpected response from vmware-authd: Invalid pathname: Tools/Windows)
/usr/bin/vmware-cmd: Could not connect to VM XP
  (VMControl error -14: Unexpected response from vmware-authd: Invalid pathname: XP)
/usr/bin/vmware-cmd: Could not connect to VM Professional.vmx
etc…..

I took the time to write a slightly more robust perl script to stop and start VMs that appears to work well on both ESX 2.x and VMware Server (I only tested this on VMware server running on on a Linux host but it should work on a Windows host with perl installed).  If you would like an executable (exe) version for windows Email me and I can provide it to you.

#!/usr/bin/perl -w
#vmpower.pl
#RJB – 1/23/2007

use strict;
my $command;
my $switch;

if ($ARGV[0] eq “help”) {
    &usage;
    }
if ($ARGV[0] eq “getstate” || $ARGV[0] eq “stop” || $ARGV[0] eq “start” || $ARGV[0] eq “reset”) {
    &power;
    }
else{
    &error;
    }

sub power {
$command = “vmware-cmd -l”;
print “==> $command\n”;
if (system(“$command > .vmtmpfile”) == 0) {
  print ” success, exit status = $?\n”;
} else {
  print ” failure, exit status = $?\n”;
}

open (VM, ‘.vmtmpfile’);
while (<VM>) {
chomp;
$command = “vmware-cmd”;
$switch = $ARGV[0];
print “==> $command \”$_\” $ARGV[0]\n”;
system(“$command \”$_\” $switch”);
    if ( $? == 0 ) {
      print ” success, exit status = $?\n”;
    } else {
      print ” failure, exit status = $?\n”;
    }
}
close (VM);
system (“rm -f .vmtmpfile”);
}

sub usage {
system “clear”;
print “VM statup and shutdown script for ESX 2.x and VMware Server\n”;
print “vmpower.pl\n\n”;
print “Usage:  vmpower.pl [getstate|start|stop|reset]\n\n”;
exit;
}

sub error {
    print ” error\n”;
    print ” \”vmpower.pl help\” – for usage instructions\n\n”;
    exit;
}

Shutting down a Windows VM when the console is locked

It was brought to my attention that vmware-cmd can not gracefully shutdown a Windows VM when the console is locked.  Yesterday I spent sometime researching the issue and the guys at Sysinternals once again have coded a quality utility that appears to have solved the problem.  I only hope that they can maintain the same level of quality now that Microsoft has acquired them.

Download psshutdown here

To shutdown a single host the syntax looks like this:
psshutdown \\host -t 5 -f -m “Shutting down in 5 seconds”

To shutdown multiple hosts psshutdown takes input from a file (e.g. – hosts.txt)  where the host names or IP are separated by a .:
psshutdown @hosts.txt -t 5 -f -m “Shutting down in 5 seconds”

To shutdown a single host when not logged in as a privileged user you will need to pass credentials to the the psshutdown command:
psshutdown @hosts.txt -t 5 -f -m “Shutting down in 5 seconds” -u administrator -p password

The standard windows cli shutdown command also works if you use the force option (-f).  The thing I really like about psshutdown is the ability to pass a file with a list of hosts and the ability to pass user credentials.