ScaleIO – Chapter I: Frenemies? The story of a scale-out frenemietecture.

So this post is a slightly modified version of some internal documentation that I shared with my management, the folks at Dell who graciously donated the compute, PCIe SSDs and 10 Gig network for this project and the folks at EMC who of course donated the the ScaleIO licensing (and hopefully soon the ViPR 2.0 licensing).  Due to the genesis of this post and my all around lack of time for editing some of the writing and tense in this post may not always be logical.

Just about everyone knows that Dell and EMC aren’t exactly best friends these days but could there be a better match for this architecture?  Cough, cough, Supermicro, cough, cough Quanta…. but seriously the roll your own Supermicro, Linux, Ceph, Swift, etc… type architecture isn’t for everyone, some people still want reasonably supported hardware and software at pricing that rivals the likes of Supermicro and OSS (Open-source software).  BTW, there is a cost to OSS, it’s called your time.  Think I need to build a private scale-out architecture, I want it to be lower cost, high performance, support both physical and virtual environments and I want the elasticity and the the ability to scale to the public cloud and oh yeah, I want a support mechanism that is enterprise class for both the hardware and software that I deploy as part of this solution.

Most have heard the proverb “the enemy of my enemy  is my friend”, the reality is that Dell and EMC are fenemies whether they know it or not, are willing to admit it or not because I am currently implementing Chapter III in this series and trust me the enemy (competition) is a formidable one, known as the elastic public cloud!  Take your pick, AWS, Google, Azure, ElasticHosts, Bitnami, GoGrid, Rackspcae, etc…  Will they replace the private cloud, probably not (at least not in the foreseeable future) as there are a number of reasons the private cloud needs to exist and will continue to exist, reasons like regulations, economics, control, etc…

In a rapidly changing landscape where the hardware market is infected with the equivalent of the ebola virus, hemorrhaging interest, value and margin.  The sooner we accept this fact and begin to adapt (really adapt) the better our chances of avoiding extinction.  Let’s face it there are many OEMs, VARs, individuals, etc… who are more focused on containment rather than a cure.  All of us who have who have sold, architected, installed, maintained, etc… traditional IT infrastructure face a very real challenge from a very real threat.  The opposing force possess the will and tactics of the Spartans and the might of the Persians, if we (you and I) don’t adapt and think we can continue with business as usual,  more focused on containment than curing our own outdated business models we will face a very real problem in the not so distant future.  Having said the aforementioned there is no doubt that EMC is hyperfocused on software, much of it new (e.g. – ViPR, ScaleIO, Pivotal, etc…) and many tried and true platforms already instantiated in software or planned to be (e.g. – RecoverPoint, Isilon, etc…).  As compute costs continue to plummet more functionality can be supported at the application and OS layers which changes the intelligence needed from vendors.  In the IT plumbing space (specifically storage) the dawn of technologies like MS Exchange DAGs and SQL AlwaysOn Availability Groups have been a significant catalyst for the start of a significant shift, the focus has begun to move to features like automation rather than array based replication.

The market is changing fast, we are all scrambling to adapt, figure out how we will add value in the future of tomorrow.  I am no different than anyone else, spending my time and money on AWS.

image

Anyway there is too much to learn and not enough time, I read more than ever on my handheld device (maybe the 5” screen handheld device is a good idea, always thought it was too large).  As I work on Chapter II of this series I found myself at dinner the other night with my kids reading documentation on the Fabric documentation.  Trying to decide is I should use Fabic to do automate my deployment or just good old shell scripts and the AWSCLI, then my mind started wondering to what do I do after Chapter III, maybe there is a Chapter IV and V with different instance types or maybe I should try Google Compute or Azure, so many choices so little time Smile

Update:  Chapter II and Chapter III of this series already completed and I have actually begun working on Chapter IV.

For sure there will be a ScaleIO and ViPR chapter but I need to wait for ViPR 2.0

This exercise is just my humble effort to become competent in the technologies that will drive the future of enterprise architecture and hopefully say somewhat relevant.

High-level device component list for the demo configuration build:

  • Server Hardware (Qty 4):
    • Dell PowerEdge R620, Intel Xeon E5-2630v2 2.6GHz Processors, 64 GB of RAM
    • PERC H710P Integrated RAID Controller
    • 2 x  250GB 7.2K RPM SATA 3Gbps 2.5in Hot-plug Hard Drive
    • 175GB Dell PowerEdge Express Flash PCIeSSD Hot-plug
  • Networking Hardware:
    • Dell Force10 S4810 (10 GigE Producution Server SAN Switch)
    • TRENDnet TEG-S16DG (1 GigE Management Switch)

High-level software list:

  • VMware ESX 5.5.0 build 1331820
  • VMware vCenter Server 5.5.0.10000 Build 1624811
  • EMC ScaleIO:  ecs-sdc-1.21-0.20, ecs-sds-1.21-0.20, ecs-scsi_target-1.21-0.20, ecs-tb-1.21-0.20, ecs-mdm-1.21-0.20, ecs-callhome-1.21-0.20
  • Zabbix 2.2
  • EMC ViPR Controller 1.1.0.2.16
  • EMC ViPR SRM Suite
  • IOzone 3.424
  • Ubuntu 14.04 LTS 64 bit (Benchmark Testing VM)

What the configuration physically looks like:Image(7)

Topology and Layer 1 connections:Image(8)

Below are the logical configuration details for the ScaleIO lab environment (less login credentials of course):Image(9)

Dell Force10 S4810 Config: http://nycstorm.com/nycfiles/repository/rbocchinfuso/ScaleIO_Demo/s4810_show_run.txt

Base ScaleIO Config File: http://nycstorm.com/nycfiles/repository/rbocchinfuso/ScaleIO_Demo/scaleio_config_blog.txt

ScaleIO Commands: http://nycstorm.com/nycfiles/repository/rbocchinfuso/ScaleIO_Demo/scalio_install_cmds_blog.txt

ScaleIO environment is up and running and able to be demoed (by someone who knows the config and ScaleIO because most of the configuration is done via CLI and require some familiarity given the level of documentation at this point)

ScaleIO ConsoleImage(10)

Below you can see that there is 136 GB of available aggregate capacity available across all the ScaleIO nodes (servers).Image(11)

This is not intended to be a ScaleIO internals deep dive but here is some detail on how the ScaleIO usable capacity is calculated:

Total aggregate capacity across SDS nodes:

  • 100/# of SDS servers = % for spare capacity
  • 1/2 half of the remaining capacity for mirroring

For example in a ScaleIO cluster with 4 nodes and 10 GB per node the math would be as follows:

    • 40 GB of aggregate capacity
    • 100/4 = 25% (or 10 GB) for spare capacity
    • .5 * 30 GB (remaining capacity) = 15 GB of available/usable capacity

Configured VMware datastores:Image(12)

  • svrsan201#_SSD – This is the local PCIe SSD on each ESX server (svrsan201#)
  • svrsan201#_local – This is the local HDDs on each ESX server (svrsan201#)
  • ScaleIO_Local_SSD_Datastore01:  The federated ScaleIO SSD volume presented from all four ESX servers (svrsan2011 – 2014)
  • ScaleIO_Local_HDD_Datastore01:  The federated ScaleIO HDD volume presented from all four ESX servers (svrsan2011 – 2014)

Detailed VMware Configuration Output: http://nycstorm.com/nycfiles/repository/rbocchinfuso/ScaleIO_Demo/ScaleIO_VMware_Env_Details_blog.html

To correlate the above back to the ScaleIO backend configuration the mapping looks like this:

Two (2) configured Storage Pools both in the same Protection Domain

  • pool01 is an aggregate of SSD storage from each ScaleIO node (ScaleIO_VM1, ScaleIO_VM2, ScaleIO_VM3 and ScaleIO_VM4)
  • pool02 is an aggregate of HDD storage from each ScaleIO node (ScaleIO_VM1, ScaleIO_VM2, ScaleIO_VM3 and ScaleIO_VM4)

Note:  Each of the ScaleIO nodes (ScaleIO_VM1, ScaleIO_VM2, ScaleIO_VM3 and ScaleIO_VM4) is tied to a ESX node (ScaleIO_VM1 -> svrsan2011, ScaleIO_VM2 -> svrsan2012, ScaleIO_VM3 -> svrsan2013, ScaleIO_VM4 -> svrsan2014)

Image(13)

Each Storage Pool has configured volumes:

  • pool01 had one (1) configured volume of ~ 56 GB. This volume is presented to the ESX servers (svrsan2011, svrsan2012, svrsan2013 & svrsan2014) as ScaleIO_Local_SSD_Datastore01
  • pool02 had two (2) configured volumes totaling ~ 80 GB.  ScaleIO_Local_HDD_Datastore01 = ~ 60 GB and ScaleIO_Local_HDD_Datastore01 = ~ 16 GB, these to logical volumes share the same physical HDD across the ScaleIO node.

Some Additional ScaleIO implementation Tweaks

The ScaleIO GUI console seen above is a jar file that needs to be SCPed from the MDM host to your local machine to be run (it lives in /opt/scaleio/ecs/mdm/bin/dashboard.jar).  I found this to be a bit arcane so installed thhtp (http://www.acme.com/software/thttpd/) on the MDM server to make it easy to get the dashboard.jar file.

On the MDM server do the following:

  1. zypper install thttpd
  2. cd /srv/www/htdocs
  3. mkdir scaleio
  4. cd ./scaleio
  5. cp /opt/scaleio/ecs/mdm/bin/dashboard.jar .
  6. vi /etc/thttpd.conf
  7. change www root dir to “/srv/www/htdocs/scaleio”
  8. restart the thttpd server “/etc/init.d/thttpd restart”
  9. Now the .jar file can be downloaded using http:\\10.10.0.22\

Image(14)

Wanted a way to monitor the health and performance (cpu, mem, link utilization, etc…) of the ScaleIO environment.  Including ESX servers, ScaleIO nodes, benchmark test machines, switches, links, etc…

  1. Deployed Zabbix (http://www.zabbix.com/) to monitor the ScaleIO environment
  2. Built demo environment topology with active elementsImage(15)
  3. Health and performance of all ScaleIO nodes, ESX nodes, VMs and infrastructure components (e.g. – switches) can be centrally monitoredImage(16)

Preliminary Performance Testing

Testing performed using a single Linux VM with the following devices mounted:Image(17)

Image(18)

Performance testing was done using IOzone (http://www.iozone.org/) and the results were parsed, aggregated and analyzed using python (http://www.python.org/), R (http://www.r-project.org/), SciPy (http://www.scipy.org/) and Jinja2 (http://jinja.pocoo.org/)

Due to limited time and the desire to capture some quick statistics a single run was made against each device using IOzone using the local HDD and SSD devices for the baseline sample data and the ScaleIO volumes as the comparative data set.

Test 1:  Local HDD device vs ScaleIO HDD distributed volume (test performed against /mnt/Local_HDD and /mnt/ScaleIO_HDD, see table above)

Test 2:  Local SSD device vs ScaleIO SSD distributed volume (test performed against /mnt/Local_SSD and /mnt/ScaleIO_SSD, see table above)

Note:  Local (HDD | SSD) = a single device in in a single ESX server, ScaleIO (HDD | SSD) makes used the same HDD and SSD device in the server used in the local test but also all other HDD | SSD devices in other nodes, to provide aggregate capacity, performance and protection.

ViPR Installed and Configured

  • ViPR is deployed but version 1.1.0.2.16 does not support ScaleIO.
  • Note:  ScaleIO support will be added in ViPR version 2.0 which is scheduled for release in Q2.Image(21)Image(22)

EMC ViPR SRM deployed but haven’t really done anything with it to date.Image(23)

ScaleIO SDS nodes in AWS

  1. Four (4) AWS RHEL t1.micro instances provisioned and ScaleIO SDS nodes deployed and configured.Image(24)
  2. Working with EMC Advanced Software Division to get an unlimited perpetual ScaleIO license so I can add the AWS SDS nodes to the existing ScaleIO configuration as a new pool (pool03).
  3. Do some testing against the AWS SDS nodes.  Scale number of nodes in AWS to see what type or performance I can drive in with t1. micro instances.

Todo list (in no particular order)

  1. Complete AWS ScaleIO build out and federation with private ScaleIO implementation
    1. Performance of private cloud compute doing I/O to AWS ScaleIO pool
    2. Using ScaleIO to migrate between the public and private cloud
    3. Linear scale in the public and private cloud leveraging ScaleIO
  2. Complete ViPR SRM configuration
  3. Comparative benchmarking and implementation comparisons
    1. ScaleIO EFD pool vs ScaleIO disk pool
    2. ScaleIO EFD vs SAN EFD
    3. ScaleIO vs VMware VSAN
    4. ScaleIO vs Ceph, GlusterFS, FhGFS/BeeGFS whatever other clustered file system I can make time to play with.
    5. ScaleIO & ViPR vs Ceph & Swift (ViPR 2.0 Required)
  4. Detailed implementation documentation
    1. Install and configure
    2. Management

Progress on all of the above was slower than I had hoped, squeezing in as much as possible in late night and on weekends because 120% of my time is consumed on revenue producing activity.

Quickly Gather RecoverPoint Replication Stats

It’s been a while since I posted, I think I got to caught up in writing lengthy posts (which I often never completed) rather than just publishing content as I have it and as my personal time allows.  This post is the start of a new philosophy.

Last week I had a need to quickly grab some replication stats from RecoverPoint and I thought I would share the process and code I used to do this.

Prerequisites:  plink, sed, awk, head, tail and egrep

Note:  Because this is not a tutorial I am not going to talk about how to get the requirements configured on your platform.  With that said you should have no issues getting the prerequisites work on Windows or Linux (for Windows Cygwin may be a good option).

The resulting output is a CSV which can be opened in Excel (or whatever) to produce a table similar to the following:

image

My Iomega ix2 and my new 3 TB USB drive

Purchased 3 TB Seagate USB 3.0 drive from Amazon (http://amzn.to/TpduBU)

Waited… Very excited to connect to my ix2….

A few days later my 3 TB USB expansion drive arrived.  I hurried to unpack and connect to my ix2 expecting plug-and-play.  I plugged by no play.

clip_image001

An overwhelming feeling of sadness consumed me, followed by WTF then the joy of knowing I could and would hack this to make it work.

Knowing that this Iomega thing had to be running Linux I began to scour web for how to enable SSH with Firmware Version 3.3.2.29823

Found plenty of how to information on Firmware Version 2.x but 3.x (Cloud Enabled Firmware) is a bit more sparse.

Finally to enable SSH:  http://ip/diagnostics.html

clip_image002

SSH now enabled, opened PuTTY and SSH to device.

Username:  root
Password:  soho

Boom!  In….

clip_image003

A quick “df -h” shows my currently configured capacity:

clip_image004

A quick “cat /proc/scsi/usb-storage/4” followed by a “fdisk -l” reveals the drive is being seen by the ix2.

clip_image005

clip_image006

Created partition on /dev/sdc, “fdisk /dev/sdc”

clip_image007

Now what?

Hmmmmm…. Maybe I can create a mount point on /mnt/pools/B/B0, seems logical.

clip_image008

clip_image009

Whoops forgot to mkfs.

Run “mkfs /dev/sdc1”

clip_image010

“mount /dev/sdc1 /mnt/pools/B/B0/”

clip_image011

Hmmmm…..

“umount /dev/sdc1”

Tried to partition with parted (core dumps, ix2 running ver 1.8.  pretty sure GPT partition support not ready for primetime in vet 1.8)

Let see if I can get a new version of parted.

Enabled apt-get (required a little work)

cd /mnt/pools/A/A0
mkdir .system
cd .system

mkdir ./var; mkdir ./var/lib/; mkdir ./var/cache; mkdir ./var/lib/apt; mkdir ./var/cache/apt;  mkdir ./var/lib/apt/lists; mkdir ./var/lib/apt/lists/partial; mkdir ./var/cache/apt/archives;  mkdir ./var/cache/apt/archives/partial; mkdir ./var/lib/aptitude

(I think that is all the required dirs, you will know soon enough)

cd /var/lib
ln -s /mnt/pools/A/A0/.system/var/lib/apt/ apt
ln -s /mnt/pools/A/A0/.system/var/lib/aptitude/ aptitude
cd /var/cache
ln -s /mnt/pools/A/A0/.system/var/cache/apt/ apt

run “apt-get update”

Should run without issue.

run “aptitude update”
Note:  Should run without issue.

clip_image012

Jettison that idea, not enough space on root and /mnt/apps to install new version of parted and required dependencies.

New approach:

run “dd /dev/zero /dev/sdc”

Let run for a minute of so to clear all partition info (ctrl-c) to stop

Download EASEUS Partition Master 9.2.1 from filehippo (http://www.filehippo.com/download_easeus_partition_master_home/)

Install EASEUS Partion Master 9.2.1 on Windows 7 desktop
Connect 3 TB Seagate USB drive to Windows 7 desktop
Partition and format partition ext3 using EASEUS Partion Master 9.2.1
Note:  This takes a little while.

Once complete I connected the drive to my Iomega ix2

Voila!

clip_image013

clip_image014

Cleaned up the “/mnt/pools/B directory” I created earlier (“rm -rf /mnt/pools/B”)

Reboot my ix2 (make sure I didn’t jack anything up) and enjoy my added capacity.

image

Pretty sick footprint for ~ 4.5 TB of storage (1.8 TB of it R1 protected).

DNS and Disaster Recovery

I’ve been conducting DR tests and site failovers for years using a myriad of host based and array based replication technologies.  By now the tasks of failing the host over from site A to site B and gaining access to replicated data is a highly predictable and controllable event.  What I often find is that little issues like time being out-of-sync due to NTP server issue, a host needing to be rejoined to the domain or the dreaded missing or fat fingered DNS entry tend to slow you down.

I recently ran a DR test where in the prior test a DNS entry was fat fingered, the bad DNS entry impacted the failback and extended the test time by about 5 hours.  Prior to this year’s test I decided to safeguard the DNS component of the test.  I crafted a small shell script to record and check the DNS entries (forward and reverse),  The plan would be as follows:

  1. Capture DNS entries prior to the DR test and save as the production gold copy (known working production DNS records)
  2. Capture DNS entries following the failover to the DR location and DNS updates.  Ensure that the DNS entries match the documented DR site IP schema.
  3. Finally capture the DNS entries post failback to the production site.  Diff the pre-failover production site DNS entries (gold copy) with the post-failback production site DNS entries.

The fail-safe DNS checks proved to be very valuable, uncovering a few issues on failover and failback.  Below is my script, I ran the shell script from a Linux host, if you need to run on Windows and don’t want to rewrite you could try Cygwin (I don’t believe the “host” command is natively packaged with Cygwin but it could probably be compiled, haven’t looked around much)  or you could download VirtualBox and run a Linux VM. Hopefully you find this useful.

Note:  you will need two input files:  “hosts_prod.in” and “hosts_dr.in”. These input files should contain your lookups for each site.

.in file example (syntax for .in files is “hostname | IP [space] record type”):
host1 a
host2 a
192.168.100.1 a
192.168.100.2 a

Syntax to execute the script is as follows “./checkdns.sh [prod | dr]”

Mapping RDM Devices

This post was driven by a requirement to map RDM volumes on target side in preparation for a disaster recovery test.  I thought I would share some of my automation and process with regards to mapping RDM devices that will be used to present RecoverPoint replicated devices to VMs as part of a DR test.

Step 1:  Install VMware vCLI and PowerCLI
Step 2:  Open PowerCLI command prompt
Step 3:  Execute addvcli.ps1 (. .\addvcli.ps1)

Step 4:  Execute getluns.ps1 (. .\getluns.ps1)

Step 5:  Execute mpath.ps1 (. .\mpath.ps1)

Step 6:  Get SP collect from EMC CLARiiON / VNX array
At this point you should have all the data required to map the RDM volumes on the DR side.  I simply import the two CSVs generated by the scripts into excel (scsiluns.csv, mpath.csv) as well as the LUNs tab from the SP Collect (cap report).
Using Excel and some simple vlookups with the data gathered above you can create a table that looks like the following:
I could probably combine these three scripts into one but under a time crunch so just needed the data, maybe I will work on that at a later date or maybe someone can do it and share with me.

Repurposing old EMC Centera nodes

This is the first in a multi-part series on repurposing old EMC equipment.  I recently acquired six  EMC Centera nodes, two of the nodes with 4x1TB SATA drives and four of the nodes with 4x500GB SATA drives so I started thinking what can I do with these Pentium based machines with 1 GB of RAM and a boat load of storage.  An idea hit me to create a NAS share leveraging a global file system to aggregate the capacity and performance across all the Centera nodes.  Seemingly simple there was a challenge here, most modern day global file systems like GlusterFS or GFS2 require a 64 bit processor architecture, the Centera nodes use 32 bit Pentium processors.  After spending a vacation day researching I identified two possible Global file systems as potential options, XtreemFS and FraunhoferFS (fhgfs).  I discovered fhgfs first and it looked pretty interesting, a fairly traditional Global File System consisting of metadata nodes and storage nodes (I came across this presentation which provides a good overview of the FraunhoferFS.  While fhgfs provided the basics of what I was looking for the missing link was how I was going to protect the data, fhgfs for the most part relied on hardware RAID for node survivability, because the Centera nodes are built to run EMC’s CentraStar an OS which leverages RAIN (Redundant Array of Independent Nodes) no redundancy is built in at the node level.  EMC acquired Centera and CentraStar from a Belgian company named FilePool in 2002.  As I thought trough some possible workarounds I stumbled across XtreemFS an interesting object based global file system, what was most interesting was the ability to replicate objects for redundancy.  At this point I decided to attempt to move forward with XtreemFS, my single node install went well, no issues  to really speak of as I moved towards the multi node configuration I was beginning to get core dumps when starting the daemons, is was at this point that I decided to give fhgfs a try, thinking that in phase 2 of the project I could layer on either rsync of drbd to protect the data (not there yet so not sure how well this theory will play out).  The fhgfs installed fairly easily and is up and running, the rest of this blog will walk you though the steps that I took to prepare the old Centera nodes, install Ubuntu and configure Ubuntu server, install an configure fhgfs.

Because the Centera nodes came out of a production environment they were wiped prior to leaving the production data center (as a side note DBAN booted form a USB key was used to perform the wipe of each node).  So with no data on the four Centera node internal drives the first step was to install a base OS on each node.  Rather than use a USB CD-ROM (only had one) I decided to build a unattended PXE boot install.

Phase 1:  Basic Environment Prep:

Step 1:  Build PXE Server (because this is not a blog on how to build a PXE server I suggest doing some reading).  The following two links should be very helpful:  https://help.ubuntu.com/community/PXEInstallServer, https://help.ubuntu.com/community/PXEInstallMultiDistro. I built my PXE boot server on Ubuntu 12.04 server but the process in pretty much as documented in the above two links.  You can also Google “ubuntu pxe boot server”.

Note:  One key is to be sure to install Apache and copy your Ubuntu distro to a http accessible path.  This is important when creating your kickstart configuration file (ks.cfg) so you can perform a completely automated install.  My ks.cfg file.

Step 1A:  Enter BIOS on each Centera and reset to factory defaults, make sure that each node has PXE boot enabled on the NICs.

Note:  I noticed on some of the nodes that the hardware NIC enumeration does not match Ubuntu’s ETH interface enumeration (i.e. – On 500GB nodes ETH0 is NIC2) just pay attention to this as it could cause some issues, if you have the ports just cable all the NICs to make life a little easier.

Step 1B:  Boot servers and watch the magic of PXE.  Ten minutes from now all the servers will be booted and at the “kickstart login:” prompt.

Step 2:  Change hostname and install openssh-server on each node.  Login to each node, vi /etc/hostname and update to “nodeX”, also execute apttidude install openssh-server (openssh-server will be installed form the PXE server repo, I only do this no so I can do the rest of the work remotely instead of sitting at the console).

Step 3:  After Step 2 is complete reboot the node.

Step 4:  Update /etc/apt/sources.list

Step 4 Alternative:  I didn’t have the patience to wait for the repo to mirror but you may want to do this and copy you sources.list.orig back to sources.list at a later date.

Note:  If you need to generate a a sources.list file with the appropriate repos check out http://repogen.simplylinux.ch/

Step 4A:  Add the FHGFS repo to the /etc/sources.list file

deb http://www.fhgfs.com/release/fhgfs_2011.04 deb6 non-free

Step 4A:  Once you update the /etc/sources.list file run an apt-get update to update the repo, followed by an apt-get upgrade to upgrade distro to latest revision.

Step 5:  Install lvm2, default-jre, fhgfs-admon packages

aptitude install lvm2
aptitude install default-jre
aptitude install fhgfs-admon

Phase 2:  Preparing Storage on each node:

Because the Centera nodes use JBOD drives I wanted to get the highest performance by striping within the node (horizontally) and across the nodes (vertically).  This section focuses on the configuration of horizontal striping on each node.

Note:  I probably could have taken a more elegant approach here, like boot for USB key and use the entire capacity of the four internal disks for data but this was a PoC so didn’t get overly focused on this.  Some of the workarounds I use below could have probably been avoided.

  1. Partition the individual node disks
    1. Run fdisk –l (will let you see all disks and partitions)
    2. For devices that do not have partitions create a primary partition on each disk with fdisk (in my case /dev/sda1 contained my node OS, /dev/sda6 was free, /dev/sdb, /dev/sdc and /dev/sdd had no partition table so I created a primary partition dev/sdb1, /dev/sdc1 and /dev/sdd1)
  2. Create LVM Physical Volumes (Note: If you haven’t realized it yet /dev/sda6 will be a little smaller than the other devices, this will be important later.)
      1. pvcreate /dev/sda6
      2. pvcreate /dev/sdb1
      3. pvcreate /dev/sdc1
      4. pvcreate /dev/sdd1
  3. Create a Volume Group that contains the above physical volumes
    1. vgcreate fhgfs_vg /dev/sda6 /dev/sdb1 /dev/sdc1 /dev/sdd1
    2. vgdisplay (make sure the VG was created)
  4. Create Logical Volume
    1. lvcreate -i4 -I4 -l90%FREE -nfhgfs_lvol fhgfs_vg –test
      1. Above command runs a test, notice the –I90% flag, this says to only use 90% of each physical volume.  Because this is a stripe and the available extents differ on /dev/sda6 we need to equalize the extents by consuming on 90% of the available exents.
    2. lvcreate -i4 -I4 -l90%FREE -nfhgfs_lvol fhgfs_vg
      1. Create the logical volume
    3. lvdisplay (verify that the lvol was created)
    4. Note:  The above commands performed on a node with 1TB drives, I also have nodes with 500GB drives in the same fhgfs cluster.  Depending on the the drive size in the nodes you will need to make adjustments so that the extents are equalized across the physical volumes.  As an example on the nodes with the 500GB drives the lvcreate commands looks like this lvcreate -i4 -I4 -l83%FREE -nfhgfs_lvol fhgfs_vg.
  5. Make a file system on the logical volume
    1. lvcreate -i4 -I4 -l83%FREE -nfhgfs_lvol fhgfs_vg
  6. Mount newly created file system and create relevant directories
    1. mkdir /data
    2. mount /dev/fhgfs_vg/fhgfs_lvol /data
    3. mkdir /data/fhgfs
    4. mkdir /data/fhgfs/meta
    5. mkdir /data/fhgfs/storage
    6. mkdir /data/fhgfs/mgmtd
  7. Add file system mount to fstab
    1. echo “/dev/fhgfs_vg/fhgfs_lvol     /data     ext4     errors=remount-ro     0     1” >> /etc/fstab

Note:  This is not a LVM tutorial, for more detail Google “Linux LVM”

Enable password-less ssh login (based on a public/private key pair) on all nodes

  1. On node that will be used for management run ssh-keygen (in my environment this is fhgfs-node01-r5)
    1. Note:  I have a six node fhgfs cluster fhgfs-node01-r5 to fhgfs-node06-r5
  2. Copy the ssh key to all other nodes.  From fhgfs-node01-r5 run the following commands:
    1. cat ~root/.ssh/id_dsa.pub | ssh root@fhgfs-node02-r5 ‘cat >> .ssh/authorized_keys’
    2. cat ~root/.ssh/id_dsa.pub | ssh root@fhgfs-node03-r5 ‘cat >> .ssh/authorized_keys’
    3. cat ~root/.ssh/id_dsa.pub | ssh root@fhgfs-node04-r5 ‘cat >> .ssh/authorized_keys’
    4. cat ~root/.ssh/id_dsa.pub | ssh root@fhgfs-node05-r5 ‘cat >> .ssh/authorized_keys’
    5. cat ~root/.ssh/id_dsa.pub | ssh root@fhgfs-node06-r5 ‘cat >> .ssh/authorized_keys’
  3. Note:  for more info Google “ssh with keys”

Configure FraunhoferFS (how can you not love that name)

  1. Launch the fhgfs-admon-gui
    1. I do this using Cygwin-X on my desktop, sshing to the fhgfs-node01-r5 node, exporting the DISPLAY back to my desktop and then launch the fhgfs-admon-gui.  If you don’t want to install Cygwin-X Xmingis a good alternative.
      1. java -jar /opt/fhgfs/fhgfs-admon-gui/fhgfs-admon-gui.jar
    2. Note:  This is not a detailed fhgfd install guide, reference the install guide for more detail http://www.fhgfs.com/wiki/wikka.php?wakka=InstallationSetupGuide
  2. Adding Metadata servers, Storage servers, Clients
    1. SNAGHTML12587da
  3. Create basic configuration
    1. SNAGHTML12f1d17
  4. Start Services
    1. SNAGHTML13277e3
  5. There are also a number of CLI command that can be used
    1. image
    2. e.g. – fhgfs-check-servers
      1. image
  6. If all works well a “df –h”yield the following
    1. image
    2. Note the /mnt/fhgfs mount point (pretty cool)

Creating a CIFS/NFS share

  1. Depending on how you did the install of you base Ubuntu system you likely need to load the Samba and NFS packages (Note:  I only loaded these on my node01 and node02 nodes, using these nodes as my CIFS and NFS servers respectively)
    1. aptitude install nfs-server
    2. aptitude install samba
  2. Configure Samba and/or NFS shares from /mnt/fhgfs
    1. There are lot’s or ways to do this, this is not a blog on NFS or Samba so refer to the following two links for more information:
      1. NFS:  https://help.ubuntu.com/community/SettingUpNFSHowTo
      2. Samba/CIFS:  http://www.samba.org/
    2. As a side note I like to load Webmin on the for easy web bases administration of all the nodes, as well as NFS and Samba
      1. wget http://downloads.sourceforge.net/project/webadmin/webmin/1.590/webmin_1.590_all.deb?r=http%3A%2F%2Fwww.webmin.com%2F&ts=1345243049&use_mirror=voxel
      2. Then use dpkg –i webmin_1.590_all.deb to install
      3. image

Side note:  Sometime when installing a debian package using dpkg you will have unsatisfied dependencies.  To solve this problem just follow the following steps:

  1. dpkg –i webmin_1.590_all.deb
  2. apt-get -f –force-yes –yes install

Performance testing, replicating, etc…

Once I finished the install it was time to play a little.  From a windows client I mapped to the the share that I created from the fhgfs-node01-r5 and started running some I/O to the FraunhoferFS….. I stared benchmarking using with IOzone, my goal is to compare and contrast my FraunhoferFS and NAS performance with other NAS products like NAS4Free, OpenFiler, etc… I also plan to do some testing with Unison, rsync and drdb for replication.

This is a long post so I decided to create a separate post for performance and replication.  To wet your appetite here are some the early numbers from the FhGFS NAS testing.

image

image

image

image

Created the above output quickly, In my follow-up performance post I will document the test bed, publish all the test variants and platform comparisons.  Stay tuned…

Ghetto Fabulous

Most environments running VMware would like some way to backup, protect and revision VMs. There are a number of commercial products that do a good job protecting VMs; products such as Veeam Backup and Replication, Quest Software (formerly Vizioncore ) vRanger and PHD Virtual Backup to name a few. This post will focus on the implementation of much lower cost (free) implementation of a backup and recovery solution for VMware. As with any free or open source software there is no right or wrong implementation model so this is a post that will talk about how ghettoVCB was implemented with Data Doman to enhance the protection of VMs.

Why?…

What was the driver behind the requirement for image level protection of VMs in this particular instance? Within the particular environment that I am referencing in this post the customer has a fairly large ESX farm at their production site. Most of the production infrastructure is replicated to a DR location with the exception of some of the “less critical” systems. The DR site also has some running VMs such as domain controllers, etc… also deemed “less critical” so these are not replicated. You may ask why these are not replicated, the short answer is the customer uses EMC RecoverPoint to replicate data from Site A to Site B in conjunction with VMware SRM to facilitate failover, until recently (VNX) RecoverPoint had a capacity based license so dollars were saved by only replicating critical systems. Backups are taken of all systems but this does not provide the ability to restore an older VM image. A storage migration was being done from an older SAN infrastructure to a new SAN infrastructure, the migration was deemed completed but there was one VMFS volume that was missed and never migrated, the OEM was contracted to a do a date erasure on the old SAN prior to removing it from the data center. It was at that time that the “less critical” systems were lost and everyone realized that they were not really “less critical”. VMs needed to be rebuilt, this was labor intensive and could have been avoided had a good VM backup strategy been in place.

Discussions around how to protect against this in the future started to occur, the interesting thing was as part of the new infrastructure Data Domain was implemented as a backup to disk target but there was no money left in the budget to implement a commercial VMware image level backup product. vGhetto ghettoVCB to the rescue! With a little bit of design vGhetto was implemented on all the ESX servers and has been running successfully for over a year.

How to get started…

Download the appropriate ghettoVCB code from the vGhetto Script Repository there are multiple versions (you should use the latest version, the implementation discussed in this post uses ghettoVCBg2). All of the prerequisites and usage is well documented on the vGhetto site. Take your time and read, don’t jump in to this without reading the documentation.

Note: You will have to edit configuration files for vGhetto to setup alerts, retention, backup locations, etc… be sure to read the documentation carefully.

The Implementation details…

High-level Topology

Note: Site A and Site B backups target share on each respective DD670 (e.g. \\siteADD670\siteAvmbackup for daily backups at Site A) these are replicated to the peer DD670. Replicated data is accessible at the target side by accessing the backup sharename (e.g. – \\siteADD670\siteAvmbackup replicated data would be accessible by accessing \\siteBDD670\backup\siteA_vm_backup_replica).

In the environment that this deployment was done all of the ESX servers are running ESX 4.1 full (not ESXi) so the service console was leveraged, deployment models can differ from using the remote support console to using the vMA (vSphere Management Assistant). This is why it is critical that you read the ghettoVCB documentation.

Step-bt-Step…

  • Develop and document an architecture / design, this will require a little planning to make deployment as easy as possible.
  • Create a CIFS of NFS share on the Data Domain or other CIFS/NFS target.
    • If you want to keep the cost to nearly zero I recommend Opendedup
    • In this case Data Domain 670s already existed in both locations
    • I created two shares in each location one for daily backups and one for monthly backups (see High-level topology)

The reason for two shares is that only one (1) monthly is retained on the monthly share and fourteen (14) daily backups are maintained on the daily share. There is a tape backup job monthly that vaults the VM image backups from the monthly share.

  • There are basically three tasks that need to be performed on every ESX server in the environment:
    • Mount the target backup share(s):
      • Create mountpoint: mkdir /mnt/backup
      • For NFS: mount servername|IP:/sharename /mnt/backup
      • For CIFS: mount -t cifs //servername|IP /sharename /mnt/backup -o username=USERNAME,password=PASSWORD
  • Add the target backup share(s) to /etc/fstab to make them persistent:
    • For CIFS: echo “//servername|IP /sharename /mnt/backup cifs credentials=/root/.smbcreds” >> /etc/fstab
Note: FOR CIFS create .smbcreds file that contains the CIFS share login credentials. This file should contain the following two lines:
username=cifs_user_name
password=cifs_user_password
 
    • For NFS: echo “servername|IP: /sharename /mnt/backup nfs [any NFS mount options] ” >> /etc/fstab
  • Create cron job(s):
    • Daily Job (runs Mondy thru Friday at midnight): 0 0 * * 1-5 root /mnt/backup/.files/ghettoVCB/ghettoVCB.sh -a > /mnt/backup/.files/logs/hostname_ghettoVCB.log 2>&1
    • Monthly Job (runs Saturday at midnight): 0 0 * * 6 root /mnt/monthly_backup/.files/ghettoVCB/ghettoVCB.sh -a > /mnt/monthly_backup/.files/logs/hostname_ghettoVCB.log 2>&1
Note: You will notice that the path to the ghettoVCB.sh is .files on the CIFS | NFS share, this is so I can make modifications post deployment and since all the ESX server us a shared location it is easy to maintain, more on this when I walk through my deployment methodology.

Note: crontab entries need to go in /etc/crontab. If you place them in the user crontab using crontab –e or vi /var/spool/cron/root it will NOT work.

Deployment…

Once you complete the above steps and test on a single server you are ready to roll out to all the servers in your environment. To simplify this I recommend storing the config files, scripts, etc… in a hidden directory on the CIFS or NFS share.

In my case I have a .files directory in the daily backup and monthly backup directories. This includes the ghettoVCB code, .smbcreds file and the deployment scripts.

Deployment Scripts:

 Note: The above scripts assumes a CIFS target, modify accordingly for a NFS target.

Deployment is easy, as new ESX servers come online using plink  I remotely execute a mount of the appropriate share, copy the deployment script to /tmp and execute.

All the changes are made to the fstab, cron, etc.. and VM image backups will now run on a regular basis.

Accessing backed up data…

You will now be able to browse the //servername|IP/sharename from any host and see your backups organized by date:

I use vmware-mount.exe which is part of the VMware Virtual Disk Development Kit  on the virtual center server to mount the backup vmdk files for individual file restores, obviously for a full restore I just copy the vmdk back to the production datastore.

The following are the key steps to mount a backed up vmdk:

  • Mount the CIFS share (if using NFS you can usually share the volume via CIFS of SMB as well and gain access from windows to use the process I am outlining here)
    • net use v: //servername|IP/sharename
    • net use

You should see something similar to this:

  • v:
  • dir (you should see all you VM backup dirs)
  • cd to the VM perform a recovery from
  • cd to the proper backup image
  • dir

This is what the above command sequence looks like:

  • Now mount the vmdk
    • vmware-mount.exe z: “2003 SP2 Template.vmdk”
    • You can verify a successful mount by just typing vmwre-mount.exe
  • z:
  • dir

You are now looking at the c: drive from the “2003 SP2 Template” VM from January 24, 2012.

You can navigate and copy files just like any normal drive.

Verizon Actiontec Router and Local DNS

I have been really busy and not posting much, but I have my home lab pretty much built out and have a bunch of new projects in the hopper, more on that in future posts.  If you have FIOS like I do you probably have a Actiontec router provided by Verizon.  When building out my home lab I wanted to use my Actiontec router as my DNS server, for obvious reasons, the web interface became frustrating pretty quickly.  So many clicks and the ability to only enter a single host registration at a time:

image

The ability to edit DNS from telnet is actually really nice on the Action tech router.  Commands are petty simple.

1) Enable Telnet on the router (Advanced –> Local Administration)

image

2) Once telnet is enabled, you can now telnet to your router using the same credentials used with the web interface.

image

3) After the telnet session is established there are basically three commands you need to be familiar with:

  • dns_get:  lists all DNS server entries
  • dns_set:  adds a DNS entry
  • dns_del:  deletes a dns entry

The syntax is pretty simple:

  • dns_get:  used by itself to list all DNS entries
  • dns_set: dns_set ID HOSTNAME IP_ADDRESS (e.g. – dns_set 1 host1 192.168.1.100)
  • dns_del:  dns_del ID (e.g. – dns_del 1)

This method of adding and removing DNS entries from the Actiontec router is significantly faster than using the web interface.

I use a Google Doc spreadsheet to track my IPs and build the command to add and remove DNS entries.  I have shared my template here:  https://docs.google.com/spreadsheet/ccc?key=0Alfr2Qqx-moWdE43YTFZLVRtRWM1X3VsdXY2UmFBVUE

Best Remote Connection Tool

I have tested a ton of tabbed remote connection tools.

RDTabs (http://www.avianwaves.com/tech/tools/rdtabs/):  Like it for pure RDP, no SSH, http, etc…

Terminals (http://terminals.codeplex.com/):  Slow and a little buggy IMO

Remote Desktop Manager (http://devolutions.net/):  Over built app, not portable, etc…

I am now using mRemoteNG (http://www.mremoteng.org/):  Love it!

image

This fits all my needs.  Supports all the protocols that I require, no install portable version available which is perfect for me.  I have the portable version in my dropbox (http://www.dropbox.com/) folder so I can launch on any machine and have all my connections readily available.  I can add connections anywhere and they sync’d via  dropbox.  The perfect solution for me.  The app is light weight and fast, give it a try.

App that provides dramatic productivity improvements (for parents)

So this may seem like a strange post, as most people will think that I am going to be talking about a an IDE application, a RAD tool, a CRM application or some sort of text-to-speech processor, regardless of what you are expecting I can almost guarantee you will be expecting something a little more sexy than what you are about to see (especially if you are not a parent).

I think this app is so useful I am not only posting to my appoftheday.org blog but also to my gotitsolutions.org blog because it is that good.

Let me provide some background.  I have two wonderful little girls, a 5 year old and a 6 month old, for anyone with children we all we have retooled the human machined (ourselves) to have a CPU that is focused on work and coprocessor that deals with our children while we try to focus (we can flip this paradigm as well).  I have to say my time slicing skills are second to none, you learn how to work in 2 min slices while breaking away for 30 seconds to lend some CPU cycles to an often overheating parental coprocessor.  I often read emails back later that had the same thought double typed, missing words, etc… this is because I am processing too much information, my mental programming is way off.  I have this huge array of things I need to do, things I am doing, things I am being told to do, things my kids want to do, yadda, yadda, yadda…. Let’s just say that the that I often suffer pointer corruption which leads to memory leaks, corruption and eventually a segmentation fault (in non techie lingo this is know as a freak out, but this is a technical blog hence the techie speak).

So to the point of the post.  There is this brilliant lady named Julie Aigner-Clark the founder of The Baby Einstein Company, absolute best videos for kids under the age of one to help cool down the coprocessor (why didn’t I start filming shiny lights and hand puppets 10 years ago).  My 5 year old will even watch the videos.  There is this great website site called YouTube where you can find Baby Einstein videos as well as other great videos like Oswald, WordGirl, Hannah Montana and The Pink Panther (a few of my older daughters favorites) So you are probably asking what relevance does this have.  I will explain, be patient, I know how difficult this probably is because you 6 month old wants to eat and your 5 year old wants you to “Play Barbies” with her.

I am in my office trying to work and my daughter comes in, she wants me to stop what I am doing to play with her, I attempt to stall and concentrate at the same time (very difficult).  I eventually sit her on my lap (applies to 6 month old and 5 year old) and open YouTube in my browser and start playing our favorite Baby Einstein or WordGirl video.  Good so far.  I pop out the video window from youtube.com and resize my excel sheet and attempt to work, here is a screen shot of what I am left with:

image

So on the left my daughter(s) can sit on my lap and watch the vide while I work on the spreadsheet on the right.  Now here is the issue, I only have 3/4 of the screen which can be a little annoying, if I need to use another app it can be a big issue.  So what is the effect of me switching windows:

image

Oh no, the video moved to the background, scramble to resize the browser window to avoid a complete meltdown.  Reflexes are not that good so I rarely accomplish the goal.

Now for the introduction of a must have application that dramatically improves productivity, focus and sanity.  The app is called DeskPins and simply it allows you to pin any window to the foreground so lets look at a couple of examples of how I use this.

I follow the same process as before with finding a video on YouTube, popping out the video windows but now I pin the video window to the foreground.

image

Now I can maximize my spreadsheet (far better) and without the video moving to the background, I can move the video window around as needed.  I can open FireFox and not worry about losing the video to the background.

image

The app works on 32 and 64 bit versions of Windows (I am running on 32 bit XP, 32 bit Win 7 and 64 bit Win 7) and has become an invaluable tool for me.  Hopefully this post helps with some use case examples and helps other parents occupy their children in times of need.  Enjoy!