Data Profiling with Windows PowerShell


A customer asked me the other day about a method to do some data profiling against a file system.  So I thought I would share the request, my suggestion and little PowerShell script I crafted to do some data profiling.  The request read as follows:  “Do you know of any tools that will list all files in a directory, and all subs and provide attributes like filename, path, owner, create date, modify date, last access date and maybe some other attributes in report format?”

My recommendation was a commercial product called TreeSize Professional by JAM Software the product license cost is ~ $50 and worth every penny, scan speeds are good, supports UNC paths and reporting is intuitive.  Overall an excellent product.

As an alternative below is a quick PowerShell script (also attached to post as data_profile.ps1) that will create a CSV file with data profiling information, once the CSV file is create the CSV can be opened in Excel (or your spreadsheet tool of choice) or imported into a DB and manipulated.

The above script scans all files recursively starting at c:\files and outputs the results to results.csv.  One thing to note is that the scan stores all data in an array in memory, the is because the PowerShell Export-Csv function does not support appending to a CSV file (you gotta wonder what Microsoft talks about in design meetings).  I will likely create a version of the script that uses the out-file function to write each row to the csv file as the scan happens rather then storing in memory until the scan is completes and then writing the entire array to the report.csv file, goal here is to reduce the memory footprint during large scans.

The output of this file script will be similar to the following:

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)

Blackberry Issues


So about a week ago my Blackberry (8900) booted to a screen stating “Error [some number I can’t remember]:  Reload OS”, obviously not good.  So I broke out JLcmder (a must have for all hard core Blackberry hackers), and proceeded to wipe and OS and reload my BB.  Yesterday afternoon I am sitting at my desk and I look down at my Blackberry and it is sitting there with a white screen, nothing but a white screen.  I try a soft reboot, and back to the white screen, I try a battery pull, back to the white screen.  Then in not my finest moment I some how rationalize that running over to the T-Mobile store will be the easiest/quickest fix, they waste a solid 30 mins of my life pulling the battery repeatedly and praying that it will boot, I will never get that 30 mins back.  As usual I returned to the office hooked the BB up to a laptop to see if I could connect from JLcmder, no luck.  I removed the battery, sim card and my MicroSD memory card, replaced the battery and rebooted, my BB returned.  I then stated scouring the forums, turns out that a few others had seen an issue with a corrupted SD card that caused the white screen of death.  Last night I formatted my SD card (fat32) and placed it back into my BB and the phone booted fine (happy about that).  When will I realize to never call my carrier for technical support (I have been with Verizon, AT&T and now T-Mobile and they are all the same. When you go to a BB specialist and the first thing they tell you do is pull your batter, you have to wonder how special he or she is.)  Hope this helps someone.

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)

What have I been up to… Project Hive…


Obviously my post frequency has dramatically decreased this is due to a couple of factors.  First I am busy so I have less time to turn my experiences into easy to digest blog posts and second myself and a few of my comrades have been developing something we call “Project Hive” .  As you can probably tell from many of my blog posts most of my work in recent years has been associated with EMC technologies.  Throughout the years we realized that while there are some good framework tools out there they are costly, require significant customization and often don’t solve the common day-to-day operational issues that system administrators face.  The goal of “Project Hive” is to dramatically simplify the common tasks associated with managing EMC technologies.  Being intimately familiar with these tasks we have developed a platform that is based on a distributed collection, aggregation and presentation, we call this the “Honeycomb”, each Honeycomb contains modules, we call these “Workers” which are responsible for the collection, aggregation and analysis of data from discrete infrastructure components, all workers are centrally managed on the Honeycomb and use standard based methods to collect data (i.e. – WMI, SSH, SNMP, APIs, etc…).  “Project Hive” is a very active project and we are continually adding functionality to existing workers and building new workers as time permits or requirements dictate.

Any EMC customer who has been through an upgrade is familiar with the EMCGrab process (the process of running the EMCGrab utility on each individual SAN attach host within the environment and providing the output to EMC so they can validate the host environment prior to the upgrade).

In a reasonably sized environment this process can be tedious and time consuming, one of our released workers centralizes and automates the EMCGrab process.  I recently created a video which contrasts the process of running an EMCGrab manually on an individual host vs. using the Hive Worker.  My hope is to publish more of these videos in the future but as you can imagine they take a bit of time to produce.  If you are looking for more information contact the Project Hive team at dev@projecthive.info

A hi-resolution video is available here .

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)

EMC CX3-80 FC vs EMC CX4-120 EFD


This blog is a high level overview of some extensive testing conducted on the EMC (CLARiiON) CX3-80 with 15K RPM FC (fibre channel disk) and the EMC (CLARiiON) CX4-120 with EFD (Enterprise Flash Drives) formerly know as SSD (solid state disk).

Figure 1:  CX4-120 with EFD test configuration.

image

Figure 2:  CX3-80 with 15K RPM FC rest configuration.

image

Figure 3:  IOPs Comparison

image

Figure 4:  Response Time

image

Figure 5:  IOPs Per Drive

image

Notice that the CX3-80 15K FC drives are servicing ~ 250 IOPs per drive, this exceeds 180 IOPs per drive (the theoretical maximum for a 15K FC drive is 180 IOPs) this is due to write caching.  Note that cache is disabled for the CX4-120 EFD tests, this is important because high write I/O load can cause something known as a force cache flushes which can dramatically impact the overall performance of the array.  Because cache is disabled on EFD LUNs forced cache flushes are not a concern.

Table below provides a summary of the test configuration and findings:

Array CX3-80 CX4-120
Configuration (24) 15K FC Drives (7) EFD Drives
Cache Enabled Disabled
Footprint   ~42% drive footprint reduction
Sustained Random Read Performance   ~12x increase over 15K FC
Sustained Random Write Performance   ~5x increase over 15K FC

In summary, EFD is a game changing technology.  There is no doubt that for small block random read and write workloads (i.e. – Exchange, MS SQL, Oracle, etc…) EFD dramatically improves performance and reduces the risk of performance issues.

This post is intended to be an overview of the exhaustive testing that was performed.  I have results with a wide range of transfer sizes beyond the 2k and 4k results shown in this posts, I also have Jetstress results.  If you are interested in data that you don’t see in this post please Email me a rbocchinfuso@gmail.com.

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)

New Navi Look-and-Feel


Navispphere emerging from the dark ages with a new look-and-feel
 
Key takeaways from Navisphere Birds-of-a-Feather session:
  • New Navisphere will adhere to the EMC Common Management Initiative
  • Task focus UI vs old object based UI
    • Improved Navigation, multiple entry points, drill down
    • Improved scalability
  • Summary pages with aggregated data
  • Hardware diagrams with exploded views
  • Tables
    • Costomizable
    • Exportable
  • NaviAnalyzer will provide the ability to scope the logging (e.g. Only log NAR data for a specific LUN, RG, etc…)

In the first Navisphere release NaviAnalyzer will not conform the Common Management Interface.

Classic NaviCLI is going away completely in the next release of Navi so UPGRADE YOUR SCRIPTS to NaviSecCLI if you have not already.

Overall the screenshots look pretty good.

 

VN:F [1.9.17_1161]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.17_1161]
Rating: 0 (from 0 votes)