Creating a Narrative

After a week in London, I spent my weekend like I have spent most of my weekends for the past two years, in front of the computer. To stay sane, I often take small breaks from academic research and writing while trying to retain my flow; this usually involves a shift to leisure reading or creative writing. In this case, I visited my notes from the past week, when I say notes I mean not just the ones from daytime working sessions, but all my notes including the ones I made at 3 AM on my nightstand. When I have notes taken over a period I always try to organize the information and my opinions into a narrative that I will use to challenge my views and inform others; I find bullets to be uninspiring, so a narrative is important to me. As I distill these ideas, a macro takeaway almost always emerges and this becomes and fuel for my reflection and the core of my narrative.

It is always great to spend time with organizational peers and leadership; don’t have the opportunity to do it enough, but the fact is that when it does happen, it makes me think about and appreciate the people I have in my life and the incredible opportunities I have. Having a natural and agendaless dialog with organizational peers and leadership becomes a significant portion of the fuel that I use to reflect and recalibrate. Believe it or not, I have a takeaway from just about every conversation and interaction I have, these takeaways often become a short note to myself or an idea that I jot down at 3 AM when I awake with something on my mind as a result of a conversation or interaction. Over weeks or months, I use these takeaways, ideas, and thoughts to fuel my reflection and challenge myself. What an incredible luxury and opportunity. Some of these interactions are incredibly trivial, but they occupy precious space in my mind, heart, and soul for long periods of time. For instance, while having lunch at Heathrow airport on Friday, apparently I kept looking at my feet (which were under the table) during a conversation. The person I was talking to asked me if he was kicking me, I said no, he then asked why do you keep looking down, a great question, to which I responded: “there is a bolt holding the table to the floor, and it is under my foot.” For some reason this bolt had consumed a portion of my attention; after all, why was this bolt so protruded from the floor? Why do I tell this story? First off, odd behavior on my part, second without feeling safe I am not sure how this strange dialog takes place. Without the dialog how would I ever have the fuel to reflect on the bizarre behavior of staring at my feet in the middle of a conversation? More on the value of feeling safe in a bit.

I love what I do, so much that my profession, has become a spiritual calling. I recently sent a thank you email, and in the email, I referenced a Simon Sinek quote that reads like this: “When we feel safe inside the organization, we will naturally combine our talents and our strengths and work tirelessly to face the dangers outside and seize the opportunities.” Over the last twelve years, this individual made me and others feel safe when things weren’t so safe, he created stability amid volatility, and this allowed many of us to do remarkable things. It is easy to pick apart details of anything or anyone, but in my mind, there may be no better macro-level representation of leadership. I always appreciated all the latitude this person provided me over the years and the faith he put in me, even though I know my convicted and passionate visions are often overly complicated and unrelatable. I feel these similar sentiments about so many others in my life who have entrusted me and provided me with incredible opportunities throughout my life. I have enough self-awareness to realize that my vision and passion often outrun my ability to execute, but I am OK with this because it keeps me pushing myself and others and working tirelessly to make vision meet execution. As I reflected this weekend on the past week and last thirty days, it got me thinking about the broader scope of feeling safe and achieving remarkable things together. When we feel safe we will challenge each other, we will admit fear, and we will stand together to take it on, feeling safe amongst each other is incredibly powerful. I believe this is what great leadership is about; great leaders don’t rely on organizational hierarchy or specific accomplishments to lead because leadership is an intangible influencer that has nothing to do with ascribed authority and everything to do with achieved influence.

My mind is always racing, so much so that I often wish I could shut it off for the sake of others, my family and myself, but at this point in my life, I’ve come to accept it. While I have come to accept the anxious thought as intrinsically me, progression (personally, professionally, spiritually, etc.) is also inherently me. I’ve begun to focus on reflection and pivoting my mind to focus on listening and how perception and adaptation can significantly impact influence. While I am still frustrated by so many things (it’s a journey), I am working to capture valuable cycles which I traditionally spent thinking about situations where the probability of influence was low. The methods outlined by Tim Ferris in “Tools of Titans” have helped me, I constantly ask myself: “What is the absolute worst that could happen if I did what I am considering? Is it recoverable? Why or why not? What positive impact might there be? Is fear blinding me to the future?” My need for contrarian debate will always exist as a tool that I use to challenge myself with the hope of personal enlightenment. I credit my wife who has figured out how to live with my contrarian personality for twenty-three years with planting the seed of taking the positive contrarian view, digging deeper into the reason of why something is and identifying and defending the positive position. FWIW, I have been trying to use this tactic, and I find it quite effective, proving that being left or right of center is annoying to most regardless of the direction. 🙂 I am committed to a life of progression, and this takes commitment and perseverance, luckily I am passionately curious and looking to and willing to learn from anyone or anything. One of the notes that I have from my nightstand in the UK reads like this: “… thinks I don’t like salespeople. Why is this?” Perception is a reality, and this can’t be the perception because it’s far from the truth, I admire a vast array of skills, and the optimism and perseverance of the salesperson are at the top of my list. FTR, I like salespeople; it’s people that I sometimes struggle with. 🙂

It’s interesting because the sentiments I have shared in this blog started with me revisiting an unpublished email (longer than this blog) that I began composing two weeks ago about organizational structure, focus, growth, etc. FWIW, I often write something, let it sit for weeks or months, come back to it, and find I have a different perspective and often a lot more to say; hence I write some very long emails. The reason for the revisit was a comment made this week during a casual conversation where someone said: “I don’t think anyone knows that.” My first mental reaction (and probably verbal, because there isn’t much I think that I don’t say) was “How is that possible?”, But then I thought, why does it matter? What does matter is perception and time spent asking why is not time well spent, time spent changing the perception is where I need to focus my time.

I was extraordinarily impressed by so many of the people that I met last week. The attention to detail, focus, consistency of messaging, the handle they had on their business was impressive. Nothing is perfect, and sure I walked away with some questions, but the details do not detract from the macro-level story. From a cultural fit perspective, I am not sure how much better the week could have gone. What I think made me happiest were the conversations and the personal time I spent with people I have known forever and people I just met this past week. We are all human, we will make poor decisions, we will make mistakes, but when we feel safe, we can have conversations that help us grow together and avoid poor future choices. As leaders, we have the responsibility to empower others and make them feel safe. I genuinely believe that if we feel safe, setting and executing on clear expectations becomes exponentially easier.

I had set out to write a one paragraph intro to the email I reference above, and this is what I ended up with, more of a spiritual examination of my thoughts in the context of the last thirty days.

Stop whining about the changing market…

Image result for whining quotes

In recent weeks I have been thinking a lot about value creation and how it has changed over the past few years.  As I thought about this topic more deeply, I came to realize that value creation hasn’t changed at all, what’s changed is the willingness, almost eagerness of those who participate in the value chain to blame market conditions rather than themselves.   Here is what I believe, there are only two ways to create value (as an engineer) in today’s market:

  1. Be a first mover (an innovator)
  2. Be a fast follower (an optimizer)

Neither of the above means you need to be “splitting the atom” what it does mean is you need to be innovating.  Think about DevOps, why developers have become “The New Kingmakers” and why the Ops guy is getting increasingly difficult to find, the answer lies in innovation.  Good Ops engineers innovated, they automated, and they pushed into the developer space, while frustrated developers watching infrastructure Maytag repair men point-and-click over and over again pushed into the Ops space and DevOps was born.  The value is in the developer’s ability to deliver Ops elegantly.  I am so happy to see a robust API get more attention than a slick GUI; this is the way it should be and should have always been.

To be successful a first mover (innovator) you have to identify the problem, develop (innovate) the solution, evangelize (create and sell) to the market often defining both the problem and the solution, etc…

“Fast followers” enter defined markets, they still need to innovate, but their innovations focus on optimizing the delivery model, reducing COGS, lowering cost of sales, etc…

Innovators and “fast followers” have always existed, these are the two entrants discussed in “The Innovators Dilemma”, no other viable market entrant has ever really existed, so I started thinking about what feels so different?  Why are so many people, today, comfortable blaming market conditions for failure?  I arrived at this conclusion; the market for “followers”, who are not “fast followers” looking to innovate differently is contracting fast, nearing extinction.  These followers are the paint-by-number crowd, the crowd who thinks executing the best practice as defined by the innovator can deliver value.  They have convinced themselves of this; thus they are working hard and not understanding why it’s not yielding results, the only rationalization is to blame that which you have no control over aka “a changing market”.

Image result for fast follower examples

I started crystalizing these thoughts in the context of my everyday.  It’s common to watch businesses seemingly do all the right things, at least from a going through the motions perspective but continue to fail and contract.  Why is this?  Running a business, an entire company (aka the corporation) or a piece of the business (aka a division of a company) requires decision making.  Required decisions are increasing in complexity, frequency, and velocity.  Decision-makers need to be able to quickly identify all the constants, variables, constraints (these are vast, objective and subjective and often not obvious), then render a decision and move on.  Traits, like the ability to maintain focus on macro-level objectives and not trade off strategic initiatives for tactical wins, the ability to quickly assess risk and calculate opportunity cost, act decisively and expediently are a few traits that separate those who are accepting and capitalizing on market change from those who are whining about it.

It’s incredibly important to have a business plan, to understand your cost structure, your team objectives, and how each team member will contribute to the macro-level goals.  Personally, I always expect exponential returns and strive for combinatorial explosion; I don’t like linear ROI models.

The challenge is obvious; it’s easier to find people who can paint the Mona Lisa using a paint-by-number model vs. finding the next Leonardo Da Vinci.  Enter the value of apprenticeship, a long-term strategic vision and a culture focused on exponential returns.

Image result for mona lisaImage result for versusImage result for mona lisa paint by number

The same but very different!!

The Mona Lisa paint-by-number kit sold at Walmart sells for $44.99, and I’d be willing to bet that there is a depreciation of that $44.99 expenditure once the kit is opened, and even further depreciation once the painting is complete.  The Mona Lisa painted by Leonardo Da Vinci has an estimated value of $790 million dollars.  Now the Mona Lisa painted by Da Vinci is an example of an innovator, the paint-by-number vs. innovator idea is here is pretty easy to follow.

There is a great blog that outlines five reasons why the Mona Lisa matters in the context of business and value creation.

  • Originality:  Doing something new and different requires a belief in a vision and a willingness to take risks to get there.
  • Uniqueness:  The way you execute creates uniqueness, if your not unique your painting-by-numbers, even if the lines and numbers aren’t on the canvas.
  • Time:  Perseverance, personal investment, the desire to deliver a masterpiece, etc…
  • Reputation:  Self-explanatory.  Subjective and objective reality are critical aspects of building a reputation.
  • Mystery:  There has to be mystery.  Ever hear the saying “when there’s mystery, there’s margin”.  If you in a business where all the mystery has been removed you need to find a new business, and fast.

There is a thriving “fast follower” ecosystem, where innovation is happening it’s just happening differently, with a different purpose,  admittedly “fast followers” are often viewed as “bad actors” within their communities.  These communities range from art to technology.

An example of a “fast follower” in the art community might be Mr. Brainwash, for those who have seen “Exit Through the Gift Shop” you’ll know what I am talking about, for those who haven’t watch the documentary or do some reading on Mr. Brainwash.  The tech sector is full of “fast followers” but they are also innovators in their own right.  Amazon is an innovator and a “fast follower”, their use of Open Source and lack of contribution back to the community has them labeled as an Open Source “bad actor”.

The success of any organization today requires a mix of traits that are not easy to find, develop or maintain but thriving organizations possess these traits.

  1. A shared vision and goals.
  2. A mature culture that trusts in and puts the organizations goals above individual goals.  Realizing that individuals win when the organization wins.  Superstar over achievers possessing OCD welcome, heroes need not apply.
    Quick litmus test is to rank the following motivators in order of personal fulfillment:  Praise, Money, and Accomplishment
  3. Laser focus.  Only entertain tactical diversions which contribute to strategic objectives.
  4. The existence of a concise and transparent measurement system that can be leveraged at all levels of the organization to metric and drive individual impact in the context of macro goals.
  5. An expectation that the organization should be self-directed, self-healing and self-managing.
  6. Grit, hustle, drive, relentlessness, belief, etc…, etc… wrapped in humility rather than hubris.
  7. An insatiable thirst for differentiation through collective intellect.
  8. Results-oriented. (results are empirical and defined but the concise and transparent measurement system mentioned above)

The bottom line is without a strategy all that is left is excuses.

I’m a skeptic, satiated by large raw data sets, analysis & inference

Speak to anyone who knows me, and they will likely characterize me as a skeptical, pessimistic, anxious, intense, and persistent individual.

If someone sends me a spreadsheet and then calls me to walk me through the numbers my immediate assumption is that the purpose of the follow-up call is to shape my perception. If someone provides me a composite of the figures without the raw data, visible formulas and documented data sources, I also assume manipulation. With this said I am a realist, and I am willing to accept manipulation, but I am honest about acceptance rather than convincing myself otherwise. I am just wired to be vigilant.

For me the glass being half-full represents a lack of fear of it being half-empty, I am motivated to refill the glass by the reality that it is half-empty and what is likely an unhealthy fear of dying from dehydration, but it works for me. From my perspective, the half-empty glass is not depressing or a demotivator it is a potential reality. Now don’t get me wrong, I know there is water in the glass and death is not imminent, but I am incredibly aware and grateful for the opportunity to find a water source to refill my glass.

I spend my days listening to dozens of pitches, where I need to focus, why I need to do x or y, what I am missing out on by not doing x or y, etc… The pitches almost always start with a half-full perspective, selling the positive but it’s amazing how when it doesn’t go the way the pitchman expects the approach shifts to the half-empty perspective, relying on FOMO (fear of missing out) as a last ditch attempt at motivation.

Now let’s face it, no one likes to miss out, but as a realist, I recognize that I can’t do everything, so decisions are required. Forks in the road appear every minute of every hour of every day, and I am greeted at each fork by a host espousing the merits of their righteous path. For someone like me, these decisions need to be my own, driven by raw data (as raw as it can be), analysis and inference. I try to check the near-term outcomes at the door and focus on and visualizing the long-term strategic outcomes, the vision. In my mind tactical activities require little to no thought, they just happen. For example, a visionary looking for a more sustainable model for garbage disposal doesn’t stop taking their garbage to the curb every Monday and Thursday. Accepting what is and executing without much thought IMO avoids paralyzation and makes room in your life and brain for what will be.

So now we arrive at the origin of this blog. I have to make personal and professional bets on where the market is going, what is most relevant and where I should focus my time. Of course, I have a subjective opinion on where I believe the market is going but I like to validate my opinions with some data and with so many people, organizations and news outlets selling their version of the future the question becomes, how do I validate my opinions objectively. Social chatter is meaningful to me as is sentiment analysis. The great news is with a little Python, the use of some APIs and the ELK stack it’s pretty easy to collect data from social media platforms, store it, analyze it and draw some conclusions. One such question that is very relevant to me is what technologies and what OEMs (original equipment manufacturers) have mindshare? I’ve been pulling social media data for a few weeks using #hashtags to see what techs and OEMs have the most buzz; I have also been doing sentiment analysis to see if the buzz is good or bad.

Here is my view of the market using social media buzz to determine mindshare (it actually feels pretty on the money):

The world has changed, are you paying attention?

This blog is the result of a restless night where I pondered a recent event where the idea (or existence) of NOC (Network Operations Center) was conveyed as a key component of the ITSM (Information Technology Service Management) paradigm. I find this to be an uber interesting topic and position given that the world has moved (and continues to move) in every way from a centralized to a disaggregated and distributed model. I believe this is true in computing (think cloud, microservices, twelve-factor apps, etc…) and it’s true in the area of human capital management and service delivery.

I thought I would share some of my opinions on the topic, my position as well as some anecdotes that I believe support my thoughts.

First, let me start by saying that we are engaged in a war, a war for human capital, a war where the best knowledge workers don’t look anything like what they looked like twenty years ago, they live in the shadows, digital nomads inhabiting a digital universe.

When I think NOC, here is what I envision:

Image result for wargames pic

The above is a picture of the NOC from the movie WarGames which was released in 1983, this was cool and impressed the audience, but it was 35 years ago! It’s probably obvious from looking at my blog header that I am a big WarGames fan. Let’s stay with the Hollywood portrayal of tech for a moment because I think it’s relevant.

Fast forward from 1983 to 2001, 18 years later, and the NOC has given way to the lone hacker, with umteen monitors (quite a setup) working alone to High Voltage by The Frank Popp Ensemble.

Disaggregation and decentralization have become a pervasive theme, message and a way of life. Nowhere is this more evident in than in the Open Source community. Disaggregation and decentralization coupled with a shifting culture that has shifted the motivation of the knowledge worker has given way to an unprecedented pace of innovation which would otherwise be impossible.

The Open Source statistics are truly staggering: https://octoverse.github.com/

Couple what the Open Source movement has taught us about the power of disaggregated and decentralized teams with “the surprising truth about what motivates us” and you’ll realize that the disaggregated and decentralized cultures being built are unlike anything we could have dreamed. The passion, commitment, engagement, communication, execution, and velocity are astounding.

Ask yourself where people (yourself included) go for help, how they build communities, what are trusted sources of information, etc…
Where do developers look for help? StackOverflow, Slack, IRC, Quora, etc…?
Where does the average person look for help? Facebook, YouTube, Twitter, etc…?

These are all platforms which enable the construction of disaggregated and decentralized communities which create cultures, subcultures, increase engagement, provide better time to value, etc… Are there no lessons to be learned here? There are lessons to be learned, and many are learning these lessons and adapting their engagement models.

I am a techie, and I believe that substance will always prevail over style and the question I continually ask myself as I adjust to keep up with a market which is innovating and changing at an unprecedented pace is how to define the culture? Is what we are doing relevant today and does it put us on a trajectory to where we’ll need to be in 24 months?

And now we have arrived at my thoughts regarding the NOC.

JetBlue made a bold move (which others followed) shifting from reservation call centers to hiring booking agents who work virtually, and their customer service is consistently rated the highest in the industry.

Relevant business models do NOT focus on resource colocation; they focus on resource capability, availability, and collaboration. I would go as far as to say that colocation favors style over substance.

The cultures we build need to focus on leveraging technology to deliver a great total customer experience (TCE). I believe that a 5.3” screen in the hands of hundreds of engineers, elegant engagement processes, procedures, and tools deliver a better TCE than a 60” monitors on the wall in a room with ten engineers with landline phones. Cultural agility over environmental rigidity.

The focus and value here is NOT a finite set of L1, L2 and L3 shift workers in a NOC. Big screen TV’s on the wall, the Bat Phone and people sitting at a desk are style decisions which have no direct correlation to the ability to deliver substance. Our focus needs to be on how to engage and nurture the best knowledge workers the market can offer. Our mission needs to be the creation and cultivation of a culture which fosters engagement. Our ability to engage and escalate to a subject-matter expert (SME) at any time, to improve the TCE by building equitable partnerships which deliver distinct value, with a meaningful escalation path that focuses on MTTW (Mean-Time-to-Workaround) while in parallel determining a root cause and resolution lies in our culture.

We must understand that the world has changed.  We live in a world where seemingly forward-thinking paradigms are obsolete before they are implemented.  The path to success relies on agility and accountability, not rigidity and responsibility.

The market is depressed it’s not the “Depression”

The market had a huge day today with the Dow rising 936 points!  Following the biggest rise in the markets history I thought I would post a few figures from Friday for all the people who pulled thier money out the market last week in a panic and who most likely are questioning the move after today.

As of Friday’s closing bell the US martkets were down ~ 18%, EU markets ~ 25% and the Asian markets ~ 30%.  While an 18% slump sucks is it enough to make me panic?  Well let’s look at a few additional indicators today vs. the "Depression"

  Today "Depression"
Industrial Production

Down ~ 1% Down ~ 44%
Unemployment ~6% ~ 25%
Housing payments in the rears ~ 4% ~ 40%

The market conditions today are a long way from the conditions in 1929.  Things look much closer to the market crash of 1973/74 where the Dow lost ~ 45% of its value.

Now let’s look at an investmetment made in 1972 and present day value.  This would be representative of a thirty somethings investment today with a retirement target cashout.

I looked at an investment in XOM made 01/70 and sold today, even following last week the gain is ~ +843%.  The Dow inthe same time perios is ~ +1060%.

I also looked at the past 10 years from Oct 1998 til today which shows a ~ +107% for XOM and ~ +19% for the Dow.

Conclusion, the market is not a place to put your cash if you need liquidity.  As a long term investment the market is the place to be!

Thin Provisioning 101

I was asked by someone for a quick overview of Thin Provisioning and when I view this as an valuable or applicable technology.

So let me start with a quick simplified visual overview of both Thin Provisioning and Traditional (aka Thick)Provisioning.

thin_provisioning_101

So now that you understand the concepts of Thin Provisioning and Traditional (aka Thick) Provisioning let me quick talk about the only where I see Thin Provisioning as a valuable technology.

I look at Thin Provisioning in the same way that a Disaster Recovery (DR) provider looks at taking on new customers.  If storage is your business (i.e. – You are offering a shared storage model to customer co-located in your data center) then Thin Provisioning may be a key ingredient to your business model.  Let me expand on this, DR providers like SunGuard oversubscribe their data centers hedging that 100% of their customer will not have a disaster at the same time (BTW this has happened and put the provider out of business).  Thin Provisioning works in the same way by providing the user with the belief that they have 100% of the capacity while in fact the capacity may be over provisioned and the storage service provider (SSP) is hedging that 100% of the user co-located on the storage array will not demand 100% of the resources at the same time.

There are some very minor management benefits that I outlined in the pictorial above but IMO given some of the pitfalls associated with Thin Provisioning these do not provide a compelling reason consider Thin Provisioning.  Read and interesting article here that outlines one very real issue encountered with Thin Provisioning and NTFS.

So in conclusion if you are SSP of some sort consider Thin Provisioning otherwise go thick or go home 🙂

Thoughts on Super Bowl XLII

imageCongratulations to the NY Giants and NY Giants fans. Unbelievable showing!!!! Yes, the Giants looked possessed on defense but did the Patriots look like an 18-0 team? I have a theory on this; read on.? Regardless I could have cared less who won the game, neither team is my team and I did not have money on

the game, but everyone loves the underdog and frankly I am so freakin tired of the the Patriots. Congratulations to Eli Manning MVP of Super Bowl XLII. I would like to offer up that there should be a co-MVP or at least adefensive player of the game should be named… can

you guess who? image Gisele Bundchen co-MVP Super Bowl XLII  Maybe next year Tom Brady should leave Gisele at home. While she is not a bad trophy / consolation prize it may be a bit easier for Eli to share his trophy (The Vince Lombardi Tropy, shown here on the right) with the team and the fans. Congrats NY!

The perception is emerging but the reality is legacy…

The market perception seems to be that iSCSI is gaining tremendous steam and many customers who would have adopted Fibre Channel as their interconnect a year ago are now adopting iSCSI. I would agree that this is the case but iSCSI is not an emerging interconnect and might be better classified as a legacy interconnect which is now experiencing a newsworthy adoption rate. An early concern of iSCSI was the performance penalty associated with TCP operations and the impact of software based initiators, this spawned the TOE (TCP Offload Engine) which would offload TCP calculation from the system CPU to a dedicated onboard processor dedicated to TCP operations, today most iSCSI implementations leverage software initiators (today system CPU resources the most part are so under utilized that most environments will never notice a 10% CPU utilization increase that may be associated with iSCSI). Some vendors addressed the TCP concern by modifying the iSCSI protocol to ride on UDP as opposed to TCP thus increasing performance via proprietary protocols which resemble iSCSI (i.e.- LeftHand Networks, HammerStorage and Zetera). It is important to note that LeftHand has pretty much abandoned the proprietary protocol they started with and has now adopted the iSCSI standard, it is also interesting to note that their adoption rate seems to have increased since doing this, when you are a new player I think evangelizing your protocols superiority over the standard is probably a tall order. LeftHand has also changed morphed their business into a software play and embraced the VMware Virtual Appliance Markerplace as a way to propagate their technology.

There are a number of emerging interconnects that technologically out shine iSCSI the question is how quickly can the market makers move to adopt these technologies, are the market makers interested in accelerating the adoption curve? This is a complicated question, on one hand you could argue that technologies are more stable once they have been around longer (I remember reading papers on iSCSI in 2001, did it really take 7+ years to get iSCSI to where is today?) on the other hand if the market makers validate these technologies too early they run the risk of fierce competition from more nimble startups. It is a complex problem, my feeling is that for the most part the adoption cycle is slowed by the market makers as a way to recoup development cost and and slow competition. IMO the by product of this is a slower innovation cycle.

AoE and HyperSCSI both offer the interconnect price point of iSCSI without the performance burden associated with TCP. AoE and HyperSCSI ride on Layer 2 and do not experience the protocol overhead associated with Layer 3 protocols. SoIP (Storage over IP) uses UDP as the transport protocol as opposed to TCP. We are also seeing the emergence of iSER and iWARP, next generation TCP technology that closely resembles Infiniband. iWARP (Internet Wide Area RDMA Protocol) is a superset of the VI architecture and is aimed at reducing the overhead associated with TCP, iSER (iSCSI Extensions for RDMA) maps the iSCSI protocol over RDMA networks like Infiniband or iWARP. iSER address the overhead associated with TCP and out-of-order packet delivery. How many of us hear about these protocols?

It seems that the majority of marchitecture effort is being put into FCoE (Fibre Channel over Ethernet). Major players such as Brocade, Cisco, EMC, Emulex, QLogic, IBM, Intel, Sun and Mellanox have all gotten behind FCoE in a big way so most likely this will be the next big thing in storage interconnects. The emergence of technologies that further leverage Ethernet as the interconnect will change the game, it is hard to imagine that Cisco’s dominance will will not continue to grow. It is likely that more storage services (i.e. – replication, snapshots, etc…) will be handled at the network layer, as these services move into the network layer we will continue to see the further commoditization of the storage market. Should be interesting to watch over the next few years.

While on the surface protocols that sit on top of Layer 2 (AoE and FCoE) may seem to be superior there is a tremendous amount of functionality that is provided at Layer 3 so it is not a forgone conclusion that FCoE will will the battle. Right now the only forgone conclusion I can see it that Cisco wins regardless, the others will be battling for a piece of the pie. But who knows anything can happen, after all this is technology.

Oracle Storage Guy: Direct NFS on EMC NAS

I have been chomping at the bit to test VMware on dNFS on EMC NAS for a couple of reasons.  A number of my customers who are looking at EMC NAS in particular the NS20 would like to consolidate storage, servers, file services, etc… on to a unified platform and leverage a single replication technology like Celerra Replicator.   dNFS may offer this possibility, .vmdks can now reside on the a NFS volume, CIFS shares can be consolidated to the the NS20 and all can be replicated with Celerra Replicator.  The only downside to this solution that I can see is right now the replicated volumes will be crash consistent copies but I think with some VMware scripting even this concern can be addressed.  I hope to stand this configuration up in the lab in the next couple of weeks so I should have more detail and a better idea of is viability shortly.  You may be wondering why this post entitled Oracle Storage Guy…… the answer is I was searching the blogsphere for an unbiased opinion and some performance metrics of VMware and dNFS and this was the blog that I stumbled upon.

The performance numbers I have seen for VMware on dNFS come very close to the numbers I have seen for iSCSI, both technologies offer benefits but for the use case I mention above dNFS may become very compelling.  I recommend reading this post Oracle Storage Guy: Direct NFS on EMC NAS, is offers some great commentary on the performance characteristics and benefits of dNFS.

The Cache Effect

Following a fit of rage last night after I inadvertently deleted 2 hours worth of content I have now calmed down enough to recreate the post.

The story starts out like this, a customer who recently installed a EMC CX3–80 was working on a backup project roll out, the plan was to leverage ATA capacity in the CX3–80 as a backup-to-disk (B2D) target.  Once they rolled out the backup application they were experiencing very poor performance for the backup jobs that were running to disk, additionally the customer did some file system copies to this particular device and the performance appeared to slow.

The CX3–80 is actually a fairly large array but for the purposes of this post I will focus on the particular ATA RAID group which was the target of the backup job where the performance problem was identified.

I was aware that the customer only had on power rail due to some power constraints in their current data center.  The plan was to power up the CX using just the A side power until they could de-commission some equipment and power on the B side.  My initial though was that cache the culprit but I wanted to investigate further before drawing a conclusion.

My first step was to log into the system and validate that cache was actually disabled, which it was.  This was due to the fact that the SPS (supplemental power supply) only had one power feed and the batteries where not charging.  In this case write–back cache is disabled to protect from potential data loss.  Once I validated that cache was in fact disabled I thought that I would take a scientific approach to resolving the issue by base lining the performance without cache and then enabling cache and running the performance test again.

The ATA RAID group which I was testing on was configured as a 15 drive R5 group with 5 LUNs (50 – 54) ~ 2 TB in size.

Figure 1:  Physical disk layout

R5

My testing was run against drive f: which is LUN 50 which resides on the 15 drive R5 group depicted above.  LUNs 51, 52, 53 and 54 were not being used so the RG was only being used by the benchmark I was running on LUN 50.

Figure 2:  Benchmark results before cache was enabled

Pre-cache

As you can see the performance for writes is abysmal.  I will focus on the 64k test as we progress through the rest of this blog.  You will see above that the 64k test only push ~ 4.6 MB/s.  Very poor performance for a 15 drive stripe.  I have a theory for why this is but I will get to that later in the post.

Before cache couple be enabled we needed to power the second power supply on the the SPS, this was done by plugging the B power supply on the SPS into the A side power rail.  Once this was complete and the SPS battery was charged cache was enabled on the CX and the benchmark was run a second time.

Figure 3:  Benchmark results post cache being enabled (Note the scale on this chart differs from the above chart)

Post-cache

As you can see the performance increased from ~ 4.6 MB/s for 64k writes to ~ 160.9 MB/s for 64k writes.  I have to admit I would not have expected write cache to have this dramatic of an effect.

After thinking about it for a while I formulated some theories that I hope to fully prove out in the near future.  I believe that the performance characteristics that presented themselves in this particular situation was a combination of a number of things, the fact that the stripe width was 15 drives and cache being disabled created the huge gap in performance.

Let me explain some RAID basics so hopefully the explanation will become a bit clearer.

A RAID group had two key components that we need to be concerned with for the purpose of this discussion:

  1. Stripe width – which is typically synonymous with the number of drives in the the raid group
  2. Stripe depth – which is the size of the write that the controller performs before it round robin to the next physical spindle (Depicted in Figure 4)

Figure 4: Stripe Depth

Stripe_depth

The next concept is write cache, specifically two features of write cache know as write-back cache and write-gathering cache.

First lets examine the I/O pattern without the use of cache.  Figure 5 depicts a typical 16k I/O on an array with and 8k stripe depth and a 4 drive stripe width, with no write cache.

Figure 5:  Array with no write cache

No_cache

The effect of no write cache is two fold.  First there is no write-back so the I/O needs to be acknowledge by the physical disk, this is obviously much slower that and ack from memory.  Second, because there is no write-gathering full-stripe writes can not be facilitated which means more back-end I/O operations, affectionately referred to as the Read-Modify-Write penalty.

Now lets examine the same configuration with write-cache enabled.  Depicted in Figure 6.

Figure 6:  Array with write cache enabled

W_cache

Here you will note that acks are sent back to the host before they are written to physical spindles, this dramatically improves performance.  Second write-gathering cache is used to facilitate full-stripe writes which negates the read-modify-write penalty.

Finally my conclusion is that the loss of write cache could be somewhat negated by reducing stripe widths from 15 drives to 3 or 4 drives and creating a meta to accommodate larger LUN sizes.  With a 15 drive raid group the read-modify-write penalty can be severe as I believe we have seen in Figure 2.  This theory needs to be test, which I hope to do in the near future.  Obviously write-back cache also had an impact but I am not sure that is was as important as write-gathering in this case.  I could have probably tuned the stripe-depth and file system I/O size to improve the efficiency without cache as well.