Mapping the VCDX Defense Genome Part 3: Is All Flash a Trap?!

Hey everyone,

So prior to the holidays I was helping mentor some new VCDX candidates for their re-defenses and was coming across a few trends that I wanted to discuss a little as they started creeping up over and over again.  These trends were All Flash SAN’s, Oversized Compute, and questions being asked that were not part of the candidates design.  We’ll get into the last point a little more but first lest tackle the first two.

All Flash SAN

ItsATrap

Is All Flash SAN a VCDX trap?  No of course not the All Flash SAN is obviously a silver bullet when it comes to storage as it greatly reduces the complexity of the storage system while giving amazing performance.  Gone are the days of tiering storage down to the last IOPS, automatic SAN storage migrations for hot data, and hybrid SAN’s, however it comes at a cost.   Although costs are constantly dropping it typically comes at a higher cost when it comes to money and also has some pit falls as far as a VCDX defense goes.

First let’s look Pro’s of an all Flash SAN and how you can defend its cost:

  • Extremely high performance
  • Can run any IOPS load from small to large
  • Reduces administrative overhead as you have less tiers of storage or pools of disk to administrate, create hot spares for, etc, etc
  • Generally runs into less noisy neighbor issues, and peak spike issues
  • Future proof’s the SAN from a performance stand point

Now let’s look at some requirements and or constraints this could fulfill

  • R001: must reduce storage overall management
  • R002: storage must be able to handle all applications and future infrastructure projects without negative performance impact
  • C001: small storage team requires simple storage design
  • C002: storage must be able to handle 20,000 IOPS at peak load times
  • C003: Budget Constraints

Now let’s look at some con’s of an all Flash SAN

  • Cost, typically All Flash SAN’s work out to be more expensive then a hybrid SAN or other SAN solutions, however this is a moving target as prices drop
  • Simpler design which can lead to problems in the defense, we’ll talk about this in a little bit

Let’s break this out into a design decision table

Decision Description
 1.0 SAN – All Flash Vs Hybrid Storage
Decision &

Justification

All flash SAN will provide the best performance, while reducing complexity and management overhead.

A standard or hybrid SAN could fulfill the customer’s requirements however while maintain budget concerns but would not simplify the design or reduce management overhead for the small storage team.

Assumptions Sufficient rack space and switch connections
Constraints Small Storage Team, budget constraints
Risks & Considerations SPoF, Cost
Containment SPOF – Using only one SAN is a SPOF.  This RISK is present in the current environment and has been accepted.  This can be mitigated by implementing a DR site or purchasing another SAN.

Cost – The All Flash Array cost more and will cost more for future expansions if required.  This risk has been accepted due to simplicity of the design, reduction in management, and future proofing in performance considerations.  This risk could be mitigated with a hyper converged option like VSAN or a slower disk SAN for other workloads at a lower cost.

Requirements Met  R001, R002, C001,C002
Design Qualities  Manageability, Performance

Table #1 Donated by Kiran Reid VCDX#225

So as you can see this can be easily defended but the pit fall is due to the characteristics of an All Flash Array it can also be used as an easy catch all.  This is where you might find some questions that don’t involve your design come up in your defense that you were not expecting. This is because things like SOIC, Noisy Neighbor, IOPS sizing, storage pools, Storage DRS, Storage policies, and other deep technical design considerations that typically would show up with a traditional storage array don’t come into the picture.  So to give the candidate a chance to score points you may be asked these questions even though it has nothing to do with your design from a storage perspective.

At the end of the day you have to remember the VCDX is still an exam and the defense panel has a rubric they will be scoring you against.  So if the storage section in the rubric has scoring points for these sections and your design is lacking in these area’s or doesn’t include them because they are not required due to an All Flash Array you may see questions come up that you were not expecting.

So how can you prepare for this?  Make sure you have your design decisions ready to defend the All Flash array both from a design/technical level and also from a business level aligning with AMRPS.  You may also want some back up slides with other storage considerations for the design as talking points or just be prepared to speak technically or at a design level for the other storage considerations.  As the panelists are checking to make sure you have a solid storage understanding from end to end.

Now onto the oversized compute side.

LargeDatacenter

So I only needed 48 cores of processing power and 1TB of memory, but I got a DEAL!

We run into some of the same issues with over sizing compute.  Mostly with it being a catch all instead of “sizing accordingly”  However in most cases most companies when they buy compute they are buying it for 5 years and typically when they go to the well they only want to go to the well once if possible.  So they will buy big or over size.  Or there could be a number of other reason as to why more hardware was purchased just to list a few:

  • Server promotion
  • Buy one get one
  • Q4 sale
  • Vendor sales person having a bad quarter and just NEEDS something
  • Buying in bulk

I’m sure there is many more.  This again can be defended as a VCDX design decision but will require some additional considerations.  Let’s create some requirements and constraints to break this down in the design decision table again.

  • R001: must meet current compute sizing based off current state analysis plus 5 years growth at 20% a year growth
  • R002: must be able to absorb newly acquired companies as part of expansion plan
  • R003: must be able to meet application SLA’s during peak usage
  • R004: Reduce management overhead where possible
  • C001: Solution must have consistency and stick with one vendor
  • C002: Budget Constraints

 

Decision Description
 2.0 Compute – 24 Blades vs 12 Blade deployment
Decision &

Justification

24 Blades was selected and accepted as the solution to meet the current requirements as it could more easily absorb the 20% year growth with the unknown additional growth of new company acquisitions as part of the company’s expansion plan.

12 blades could have been used to meet the requirements however this would of introduced additional risk due to R002 ambiguity in sizing information.   This could lead to additional sudden compute purchased which lead to a larger upfront purchase decision.

Assumptions Sufficient rack space and switch connections
Constraints Compute solution must have consistency and one vendor, budget constraints
Risks & Considerations Cost
Containment Cost – Purchasing 24 blades over 12 blades violated C002 budget constraints, however by purchasing 24 blades it reduced the risk of R002 while being able to better fulfill R003 and R001.  Reducing R002 risk could also lead to a lower TCO as last minute compute purchases could lead to higher overall costs.  This risk, constraint violation and risk mitigation plan was all accepted.

By sticking with one vendor and buying models all at the same time this will remove additional patching, firmware, and update management

Requirements Met  R001, R002, R003, R004, C001
Design Qualities  Manageability, Performance, Availability

Table #2 Donated by Kiran Reid VCDX#225

So again we can defend this design decision based off AMRPS and requirements / constraints, but we you will also need to be prepared to answer questions around the standard compute questions like vCPU ratios, NUMA, Resource Pools, limits, Admission controls, TPS, and how these all get effected by having  large compute cluster.  Many times we can fall into the trap and avoid a lot of these hard compute design choices by over buying but just be ready to answer these both if they are in your design and if they are not.

I was going to try and fit another discussion piece on the VCDX is still and exam how it could be broken down from a point’s perspective but seeing as this blog post is already getting longer I’m going to break that out into the next one, so stay tuned for Mapping the VCDX Genome Part 4: Points breakdown

So in conclusion if you use an All Flash Array be prepared to possibly answer sizing questions on a traditional array / process to score additional points.   If you have an oversized compute design choice also be prepared to answer questions on how this can affect the technical design choices.

As always discuss encouraged and any feedback is always welcome

Till next time!

 

**Update**
Okay, I’m going to get ahead of this one before my twitter and mail box fills up :p  Yes the cost of All Flash Arrays has come down dramatically in the last few years along with the drive sizes are now starting to get on par with the SAS drive sizes as well, so the All Flash Arrays are now finally starting to get to the point of mass consumption.  None the less the liberties that All Flash Arrays give us allows for a lot less design work then once had to take place.  So until the storage standards and VCDX rubric changes, you’ll still need to know and be prepared for the other sizing or design mythologies of standard arrays.

Thanks everyone for the feedback and discussions!

 

Be the first to comment

Leave a Reply