Back 2 VSAN Basics


I see VSAN “in the wild” on a daily basis and very often I get questions about the basics of VSAN. Don’t get me wrong, questions are absolutely a good thing!  What I also see, is that people often ignore the practices and assume it will be OK with a next/next/finish. You guessed it right: This doesn’t go well!

Therefor I have decided to start with ‘a VSAN basics’ blog series.

  • What is VSAN
  • What’s important to run VSAN well
  • And why is it important for the success of your project

Let’s start with a bit of history

For the last decades we have been procuring new storage in the same way as we used to do in the previous renewal cycles. That’s neither a bad thing nor is it a good thing. But there is a constant in this story: Until some clever people came up with HCI, we have been doing nothing else but adding new features to a classical storage system and this classic storage system has become more and more complex. One of the reasons for these “complexity”tendencies was the adoption of the server virtualization technology.  Suddenly we had to support a lot more on that classic storage: more machines had to be connected to the same storage enclosures; more storage had to be provided via classical methods (LUN, NFS, etc..) to an exploding number of virtual machines running from the same hosts; and then disaster recovery and disaster avoidance were added to the mix and this doubled up the complexity of the storage management again. However, when complexity increases, you need more skills to keep up with the changes and you still need to be able to understand what the hell you are doing to keep up with all this new stuff flooding in your already crammed ‘silo of managing all things’.

Welcome in the simplified era

A couple of years ago VMware announced a framework where OEMs could contribute to. It is called VVOLS.  The philosophy behind VVOLS is that the storage management would be less painfull and instead a storage policy would be assigned to a VM. This policy then would dictate to the underlying storage where which part of a VM should be stored at, in which form it should be stored and which SLA would be applied to it.

At VMware we believe that storage policies are the way forward because they simplify the way we can look at and think about storage and they make storage and storage operations enterprise ready! With VSAN we are walking down the same road and we have reorganized the way you can manage storage in such a way that it is totally different but easier to manage when compared to managing traditional storage. OK, it might take a while for your grey matter to become reprogrammed with this new way of thinking about storage and to familiarize yourself with these new storage concepts so I will introduce you to one VSAN concept at a time.

How does VSAN make your data Fault Tolerant?

For software defined storage it is important to distribute the data over multiple hosts. As a bare minimum you have to build in some fault tolerance. In VSAN we call this fault tolerance ‘Failure To Tolerate’ (FTT) . You will notice that a lot of the terms like RAID are used in VSAN as well, however, VSAN FTT has nothing to do with hardware RAID but the methodology is something we still use. The analogy with RAID used in a classic storage is really nice because  everyone knows how it works…

RAID 1 makes a complete mirror of the data whereas RAID5/6 works with parities across hosts in the cluster.

pic1
Now, to make this blog post less confusing , let’s assume that the VSAN provisions ‘all flash’  and I’ll dedicate a seperate blog post to a hybrid VSAN configuration in a next post.
Of course we need to store the data somewhere. It is really important to understand the flow of the data. 2 concepts are important in this case: Cache disks and Capacity disks.  At least one cache disk and one capacity disk are always in the game.
We use cache as a write-back buffer: All writes are done on the cache disk first, this is where the application writes that is hosted by a VM, and then is destaged to the capacity disk. Before committing we need to fullfil the policy set forward for the object.
The data is destaged from cache to capacity based on an algorithm. 1 cache disk and at least 1 to 7 capacity disks make up one disk group. We can host up to 5 disk groups in one host.
In a FTT1 RAID 1 data would be spread over 3 hosts. 2 hosts would have actual data another host would host the witness. Screenshot underneath to illustrate this…

pic2

You might have noticed. Screenshots in this post are all from the new clarity UI. I made some promises in this blogpost I’ll make another one – I will make one around the HTML5 client as well. While writing this blog I found out there are multiple great improvements in this UI!

Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.