Testing VSAN 6.7 network outages on Ravello is easy!


For quite some time I have been thinking about setting up a new virtual VSAN lab on Ravello and Oracle Cloud Infrastructure, aka.: OCI. There is not better time to get things done than the present, especially since vSphere 6.7 and VSAN 6.7 have been released earlier in the month.

First: a big thank you to Ian Sanderson and Raff Poltronieri for posting to get vSphere 6.7 to work on Ravello / Oracle OCI. You can read about that right here.

However, I was not really pleased with the network setup on my hosts. For starters, all my interfaces had been put on the same network, both the MGMT vmnics and the VSAN vmnics. I decided to change this. Starting today, each host has a total of 4 vmnics. two of these will be used for MGMT, vMotion, VM Network etc.. and the two remaining vmnics will be used for VSAN networking. MGMT, vMotion, VM Network will use the 10.1.0.0 network and VSAN will use the 10.2.0.0 network.

Ravello Network configuration

On each host, the interface configuration now has been setup similar to this:

Of course this also means that I had to make some changes on the network side in Ravello:

To make stuff happen I had to setup an additional 2 switches and subnets in the Ravello networking part in the application (collection of ESXi servers, vCenter servers etc…). I did create  a switch and choose to create a subnet at the same time. It looks like this:

switch1

I did the same for the 10.2.0.0 network.

Next I did add a VLAN tag of 10 to the 10.2.0.0 subnet. This subnet will be used for the VSAN network.

switch2-VLAN TAG10

vSphere network configuration

The other bits and pieces are the typical ESXi and DVS network configurations. I choose to use the untagged DPortGroup (I know, bad Kim!) for the MGMT network and choose to setup a VSAN portgroup for VSAN traffic. However, I did tag the VSAN port group with VLAN TAG 10, matching the Ravello network configuration mentioned above.

So for 4 hosts, where each hosts has 4 uplinks,  my DVS looks like this:

DVS

VSAN PG

You cannot see it in the picture, but the VLAN tag has been set to 10 for the VSAN port group.

Both DVS port-groups have been configured in active-active, and ‘Notify switches’and ‘Failback’ have been set to ‘Yes’.  I am leaving Failback for the VSAN network also set to Yes because I am  going to cause a network outage and want to monitor what happens. How easy will the second vmnic take over from the primary vmnic once it goes down?

Causing network issues by bringing an interface down

Now to bring an interface down from the Ravello lab is quite easy. Go to the settings of one of the NICs currently used for one of the hosts, and change the adapter type to a type which is unsupported for ESXi.

NICerror

This will cause the following errors to happen in VSAN:

The host will disconnect and VSAN will trow some errors:

host error

However within a very short time, the hosts will connect again, and you will see that the secondary adapter for VSAN networking has taken over. No data resync has happened, because the failover happened too quickly. Please ignore the two warnings below, one is because Update manager hasn’t been properly configured the other oone is because the SCSI controller used by Oracle isn’t supported.

host correction

On the DVS you will see that uplink 2 is only using 3 vmnics instead of 4.

vmnic

Works like a charm if you ask me!

 


2 Comments

  1. FYI: If you wonder why I am disabling an adapter by using an unsupported one. I do this because otherwise I have to do the configuration of the innitial adapter again. Selecting an unsupported one and leaving the configuration in place, saves me time. Kim

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.