Moving from vSphere 5.5 to vSphere 6.5 Part 1 #Upgrade #Migration #Fresh #vExpert #vSphere #VCSA


So I have been meaning to write a post on this for a while, so I thought I better get started!

So let me tell you a story!

The Beginning 

Now as we all know vSphere 5.5 went EOL a little while ago. So in preparation, I was investigating the various options that were available for the business, to keep them supported.

When I first joined the business years ago. it was all a vSphere 5.1 environment that didn’t get patched often and hosts had very very long uptimes. the reason I was brought on board was to get a handle on this and bring it all into line.

One of the first jobs was to actually re-ip the whole environment at Site A for various business and technical reasons. I spent a lot of time planning and testing this by cloning out the environment into an isolated one domain controllers and all and documenting every step and the hiccups along the way until I had a solid process I could present to the business.

After that was done vSphere 5.1 was going EOL and vSphere 6.0 had been out for a little while and was at U1, I tried to test upgrading it and I kept hitting errors. I had a case logged with VMware support and after 2 weeks of pestering them, I managed to find out there was a known issue with no resolution as of yet. All my prep for the 6.0 upgrade then helped me get them to 5.5 with ease at both sites and just before 5.1 went EOL.

I updated the firmware on all the hosts and upgraded them from 5.1 to 5.5 with no real issues. Now everything was much more standardized within the constraints of what the business already had in place. They had already paid for SRM but hadn’t configured it properly, so I got SRM configured and working so they could meet their contractual RPO and RTO needs and we did numerous DR tests to prove to the business and its customers that we had a solid plan with reports as proof.

So this is the vSphere 5.1 environment that was upgraded to vSphere 5.5:

 

As you can see a lot of Windows in there, Linked Mode, vSphere Replication, SRM, and Veeam. I spent a lot of time discussing with people on the main options I had available to me:

  • Upgrade each site and stay on Windows
  • Upgrade and Migrate to VCSA 6.0/6.5
  • Go fresh with a side by side migration from 5.5 to 6.0/6.5

Ok so upgrading with Windows is supported but it’s deprecated now and if you tell people you are doing that …they will hunt you down heh. But from a business point of view, there is licensing costs and patching.  Also, any bad decisions and choices made over the years carry through.

Migrating to the VCSA was a solid option, but investigating this further I realized that the SSL certs would need redoing at least at one site and any bad decisions and choices made over the years would carry through.

Fresh side by side migration, would allow a full re-architecture of the environment, ditch all sorts of things that had carried through (such as users and config options that no one could justify but were too scared to remove), but this had its own challenges be it SRM/VR and other valuable things such as DRS Rules/tags etc.

Now if I jumped directly to 6.5 then SRM and VR would have to be binned and re-done from scratch because you can’t jump from 5.8 to 6.5 and at every step, the vCenter has to match too. Which made me very sad and this has been fixed in SRM 8.1! Also, Veeam would now see all VMs as new because the IDs in vCenter would change and you would have an increase in storage consumption as a result

So my original thoughts were to go with a migrate to VCSA 6.0, because it meant I could do it staged and keep SRM/VR. This is when I discussed it with Graham Barker @VirtualG_UK and he bet me a #Nandos meal that I would end up doing a Side by Side upgrade to 6.5. I accepted that bet because I love any excuse to eat at Nandos!

As time went on and I discussed with the business, it became clear that going clean to the VCSA would be the longer but better option. Previously made decisions by the original people who installed it at 5.1 could be binned, we could remove a bunch of Windows VMs and migrate from SQL Enterprise to SQL Standard. Basically re-architecting with a bigger emphasis on simplicity and security. even if I wasn’t around, people could use it/patch it without fearing it would die on them, and any issues rollback would be a lot simpler. So I lost the bet!

I have taken him to Nandos and the debt is now paid in full!

One thing a lot of people forget that before 6.7u1 if you had external PSCs backing them up correctly and restoring them correctly, was not a simple process and you could run the risk of corrupting your SSO domain. This has now been fixed in 6.7u1! Also External PSCs are no longer required for ELM in 6.5U2 and 6.7 either. 

https://docs.vmware.com/en/VMware-vSphere/6.7/rn/vsphere-vcenter-server-671-release-notes.html

“With vCenter Server 6.7 Update 1, you can restore external Platform Services Controller instances which are replicating data with other external Platform Services Controller instances. This includes restore of external Platform Services Controller instances in all topologies supported in replication mode. The external Platform Services Controller being restored syncs with active peers or if no replication partner is available, it is restored to a backed-up state.”

Going clean would allow me to ditch things such as various logins, rules, tags and allow me to simplify things. In doing so from a patching, cost, and auditing point of view everything would be better. The main things I had to do is plan the switch over to avoid disruption where possible and I could bring the side by side environment up and get it configured without having to touch the current environment.

One of the main business requirements going this route was to provide an interim DR plan to cover off how to failover while SRM was out of commission, even though it would be a short switch over, that still needed to be covered off. This was a lot easier than you would think because while I was setting up the original DR plan, all Live Production VMs were put onto their own dedicated datastores. Now, this was done so we could use SRM with Dell Compellents SRA to failover. Now if you know anything about the OLD Dell Compellent SRA is that it was VERY BAD, to the point I told the business I could not trust it to failover over correctly. Once I made this clear to the business we used vSphere Replication wherever we could. So now that we were binning VR and SRM all I had to do was enable array-based replication on the datastores and document how to bring the replications online in the interim. Since I had used seeds when I disabled VR the seeds remained, now this helped me because when I came to reconfigure in 6.5 the seeds were all there and good to go!

I would like to point out the NEW Dell Compellent SRA is MUCH MUCH better and works as expected, they have clearly put a lot of time and effort into sorting it out.

Some key things I had to work out were:

  • Create new VSS switches and migrate VMS from the VDS to the VSS before moving the hosts to the new VC. As we all know a VDS is a local vCenter construct. The only supported way of moving hosts across is to remove them from the VDS and have everything on standard switches. people have managed to get it to work but that is unsupported and any glitches are hard to sort out.
  • Then moving VMs across to the VSS on each host.
  • Copying the vSphere Tags across
  • Copying over the DRS Rules
  • Copying over the folder structures
  • Copying over customization scripts
  • Applying all the tags back again
  • Moving the VMs into the correct folders again
  • Moving everything back to the VDS
  • Reconfiguring Veeam correctly
  • Redeploying SRM/VR and getting it up and running

Now onto Part 2!


Be the first to comment

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.