When you use SRM (in my case SRM version 5.8.1) in combination with VR to protect your VMs, SRM will own these VMs. You will not be able to recover these protected VMs using vSphere Replication on its own, as long as they are SRM protected. This became an issue for us, simply because we had an issue where the SRM service would fail to start at the DR site. Working with VMware support we got it working again but neither VMware Support or myself really knew what the root cause was.
A little SRA and SRM history
We are using SRM in combination with SRA’s and had a Recovery Plan that was stuck in partial re-protect mode and no amount of fiddling would get it sorted. To bypass this SRM protection, VMware support provided me with a tool that interrogates the SRM database and which removes all references to a specific Recovery Plan/Protection Group. This tool worked well and removed all references, but as soon as I used it, the SRM service at the DR site would not start, and the SRM logs were of no help figuring out what happened. We reverted the SQL DB from backup at both sites and re-did it…again and again and again. Every time we had the exact same result at which point VMware said all that was left was to do a fresh install from scratch (urghhh).
Oddly enough when I tried it the next day it did work! The services started and no matter what I did, no matter how many times I tried to replicate the issue, I simply couldn’t.
VMware still cannot explain why this happened and neither can I. As great as it is that it is working, not knowing the root cause, troubles me quite a bit!
A VR solution
So to mitigate this issue from happening again, I moved away from array based replication and moved to VR where possible, because I didn’t want to rely on the SRA (everyone I speak to regardless of vendor has issues with the SRA), these days it seems to be recommended that if you can use VR you should as it just works.
So by using VR we knew that we could use it with SRM and also use it without SRM and recovery them one by one using VR on its own if needed. Worse case we could add the replicated VMs to the inventory ourselves.
Anyway I noticed that when clicking on a VM to recover on its own, that was also protected by SRM…..I simply couldn’t.
This puzzled me fore a while as I have always assumed that VR worked with SRM but could always be run independently regardless. I spoke to people using 6.0 and they said the option to do it manually was there but for some reason I just couldn’t no matter what I did.
So after much digging and asking around no one really knew and it was puzzling, I knew if I removed the SRM protection I could recover the Vm using VR manually fine.
So after digging around int he SRM 5.8 admin guide on page 114 I found this:
http://pubs.vmware.com/srm-58/topic/com.vmware.ICbase/PDF/srm-admin-5-8.pdf page 114
Change vSphere Replication Settings
You can adjust global settings to change how Site Recovery Manager interacts with vSphere Replication.
- In the vSphere Web Client, click Site Recovery > Sites, and select a site.
- On the Manage tab, click Advanced Settings.
- Click vSphere Replication.
- Click Edit to modify the vSphere Replication settings.
Allow vSphere Replication to recover virtual machines that are included in Site Recovery Manager recovery plans independently of Site Recovery Manager. The default value is false.
If you configure vSphere Replication on a virtual machine and include the virtual machine in a Site Recovery Manager recovery plan, you cannot recover the virtual machine by using vSphere Replication independently of Site Recovery Manager. To allow vSphere Replication to recover virtual machines independently of Site Recovery Manager, select the allowOtherSolutionTagInRecovery check box.
Now see this seems to be exactly what I was after….or so you would think heh.
After changing the setting to “true” at both sites for SRM, the issue still remained. I restarted the SRM/VC/Web service at both sites and I was still unable to manually recover VR VMs that were also protected by SRM.
Next I deleted the Protection Group for my test VMs and then they became tagged by VR again and I could recover manually, but adding them back to a Protection Group tagged them as SRM and I was unable to recover them manually again.
I already had a VMware Support Case open, from when we were dealing with the SRM service dying at our DR site. The engineer seemed to think there was some kind of glitch as everyone in support assumed that you could always recover VR replicated VMs manually in the GUI regardless of whether they were part of an SRM Protection Group.
I did some more digging and spoke to @ he wrote the book on SRM back on 5.0, but he hasn’t worked with SRM in a while, but he put me in touch with GS Khalsa @ who started to look into it and brought it up with the engineering team at VMware.
I decided to have a fiddle with the SRM HOL which is currently @6.1, on initial look it appeared it would let you fail over VR VMs manually even if tagged by SRM.
But when you try the first screen stops you saying they are tagged by SRM.
The advanced setting made no difference either way! So exactly the same thing as I was experiencing in 5.8.1. From what I can tell, everyone assumes that since they can see the Red Play button in VR that it’ll allow them to recover the VM regardless of its SRM status!
After discussing it with GS and with him speaking to the engineering team, it actually looks like the documentation is incorrect, and as such he has opened an internal case to get the description for the advanced setting changed.
As from what he has been told it is more to do with how SRM interacts with other 3rd party solutions something along the lines of “To allow SRM to recover VMs whose replications are managed by other solutions, check this box.”