So, I have been deploying a new 6.5 VCSA as part of my plan to move away from the current 5.5 deployments I am working with (a post for another time). During testing of this new deployment, I had a user come to me with a strange error.
When they changed their AD password, they lost all access to the VCSA, but everything else for them worked fine, they could log in to anything else using their AD account.
I could log into the VCSA fine and other users could still log in fine too. We cleared the web cache among other things, but to no joy. I couldn’t really see anything that was wrong. A few web posts mentioned that using /sbin/pam_tally would do the trick. I tried this but it could not even get it to show any domain account never mind reset anything!
So I then raised a ticket with VMware support, as I wanted to get to the root cause of this issue. Support was very good and on point!
We worked on various things:
– checked if VCSA was joined to AD
– checked the ID source
– unable to extract SSO users from the domain (“Error while extracting local SSO users”) when trying to add domain users
– lookup was not working
– removed the ID source and added back but no change in the situation
– removed AD on the VCSA level and rebooted (from the command line)
– when the services restarted, we re-added AD on the VCSA and rebooted
– still saw the same error
– tested using LDAP instead. This worked, indicating an issue at the back end in AD.
– I planned to carry out checks at my end to see if the AD issue can be addressed
Wwhen using Integrated Authentication, we could not pull any users for the main domain, but could still pull users for a couple of associated domains. When we tried LDAP and pointed it directly to 2 DCs we could access the main domain fine again. So I agreed with VMware support that it was something on our end that needed further investigation. I sat down with the resident DNS/AD guy in the office and we worked through a few things trying to pinpoint what was wrong.
Now that I had come across the “Error while extracting local SSO users” error, I could do some more digging.
We came across this post which mentioned that there could be issues with PTR records:
http://vninja.net/virtualization/vcenter-sso-unable-to-retrieve-ad-information-error-while-extracting-local-sso-users/
We thought this was unlikely as we would have been seeing all sorts of other issues, but you will see….how wrong we were!
We did some digging using the dig command on the appliance and we noticed something was off:
root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search dc003.prod.com dc003.prod.com. 3600 IN A 10.x.x.10 root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search dc004.prod.com dc004.prod.com. 3600 IN A 10.x.x.11 root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search dc002.prod.com dc002.prod.com. 2268 IN A 10.x.x.2 root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search dc001.prod.com dc001.prod.com. 2265 IN A 10.1.x.2 root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search -x 10.x.10 10.x.x.10.in-addr.arpa. 3600 IN PTR dc003.prodcom. root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search -x 10.x.x.11 11.x.x.10.in-addr.arpa. 3600 IN PTR dc004.prod.com. root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search -x 10.x.x.2 2.x.x.10.in-addr.arpa. 3105 IN PTR ntp.prod.com. root@vrmvmwvcsa [ ~ ]# dig +noall +answer +search -x 10.1.x.2 2.x.1.10.in-addr.arpa. 3600 IN PTR ntp.prod.com.
As you can see the last 2 reverse lookups point to ntp and not to dc001/002. It appeared when someone was creating DNS entries for the NTP configuration, they overwrote the reverse PTRs for those 2 older DCs by accident.
Once I set them back to point to the correct DCs and rebooted the appliance, everything started working again.
I find it interesting that this issue would have happened a while ago, but since these newer DCs are only starting to be used we have never seen it be an issue. Also that initially, everything worked correctly. I don’t know the exact mechanism the VCSA uses to pick its DCs to communicate with, but once the reverse PTRs were set the issue went away. When we set up LDAP, we pointed it directly to the new DCs dc003/004 skipping any need for it to go to dc001/002.
The reason I and a few other users could keep logging in was simply that of the VCSA caches things locally, but since that user changed his password it caused the issue to present itself in that strange way.
“Error while extracting local SSO users”
SO ALAS IT WAS DNS
Leave a Reply