Virtualizing Microsoft SharePoint 2010/2013 on vSphere 5 Best Practices

Let’s now talk about Microsoft SharePoint 2013. MS SharePoint is one of the most complex Microsoft products. It’s a multi-tier product, that each tier can be scaled individually. Quoting the following from Wikipedia.com, as a summary about Microsoft Sharepoint:

“SharePoint can provide intranet portals, document and file management, collaboration, social networks, extranets, websites, enterprise search, and business intelligence. It also has system integration, process integration, and workflow automation capabilities.”

MS SharePoint consists mainly of three tiers: Web Interface, Application Server and Database. Each of these tiers needs a defined level of performance, availability and scalability. vSphere 5.x can easily provide the required level of performance, availability and scalability due to its flexibility, its ability to host different types of workloads and the advanced features that vSphere has to provide the required level of availability and scalability, like: vSphere vMotion, vSphere HA and vSphere DRS.

I’ll follow the same schema of my previous posts and relate these best practices to our five Design Qualifiers (AMPRS – Availability, Manageability, Performance,Recoverability and Security) in addition to Scalability. For more information, check the References section below.

Availability:
1-) For Web Server Role, it’s recommended to deploy many Web Server VMs behind Load Balancers (HW/Virtual Aplliances) to provide load-balancing and high availability. For additional availability, leverage vSphere HA with VM Monitoring to restart any failed Web Server VMs on other hosts for better availability and min. downtime.

2-) For Application Server Role, it’s recommended to deploy many Application Server VMs to provide load-balancing and high availability. SharePoint 2013 Farm will automatically balance the users load between all Application Server VMs. For additional availability, leverage vSphere HA with VM/App Monitoring to restart any failed Application Server VMs on other hosts for better availability and min. downtime.

3-) For DB Servers, it’s recommended to use SQL Server native availability techniques combined with vSphere HA and VM/App Monitoring for max. availability. Some SQL native availability techniques can be used with all types of SharePoint 2013 Farm DBs while other techniques can’t be used with all types of SharePoint DBs. For example:

Availability Technique Configuration DB Central Administration DB Content DB
DB Mirroring – High Safety Mode Yes Yes Yes
DB Mirroring – High Performance Mode/ Log Shipping No Yes Yes
SQL 2012 AAG – Synchronous-commit Mode Yes Yes Yes
SQL 2012 AAG – Asynchronous-commit Mode No No Yes

For more information: Supported HA and DR options for SharePoint databases (SharePoint 2013).

4-) For Search Service Availability:
a- Deploy two or more Query Servers VMs, each with a main Index Partition and a mirror of the other partition- for load balancing and redundancy.
b- Deploy Two or more Crawl DB Servers VMs, each one holds a Crawl DB for a Crawl Server, for more redundancy and load balancing.
c- Protect Crawl DB with the suitable SQL native availability technique for higher availability if needed. For more information: Supported HA and DR options for SharePoint databases (SharePoint 2013).
d- Deploy two or more Crawl Servers VMs, each with two or more Crawler Services that each one of the Crawler Services connected to a different Crawl DB.
e- Separate each VM from each pair on different host, blade chassis and even different storage array using VM/Storage Anti-affinity Rules. Apply also vSphere VM/App monitoring for higher availability.
This provides highest level of redundancy and load balancing for large environments with Enterprise Search Service. For more information: Plan the topology for Enterprise Search (SharePoint Server 2010).

5-) As Microsoft supports vMotion of SQL VMs, use DRS Clusters in Fully Automated Mode. It’ll always load balance your SharePoint VMs across the cluster, respecting all of your configured affinity/anti-affinity rules. For SQL Servers in your SharePoint Farms, consider all best practices while deploying vMotion Network to support large SQL VMs migrations. Refer back to my blog post:Virtualizing Microsoft SQL Server 2012/2014 on vSphere Best Practices.

6-) For Web Server/Application Server VMs, use DRS VMs Anti-affinity rules to separate them over different hosts. When HA restart a VM, it’ll not respect the anti-affintiy rule, but on the following DRS invocation, the VM will be migrated to respect the rule. In vSphere 5.5, configure the vSphere Cluster with“das.respectVmVmAntiAffinityRules” set to 1 to respect all VMs Anti-affinity rules respectively.

7-) Try to leverage VM Monitoring to mitigate the risk of Guest OS failure. VMware Tools inside SharePoint VMs will send heartbeats to HA driver on the host. If it’s stopped because Guest OS failure, the host will monitor IO and network activity of the VM for certain period. If there’s also no activity, the host will restart the VM. This add additional layer of availability for SharePoint VMs. For more information, refer to: vSphere HA VM Monitoring – Back to Basics | VMware vSphere Blog – VMware Blogs.

😎 Try to leverage Symantec Application HA Agent for SharePoint with vSphere HA for max. availability. Using Application HA, the monitoring agent will monitor SQL instance and SharePoint services, sending heartbeats to HA driver on ESXi host. In case of application failure, it may restart services or mount databases or services. If Application HA Agent can’t recover the application from that failure, it’ll stop sending heartbeats and the host will initiate a VM restart as a HA action. For more information about Symantec Application HA, refer to this pdf.

Performance:
1-) As it’s so hard to give performance recommendations for green-field deployments of SharePoint 2013 farms, Microsoft has its own performance tests for some scenarios and recommendations based on these tests: Performance and Capacity test results and recommendations (SharePoint Server 2013).

2-) For Capacity Planning of a SharePoint 2013 Farm, it’s recommended to follow Microsoft recommendations as it’s really difficult to provide standard guidance:Capacity planning for SharePoint Server 2013.
You can also check Microsoft published case studies about different deployments scenarios with different capacities: Performance and capacity technical case studies (SharePoint Server 2010).

3-) You should have a good knowledge about SharePoint 2013 Farm’s DBs used and their performance characteristics before virtualizing SharePoint Farm. SharePoint 2013 supports only SQL Server 2008R2 or above. For more information about SharePoint 2013 DBs: Database types and descriptions (SharePoint 2013). For downloading a graphical poster, here.

4-) Follow Microsoft Best Practices for backend SQL Server: Best Practices for SQL Server in a SharePoint Server farm. Refer also to my blog post, Virtualizing Microsoft SQL Server 2012/2014 on vSphere Best Practices, for more  best practices in case you’ll virtualize also SQL DBs.

5-) Distributed Cache Application VMs should have their configured memory reserved. They heavily depend on their memory as a Cache for the entire SharePoint Farm, so they have not to participate in any memory reclamation techniques, like: ballooning.

6-) CPU Sizing:
a- Assign vCPUs as required and don’t over-allocate vCPUs to the VM to prevent CPU Scheduling issues at hypervisor level and high RDY time. This approach can be applied to the three roles in your SharePoint Farm: Web, Application and DB VMs. For Application and Web Server VMs, sometimes it’s easier and better to follow Scale-out approach by creating additional VMs to serve more load than Scale-up approach. Besides, vSphere DRS easily balance smaller VMs across the cluster that larger VMs. Generally speaking, better CPU utilization in you SharePoint Farm means higher throughput and lower latency.
b- Don’t over-commit CPUs. Ratio of Virtual: Physical Cores should be 2:1 max (better to keep it nearly 1:1) for mission-critical SharePoint VMs. In some cases like small environments, over-commit is allowed after establishing a performance baseline.
c- Enable Hyperthreading when available. It won’t double the processing power –in opposite to what shown on ESXi host as double number of logical cores- but it’ll give a CPU processing boost up to 20-25% in some cases. Don’t consider it when calculating Virtual: Physical Cores ratio.
d- ESXi Hypervisor is NUMA aware and it leverages the NUMA topology to gain a significant performance boost. Try to size your SharePoint VMs to fit inside single NUMA node to gain the performance boost of NUMA node locality.
e- On your SQL Server, Set “Maximum Degree of Parallelism” setting to 1 from SQL Server adv. properties to control how SQL Server divides incoming requests between VM vCPUs.
f- Generally speaking, SQL Server, Web Server and Application Server can start with 4 vCPUs then be scaled up, down or out according to your environment.

7-) Memory Sizing:
a- Don’t over-commit memory, as SharePoint 2013 is a memory-intensive application. If needed, reserve the configured memory to provide the required performance level (more memory equals more caching and better throughput and lower latency). Keep in mind that memory reservation affects as aspects, like: HA Slot Size, vMotion chances and time. In addition, reservation of memory removes VM swapfiles from datastores and hence, its space is usable for adding more VMs. For some cases, where a lot of underutilized SharePoint Servers there, over-commitment is allowed to get higher consolidation ratios. Performance monitoring is mandatory in this case to maintain a baseline of normal-state utilization.
b- Leverage Memory Hot-add feature to scale your VMs quickly. Keep in mind that some SharePoint servers, like: Distributed Cache, can’t use the added memory till a reboot.
c- For Web Server and Application Server, the min recommended memory is 8GB for small environments and scale-up to 16GB for large ones. Adding more users load on your Web Server or more applications on your Applications Servers will require adding more memory than the recommended.
d- For Content DB Memory Sizing:

Combined size of content databases RAM recommended for VM running SQL Server
Minimum for small production deployments 8GB
Minimum for medium production deployments 16 GB
Recommendation for up to 2 terabytes 32 GB
Recommendation for the range of 2 terabytes to 5 terabytes 64 GB
Recommendation for more than 5 terabytes >64 GB (estimated according to your DB size to provide enough cache to improve your SQL Server performance)

Keep in mind that, leveraging any SQL Availability techniques that create additional secondary copies of the DB, will require sizing the secondary SQL Server node with the same memory size to provide the same performance in case of failover. In case of using SQL 2012 AAG groups with readable secondaries, sizing secondary SQL Server node properly will improve Read operation performance.

😎 Storage Sizing:
a- Always consider any storage space overhead while calculating VMs space size required. Overhead can be: swapfiles, VMs logs or snapshots. It’s recommended to add 20-30% of space as an overhead.
b- All recommended best practices for deploying SQL Server Storage requirements must be applied when deploying SharePoint Farm backend DB Servers.
c- Separate different SharePoint VMs’ disks on different –dedicated if needed- datastores to avoid IOps contention, as SharePoint is an IO-intensive application with many components, each with different IOps requirements.
d- Provide at least 4 paths, through two HBAs, between each ESXi host and the Storage Array for max. availability.
e- RDM can be used in many cases, like: P2V migration or to leverage 3rd Party array-based backup tool. Choosing RDM disks or VMFS-based disks are based on your technical requirements. No performance difference between these two types of disks.
f- Partition Alignment gives a performance boost to your backend storage, as spindles will not make two reads or writes to process single request. VMFS5 created using vSphere (Web) Client will be aligned automatically as well as any disks formatted using newer versions of Windows. Any upgraded VMFS datastores or upgraded versions of Windows Guests will require a partitions alignment process. For upgraded VMFS, it’s done by migrate VMs disks to another datastore using Storage vMotion, then format and recreate the datastore on VMFS5.
g- Use Paravirtual SCSI Driver in all of your SharePoint VMs, specially disks used for DB and Logs, for max. performance, least latency and least CPU overhead.

9-) Network Sizing:
a- Your Application and Web Servers should have two vNICs, one for public communication with users and the other for backend communication with SQL DB VMs.
b- Use VMXNet3 vNIC in all SharePoint VMs for max. performance and throughput and least CPU overhead.
c- SharePoint VMs port group should have at least 2 physical NICs for redundancy and NIC teaming capabilities. Connect each physical NIC to a different physical switch for max. redundancy.
d- Consider network separation between different types of networks, like: vMotion, Management, SharePoint production, SharePoint backend communication, Fault Tolerance, etc. Network separation is either physical or logical using VLANs.
e- It’s better to dedicate a physical NIC on ESXi hosts for backend communication network between Application and Web VMs and SQL DB VMs.
f- Provided that your design will depend on creating multiple redundant instances of all SharePoint Roles, you can keep one Web, one Application and one backend DB Server VMs as a one unit on a single ESXi Host. This will make all their backend communications local on host’s memory which provides much more throughput than your network and much lower latency. Use DRS VMs Affinity rules to keep these VMs together. Create many units of the three VMs and distribute them on your ESXi hosts for higher availability. Use Host-VM Should Affinity rules to control which unit runs on which host.
g- For your DB Servers, dedicate physical NIC (two for redundancy and load balancing) on ESXi hosts hosting them for replication traffic between redundant instances to keep them in tight lockstep for better availability and better RPO.

10-) Monitoring:
Try to establish a performance baseline for your SQL VMs and VI by monitoring the following:
– ESXi Hosts and VMs counters:

Resource

Metric (esxtop/resxtop) Metric (vSphere Client) Description
CPU %USED Used CPU used over the collection interval (%)
%RDY Ready CPU time spent in ready state
%CSTP Co-Stop Percentage of time a vCPU spent in read, co-descheduled state. Only meaningful for SMP virtual machines.
%MLMTD Percentage of time a vCPU was ready to run but was deliberately not scheduled due to CPU limits.
%SYS System Percentage of time spent in the ESX/ESXi Server VMKernel
Memory Swapin,
Swapout
Swapinrate, Swapoutrate Memory ESX/ESXi host swaps in/out from/to disk (per virtual machine, or cumulative over host)
MCTLSZ (MB) vmmemctl Amount of memory reclaimed from resource pool by way of ballooning
Disk READs/s, WRITEs/s NumberRead, NumberWrite Reads and Writes issued in the collection interval
DAVG/cmd deviceLatency Average latency (ms) of the device (LUN)
KAVG/cmd KernelLatency Average latency (ms) in the VMkernel, also known as Queuing Time‖
Network MbRX/s, MbTX/s Received, Transmitted Amount of data received/transmitted per second
PKTRX/s, PKTTX/s PacketsRx, PacketsTx Received/Transmitted Packets per second
%DRPRX, %DRPTX DroppedRx, DroppedTx Receive/Transmit Dropped packets per second

– In-guest counters:
For all in-guest counters need to be monitored: Monitoring and maintaining SharePoint Server 2013.

Manageability:
1-) Microsoft support for SharePoint 2013 Virtualization: Virtualization support and licensing in SharePoint 2013.

2-) Try to leverage vApp feature in vSphere. It can be really helpful in packaging and exporting group of SharePoint VMs with certain reserved resources for development or testing.

3-) Use vCenter Operation Manager to monitor your environment performance trends, establish a dynamic baseline of your VMs performance to prevent false static alerts, estimate the capacity required for further scaling and proactively protect your environment against sudden peaks of VMs performance that need immediate scaling-up of resources. It’s really helpful in SharePoint Environments as they’re really dynamic environments.

4-) Use SharePoint Product Preparation Tool found on SharePoint media to install all prerequisites on your SharePoint Server.

5-) Install SharePoint Server binaries on all required Application and Web Server VMs before configuring any required configuration on any one of them to achieve configuration consistency and stable SharePoint farm.

6-) Time Synchronization is one of the most important things in SharePoint environments. It’s recommended to do the following:
a- Let all your SharePoint VMs sync their time with DC’s only, not with VMware Tools.
b- Disable time-sync between SharePoint VMs and Hosts using VMware Tools totally (Even after uncheck the box from VM settings page, VM can sync with the Host using VMware Tools in case of startup, resume, snapshotting, etc.) according to the following KB: VMware KB: Disabling Time Synchronization.
c- Sync all ESXi Hosts in the VI to the same Startum 1 NTP Server which is the same time source of your forest/domain.

7-) Make sure that you enable “Full Recovery Mode” on all SharePoint DBs that will be included in your SQL AAGs and also make sure that at least single Full Backup is taken.

Recoverability:
1-) Use VMware Site Recovery Manager (SRM) if available for DR. With SRM, automated failover to a replicated copy of the VMs in your DR site can be carried over in case of a disaster or even a failure of single VM in your SharePoint Farm.

2-) If VMware SRM isn’t available, you can leverage some availability features of SQL itself for more recoverability of your backend DB infrastructure. You can either use a mix between AAG Synchronous/Asynchronous replicas or Data Mirroring in High-safety Mode with Log Shipping in DR Site. This approach leads to lower cost, but with higher management overhead and higher RPO/RTO results than using VMware SRM.

3-) For the least protection, use warm clones of your VMs in the DR sites that ready to be powered up and deployed in case of disaster. This approach require consistent Backup/Restore cycle of your SharePoint Farm VMs. Fore more information about SharePoint Farm DR: Choose a DR strategy for SharePoint 2013.

4-) Try to leverage native backup techniques in SharePoint. For more information: Backup Solution in SharePoint 2013.

5-) Try to leverage any backup software that uses Microsoft Volume Shadow Service (VSS). These are SQL-aware and don’t cause any corruption in DB during the backup operation. In addition, they’re SharePoint-aware and uses VSS writers to backup any Application or Web Server without any interruption. Ofcourse, one of them is vSphere Advanced Data Protection. Check the following pdf.

Security:
1-) All security procedures done for securing physical Microsoft SharePoint Servers should be done in SharePoint VMs, like: Role-based Access Policy.

2-) Follow VMware Hardening Guide (v5.1/v5.5) for more security procedures to secure both of your VMs and vCenter Server.

Scalability:
1-) SharePoint 2013 Farm contains many SQL DBs that differ in their required scalability approaches according to the number allowed of each in the farm, their performance characteristics and the max. recommended size. Generally speaking, Configuration DB and Central administration DBs must be co-located and both will never grow beyond 1 GB. SharePoint 2013 Farm must have only one of each of them and hence, if you have a rare case to expand any of them, you should scale them up not out. Content DB will grow according to your deployment of your SharePoint 2013 Farm and can beyond 1 TB. It’s recommended to keep it below 200GB for max. performance. For more scalability, scale-out your Web Server and add another Content DB that should be kept also below 200GB and so on. For more information: DBs Types and Descriptions – SharePoint 2013.

2-) Microsoft released some topologies for different sizes of SharePoint environments with the required components. These can be a starting point for you to size your environment to achieve the required performance, scalability and availability levels. Check SharePoint 2010 Topologies and SharePoint 2013 Topologies.

3-) Leverage CPU/Memory Hot add with SharePoint VMs to scale them as needed. Some VMs , like SQL Server, may use added resources without a reboot when others, like: Distributed Cache Server, will need a reboot to use them.

4-) Scale-up Approach of SQL VMs requires a large ESXi Hosts with many sockets and RAM. It reduces the number of VMs required to serve certain number of DBs and hence, a single failed VM will affect a large portion of users. That’s why Scale-up Approach needs a careful attention to availability and usage of SQL AAGs in parallel with vSphere HA. In the same time it reduces the cost of software licenses and physical hosts. Scale-out Approach requires smaller ESXi Hosts and gives a more flexibility in designing a SQL VM, but requires high number of ESXi hosts to provide the required level of availability and more software licenses and hence, more cost. A single VM failure has a less effect using Scale-out Approach and it requires less time for migration using vMotion and hence, DRS will be more effective. There’s no best approach here. It all depends on your environment and your requirements.

5-) Try to leverage vSphere Templates in your environment. Create your Golden Template for every tier of your VMs. This reduces the time required for deploying or scaling your SharePoint environment as well as preserve consistency of configuration throughout your environment.

Long post again, but this post isn’t a summary to virtualize SharePoint 2013 environments, it’s only an introduction to many aspects and concerns that must be taken care of. A SharePoint Architect can be a great help dealing with some of the aspects mentioned here and in large-scale environments, you should have one sitting beside you helping in the designing sessions . Wish I could give a hand with this summary.

References:
** Virtualizing MS Business Critical Applications by Matt Liebowitz and Alex Fontana.
** MS SharePoint 2010 on VMware – Best Practices Guide.
** MS SharePoint 2010 on VMware – High Availability & Recovery Guide.
** MS SharePoint 2010 on VMware – Support/Licensing Guide.
** MS SharePoint 2010 Infrastructure Planning and Designing Kit.
** vSphere Design Sybex 2nd Edition by Scott Lowe, Kendrick Coleman and Forbes Guthrie

Be the first to comment

Leave a Reply