Many Cores per Socket or Single-Core Socket Mystery

Hi All

During some readings, I remembered the eternal debatable Question when creating SMP VM (VM with many vCPUs):
“Which is better: many Cores in a single socket or many Sockets each with single core..??”
I remember how many times I debated for hours with my technical manager -while reviewing some designs- about the same question, but neither of us could prove a bit.

To answer this question, we have to review some concepts:
NUMA CPU Configuration: Non Uniform Memory Access (NUMA) is a CPUs configuration, in which each CPU has some Memory DIMMs local and connected to it. Each CPU can access both its local memory DIMMs with lowest latency and the remote DIMMs with higher latency using Interconnecting Bus.
Many vExperts, and VMware itself, talked about this when if first came up supported in vSphere 3.5 (If I remember correctly!!) and how vSphere Platform is using NUMA Configuration to support SMP VMs. The best articles you can read about it and how it was supported are Frank Denneman’s articles:
Sizing VMs and NUMA nodes –
ESXi 4.1 NUMA Scheduling –
Node Interleaving: Enable or Disable? –

vNUMA CPU Configuration: vNUMA is Virtual NUMA. On a NUMA vSphere Host, when hosting large VM (9+ vCPUs), the host can expose NUMA configuration to the VM to gain additional performance boost. It was introduced with vSphere 5.0 and to gain this boost, VM hardware level must be 8+ and Guest OS and Guest Applications must be NUMA-aware.

Now let’s quote the phrase from vSphere 5.5 ePubs that gave me a key to solve the mystery

“You can affect the virtual NUMA topology with two settings in the vSphere Web Client: number of virtual sockets and number of cores per socket for a virtual machine. If the number of cores per socket (cpuid.coresPerSocket) is greater than one, and the number of virtual cores in the virtual machine is greater than 8, the virtual NUMA node size matches the virtual socket size. If the number of cores per socket is less than or equal to one, virtual NUMA nodes are created to match the topology of the first physical host where the virtual machine is powered on.”

Looking for those Advanced Settings, and I found those:
cupid.coresPerSocket & numa.vcpu.maxPerVirtualNode.

Referring to them here:

[table id=2 /]

So, the first one is the number of cores per single vCPU socket, and the other one is controlling some behavior (Not clear, ha..??! )

Googling and searching about these two options lead me to this one of  VMware Blogs -by Mark Achtemichuk- about how number of cores per socket affects the performance and it solved all:
Does corespersocket Affect Performance? | VMware vSphere Blog – VMware Blogs.

According to the article, controlling number of cores per socket in VMs was introduced by VMware for just Licensing issues on Guest VMs on pre-vSphere 5.0 and later versions. When vNUMA was introduced in vSphere 5.0, another player came in the ground. Quoting the following:

“#1 When creating a virtual machine, by default, vSphere will create as many virtual sockets as you’ve requested vCPUs and the cores per socket is equal to one. I think of this configuration as “wide” and “flat.” This will enable vNUMA to select and present the best virtual NUMA topology to the guest operating system, which will be optimal on the underlying physical topology.”

“#2 When you must change the cores per socket though, commonly due to licensing constraints, ensure you mirror physical server’s NUMA topology. This is because when a virtual machine is no longer configured by default as “wide” and “flat,” vNUMA will not automatically pick the best NUMA configuration based on the physical server, but will instead honor your configuration – right or wrong – potentially leading to a topology mismatch that does affect performance.”

Now let’s put all the pieces together:
1-) When Creating any VM: By default, vSphere chooses to make by default the number of cores per vCPU socket is 1 (cupid.coresPerSocket=1 by default). In this case, and when the VM has more than 8 vCPUs (more than 8 virtual sockets each with single virtual core), Virtual NUMA topology is created by default for this VM that comply with underlying physical NUMA topology of the first host where the VM is created, i.e. the vCPU configuration is modified automatically to comply with underlying physical NUMA of the first host where the VM is created. The max. number of cores in single virtual NUMA node -created by automatic Virtual NUMA which will equal the number of physical cores per NUMA node- is controlled by“numa.vcpu.maxPerVirtualNode” which is set by default to 8 (has to be changed if the physical NUMA node is greater than 8 physical cores).
2-) To enable Virtual NUMA manually on a large VM: you can set manually the number of cores per virtual socket (cupid.coresPerSocket>1). In this case, Virtual NUMA configuration will be set manually as configured and it will ignore the underlying physical NUMA configuration. “numa.vcpu.maxPerVirtualNode” has no effect in this case (Check the last test in VMware blog where he set Virtual NUMA to single socket with 24 cores).
3-) To enable Virtual NUMA manually on a small VM with certain Virtual NUMA Size: you have to set “numa.vcpu.min” to less than 9 first. Then, you can either set “numa.vcpu.maxPerVirtualNode” to the required number to set the Virtual NUMA required size while setting “cupid.coresPerSocket” to 1 or set the number of virtual sockets and virtual cores manually to set the required Virtual NUMA topology (which have to comply at least with the number of nodes in underlying physical NUMA topology).

Now, we can understand importance and necessity of setting the number of Virtual Sockets and Virtual Cores correctly in any VM. We can also now answer the debatable Question:
“Which is better: many Cores in a single socket or many Sockets each with single core..??”

The answer is: It depends. If you understand exactly the underlying Physical NUMA topology, you can set many cores per socket in the VM to use Virtual NUMA benefits or take the easiest approach and set the number of cores to 1.

3 Comments on Many Cores per Socket or Single-Core Socket Mystery

    • Unfortunately not. But I think that, according to what VMware says, it’ll only depend on the application running inside the VM. If it’s vNUMA-aware, it’ll have a better performance as it’ll leverage vNUMA capabilities, else there won’t be any performance difference if the application isn’t a vNUMA-aware.

Leave a Reply