ESXTOP overview

Remember the best tool for Performance Troubleshooting is Esxtop

  1. ESXTOP Metrics overview

Changing views is easy, type the following keys for the associated views:

Changing views is easy, type the following keys for the associated views:

c = cpu
m = memory
n = network
i = interrupts
d = disk adapter
u = disk device
v = disk VM
p = power mgmt
x = vsan

V = only show virtual machine worlds
e = Expand/Rollup CPU statistics, show details of all worlds associated with group (GID)
k = kill world, for tech support purposes only!
l = limit display to a single group (GID), enables you to focus on one VM
# = limiting the number of entitites, for instance the top 5

2 = highlight a row, moving down
8 = highlight a row, moving up
4 = remove selected row from view
e = statistics broken down per world
6 = statistics broken down per world

CPU

Once in the esxtop screen press ‘c’ to display CPU statistics.

CPU load average relates to the average CPU usage for the ESXi host over the last 1, 5 and 15 minutes. A load average of 0.5 suggests CPU is half utilised, 1.0 suggests CPU is fully utilised and a value above this would mean the ESXi host is using more physical CPUs than currently available.

%RDY column shows the percentage of time spent waiting for the CPU scheduler. Consider that 1% is roughly 200 milliseconds and 100% is roughly 20,000 milliseconds. Therefore a value between 5% and 10% or higher (or 1,000 to 2,000 milliseconds) could potentially be a cause for concern.

%USED amount of time spent executing CPU core cycles by the virtual machine. A substantially higher value on one virtual machine compared with others could mean it is the cause of performance issues on the host.

%SYS shows the percentage of time spent performing system activities on behalf of the world. A value of 10% to 20% or higher could be a symptom of a high IO virtual machine.

 %CSTP is a value for virtual machines with multiple vCPUs, and shows the time spent waiting for one or more of those virtual CPUs to become ready. If this is above 3% it generally means the number of vCPUs should be decreased.

%MLMTD percentage of time a ready to run vCPU was not scheduled due to a CPU limit setting. If this value is above 0 then the limit should be removed to improve performance.

%SWPWT relates to the time spent waiting for swapped pages to be read from disk. If the value exceeds 5 you could potentially have an issue with memory over-commitment.

 

 

MEMORY

Once in the esxtop screen press ‘m’ to display memory statistics.

MEM overcommit avg shows the average memory overcommit for the last 1, 5 and 15 minutes.

MCTLSZ amount of guest physical memory in MB that the ESXi host is reclaiming by inflating the balloon driver. This occurs when the host is over committed and does not have enough available physical memory.

SWCUR amount of memory in MB swapped by the VMKernel, again any value over 0 is another symptom of memory over-commitment.

SWR/s and SWW/s shows the rate at which the host is reading or writing to swapped memory, once again any value over 0 indicates possible memory over-commitment.

CACHEUSD shows the amount of memory in MB that has been compressed by the ESXi host. Compression occurs when the host is over-committed on memory so this should not be above 0.

ZIP/s and UNZIP/s indicates the host is actively compressing memory and accessing compressed memory respectively. Values larger than 0 imply the host is over-committed on memory.

 

 

 

NETWORK

Once in the esxtop screen press ‘n’ to display network statistics.

For network you can look at the %DRPTX and %DRPRX columns. These represent the dropped packets transmitted and dropped packets received respectively. Values above 0 here could signify high network utilisation.

The USED-BY and TEAM-PNIC columns display a list of the virtual machines on the host and the vmnic that it is using.

Outside of esxtop, but also useful, back at the command line use ‘esxcli network nic list’ to list the available network adapters. To see the stats such as packets/bytes transmitted/received and dropped enter ‘esxcli network nic stats get –n vmnic0’ changing the vmnic as appropriate.

 

You can also check VMware article

 

2. How to capture the results

  • esxtop -b -d 2 -n 100 > esxtopcapture.csv
    Where “-b” stands for batch mode. “d 2” is a delay of 2 seconds and “-n 100” is 100 interactions. In this specific case. If you want to record all metrics make sure to add “-a” to your string.
  • esxtop -b -a -d 2 -n 100 | gzip -9c > esxtopoutput.csv.gz
    you can also direct zipping output as well

 

 

3. How to Analyze the Capture

  • VisualEsxTop  link do download
  • perfmon
  • exel
  • esxplot