Mem.MinFreePct and Memory Reclamation in vSphere 5

It feels good to be back..

Recently, Frank published a blog about new sliding scale based estimation for minimum free memory % in vSphere 5. An interesting read for anyone looking to estimate memory capacity for his/her vSphere based virtual infrastructure. My good friend YP Chien from Kingston ran some tests to understand the memory reclamation techniques [ballooning, compression and host-swapping] in vSphere 5. But, he noticed that the host free memory levels at which various memory reclamations kicked-in were quite different from what it should have been based on sliding scale logic mentioned in Frank’s blog. YP immediately brought this to my attention (Thanks YP!). I dug a bit on this and found the issue. Instead of commenting on Frank’s blog, I thought of offering a deeper explanation here:

I will use the same example that was used in Frank’s blog. Consider a server configured with 96GB of RAM. The MinFreePct threshold will be set at 1597.36MB based on a sliding scale shown in the following table:

Threshold       Range (MB)                    Reserved Free Memory (MB)

6%                  0 – 4091                 245.76

4%                  4092 – 12287           327.68

2%                  12288 – 28671         327.68

1%                  Remaining                696.32 (in this case)

Total Free Mem                                     1597.36

For the host considered in the above example, various memory reclamation techniques kick-in at different thresholds as explained below:

Free Memory State    Threshold (% of MinFree)    Threshold in MB    Reclamation  Type               

Soft to High                        64 to 100                  1022.31 – 1597.36         Balloon

Low to Hard                       16 to 64                    255.57 – 1022.31          Balloon, Compression and/or Swap

Please note:

  1. There is no separate reclamation target for Memory Compression. It uses ‘Swap Target’ to reclaim memory.
  2. The choice of using memory compression [when enabled] or host-swap is dynamic. vSphere tries to use memory compression, but if it cannot reclaim enough memory soon it will resort to host swapping.
  3. Decrease in memory pressure doesn’t mean that the respective reclamation targets are set to zero immediately. vSphere constantly monitors the memory pressure in the host and gradually reduces a reclamation target if it finds memory pressure to have reduced. On the other hand, memory states could change as soon as the memory pressure in the host changes. Hence, it is possible for you to see some memory reclamation (balloon or swap) for extended time till the respective reclamation targets become zero even after the memory states indicate no or reduced memory pressure.

Hope this helps you understand when a specific type of memory reclamation kicks-in and why you would see it even when you don’t expect to see it. Feel free to throw in your comments or questions 🙂

Storage IO Control and Storage vMotion?

My colleague Duncan posted an article on yellow-bricks regarding storage vMotion (sVMotion) of a virtual disk placed in a storage IO control (SIOC) enabled datastore. I thought of providing some more information on this topic..

Yes, sVMotion will be treated as a regular stream of I/O requests coming from a particular VM to a vmdk that is placed on a SIOC enabled datastore. If the datastore wide I/O latency exceeds the congestion threshold of the datastore, SIOC kicks in and adjusts the device queue in the host according to the aggregate disk shares of all the VMs on the host that share the datastore. Within a particular host, I/O requests of each VM is given preferential priority based on VM’s disk shares. The I/O requests can be from an application needing data or from ESX requesting  sVMotion of the vmdk on a non-VAAI compatible storage.

How does SIOC treats sVMotion’s I/O traffic? When SIOC is active, a VM is allowed to have a certain number of concurrent I/O requests queued in the host for the SIOC enabled datastore. If sVMotion is initiated on the VM when it was actively issuing I/O requests, the VM’s quota of concurrent I/O requests will be shared by both sVMotion traffic and the other I/O traffic from the VM to the datastore. If the VM is sparsely issuing I/O requests to the datastore, then its quota of concurrent requests will be dominated by sVMotion traffic.

Note in both cases, the total concurrent I/O requests (sVMotion + other I/O traffic) is limited to a value proportional to the disk shares of the VM on the datastore.

What happens when the storage is VAAI compatible? ESX issues the sVMotion command to the storage. The storage initiates sVMotion on behalf of ESX. ESX doesn’t even see the sVMotion traffic. In this case, the VM is free to use its full quota of concurrent I/O requests.

You will not be able to see the exact number of each I/O request types in the device queue. Good news is, that you don’t have to worry about them. SIOC is capable of  handling these varying traffic conditions for you. If you feel geeky, and really want to get into this, your best bet will be monitoring the difference in sVMotion’s completion time under different load conditions of your VM. But know this – irrespective of the VM’s load situation, SIOC will not let sVMotion affect the I/O traffic on the datastore from any other VM. Though, the response time of I/O operations in the VM on which sVMotion was initiated will be affected by sVMotion.

What if the datastore is not congested? SIOC lets sVMotion use its full quota of bandwidth until datastore becomes congested (datastore wide latency > congestion threshold). Then SIOC does what it is designed to do.

Here is a question for you – If you have to sVMotion a vmdk on a SIOC enabled datastore when do you do it? 😉

How cool is vscsiStats? Part-II

I enabled vscistats collection in my vSphere host before starting the purge2 operation (check my white paper for more details) in the vCenter database . While the operation ran, I collected 20 samples of vscsiStats output at equal intervals (each interval was 7.5 seconds). vscsiStats output consists of histograms of various metrics – outstanding IOs, seek distance, length of a request, arrival time, all split between reads and writes. To obtain the histogram of a given metric at a particular time instant, I divided the difference in histogram values of the metric collected at successive time intervals by the sampling interval.

Example:

Outstanding Read IOs (=1) at time t(x) = (Outstanding Read IO (=1) until time t(x) – Outstanding Read IO (=1) until time t(x-1))/Sampling interval

Outstanding Read IOs (=2) at time t(x) = (Outstanding Read IO (=2) until time t(x) – Outstanding Read IO (=2) until time t(x-1))/Sampling interval

:

:

Outstanding Read IOs (>64) at time t(x) = (Outstanding Read IO (>64) until time t(x) – Outstanding Read IO (>64) until time t(x-1))/Sampling interval

NOTE: If you are thinking that the above steps are very cumbersome, I agree with you. I have an excel template which does it for me. All I need is vscsiStats output of 20 consecutive samples all saved in a single excel file. Irfan, in his virtual scoop blog, has provided few links to some neat blogs on visualizing vscsiStats. Check his blog.

I followed the above steps for the following histograms in vcscsiStats output.

Outstanding IOs:

Figure 1. Outstanding IOs during purge2 operation.


The graphs in figure 1 show the outstanding IOs during purge2 operation.  The number of read outstanding IOs was 64 (tidbit: pvscsi driver installed in the guest operating system has a default queue length of 64. During purge2 operation, the I/O queue in the pvscsi driver was full with read requests. Hence the number of outstanding read IO requests coming from the VM was 64) whereas the number of write outstanding IOs was zero.

Request Type: Graphs in figure 1 also show that the purge2 operation consisted of only read requests.

NOTE: Since purge2 operation is completely dominated by reads, for the remaining vscsistats histograms I only considered the respective read histograms.

Randomness: To identify the randomness of the purge2 operation I looked at the ‘seek’ histogram in the vscsiStats output.

Figure 2. Seek distance between read requests


The seek read distance histogram shows the distance between consecutive read requests in terms of logical blocks. A seek distance of 1 logical block between consecutive requests indicate a purely sequential workload. A seek distance of < 10 logical blocks indicate a quasi-sequential workload. A seek distance of 10+ logical blocks indicate a random workload. In this case, the seek distance between successive read requests was 500,000+ logical blocks, indicating a pure random read access pattern.

Size of an I/O Read: The last parameter I needed was the size of an I/O read request during the purge2 operation. This was provided by the ‘ioLengthReads’ histogram.

Figure 3. Size of Read Requests during purge2 operation

The size of the read requests seen during purge2 operation varied from 16KB to 64KB with some requests  as large as 128KB. The variation in I/O size indicates some kind of optimization employed during reads to fetch as much data as possible in one read operation.

Arrival Time for Reads: Another interesting histogram provided by vscsiStats (that is not required to create an IOmeter workload profile, but interesting) is the arrival time for the I/O requests (in this case for reads).

Figure 4. Arrival Time for Reads during purge2 operation

An arrival time of ≤100 microseconds  indicates that purge2 operation was very I/O intensive (also evidenced by 64 outstanding read requests throughout the operation).

With the information I collected from vscsiStats, I created a workload in IOmeter with the following paramters:

  • Outstanding IOs: 64
  • Access Type: 100%Read, 100%Random
  • Request size: 48KB (median of 16KB, 32KB, 48KB, 64KB, 128KB)

Rest, you will know when you read the white paper 😉

Next time, you get into troubleshooting I/O problems or planning storage resource for your vSphere environment, remember that you have the secret sauce at your finger tips. Surprise your storage admins by speaking in a language they understand – outstanding IOs, request size, access pattern and more..

Isn’t vscsiStats cool?

How cool is vscsiStats? Part-I

It has been few days since I published a white paper on the performance characterization of SQL server-based vCenter database. Few people have asked me questions about the vscsiStats graphs in the appendix of the paper. Instead of answering the questions individually I decided to blog here for the benefit of all the readers.

As mentioned in the paper, I observed this rather unusual (yes, I say unusual because I didn’t expect performance of virtual I/O stack to be better than that of native) during some of the experiments. Using the vCenter application to reproduce this behavior was rather complex and involved too many variables. Hence I decided to use a simple I/O benchmark, which most of you are familiar – IOmeter (http://www.iometer.org/). But to reproduce the issue, I needed to use the exact same I/O load as that produced by the stored procedures of the vCenter database. To create a custom workload profile in IOmeter I was required to configure outstanding IOs, I/O request size, read percentage and percentage of randomness (at least). Question was – how to get these parameters from the workload?

vscsiStats provided the answer. Scott Drummonds (during his VMware days) wrote a great blog on vscsiStats. I highly encourage the readers to go through the article and understand the basics of vscsiStats (if you are not already familiar with the tool).  This will help you appreciate the content of this multi-part blog series. Instead of dwelling on the details of vscsiStats, I will illustrate the usefulness of vscsiStats here.

First, a quick description of a sample histogram output from vscsiStats. All the histograms have similar format and should be straightforward to understand.

If you are curious about this tool and want to learn more, check out these technical literatures:

  1. Storage Workload Characterization and Consolidation in Virtualized Enviornments” – Ajay Gulati, Chethan Kumar, and Irfan Ahmad presented at VPACT 09 (Yes, I was one of the authors)
  2. vscsiStats: Fast and Easy Disk Workload Characterization on VMware ESX Server” – Presentation by Irfan Ahmad at VMworld 2007 (excellent presentation by one of the creators of this tool)

Up next: The histograms I collected …

Running Virtual Center Database in a Virtual Machine

I just completed an interesting project. For years, we at VMware believed that SQL server databases run well when virtualized. We have illustrated this through several benchmark studies published as white papers. It was time for us to look at real applications. One such application that can be found in most of the vSphere based virtual environments is the database component of the vCenter server (the brain behind a vSphere environment). Using the vCenter database as the application and the resource intensive tasks of the vCenter databases (implemented as stored procedures in SQL server-based databases) as the load generator, I compared the performance of these resource intensive tasks in a virtual machine (in a vSphere 4.1 host) to that in a native server.

From a sheer size perspective, the database doesn’t pop out many eyes. But from the criticality standpoint it can give sleepless nights. The database though ~93GB in size, included an inventory that in reality can represent a very large vSphere based virtual datacenter. With 500 hosts and 8000 virtual machines, performance of this vCenter database becomes extremely important. The tasks whose performance I studied are among the most common operations that can affect the performance of a vCenter database. These operations included:

  1. Rolling up performance statistics
  2. Calculating top resource consumers
  3. Purging old performance statistics

I won’t go into the details of these operations. Refer to the white paper for an explanation. Though I expected the performance of the virtualized database to be very close that of the native database, I didn’t expect it to be this close. Both in terms of CPU utilization and I/O performance, the virtual machine was very close to native (in some cases better than native!). I also observed an unusual behavior of native I/O stack during my experiments (I will blog about it next time, but for now check the appendix of the paper).

Here, this is for you VI admins – if you are thinking of virtualizing vCenter database (why not? you can virtualize anything and everything these days ;-)) here is a study that should give you the confidence to take the next step. If you have any comments to share with me or rest of the community, interesting facts, or tough performance issues, feel free to drop a comment or two here.

Oh!, BTW here is the link to the white paper.

Previous studies:

  1. http://www.vmware.com/files/pdf/perf_vsphere_sql_scalability.pdf
  2. http://www.vmware.com/pdf/SQL_Server_consolidation.pdf
  3. http://blogs.vmware.com/performance/2009/07/summary——–vmware-distributed-resource-scheduler-drs-dynamically–allocates-and-balances-computing-resources-in-a-clust.html
  4. http://communities.vmware.com/blogs/chethank/tags/performance
%d bloggers like this: