Wednesday, April 22, 2009

VMware/NFS performance with the Celerra NS-G8

Further testing with the Celerra raised the question of performance - can things be tweaked to improve performance and how does IP storage compare with FC? The scenario I am looking at is how VMware uses the NFS export as a datastore (something that isnt officially supported, yet a wizard exists to configure it).

Our Celerra will eventually be configured with a 10G Ethernet link into a Xsigo I/O Director - if your not familiar with Xsigo, check them out as there is some impressive technology there. We are waiting on parts to show up, so the current perf testing will use 2 1G Ethernet links in an Etherchannel configuration.

The test plan is:
  1. Create VMDKs using FC, Celerra with FC disks, and Celerra with SATA disks
  2. Use IOMETER to test the various "baseline" configuration
  3. Tweak the VMware/Celerra configuration according to EMC best practices and benchmark again
  4. Once the 10G modules arrive, repeat the tests
Storage Configuration:
Our Clariion is a CX4-240 with 1 tray of 1TB SATA disks, and 6 trays of 300GB FC disks. The SATA disks are configured into (2) 6+1 RAID5 arrays, with (4) 1.6TB LUNs each. The FC disks are configured into (3) 8+1 RAID5 arrays, with (2) 1.2TB LUNs each. The remaining FC disks are unused or configured for Celerra root file system. A temporary FC MetaLUN was also created with the same configuration as the existin FC exports for baseline testing

The SATA LUNs are combined on the Celerra to create 1 Storage Pool, and the FC LUNs are combined on the Celerra to create 1 Storage Pool. These pools are then exported to via NFS VMware over the 2GB Etherchannel links.

Once the baseline performance was captured using the default profiles in IOMETER, a whitepaper from EMC called "VMware ESX Server Optimization with EMC Celerra Performance Study Technical Note" (you may need a powerlink account to access this) was used to tweak NFS performance settings. According to their performance tests:
Based on the results of this study, the following recommendations should be considered when using VMware in a Celerra environment:
􀂊
Recommendation #1: Use the uncached option on Celerra file systems provisioned to the ESX Server as NAS datastore. This can improve the overall performance of the virtual machines. This should be considered particularly with random and mixed workloads that have write I/O.
􀂊
Recommendation #2: When using random and mixed workloads without replication, consider disabling prefetch on the Celerra file system. This can improve the overall performance of virtual machines provisioned with Celerra NFS storage.
􀂊
Recommendation #3: Align virtual machines that are provisioned with Celerra iSCSI storage as it can improve the overall performance of the virtual machines.
􀂊
Recommendation #4: Do not align virtual machines that are provisioned with Celerra NFS storage because this may degrade the overall performance of the virtual machines.
􀂊
Recommendation #5: Consider using an iSCSI HBA on the ESX Server with sequential workloads with Celerra iSCSI storage. This can improve the overall performance of virtual machines with such workloads.

This added 2 more scenarios into the testing mix: setting the uncached write option, and setting the prefetch read option. Configuring the uncached write option is done with the following command:
server_mount ALL -option rw,uncached
Configuring the prefetch read option is done with the following command:
server_mount ALL -option rw,noprefetch



MB/Second result
IOPs result

Conclusion:
While 2GB Ethernet does not match the throughput and performance of 4GB FC, it is close. I greatly expect that if this testing was done with 4GB Ethernet, the Ethernet/FC differences would be minimal, and once out 10G modules are installed that the Ethernet will outperform FC.
Additionally, at 2GB Ethernet speed, there is little difference between SATA disks and FC disks. This suggests that the bottlenec is the transport mechanism, and that more differences will be identified once the bandwidth is increased.
Lastly, there is little visible difference between the baseline configuration and the EMC best practice configuration. Some of this may be due to the workload profiles being mostly sequential instead of random, but it does suggest that the out-of-the-box configuration is fairly optimized.

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.