Tuesday, April 21, 2009

Celerra NFS and VMware testing

We just recieved a EMC Celerra NS-G8 and it is my job to implement NFS serving VMware. Beyond the standard "get away from vmfs", there were a few features that peaked my interest: thin provisioning and deduplication.

Thin provisioning was a big letdown. If you are using NFS, you have some degree of thin provisioning by default. Additionally, most any VMware function that touches the disks (StorageVMotion, cloning, deploy from template) will bloat the vmdk to full size. I did find a way to thin out the VMs (I called it a treadmill process), but its not seemless and requires hours of downtime. I still have this feature enabled, but dont expect great things from it.

Deduplication was a bit of a surprise to me since I didnt think this feature was available until I got the system. My previous experience with deduplication was with EMC Avamar, which is block-level deduplication that allows for over 90% deduplication rates. Celerra deduplication however is file-level, meaning only fully duplicate files are freed up.
I have worked with Exchange and Windows Single-Instance-Storage before, so this is a great item for file servers where the same file may exist dozens or hundreds of times, but no 2 VMDKs are ever going to be alike.

Celerra Deduplication however also does compression, something that may be very useful if it can compress the zero blocks in a VMDK. To test this I created a "fat" vmdk and copied it to a NFS datastore, then initiated the dedupe process and identified the size differences.


Step 1: Create the bloated VMDK
The first thing needed is to create a bloated/fat/inflated disk to test against
  1. SSH into the VMware host
  2. CD to /vmfs/volumes/
  3. Create the disk vmkfstools -c 50G foo.vmdk -d eagerzeroedthick

The size of the disk can be confirmed by executing ls -l, and by viewing it in the Datastore Browser, make sure both locations list it as a full 50G in size (to ensure that thin provisioning isnt effecting us)


Step 2: Change the dedupe parameters
By default, deduplication is limited to files that meet the following requirements:
  • Havn't been accessed in 30 days
  • Havn't been modifed in 60 days
  • Is larger than 24kb
  • Is smaller than 200MB

To test the dedupe process, we need to change these using the server_param command. To see the current settings, ssh into the Celerra and run server_param server_2 -facility dedupe -list. This will list all the deduplication settings, the settings can then be changed by running server_param server_2 -facility dedupe -modify <attribute> -value <value>. In my case I need to reset the access and modified times to 0, and maximum size to 1000.


Step 3: Initiate the dedup process
Every time a filesystem is configured for deduplication, the dedupe process is triggered - meaning we can start a dedupe job manually by telling the filesystem to enable deduplication (if that makes sense). There are 2 ways we can do this - via the web console, or via the command line.

To kick off a dedupe job via the web console, browse to the File Systems node and open the properties for the target file system. In the File System Properties page, set Deduplication = Suspended and click Apply. The set Deduplication = On and click Apply. As soon as dedupe is set to on, a dedupe job will be initiated.

To kick off a dedupe job via command line, ssh into the Celerra and run fs_dedupe -modify -state on. This will automatically start a deduplication job. To view the status of the job, run fs_dedupe -info

Step 4: Compare the results
Initiating a dedupe on a file system with only VMDKs ultimatly results in 0 gain. Even with the disks being completely blank, compression doesnt seem to come into play - meaning a big waste of time in testing it.


Additional testing of dedupe with other files (ISOs, install files, home folders, etc...), show that dedupe works properly on the file level, but not for VMDKs.

2 comments:

Anonymous said...

Try setting the the maximum size to 51200 or larger to process a 50GB file.

Anonymous said...

The problem with Celerra de-duplication/compression is that even if your VMDK was optimized by it - if you try to start the VMDK afterwards and cause any change in the VMDK file - the whole VMDK will have to be decompressed to its full, un-optimized form before you can do any I/O to the file.. Celerra white-paper states 8MB/sec decompression..which means - forever - in typical VMDK sizes. http://www.emc.com/collateral/hardware/white-papers/h6265-achieving-storage-efficiency-celerra-wp.pdf

Celerra de-duplication/compression is for completely inactive data. changing Celerra optimized data severely impacts file access performance.

Did you test Storwize - looks like they can provide what you were looking for in this test..