Wednesday, December 02, 2009

Top 10 VMware Consumers Report

If you have even a small VMware environment, you will begin to worry about resource consumption in your VM clusters. Specifically, I was concerned about 1 or more systems behaving poorly and stealing all the CPU, memory, disk, and network resources. So the idea of a top 10 report that uses the VCenter performance statistics came to mind.

Step 1: Find the Cluster ID

This assumes you have more than one cluster in your environment, so you will need to select from them. A simple SQL query will give you the name and ID of the various clusters.

SELECT ID, Name
FROM
vpxv_entity
WHERE Type_ID=3
ORDER
BY Name

Step 2: Find the top 10 consumers

Now that you know which cluster you want, you can put the ID in a variable named @ClusterID. The below report looks at the average CPU utilization for the last 7 days for all systems in your cluster and returns the top 10 consumers. This query performs a lot of work, and if your database isn't working optimally it will take a long time to complete, so be careful when you first run it.

SELECT TOP 10 v.VMID
FROM vpxv_VMs v (NOLOCK)
INNER
JOIN vpxv_entity e (NOLOCK) ON v.HostID=e.ID
INNER JOIN vpxv_entity_moid m (NOLOCK) ON m.EntityID=v.VMID
INNER JOIN dbo.VPXV_HIST_STAT_WEEKLY sd (NOLOCK) ON sd.ENTITY=m.MOID
WHERE e.type_id=1 AND e.Parent_ID=@ClusterID
AND
stat_name='usagemhz'
AND
STAT_ROLLUP_TYPE='average'
AND
sample_time > getdate()-7
GROUP
BY v.VMID, v.Name
ORDER BY sum(sd.stat_value) DESC

Step 3: Chart the top 10 consumers

Now that we know which cluster, and which systems are the top consumers, we can graph it. The below query embeds the query in Step 2 to limit its results, and then returns the name of the VM, sample time, and sample value for the last 7 days. Using SQL Server Reporting Services (SSRS), you can pipe this into a pretty graph and email it out automatically every week.

SELECT e.Name 'Host'
,
sd.stat_name, sd.sample_time, sd.stat_value
FROM vpxv_entity e (NOLOCK)
INNER
JOIN vpxv_entity_moid m (nolock) ON m.EntityID=e.ID
INNER JOIN dbo.VPXV_HIST_STAT_WEEKLY sd (NOLOCK) ON sd.ENTITY=m.MOID
where e.type_id=0
AND
e.ID in
(SELECT top 10 v.
VMID
FROM vpxv_VMs v
INNER JOIN vpxv_entity e ON v.HostID=e.ID
INNER JOIN vpxv_entity_moid m (nolock) ON m.EntityID=v.VMID
INNER JOIN dbo.VPXV_HIST_STAT_WEEKLY sd (NOLOCK) ON sd.ENTITY=m.MOID
WHERE e.type_id=1 AND e.Parent_ID=@ClusterID
AND
stat_name='usagemhz'
AND
STAT_ROLLUP_TYPE='average'
AND
sample_time > getdate()-7
GROUP
BY v.VMID, v.Name
ORDER BY sum(sd.stat_value) DESC)
AND
stat_name='usagemhz'
AND
STAT_ROLLUP_TYPE='average'
AND
sample_time > getdate()-7
ORDER
BY sample_time


Once the CPU graph is done, you can do the same for memory, disk, network, and many other data points. The monitors available to report on differ based on the monitoring level you have configured in VCenter, so if you don't see something you want, look if you can add it.

Below is a sample graph that came from one of my clusters. As you can see, there are 2 VMs consuming a majority of CPU resources, and for the most part the consumption appears flat across the entire week. This suggests that there is either a long-running job, or something is wrong with the VMs.



No comments: