One of the common discussions I have when work with a customer on their Operations Manager (SCOM) environment is how the data aggregation works and how that impacts on the database size. There are plenty of blogs around how to review the data retention periods, my favourite is:
http://blogs.technet.com/b/kevinholman/archive/2010/01/05/understanding-and-modifying-data-warehouse-retention-and-grooming.aspx
So now that you have read that blog and know that the defaults are a bad idea. How do we actually determine the data size that each aggregate dataset takes up?
The typical answer is to use the dwdatarp.exe tool (available from http://blogs.technet.com/b/momteam/archive/2008/05/14/data-warehouse-data-retention-policy-dwdatarp-exe.aspx) however some organizations may have change and security restrictions which prevent the use of such tools. Alternatively people may just be more comfortable with seeing exactly what data is being accessed..... That is the scenario I was faced with this week so I have come up with the below SQL query which will give you the output of each aggregate dataset, the total size, and the percentage of the total database size.
--Which DataSet is usin the most database space (alternative to using dwdatarp.exe USE OperationsManagerDW; WITH TableSizes AS ( SELECT so.name, Coalesce(8 * Sum(CASE WHEN si.indid IN (0, 1) THEN si.reserved END),0) AS data_kb, Coalesce(8 * Sum(CASE WHEN si.indid NOT IN (0, 1, 255) THEN si.reserved END), 0) AS index_kb, Coalesce(8 * Sum(CASE WHEN si.indid IN (255) THEN si.reserved END), 0) AS blob_kb FROM dbo.sysobjects AS so JOIN dbo.sysindexes AS si ON (si.id = so.id) WHERE 'U' = so.type GROUP BY so.name ) SELECT (CASE WHEN CHARINDEX('_',so.name) > 0 THEN LEFT(so.name,CHARINDEX('_',so.name)-1) ELSE so.name END) as dataset, Coalesce(8 * Sum(CASE WHEN si.indid IN (0, 1) THEN si.reserved END),0) + Coalesce(8 * Sum(CASE WHEN si.indid NOT IN (0, 1, 255) THEN si.reserved END), 0) + Coalesce(8 * Sum(CASE WHEN si.indid IN (255) THEN si.reserved END), 0) AS totalsize_kb, CAST (1. * (Coalesce(8 * Sum(CASE WHEN si.indid IN (0, 1) THEN si.reserved END),0) + Coalesce(8 * Sum(CASE WHEN si.indid NOT IN (0, 1, 255) THEN si.reserved END), 0) + Coalesce(8 * Sum(CASE WHEN si.indid IN (255) THEN si.reserved END), 0)) / (SELECT SUM(data_kb + index_kb + blob_kb) FROM TableSizes) * 100 AS DECIMAL(5,2) ) AS PercentOfTotal FROM dbo.sysobjects AS so JOIN dbo.sysindexes AS si ON (si.id = so.id) WHERE 'U' = so.type AND (so.name like '%daily%' OR so.name like '%hourly%' OR so.name like '%raw%') GROUP BY (CASE WHEN CHARINDEX('_',so.name) > 0 THEN LEFT(so.name,CHARINDEX('_',so.name)-1) ELSE so.name END) ORDER BY totalsize_kb DESC
In my lab environment, which is not doing anything, I get the following output. Typically though you should see PerfHourly as the highest user when the defaults are in effect.
With this information it becomes much easier to qualify the value in adjusting the aggregate table grooming setting in the blog above.
Legal Stuff: As always the contents of this blog is provided “as-is”. The information, opinions and views expressed are those of the author and do not necessarily state or reflect those of any other company with affiliation to the products discussed. This includes any URLs or Tools. The author does not accept any responsibility from the use of the information or tools mentioned within this blog, and recommends adequate evaluation against your own requirements to measure suitability.