DBA Repository – Database Size Statistics UDF

In a previous article, I demonstrated how to periodically poll all the SQL Server databases in an enterprise to record their size.  Once the size data is being gathered I need to know how things are changing over time.  The DBA Repository may have databases added or removed periodically so it is important to determine meaningful rates of change.  In my case I will determine an annualized rate of change for any given database.

I plan to create a Table-Valued User Defined Function to allow the caller to pass in a start and end date allowing the caller increase or shrink the sample dataset of polled DB size data to balance accuracy of reporting and performance of a query.  The wider the data range, the more data will need to be calculated – the result may be more accurate because size fluctuations will average out, but the result will be slower to return.  I’ll discuss the parts of the function individually, the whole function will be listed at the end of the article.

Some Housekeeping

The function’s arguments, return values and internal variables are described here:

Arguments and Return Values

The function will accept @StartDate, @EndDate, @ServerName, and @DatabaseName arguments.  @DatabaseName and @ServerName are optional allowing the caller to return data for a specific database or the data rolled up for all databases on a server instance.  It will store its calculations in a table variable called @DatabaseSizeStatistics which will be returned to the caller.

Internal Variables

There will be two table variables used internally in the function to determine what sample data to use based on what the caller passed in the arguments

@Boundaries holds the first and last polled dates within the @StartDate and @EndDate passed.  This is important because if there are fewer days in the DB size data than days in the date range passed the annualized rate of change need to use the @Boundaries data to create a more accurate result, here’s @Boundaries:

DECLARE @Boundaries TABLE (
      ServerName nvarchar(128)
    , DatabaseName nvarchar(128)
    , MinSampleDate datetime
    , MaxSampleDate datetime
    )

@SampledData holds all data points needed to for rate of change calculations from the dbo.DatabaseSize table based on the data we found for @Boundaries, here’s @SampledData:

DECLARE @SampledData TABLE (
      ServerName nvarchar(128)
    , DatabaseName nvarchar(128)
    , BeginSize decimal(15, 4)
    , EndSize decimal(15, 4)
    , SizeDelta decimal(15, 4)
    , DayDelta int
    , Periods smallint DEFAULT 365
    )

Let’s start calculating…

Gather Polled Dates

First we’ll load @Boundaries table with the appropriate MIN and MAX polled dates, they’ll either match the passed @StartDate and @EndDate or they’ll be some date range in-between based on the data we’ve collected:

DECLARE @Boundaries TABLE (
      ServerName nvarchar(128)
    , DatabaseName nvarchar(128)
    , MinSampleDate datetime
    , MaxSampleDate datetime
    )
INSERT  INTO @Boundaries
        SELECT  ServerName
              , DatabaseName
              , MinSampleDate = MIN(DayPolled)
              , MaxSampleDate = MAX(DayPolled)
        FROM    [database].DatabaseSize AS ds
        WHERE   1 = 1
                AND ds.DayPolled BETWEEN @StartDate AND @EndDate
                AND (ds.ServerName = @ServerName
                     OR @ServerName IS NULL)
                AND (ds.DatabaseName = @DatabaseName
                     OR @DatabaseName IS NULL)
        GROUP BY ServerName
              , DatabaseName

Find Size Deltas

The next step is to find the begin and end size of the DB, the amount it changed and the number of days over which it changed.  Remember, that number of days change is an important factor in accuracy.  If the caller passed in 180 days between the @StartDate and @EndDate but the data only contains data for the last 30 days, we calculate based on the data we have, 30, rather than the 180 requested.  This query loads @SampledData with those figures:

INSERT  INTO @SampledData (
          ServerName
        , DatabaseName
        , BeginSize
        , EndSize
        , SizeDelta
        , DayDelta
        )
        SELECT  ds.ServerName
              , ds.DatabaseName
              , BeginSize = SUM(CASE WHEN ds.DayPolled = b.MinSampleDate THEN ds.[DBSize(MB)]
                                     ELSE 0
                                END)
              , EndSize = SUM(CASE WHEN ds.DayPolled = b.MaxSampleDate THEN ds.[DBSize(MB)]
                                   ELSE 0
                              END)
              , SizeDelta = SUM(CASE WHEN ds.DayPolled = b.MaxSampleDate THEN ds.[DBSize(MB)]
                                     ELSE 0
                                END) - SUM(CASE WHEN ds.DayPolled = b.MinSampleDate THEN ds.[DBSize(MB)]
                                                ELSE 0
                                           END)
              , DayDelta = CASE WHEN DATEDIFF(dd, b.MinSampleDate, b.MaxSampleDate) < 1 THEN 1
                                ELSE DATEDIFF(dd, b.MinSampleDate, b.MaxSampleDate)
                           END
        FROM    [database].DatabaseSize AS ds
                INNER JOIN @Boundaries AS b ON ds.ServerName = b.ServerName
                                               AND ds.DatabaseName = b.DatabaseName
        WHERE   1 = 1
                AND ds.DayPolled BETWEEN @StartDate AND @EndDate
                AND (ds.ServerName = @ServerName
                     OR @ServerName IS NULL)
                AND (ds.DatabaseName = @DatabaseName
                     OR @DatabaseName IS NULL)
        GROUP BY ds.ServerName
              , ds.DatabaseName
              , b.MinSampleDate
              , b.MaxSampleDate

Calculate the Rates of Change

All the hard work is already complete, now we just do some math to find the Daily and Annualized rates of change to return to the caller.  Let’s load @DatabaseSizeStatistics with the results:

INSERT  INTO @DatabaseSizeStatistics
        SELECT  sd.ServerName
              , sd.DatabaseName
              , BeginningDatabaseSize = sd.BeginSize
              , CurrentDatabaseSize = sd.EndSize
              , sd.DayDelta
              , DailyRateOfChange = (sd.SizeDelta / sd.DayDelta) / sd.BeginSize
              , AnnualizedRateOfChange = ((sd.SizeDelta / sd.DayDelta) / sd.BeginSize) * sd.Periods
        FROM    @SampledData AS sd

Summary

We can now determine rates of change base on the dbo.DatabaseSize table we populate periodically via SSIS.  Since we’ve got detail data at the database level we can view the information by database or rolled up by server.  The UDF ensures the calculation uses the most available sample data and if there’s fewer days available for the sample than requested, the rates of change are adjusted appropriately.  Here’s the full UDF:

CREATE FUNCTION [database].[GrowthStatistics] (
      @StartDate datetime
    , @EndDate datetime
    , @ServerName nvarchar(128) = NULL
    , @DatabaseName nvarchar(128) = NULL
    )
RETURNS @DatabaseSizeStatistics TABLE (
      ServerName nvarchar(128)
    , DatabaseName nvarchar(128)
    , BeginningDatabaseSize decimal(15, 4)
    , CurrentDatabaseSize decimal(15, 4)
    , DayDelta decimal(15, 4)
    , DailyRateOfChange decimal(15, 4)
    , AnnualizedRateOfChange decimal(15, 4)
    )
AS
/*
Object:   [database].DatabaseGrowthStatistics
Description:  Finds the daily and annualized rate of growth for all databases between the
    provided date range
 
Usage:    @StartDate = Begin data for pulling sample data
    @EndDate = End date for pulling sample data
 
Returns:   Recordset
$Workfile: database.GrowthStatistics.sql $
$Author: David Masciangelo &
$Revision: 1.0 $
Created: 10/27/2009 10:25
$Modtime: 10/27/2009 10:25 $
Modification History:
 
*/
--UnREM for testing
--DECLARE @StartDate datetime = '20091001'
--  , @EndDate datetime = '20091231'
--  , @ServerName nvarchar(128) = 'CQDSC06'
--  , @DatabaseName nvarchar(128) = NULL
 
BEGIN
 
    DECLARE @Boundaries TABLE (
          ServerName nvarchar(128)
        , DatabaseName nvarchar(128)
        , MinSampleDate datetime
        , MaxSampleDate datetime
        )
    DECLARE @SampledData TABLE (
          ServerName nvarchar(128)
        , DatabaseName nvarchar(128)
        , BeginSize decimal(15, 4)
        , EndSize decimal(15, 4)
        , SizeDelta decimal(15, 4)
        , DayDelta int
        , Periods smallint DEFAULT 365
        )
 
-- Gather the polled data min and max within the passed start and end dates
    INSERT  INTO @Boundaries
            SELECT  ServerName
                  , DatabaseName
                  , MinSampleDate = MIN(DayPolled)
                  , MaxSampleDate = MAX(DayPolled)
            FROM    [database].DatabaseSize AS ds
            WHERE   1 = 1
                    AND ds.DayPolled BETWEEN @StartDate AND @EndDate
                    AND (ds.ServerName = @ServerName
                         OR @ServerName IS NULL)
                    AND (ds.DatabaseName = @DatabaseName
                         OR @DatabaseName IS NULL)
            GROUP BY ServerName
                  , DatabaseName
-- find the deltas of the polled data withing the start and end
    INSERT  INTO @SampledData (
              ServerName
            , DatabaseName
            , BeginSize
            , EndSize
            , SizeDelta
            , DayDelta
            )
            SELECT  ds.ServerName
                  , ds.DatabaseName
                  , BeginSize = SUM(CASE WHEN ds.DayPolled = b.MinSampleDate THEN ds.[DBSize(MB)]
                                         ELSE 0
                                    END)
                  , EndSize = SUM(CASE WHEN ds.DayPolled = b.MaxSampleDate THEN ds.[DBSize(MB)]
                                       ELSE 0
                                  END)
                  , SizeDelta = SUM(CASE WHEN ds.DayPolled = b.MaxSampleDate THEN ds.[DBSize(MB)]
                                         ELSE 0
                                    END) - SUM(CASE WHEN ds.DayPolled = b.MinSampleDate THEN ds.[DBSize(MB)]
                                                    ELSE 0
                                               END)
                  , DayDelta = CASE WHEN DATEDIFF(dd, b.MinSampleDate, b.MaxSampleDate) < 1 THEN 1
                                    ELSE DATEDIFF(dd, b.MinSampleDate, b.MaxSampleDate)
                               END
            FROM    [database].DatabaseSize AS ds
                    INNER JOIN @Boundaries AS b ON ds.ServerName = b.ServerName
                                                   AND ds.DatabaseName = b.DatabaseName
            WHERE   1 = 1
                    AND ds.DayPolled BETWEEN @StartDate AND @EndDate
                    AND (ds.ServerName = @ServerName
                         OR @ServerName IS NULL)
                    AND (ds.DatabaseName = @DatabaseName
                         OR @DatabaseName IS NULL)
            GROUP BY ds.ServerName
                  , ds.DatabaseName
                  , b.MinSampleDate
                  , b.MaxSampleDate
-- calculate the stats
    INSERT  INTO @DatabaseSizeStatistics
            SELECT  sd.ServerName
                  , sd.DatabaseName
                  , BeginningDatabaseSize = sd.BeginSize
                  , CurrentDatabaseSize = sd.EndSize
                  , sd.DayDelta
                  , DailyRateOfChange = (sd.SizeDelta / sd.DayDelta) / sd.BeginSize
                  , AnnualizedRateOfChange = ((sd.SizeDelta / sd.DayDelta) / sd.BeginSize) * sd.Periods
            FROM    @SampledData AS sd
    RETURN
 
END

Enjoy!

Leave a Reply

Your email address will not be published. Required fields are marked *