DBA Repository – Gather Database Connection Information

As anyone who works with data knows, the number of databases and data in those databases only ever seems to grow.  In my environment a significant number of database servers are the responsibility of staff in a variety of divisions.  I’ve found that the division technology staff doesn’t always have the same perspective with regard to size on disk as central IT.

In fact database servers using SAN disk are a mixed blessing, when extra space is required it’s pretty easy to allocate more.  This can give the impression of a infinite ocean of space to a division team member whose primary concern is his or her own slice of the enterprise pie.  Our central IT is constantly concerned about long term data growth rates across the enterprise and the possibility that unchecked data growth could lead to a very real resource problem.

Databases consume a significant percentage of any organization’s total disk space, as such, it make sense to find ways to eliminate unused databases.  One way I’ve attempted to determine if any given database is useful is to determine how often (if at all) people are connecting to it.  Those databases that are rarely or never used are candidates for deletion and freeing up of that vital disk space in the production system and any test/staging/development systems as well as the backup platform.

As part of the DBA Repository I’ve created an SSIS package that polls all databases across the enterprise, I run it every 5 minutes, to determine if connections are active on that database and how many there are.  It stores the maximum number of connections for each database each day.  Having that data allows me to send inquiries to the responsible division technology staff member to determine if unused databases can be eliminated.

Create the dbo.DatabaseConnections Table

We need a table to hold the daily connection information:

CREATE TABLE [dbo].[DatabaseConnections](
	  [DatabaseConnectionID] [bigint] IDENTITY(1,1) NOT NULL
	, [ServerName] [nvarchar](128) NOT NULL
	, [DatabaseName] [nvarchar](128) NOT NULL
	, [Day] [datetime] NOT NULL
	, [Connections] [int] NOT NULL
)

The Gather Database Connection Information Package

The package is simple but the data flow looks more complicated than it really is.  It performs the following logical steps:

  • Loop through all servers in the enterprise (how to do this is discussed in a previous post)
  • Compare the current connection information to what’s in the DBA Repository
  • Add new data or update the existing data if the number of connections is higher than the previous poll

Here’s what the package’s Control Flow looks like:

DBARepository_GatherDatabaseConnectionInfo

Gather database Connection Information Data Flow

The data flow is where all the work gets done here’s an image of the task:

Find Connections on Source Server

This OLE DB Source is based on a query that finds the current number of connections on each database and it will work for SQL 2000 and later instances, here ‘s the query:

DECLARE @ServerVersion varchar(18)
SET @ServerVersion = CAST(SERVERPROPERTY('ProductVersion') AS varchar(20))
SET @ServerVersion = LEFT(@ServerVersion, CHARINDEX('.', @ServerVersion) - 1)
 
IF CAST(@ServerVersion AS tinyint) <= 8
    SELECT  [Server] = @@SERVERNAME
          , [Database] = d.NAME
          , [Day] = CAST(CONVERT(char(10), GETDATE(), 101) AS datetime)
          , [Connections] = COUNT(*)
    FROM    sysprocesses AS p
            INNER JOIN sysdatabases AS d ON d.[dbid] = p.[dbid]
    WHERE   d.[name] NOT IN ('master', 'model', 'msdb', 'tempdb')
            AND p.loginame <> 'NT AUTHORITY\SYSTEM'
            AND LEN(p.loginame) > 0
    GROUP BY d.[name]
 
IF CAST(@ServerVersion AS tinyint) > 8
    SELECT  [Server] = @@SERVERNAME
          , [Database] = d.[name]
          , [Day] = CAST(CONVERT(char(10), GETDATE(), 101) AS datetime)
          , [Connections] = COUNT(*)
    FROM    sysprocesses AS p
            INNER JOIN sys.databases AS d ON p.[dbid] = d.database_id
    WHERE   d.[name] NOT IN ('master', 'model', 'msdb', 'tempdb')
            AND p.loginame <> 'NT AUTHORITY\SYSTEM'
            AND LEN(p.loginame) > 0
    GROUP BY d.[name]

Repository Connection Information

This OLE DB Source is based on a query that finds all the latest connection information currently stored in the DBA Repository for each database, here’s the query:

SELECT  dc.ServerName
      , dc.DatabaseName
      , [Day] = mxd.[Day]
      , Connections = dc.Connections
FROM    DBARepository.dbo.DatabaseConnections AS dc
        INNER JOIN (SELECT  ServerName
                          , DatabaseName
                          , [Day] = MAX(c.[Day])
                    FROM    DBARepository.dbo.DatabaseConnections AS c
                    GROUP BY c.ServerName
                          , c.DatabaseName) AS mxd ON dc.ServerName = mxd.ServerName
                                                      AND dc.DatabaseName = mxd.DatabaseName
                                                      AND dc.[Day] = mxd.[Day]

The next few steps simply sort the data so a merge join can be used to compare the data from the server to the data in the repository for each database.

Split Connection Data

This Conditional Split Transform directs the data to perform the appropriate subsequent steps.  Data is sent to the Unexpected Connection Information output if one of the criteria below is not met and the data is logged to the DBA Repository’s Log table.

New Database or New Day

The New Database or New Day output determines if the connection record is for a new database or a new day so the new record can be inserted rather than updated.  The expression that determines the data for this output is: ISNULL(RepositoryServerName) || (SourceDay > RepositoryDay)

Connection Updates

The Connection Updates output is for connection information for a database in the repository in the same day that has more connections that what was previously recorded in the DBA Repository.  Here’s the expression for that output: (SourceDay == RepositoryDay) && (SourceConnections > RepositoryConnections)

No Change

There’s a No Change output in the split but this is for data that hasn’t changed or has fewer connections than previously recorded.  This  process doesn’t record this additional data; I trap it and send it nowhere.  It needs to be split out as valid data so the errors can be sent to the Unexpected Connection Information output.  Here’s the expression for the No Change data: (SourceDay == RepositoryDay) && (SourceConnections <= RepositoryConnections)

Data updates

I won’t detail the destination logic after the split because they’re quite simple they either

  1. Update existing connection records with a larger connection value
  2. Insert new connection records for a new database or for a new day
  3. Log data that didn’t match those as errors

Summary

This Integration Services package polls all databases on all instances every five minutes and records the maximum number of connections for any given day.  Now that I have this info I can find databases that are not being used and speak with the responsible staff member to determine if the database can be removed and free up some space on the server and in the backup solution.

Leave a Reply

Your email address will not be published. Required fields are marked *