As anyone who works with data knows, the number of databases and data in those databases only ever seems to grow. In my environment a significant number of database servers are the responsibility of staff in a variety of divisions. I’ve found that the division technology staff doesn’t always have the same perspective with regard to size on disk as central IT.
In fact database servers using SAN disk are a mixed blessing, when extra space is required it’s pretty easy to allocate more. This can give the impression of a infinite ocean of space to a division team member whose primary concern is his or her own slice of the enterprise pie. Our central IT is constantly concerned about long term data growth rates across the enterprise and the possibility that unchecked data growth could lead to a very real resource problem.
Databases consume a significant percentage of any organization’s total disk space, as such, it make sense to find ways to eliminate unused databases. One way I’ve attempted to determine if any given database is useful is to determine how often (if at all) people are connecting to it. Those databases that are rarely or never used are candidates for deletion and freeing up of that vital disk space in the production system and any test/staging/development systems as well as the backup platform.
As part of the DBA Repository I’ve created an SSIS package that polls all databases across the enterprise, I run it every 5 minutes, to determine if connections are active on that database and how many there are. It stores the maximum number of connections for each database each day. Having that data allows me to send inquiries to the responsible division technology staff member to determine if unused databases can be eliminated.
Create the dbo.DatabaseConnections Table
We need a table to hold the daily connection information:
CREATE TABLE [dbo].[DatabaseConnections]( [DatabaseConnectionID] [bigint] IDENTITY(1,1) NOT NULL , [ServerName] [nvarchar](128) NOT NULL , [DatabaseName] [nvarchar](128) NOT NULL , [Day] [datetime] NOT NULL , [Connections] [int] NOT NULL )
The Gather Database Connection Information Package
The package is simple but the data flow looks more complicated than it really is. It performs the following logical steps:
- Loop through all servers in the enterprise (how to do this is discussed in a previous post)
- Compare the current connection information to what’s in the DBA Repository
- Add new data or update the existing data if the number of connections is higher than the previous poll
Here’s what the package’s Control Flow looks like:
Gather database Connection Information Data Flow
The data flow is where all the work gets done here’s an image of the task:
Find Connections on Source Server
This OLE DB Source is based on a query that finds the current number of connections on each database and it will work for SQL 2000 and later instances, here ‘s the query:
DECLARE @ServerVersion varchar(18) SET @ServerVersion = CAST(SERVERPROPERTY('ProductVersion') AS varchar(20)) SET @ServerVersion = LEFT(@ServerVersion, CHARINDEX('.', @ServerVersion) - 1) IF CAST(@ServerVersion AS tinyint) <= 8 SELECT [Server] = @@SERVERNAME , [Database] = d.NAME , [Day] = CAST(CONVERT(char(10), GETDATE(), 101) AS datetime) , [Connections] = COUNT(*) FROM sysprocesses AS p INNER JOIN sysdatabases AS d ON d.[dbid] = p.[dbid] WHERE d.[name] NOT IN ('master', 'model', 'msdb', 'tempdb') AND p.loginame <> 'NT AUTHORITY\SYSTEM' AND LEN(p.loginame) > 0 GROUP BY d.[name] IF CAST(@ServerVersion AS tinyint) > 8 SELECT [Server] = @@SERVERNAME , [Database] = d.[name] , [Day] = CAST(CONVERT(char(10), GETDATE(), 101) AS datetime) , [Connections] = COUNT(*) FROM sysprocesses AS p INNER JOIN sys.databases AS d ON p.[dbid] = d.database_id WHERE d.[name] NOT IN ('master', 'model', 'msdb', 'tempdb') AND p.loginame <> 'NT AUTHORITY\SYSTEM' AND LEN(p.loginame) > 0 GROUP BY d.[name]
Repository Connection Information
This OLE DB Source is based on a query that finds all the latest connection information currently stored in the DBA Repository for each database, here’s the query:
SELECT dc.ServerName , dc.DatabaseName , [Day] = mxd.[Day] , Connections = dc.Connections FROM DBARepository.dbo.DatabaseConnections AS dc INNER JOIN (SELECT ServerName , DatabaseName , [Day] = MAX(c.[Day]) FROM DBARepository.dbo.DatabaseConnections AS c GROUP BY c.ServerName , c.DatabaseName) AS mxd ON dc.ServerName = mxd.ServerName AND dc.DatabaseName = mxd.DatabaseName AND dc.[Day] = mxd.[Day]
The next few steps simply sort the data so a merge join can be used to compare the data from the server to the data in the repository for each database.
Split Connection Data
This Conditional Split Transform directs the data to perform the appropriate subsequent steps. Data is sent to the Unexpected Connection Information output if one of the criteria below is not met and the data is logged to the DBA Repository’s Log table.
New Database or New Day
The New Database or New Day output determines if the connection record is for a new database or a new day so the new record can be inserted rather than updated. The expression that determines the data for this output is: ISNULL(RepositoryServerName) || (SourceDay > RepositoryDay)
The Connection Updates output is for connection information for a database in the repository in the same day that has more connections that what was previously recorded in the DBA Repository. Here’s the expression for that output: (SourceDay == RepositoryDay) && (SourceConnections > RepositoryConnections)
There’s a No Change output in the split but this is for data that hasn’t changed or has fewer connections than previously recorded. This process doesn’t record this additional data; I trap it and send it nowhere. It needs to be split out as valid data so the errors can be sent to the Unexpected Connection Information output. Here’s the expression for the No Change data: (SourceDay == RepositoryDay) && (SourceConnections <= RepositoryConnections)
I won’t detail the destination logic after the split because they’re quite simple they either
- Update existing connection records with a larger connection value
- Insert new connection records for a new database or for a new day
- Log data that didn’t match those as errors
This Integration Services package polls all databases on all instances every five minutes and records the maximum number of connections for any given day. Now that I have this info I can find databases that are not being used and speak with the responsible staff member to determine if the database can be removed and free up some space on the server and in the backup solution.