Why is Count(*) Taking So Long

Phone rings……..I answer…….a DBA from a third party vendor has supplied someone with two scripts. The first script is a simple insert with a where clause:

insert into [databasename].[dbo].[tablename]
( column1,column2, column3, column4 )

select column1, column2, column3, column4
from [databasename].[dbo].[tablename]

where column1<[integeramount]
and column1>[integeramount]

First question I ask; how much data is loading into the table – answer millions of records; the vendor gave us a script to see if the number is increasing. What is the script; do you have it? Sure…..script below:

select COUNT(*) from tablename

I was reminded of something I came across several years ago about this very scenario so figured why not put it to the test. I will try to explain to the best of my ability why the the second query was taking an hour to run.

The first problem I see right off hand is the COUNT(*) statement has to do a table scan; it is a requirement to figure out the calculation to return the result set. You take this statement and run it against several million row tables with a ton of reads on it you have a recipe of being prepared to sit and watch and wait for the result set to return.

How Do I Get Around This?

It’s not that difficult and here is a nice trick to provide a quick solution. I’m a huge fan of DMV’s and it just so happens that you can utilize one to return row counts for all tables in a database or specific tables in the database:

SELECT o.name,
FROM sys.indexes AS i
INNER JOIN sys.dm_db_partition_stats AS ddps ON i.OBJECT_ID = ddps.OBJECT_ID
AND i.index_id = ddps.index_id
WHERE i.index_id < 2
AND o.is_ms_shipped = 0

The result will give you the specific table name with row count


Don’t be alarmed by using the system object. Unlike others the row count does not depend on any updated statistics so the count is accurate. On this 132 million record table I can get the result set to return immediately.

Next time you get stuck waiting on a COUNT(*) statement to run; think about using a DMV; for a listing check out what Microsoft has listed into categories

It is always nice to have some tricks up your sleeve; especially when dealing with outside vendors.

T-SQL Tuesday #040: File and Filegroup Wisdom

SQL Tuesday

It’s that time again for the T-SQL Tuesday party! This party was created by none other than Adam Machanic (Twitter). If you are interested in hosting a party at some point this year give him a shout; you need to have participated in two T-SQL Tuesdays along the way and also maintain your own blog for at least 6 months.

Now that we have what the party is all about let’s get into what this month’s party is centered around Filegroups and his hosted by Jen McCown / MidnightDBA

My focus today is garnered toward indexes on filegroups and what they can do to your index strategy. I’m a big fan of having strategies when tackling issues, problems, or even believe it or not from the beginning of a project. Placing indexes on filegroups carefully can improve query performance (at the same time I want to note that indexes can also hurt performance in some situations so thorough testing needs to be taken into consideration).

Back from my 2008 R2 studies, if my memory serves me correctly, indexes are stored in the same filegroup by default; a non-partitioned clustered index and the associated table always reside in the same filegroup however you can do one of three things:

  1. You can partition both clustered and non clustered indexes to span multiple filegroups
  2. Create non clustered indexes on a filegroup
  3. Move a table from one filegroup to another

You can achieve performance gains by created non clustered indexes on a different filegroup if the filegroups are using different physical drives. The data and index information can be read in parallel by the multiple disk heads when the physical drives are on their own controllers.

One cannot foresee the access that will transpire or when it will happen, a better decision to spread your tables and indexes across all file groups might be of help. This would guarantee all disks are being used and accessed because all data and indexes would be spread evenly across all disks.

To bring this all back together you can think of a filegroup in its simplest of forms. Every database that you create has at least a data file and a log file, and every database has a primary filegroup. The filegroup contains the primary data file and any secondary files that are associated with it. One filegroup can contain multiple mdf/ndf files.

In the end I have seen significant gains with indexes being placed on specific filegroups, but as I stated before it is good to test all this out. Set up some scenarios on your test server and start doing some test cases to prove different theories and ideologies. One thing to remember as well is not every case is the same; ensure that the decisions you are making is good for what you are working on; never take a suggestion and drop it into a production environment. Prove the statement to be true or false no matter who it comes from.

Well, that’s a wrap for today’s party. Until next month…….

PowerShell Tactics

Last months T-SQL Tuesday party dealt with PowerShell. I debated on whether or not to post a topic, but in the end I didn’t feel like I should post something that I had not really ever tried before so I spent a few weeks going through some basic power shell scripting and what I ended up with was a way to check my databases from a list of servers I have contained in a text file; along with retrieving failed job history based on the same server listing.

I was impressed enough that I started to use some of this technology on my on-call days where I have automated reports dumped out into excel waiting for me when I get in for review.

While there are many variations of this out there I researched and then put some of my own twists into it when creating the scripts; I only plan today to show one that I use the SMO assembly to retrieve basic database information.

The Script

First I want to open excel:

#Create a new Excel object using COM
$Excel = New-Object -ComObject Excel.Application
$Excel.visible = $True
$Excel = $Excel.Workbooks.Add()
$Sheet = $Excel.Worksheets.Item(1)

Now that excel is open I can begin to retrieve the list of databases from the text file and create all the header information:

#Counter variable for rows
$intRow = 1

#Read thru the contents of the SQL_Servers.txt file
foreach ($instance in get-content “C:\Server\Servers.txt”)

#Create column headers
$Sheet.Cells.Item($intRow,1) = “INSTANCE NAME:”
$Sheet.Cells.Item($intRow,2) = $instance
$Sheet.Cells.Item($intRow,1).Font.Bold = $True
$Sheet.Cells.Item($intRow,2).Font.Bold = $True


$Sheet.Cells.Item($intRow,1) = “DATABASE NAME”
$Sheet.Cells.Item($intRow,2) = “OWNER”
$Sheet.Cells.Item($intRow,3) = “AUTOSHRINK”
$Sheet.Cells.Item($intRow,4) = “RECOVERY MODEL”
$Sheet.Cells.Item($intRow,5) = “LAST DATABASE BACKUP”
$Sheet.Cells.Item($intRow,6) = “SIZE (MB)”
$Sheet.Cells.Item($intRow,7) = “SPACE AVAILABLE (MB)”

#Format the column headers
for ($col = 1; $col –le 7; $col++)
$Sheet.Cells.Item($intRow,$col).Font.Bold = $True
$Sheet.Cells.Item($intRow,$col).Interior.ColorIndex = 48
$Sheet.Cells.Item($intRow,$col).Font.ColorIndex = 34


Now the fun part; time to get all the information and watch your excel file go:

[System.Reflection.Assembly]::LoadWithPartialName(‘Microsoft.SqlServer.SMO’) | out-null

# Create an SMO connection to the instance
$s = New-Object (‘Microsoft.SqlServer.Management.Smo.Server’) $instance

$dbs = $s.Databases

#$dbs | SELECT Name, Owner, AutoShrink, RecoveryModel, Last Database Backup, Size, SpaceAvailable

#Formatting using Excel

ForEach ($db in $dbs)

#Divide the value of SpaceAvailable by 1KB
$dbSpaceAvailable = $db.SpaceAvailable/1KB

#Format the results to a number with three decimal places
$dbSpaceAvailable = “{0:N3}” -f $dbSpaceAvailable

$Sheet.Cells.Item($intRow, 1) = $db.Name
$Sheet.Cells.Item($intRow, 2) = $db.Owner

#Change the background color of the Cell depending on the AutoShrink property value
if ($db.AutoShrink -eq “True”)
$fgColor = 3
$fgColor = 0

$Sheet.Cells.Item($intRow, 3) = $db.AutoShrink
$Sheet.Cells.item($intRow, 3).Interior.ColorIndex = $fgColor
if ($db.RecoveryModel -eq “1”)
$Sheet.Cells.Item($intRow, 4) = “FULL”
elseif ($db.RecoveryModel -eq “3”)
$Sheet.Cells.Item($intRow, 4) = “SIMPLE”
elseif ($db.RecoveryModel -eq “2”)
$Sheet.Cells.Item($intRow, 4) = “BULK-LOGGED”
if ($db.LastBackupDate -eq “12:00:00 AM”)
$Sheet.Cells.Item($intRow, 5) = “Never”
$Sheet.Cells.Item($intRow, 5) = $db.LastBackupDate
$Sheet.Cells.Item($intRow, 6) = “{0:N3}” -f $db.Size

#Change the background color of the Cell depending on the SpaceAvailable property value
if ($dbSpaceAvailable -lt 1.00)
$fgColor = 3
$fgColor = 0

$Sheet.Cells.Item($intRow, 7) = $dbSpaceAvailable
$Sheet.Cells.item($intRow, 7).Interior.ColorIndex = $fgColor

$intRow ++


$intRow ++




I added and removed some various objects in the information I was pulling back. There are a lot of great topics out there that show you what the objects are and many various books that provide the syntax in order to accomplish many things utilizing PowerShell.

I was not happy to miss the party last month but going through the exercise of diving in and doing some things with PowerShell I am glad I did. I look forward to doing some more involved work with it as I move forward in the future ~ I’ve only begun to scratch the surface!