“I’m getting index seeks. Why are my row estimates still wrong?”

“I’m getting index seeks. Why are my row estimates still wrong?”

If you’ve got good indexes to support your query, and statistics to help SQL Server guess how many rows will come back, how can SQL Server still come up with terribly incorrect row estimates?

To demonstrate, I’ll use the 2018-06 version of the Stack Overflow database, but any recent version will work as long as you understand the problem with this demo query:

SELECT *
FROM dbo.Users
WHERE CreationDate > '2018-05-01'
  AND Reputation > 100
  ORDER BY DisplayName;

I’m asking SQL Server to find users created in the last month (because I’m dealing with the database export that finished in June 2018), who have accumulated at least 100 reputation points.

SQL Server knows:

  • There have been a lot of users created in the last month
  • There are a lot of users who have > 100 reputation points

But he doesn’t know that that’s a Venn diagram of people who don’t overlap:

“I’m getting index seeks. Why are my row estimates still wrong?”

So when I run the query and get the actual execution plan – even though I’ve created two supporting indexes, which also create supporting statistics:

DropIndexes;
GO
CREATE INDEX CreationDate_Reputation 
	ON dbo.Users(CreationDate, Reputation);
CREATE INDEX Reputation_CreationDate 
	ON dbo.Users(Reputation, CreationDate);
GO
SELECT *
FROM dbo.Users
WHERE CreationDate > '2018-05-01'
  AND Reputation > 100
  ORDER BY DisplayName;

SQL Server’s actual execution plan ignores both indexes and does a table scan:

To understand why, we read the query plan from right to left, starting with the Clustered Index Scan operator. It says “419 of 40041 (1%)” – which means SQL Server found 419 rows of an expected 40,041 rows, just 1% of the data it expected to find.

SQL Server over-estimated the population of users because it didn’t know that very few recently created users have earned over 100 reputation points.

Why you care about estimation problems
(once you’ve learned about ’em)

The overestimation means SQL Server doesn’t think using the indexes will be efficient here. SQL Server thinks there will be too many key lookups, which would result in a higher number of logical reads than scanning the whole table.

He’s wrong, of course, and we can prove that by copying the query into a new version with an index hint:

SET STATISTICS IO ON;

SELECT *
FROM dbo.Users
WHERE CreationDate > '2018-05-01'
  AND Reputation > 100
  ORDER BY DisplayName;

SELECT *
FROM dbo.Users WITH (INDEX = CreationDate_Reputation)
WHERE CreationDate > '2018-05-01'
  AND Reputation > 100
  ORDER BY DisplayName;

The first query does a table scan and 142,203 logical reads.

The second query does an index seek and 1,735 reads – that’s 82x less reads! SQL Server should be choosing this index, but doesn’t, and that’s why you care.

Index hints aren’t the answer, either.

Because here’s the query plan with the index hint:

See the yellow bang on the select? Hover your mouse over it:

Because SQL Server overestimated the number of rows it’d find, it also overestimated the memory required (leading to Page Life Expectancy dropping, for those of you who track that kind of thing.) In real-life-sized queries, this kind of operation usually causes SQL Server to allocate multiple CPU cores for parallel operations, too, leading to CXPACKET waits when those cores aren’t actually used.

What’s the real solution? Sadly, it’s not as simple as building your own fancy filtered statistics or indexes because date & reputation values are usually parameters that can be set by the user, and vary. I cover better solutions in my Mastering Query Tuning class.

This week’s sponsor: Top notch training on SQL Server, Synapse, Power BI, AI, Azure Data, PowerShell & more. Book a class, get the Summit free. Learn More.


Go to Source of this post
Author Of this post: Brent Ozar
Title Of post: “I’m getting index seeks. Why are my row estimates still wrong?”
Author Link: {authorlink}