Showing posts with label hostingmillions. Show all posts
Showing posts with label hostingmillions. Show all posts

Thursday, March 22, 2012

Bookmark lookup cost factors (SQL Server 7.0)

Good day to you.
I am working with some large databases with several tables hosting
millions or tens of millions of records. The actual data is being
stored on a SAN, while the database host is a 4 CPU SMP machine. In
this case the SAN is often a bottleneck while the CPUs are idled.
A situation that I am consistently observing is extremely poor choices
for the uses of indexes -- or rather the lack of index use. What I am
constantly finding for these large tables is that highly unique
non-clustered indexes are being ignored, and instead a table scan is
being performed. The execution engine is (wrongly) assuming that the
bookmark lookups will be more costly than simply doing a tablescan.
The end result is that many queries run 10s or 100s of times longer
than they should without explicitly adding index hints to force it to
use the index.
In one example a table with 5,000,000 or so records has a
non-clustered index on a field, and the statistics believe that there
are around 141 records that match the requested specific criteria
(there are really less -- as a sidenote can one increase the number of
steps stored by statistics?). Instead of quickly pulling the presumed
141 index matches and then doing the corresponding bookmark lookups,
it's instead table scanning all 5,000,000 records. Indeed if I look at
the estimated execution plans, it's claiming that the forced use of
the non-clutered index, and hence forced bookmark lookups, will yield
5x the subtree cost. The end results tell the real story, though -
with the index hint it completes instantly, while without the hint it
takes hundreds of times longer.
Is there a server or database setting that governs the estimation of
the cost of bookmark lookups? I do not wish to resort to unnecessary
hardcoded index hints when it's such a basically incorrect assumption
by the query engine that affects the entire system wherever large
tables are accessed. As a sidenote I had previously heard the
statistic that a non-clustered index will only be used if the
estimated set yields less than 2% of the total set, due to the cost of
bookmark lookups, but in this case it isn't using it even when the
estimated set is ~ 0.003%... how is it choosing?
Thanks
Dennis ForbesHi Dennis,
I think that 285996 PRB: Cost of Using Non-Clustered Indexes May Be Incorrect if Data (
http://support.microsoft.com/?id=285996) is something that may be applicable
to your scenario. Unfortunately there are no 'knobs' that tune the cost es
timation for any operators inside SQL Server.
Thanks,
--R
"Dennis Forbes" <dennis_forbes@.hotmail.com> wrote in message news:32811a52.0
403190902.35fd2ef5@.posting.google.com...
Good day to you.
I am working with some large databases with several tables hosting
millions or tens of millions of records. The actual data is being
stored on a SAN, while the database host is a 4 CPU SMP machine. In
this case the SAN is often a bottleneck while the CPUs are idled.
A situation that I am consistently observing is extremely poor choices
for the uses of indexes -- or rather the lack of index use. What I am
constantly finding for these large tables is that highly unique
non-clustered indexes are being ignored, and instead a table scan is
being performed. The execution engine is (wrongly) assuming that the
bookmark lookups will be more costly than simply doing a tablescan.
The end result is that many queries run 10s or 100s of times longer
than they should without explicitly adding index hints to force it to
use the index.
In one example a table with 5,000,000 or so records has a
non-clustered index on a field, and the statistics believe that there
are around 141 records that match the requested specific criteria
(there are really less -- as a sidenote can one increase the number of
steps stored by statistics?). Instead of quickly pulling the presumed
141 index matches and then doing the corresponding bookmark lookups,
it's instead table scanning all 5,000,000 records. Indeed if I look at
the estimated execution plans, it's claiming that the forced use of
the non-clutered index, and hence forced bookmark lookups, will yield
5x the subtree cost. The end results tell the real story, though -
with the index hint it completes instantly, while without the hint it
takes hundreds of times longer.
Is there a server or database setting that governs the estimation of
the cost of bookmark lookups? I do not wish to resort to unnecessary
hardcoded index hints when it's such a basically incorrect assumption
by the query engine that affects the entire system wherever large
tables are accessed. As a sidenote I had previously heard the
statistic that a non-clustered index will only be used if the
estimated set yields less than 2% of the total set, due to the cost of
bookmark lookups, but in this case it isn't using it even when the
estimated set is ~ 0.003%... how is it choosing?
Thanks
Dennis Forbessql