Showing posts with label cost. Show all posts
Showing posts with label cost. Show all posts

Thursday, March 22, 2012

Bookmark lookup cost factors (SQL Server 7.0)

Good day to you.
I am working with some large databases with several tables hosting
millions or tens of millions of records. The actual data is being
stored on a SAN, while the database host is a 4 CPU SMP machine. In
this case the SAN is often a bottleneck while the CPUs are idled.
A situation that I am consistently observing is extremely poor choices
for the uses of indexes -- or rather the lack of index use. What I am
constantly finding for these large tables is that highly unique
non-clustered indexes are being ignored, and instead a table scan is
being performed. The execution engine is (wrongly) assuming that the
bookmark lookups will be more costly than simply doing a tablescan.
The end result is that many queries run 10s or 100s of times longer
than they should without explicitly adding index hints to force it to
use the index.
In one example a table with 5,000,000 or so records has a
non-clustered index on a field, and the statistics believe that there
are around 141 records that match the requested specific criteria
(there are really less -- as a sidenote can one increase the number of
steps stored by statistics?). Instead of quickly pulling the presumed
141 index matches and then doing the corresponding bookmark lookups,
it's instead table scanning all 5,000,000 records. Indeed if I look at
the estimated execution plans, it's claiming that the forced use of
the non-clutered index, and hence forced bookmark lookups, will yield
5x the subtree cost. The end results tell the real story, though -
with the index hint it completes instantly, while without the hint it
takes hundreds of times longer.
Is there a server or database setting that governs the estimation of
the cost of bookmark lookups? I do not wish to resort to unnecessary
hardcoded index hints when it's such a basically incorrect assumption
by the query engine that affects the entire system wherever large
tables are accessed. As a sidenote I had previously heard the
statistic that a non-clustered index will only be used if the
estimated set yields less than 2% of the total set, due to the cost of
bookmark lookups, but in this case it isn't using it even when the
estimated set is ~ 0.003%... how is it choosing?
Thanks
Dennis ForbesHi Dennis,
I think that 285996 PRB: Cost of Using Non-Clustered Indexes May Be Incorrect if Data (
http://support.microsoft.com/?id=285996) is something that may be applicable
to your scenario. Unfortunately there are no 'knobs' that tune the cost es
timation for any operators inside SQL Server.
Thanks,
--R
"Dennis Forbes" <dennis_forbes@.hotmail.com> wrote in message news:32811a52.0
403190902.35fd2ef5@.posting.google.com...
Good day to you.
I am working with some large databases with several tables hosting
millions or tens of millions of records. The actual data is being
stored on a SAN, while the database host is a 4 CPU SMP machine. In
this case the SAN is often a bottleneck while the CPUs are idled.
A situation that I am consistently observing is extremely poor choices
for the uses of indexes -- or rather the lack of index use. What I am
constantly finding for these large tables is that highly unique
non-clustered indexes are being ignored, and instead a table scan is
being performed. The execution engine is (wrongly) assuming that the
bookmark lookups will be more costly than simply doing a tablescan.
The end result is that many queries run 10s or 100s of times longer
than they should without explicitly adding index hints to force it to
use the index.
In one example a table with 5,000,000 or so records has a
non-clustered index on a field, and the statistics believe that there
are around 141 records that match the requested specific criteria
(there are really less -- as a sidenote can one increase the number of
steps stored by statistics?). Instead of quickly pulling the presumed
141 index matches and then doing the corresponding bookmark lookups,
it's instead table scanning all 5,000,000 records. Indeed if I look at
the estimated execution plans, it's claiming that the forced use of
the non-clutered index, and hence forced bookmark lookups, will yield
5x the subtree cost. The end results tell the real story, though -
with the index hint it completes instantly, while without the hint it
takes hundreds of times longer.
Is there a server or database setting that governs the estimation of
the cost of bookmark lookups? I do not wish to resort to unnecessary
hardcoded index hints when it's such a basically incorrect assumption
by the query engine that affects the entire system wherever large
tables are accessed. As a sidenote I had previously heard the
statistic that a non-clustered index will only be used if the
estimated set yields less than 2% of the total set, due to the cost of
bookmark lookups, but in this case it isn't using it even when the
estimated set is ~ 0.003%... how is it choosing?
Thanks
Dennis Forbessql

Bookmark lookup (Why)

I can't figure out why i have a bookmark lookup cost on ValidVendorPackages
on the below proc:
Select PriceViewHotelPrice.price as Totalprice, PriceViewHotelPrice.Docid
from price_view PriceViewHotelPrice Inner Join
(
select C1.Priceid as Pid,C1.VendorPackageId from
(
select distinct(HA.PriceId), HA.VendorPackageId from Criteria HA Inner Join
(
select VendorPackageId from ValidVendorPackages
where (Vendor = @.Vendor or @.Vendor = '')
and (Sitecode = @.Sitecode or @.Sitecode = '')
and (PackageType = @.PackageType or @.PackageType = '')
and (RequiredItems = @.RequiredItems or @.RequiredItems = -1)
)HB on HA.VendorPackageId = HB.VendorPackageId
and
(
CriteriaId in
(
select CriteriaID from ValidItemCriteria
where Destination = @.Destination
and LengthOfStay = @.LengthOfStay
and Ages = @.Ages
and ComponentType = 'H'
and (ValidItemType = @.HotelValidItemType_1 or @.HotelValidItemType_1 = '' )
and (ItemValue = @.HotelItemValue_1 or @.HotelItemValue_1 = '' )
)
)
) C1
) HotelCriteriaPriceId on PriceViewHotelPrice.priceid =
HotelCriteriaPriceId.Pid
and PriceViewHotelPrice.keydate between @.KeyDateMin and @.KeyDateMax
and ((PriceViewHotelPrice.price) between @.PriceLow and @.PriceHigh or
@.PriceLow = -1)
and (PriceViewHotelPrice.Volatility between @.VolatilityMin and
@.VolatilityMax or @.VolatilityMin = -1)
and (PriceViewHotelPrice.AvgMinsUpdateInterval between
@.AvgMinsUpdateIntervalMin and @.AvgMinsUpdateIntervalMax or
@.AvgMinsUpdateIntervalMin = -1)
Order by TotalPrice
can anyone see why?
will
On Wed, 15 Dec 2004 11:55:04 -0800, we7313 wrote:

>I can't figure out why i have a bookmark lookup cost on ValidVendorPackages
>on the below proc:
Hi we7313,
Without knowing your table structure, indexes, etc, I can't do anything
but guess. Peruse at your own risk :-)
Looking at this part of the query:
>select distinct(HA.PriceId), HA.VendorPackageId from Criteria HA Inner Join
>(
>select VendorPackageId from ValidVendorPackages
>where (Vendor = @.Vendor or @.Vendor = '')
>and (Sitecode = @.Sitecode or @.Sitecode = '')
>and (PackageType = @.PackageType or @.PackageType = '')
>and (RequiredItems = @.RequiredItems or @.RequiredItems = -1)
>)HB on HA.VendorPackageId = HB.VendorPackageId
>and
>(
>CriteriaId in
>(
>select CriteriaID from ValidItemCriteria
>where Destination = @.Destination
>and LengthOfStay = @.LengthOfStay
>and Ages = @.Ages
>and ComponentType = 'H'
>and (ValidItemType = @.HotelValidItemType_1 or @.HotelValidItemType_1 = '' )
>and (ItemValue = @.HotelItemValue_1 or @.HotelItemValue_1 = '' )
>)
>)
I think that the optimizer has decided to start with reducing the number
of matching rows in Criteria by checking the CriteriaId in (...)
condition. Criteria rows that match are then used in the join against
ValidVendorPackages (note: the derived table is easily transformed in a
simple join; the optimizer will probably execute it as a simple join).
Obviously, you have a nonclustered index on the column VendorPackageId in
ValidVendorPackages. In order to check the other conditions (on Vendor,
Sitecode, PackageType and RequiredItems) as well, SQL Server has to fetch
the complete row. The nonclustered index used to find the VendorPackageId
will also contain the values for the clustered index on this table: they
serve as a pointer to the complete row. SQL Server will now take these
values and use them to navigate the clustered index down to the row that
corresponds to the index entry. This action (using the clustered index
values stored in the nonclustered index to find the row) is called a
bookmark lookup.
Best, Hugo
(Remove _NO_ and _SPAM_ to get my e-mail address)