Sunday, February 12, 2012

Bitwise operation in SELECT killing performance, *sometimes*?

Hi there,

We have a high volume SQL Server which gets into trouble at times, with one query blocking out others for 10-15 minutes.

The query looks something like this:

SELECT * FROM MyTable WHERE MyQueryColumn & 128 = 128 ORDER BY MySortColumn

There is an index that exists containing MyQueryColumn. MyTable has somewhere around 550,000 rows, around 50 of which might match this query at any particular time.

Almost every time this query executes in under 1 second, and everybody is happy. However, once every few weeks (or months, even), this thing takes a dogs age, blocking loads of other queries trying to hit MyTable.

We did a showplan on the query, and saw something like this

WHERE (Convert([Mydatabase].[MyQueryColumn]) & 128 = 128)

The Convert is making me a little bit nervous; but it doesn't appear to be doing a table scan or anything heinous, although an index scan with that many rows might be bad enough.

Changing the query to something like this is an option:

SELECT * FROM MyTable WHERE MyQueryColumn > 127 ORDER BY MySortColumn

but I'd like someone to tell me that using that bitwise operator might cause such a problem before I modify our code.

Thanks,

Chris
Hi,

I'm not sure how to explain the slowdown without more info.

However, given what you've told us, I don't think the index helps you with that query. The index isn't used for bitwise operators on the indexed column.

You might consider something like

select * from MyTable where (MyQueryColumn > 127) and ((MyQueryColumn & 128) = 128)

or perhaps an indexed computed column.|||An indexed computed column would work, but you would be shifting the slowdown elsewhere, particularly inserts updates and deletes.

so what is the problem?

well. . .

Bitmasking is a violation of normalization!!!

Now that being said. . .

I am willing to bet each bit of the mask corresponds to a particular quality right? the set bit, determines a ID particular quality of some row in your database:

What you are really modeling is a Many to Many relationship -
Items >--< Quality
ItemQuality

Item (ID, ItemName)
Quality(ID, QualityName)
ItemQuality(ItemID, QualityID)
To get back your original table, given that QualityID matches your original bit scheme:
Select ID, ItemName, sum(QualityID) from
Item inner join ItemQuality on Item.Id = ItemQuality.ItemId

but in practice you wouldn't really do that. . . because in the realtional world, you dont care about the ID's/Bitmasks, you care about the realtions so typically you would do

Select ID, ItemName from
(Item inner join ItemQuality
on Item.Id = ItemQuality.ItemId)
inner join Quality
on ItemQuality.QualityID = Quality.ID
where Quality.Name in
('Some Quality', 'Some Other Quality')

Therefore, the "where clause" is in essence your bitmask!

I mean, imagine an application that has a set of check boxes for all the qualities. . .

You decide you want to mask a new quality. If using a bit mask, you have to rewrite your application. Using a normalized schema, the application could be dynamic:

<pseudocode>
For each Quality
GenerateCheckBox
</pseudocode>

so bag the bitmask!!!
By using a bitmask, you are bypassing the entire purpose of an RDBMS!!!|||If you can capture the query plan for the good/bad cases, this would be helpful

set statistics profile on
<<query in question>>
set statistics profile off

Note that this adds overhead, so please do this with care.

In general, an index won't be generally useful in this case (except in the case when you have a very wide row) - the sort isn't helping you because you are just pulling some bit out of the column.

Keeping statistics up-to-date will likely also help avoid/minimize cases where the optimizer gets confused and provides a less-than-optimal query plan.

Conor Cunningham
SQL Server Query Optimization Development Lead

No comments:

Post a Comment