Wednesday, March 7, 2012

Blocked process mystery

This morning, we discovered that our application was timing out on several
pages. We tracked it down to a table that couldn't be read without timing
out. In Enterprise Manager, we found a table (TAB) lock on the table that
was of mode IS that was blocking other spids. The text property of the lock
showed this lock to be on a simple reporting stored procedure, which just
did a SELECT on a couple of tables that should have only taken a second or
two. We tried to debug the problem for several minutes, but to no avail.
Finally, we killed the lock and the database problem was immediately solved.
A look at SQL Profiler (which we run continuously) showed that a query that
matched the text property of the lock and had the same pid as the lock had
been started last Tuesday and ended at roughly the same time as we killed
the lock. The index name listed with the lock in Enterprise Manager also
was strange, since the index listed was not used by the stored procedure
listed in the properties.
We have had several similar problems with our database in the past, but this
is the first time we didn't resort to just a reboot. Why would a simple
stored procedure executing a select cause such problems? Why would this
procedure be allowed to run for a week? Why would we experience no problems
until days after (the table being locked was core to almost every page in
the system and was fine until this morning)? Can the index listed in the
lock information be used to debug the problem?
An even bigger question is how to handle such a problem after it occurs?
Killing the spid seems to have caused some problems in the dotNet
application and forced me to restart the app. Is there a more graceful
method of rolling back the offending process?
the spid you found did not actually lock on the tab, it was an intent share
lock . .effectively indicating that it was going to lock the table(possibly
pages within the table). what version of sql do you have running(and what
service pack) if you are running sql2k plus sp3 then ::fn_get_sql can be
useful in the future to indicate what the currently executing sql was for
that particular spid.
I would also switch on traceflag -T1204 on this server to ensure that you
capture the full details of the blocking in the sqlerror log.
I would also look at your current configurations for query govenor cost
limit and possibly reduce this to an acceptable value which is in line with
your 'Longest Running Query'(LRQ) . . .if you have profiler running
regularly u should be able to determine what your longest running query is
and set you query govenor cost limit accordingly, preventing such problems.
re: stopping the spid . .the only option is to kill the spid(or find out the
application that had started the process and closing the application)
it will be useful for the next time it happens to perform a select * from
master.dbo.sysprocesse(nolock) to identify what the current waittype / wait
resource /waittine were for the query in question . . as a means to
establish if there are other hidden issues within your system.
HTH
Olu Adedeji
"Stephen Brown" <nospam@.telusplanet.net> wrote in message
news:cin2ht$rbf$1@.utornnr1pp.grouptelecom.net...
> This morning, we discovered that our application was timing out on several
> pages. We tracked it down to a table that couldn't be read without timing
> out. In Enterprise Manager, we found a table (TAB) lock on the table that
> was of mode IS that was blocking other spids. The text property of the
lock
> showed this lock to be on a simple reporting stored procedure, which just
> did a SELECT on a couple of tables that should have only taken a second or
> two. We tried to debug the problem for several minutes, but to no avail.
> Finally, we killed the lock and the database problem was immediately
solved.
> A look at SQL Profiler (which we run continuously) showed that a query
that
> matched the text property of the lock and had the same pid as the lock
had
> been started last Tuesday and ended at roughly the same time as we killed
> the lock. The index name listed with the lock in Enterprise Manager also
> was strange, since the index listed was not used by the stored procedure
> listed in the properties.
> We have had several similar problems with our database in the past, but
this
> is the first time we didn't resort to just a reboot. Why would a simple
> stored procedure executing a select cause such problems? Why would this
> procedure be allowed to run for a week? Why would we experience no
problems
> until days after (the table being locked was core to almost every page in
> the system and was fine until this morning)? Can the index listed in the
> lock information be used to debug the problem?
> An even bigger question is how to handle such a problem after it occurs?
> Killing the spid seems to have caused some problems in the dotNet
> application and forced me to restart the app. Is there a more graceful
> method of rolling back the offending process?
>

No comments:

Post a Comment