Saturday, February 25, 2012

Decaying index?

We had an issue yesterday that we resolved but I would like some input as to
whether this is normal behavior for a database. I have had a SQL Server
backend on a web application that has been working flawlessly for about a
year. The app seemed to get a tiny bit slower each month, but it was no
worry in that no page ever took more than a second to return. One page was
taking about 10 seconds to return a full query, but it was still well within
a tolerable delay (and a filter was added to allow the user to speed it up).
Suddenly, yesterday the same query was taking about a minute to execute and
was causing the page to time out. We dug around and narrowed it down to one
table that has some 15,000 records and we started to look at the design of
the table. We noticed that the table had a single index, the primary key,
and it was created non-clustered.
We found that we couldn't duplicate the problem if we backed up the database
and restored it to a different server, but the problem would be the same if
we restored it to a different database on the same server. The indexes had
been rebuilt by the maintenance plan the night before, but rebuilding them
again didn't fix the problem, either. Then we set the index as clustered
and the problem vanished.
Is this a typical database maintenance thing? Did we goof when we setup the
index in the first place and a good DBA could have caught this right away?
Was this just some weird random chance happening? Is it likely that the
indexes were corrupted or might this have just been a fluke fix and the
problem could crop up again? I am just a little leary of installing this
application on other servers if I might get a call a year from now that
there is another problem like this and would like to know if it is
inevitable or if we can do something to ensure it doesn't happen again.Although you problem is strange... remember that SQL will use a nonclustered
index IF, the query is very selective ( perhaps < 5% of the records being
returned)... THis might account for SQL NOT using the nonclust, but using
the clust... it does NOT account for the same/different server you mention.
"Stephen Brown" <nospam@.telusplanet.net> wrote in message
news:cPL0b.168$f7.69628@.localhost...
> We had an issue yesterday that we resolved but I would like some input as
to
> whether this is normal behavior for a database. I have had a SQL Server
> backend on a web application that has been working flawlessly for about a
> year. The app seemed to get a tiny bit slower each month, but it was no
> worry in that no page ever took more than a second to return. One page
was
> taking about 10 seconds to return a full query, but it was still well
within
> a tolerable delay (and a filter was added to allow the user to speed it
up).
> Suddenly, yesterday the same query was taking about a minute to execute
and
> was causing the page to time out. We dug around and narrowed it down to
one
> table that has some 15,000 records and we started to look at the design of
> the table. We noticed that the table had a single index, the primary key,
> and it was created non-clustered.
> We found that we couldn't duplicate the problem if we backed up the
database
> and restored it to a different server, but the problem would be the same
if
> we restored it to a different database on the same server. The indexes
had
> been rebuilt by the maintenance plan the night before, but rebuilding them
> again didn't fix the problem, either. Then we set the index as clustered
> and the problem vanished.
> Is this a typical database maintenance thing? Did we goof when we setup
the
> index in the first place and a good DBA could have caught this right away?
> Was this just some weird random chance happening? Is it likely that the
> indexes were corrupted or might this have just been a fluke fix and the
> problem could crop up again? I am just a little leary of installing this
> application on other servers if I might get a call a year from now that
> there is another problem like this and would like to know if it is
> inevitable or if we can do something to ensure it doesn't happen again.
>|||>> Did we goof when we setup the index in the first place
and a good DBA could have caught this right away?
Nope - and it looks like you were correct for your data
distribution until recently.
You should look at the query plan. A clustered index has
the data page as the leaf level but a non-clustered index
needs an extra read to get the data if the index is not
covering. This means that a sometimes a table scan can be
more efficient than using an index (especially for a small
table like this) but a clustered index in the same
situation would be choosen.
If the index is covering then a non-clustered index would
probably be better.
It's difficult to predict the indexes the optimiser will
use (even when it gets it right - was a lot easier in the
good old days of 6.5) so you really have to look at the
query plans - a lot easier if all access is via stored
procs.
You have to keep monitoring systems and spot things like
this early.|||> We found that we couldn't duplicate the problem if we backed up the
database
> and restored it to a different server, but the problem would be the
same if
> we restored it to a different database on the same server.
Are both servers at the same SQL service pack level? Do they have the
same number of CPUs? I suggest you compare the execution plans in Query
Analyzer. I would expect you will find a difference to explain the
suboptimal performance. All things being equal, you should get
identical plans and performance. Execution plan caching that would
explain the difference with the original database but this shouldn't be
an issue with a newly restored database.
> Is this a typical database maintenance thing?
Nope. However, it is likely that the maintenance provided new
statistics for the optimizer and a different execution plan resulted.
> Did we goof when we setup the
> index in the first place and a good DBA could have caught this right
away?
If you have only one index on a table, it should be clustered unless you
have a specific reason to do otherwise. The SQL Server optinizer
behaves best when a table has a clustered index and you should choose a
table's clustered index thoughtfully. You are more likely to get a
sub-optimal execution plan when you don't have a clustered index and/or
inappropriate indexes.
--
Hope this helps.
Dan Guzman
SQL Server MVP
--
SQL FAQ links (courtesy Neil Pike):
http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
http://www.sqlserverfaq.com
http://www.mssqlserver.com/faq
--
"Stephen Brown" <nospam@.telusplanet.net> wrote in message
news:cPL0b.168$f7.69628@.localhost...
> We had an issue yesterday that we resolved but I would like some input
as to
> whether this is normal behavior for a database. I have had a SQL
Server
> backend on a web application that has been working flawlessly for
about a
> year. The app seemed to get a tiny bit slower each month, but it was
no
> worry in that no page ever took more than a second to return. One
page was
> taking about 10 seconds to return a full query, but it was still well
within
> a tolerable delay (and a filter was added to allow the user to speed
it up).
> Suddenly, yesterday the same query was taking about a minute to
execute and
> was causing the page to time out. We dug around and narrowed it down
to one
> table that has some 15,000 records and we started to look at the
design of
> the table. We noticed that the table had a single index, the primary
key,
> and it was created non-clustered.
> We found that we couldn't duplicate the problem if we backed up the
database
> and restored it to a different server, but the problem would be the
same if
> we restored it to a different database on the same server. The
indexes had
> been rebuilt by the maintenance plan the night before, but rebuilding
them
> again didn't fix the problem, either. Then we set the index as
clustered
> and the problem vanished.
> Is this a typical database maintenance thing? Did we goof when we
setup the
> index in the first place and a good DBA could have caught this right
away?
> Was this just some weird random chance happening? Is it likely that
the
> indexes were corrupted or might this have just been a fluke fix and
the
> problem could crop up again? I am just a little leary of installing
this
> application on other servers if I might get a call a year from now
that
> there is another problem like this and would like to know if it is
> inevitable or if we can do something to ensure it doesn't happen
again.
>|||It's very odd that restoring to a different server fails to show the problem
but restoring to a different database on the same server still has the
problem. That suggests it has nothing to do with a corrupt index. It
suggests it has more to do with the amount of memory on the machines, I/O
topologies, and interactions with other work going on in that instance of
SQL Server.
Your overall behavior would not be surprising if you are doing range scans
on the non-clustered index. Since there is no ordering to how the rows
themselves are stored in that index over time it is likely to become
increasingly random. For example, on an initial load, where you happened to
insert in sorted order of the primary key, the rows would be effectively
clustered together by primary key. Oversimplifying to make my point, if you
fit 20 rows per page and typically retrieved 15 rows in the query you'd
average ~1.5 I/Os to answer the query (the .5 covering the case where you
had to cross page boundaries). With inserts, updates, and deletes over time
though you'd lose the ordering. That same 15 row query would eventually
take ~15 I/Os. So you have an order of magnitude increase. Rebuilding the
non-clustered index doesn't help because the rows themselves aren't stored
as part of the index, they are in heap storage. With a clustered index you
don't have this problem. The rows are stored in the index. Essentially,
you'll stay at an average of ~1.5 I/Os indefinitely. The benefit of a
clustered index grows as the number of rows retrieved grows. In the
non-clustered index example 100 rows could lead to ~100 I/Os. In the
clustered index example, assuming a freshly rebuilt or defragmented index,
since SQL Server actually does 64K I/Os the entire 100 rows might come into
memory on a single I/O. Again allowing for some rows to be outside the
range, you stay at ~1.5 I/Os!
SQL Server has all kinds of algorithms to reduce I/O which I ignore in the
above example. I also ignored that there may be additional I/Os to retrieve
index nodes themselves. Severe fragmentation might cause my clustered index
example to grow slightly, but indexdefrag or rebuilding the index does fix
this. More memory implies better caching and more readahead, which is why
you might have seen a difference depending on which server you tried this
on. Spreading data over more drives (via files in filegroups not RAID) will
also increase readahead.
The key to non-clustered indices is to make they reduce the number of rows
that must actually be retrieved to an absolute minimum. The best case is
where the key is unique and you are doing exact key matches. The next best
case is where one or more indexes really reduce the number of rows to
retrieve rather dramatically. If you have a non-clustered index on
First-Name and ask the query "Select * from t1 where First-Name = 'John' and
Last-Name = 'Cohen' and City = 'New York'" your performance will be terrible
because you'll retrieve everyone with the first name John, then filter for
Cohen and New York. Things will be little better if your only NC index is
on Last-Name. It will be worse if it's on City. But if you have an index
on all three then the query processor will likely choose to intersect two or
three indexes and only retrieve the data rows that match all of them. In
fact, there is only one John Cohen in New York City (according to QwestDex)
so using all three indexes would get you down to a single row retrieval
whereas having only one of those indices would lead to perhaps thousands of
I/Os to answer this query.
So for your example you need to look at the data and the key queries and
rethink your index strategies. Perhaps a single clustered index is
sufficient. Adding a small number of non-clustered indices to that may
further improve performance and maintain it over time.
--
Hal Berenson, SQL Server MVP
True Mountain Group LLC
"Stephen Brown" <nospam@.telusplanet.net> wrote in message
news:cPL0b.168$f7.69628@.localhost...
> We had an issue yesterday that we resolved but I would like some input as
to
> whether this is normal behavior for a database. I have had a SQL Server
> backend on a web application that has been working flawlessly for about a
> year. The app seemed to get a tiny bit slower each month, but it was no
> worry in that no page ever took more than a second to return. One page
was
> taking about 10 seconds to return a full query, but it was still well
within
> a tolerable delay (and a filter was added to allow the user to speed it
up).
> Suddenly, yesterday the same query was taking about a minute to execute
and
> was causing the page to time out. We dug around and narrowed it down to
one
> table that has some 15,000 records and we started to look at the design of
> the table. We noticed that the table had a single index, the primary key,
> and it was created non-clustered.
> We found that we couldn't duplicate the problem if we backed up the
database
> and restored it to a different server, but the problem would be the same
if
> we restored it to a different database on the same server. The indexes
had
> been rebuilt by the maintenance plan the night before, but rebuilding them
> again didn't fix the problem, either. Then we set the index as clustered
> and the problem vanished.
> Is this a typical database maintenance thing? Did we goof when we setup
the
> index in the first place and a good DBA could have caught this right away?
> Was this just some weird random chance happening? Is it likely that the
> indexes were corrupted or might this have just been a fluke fix and the
> problem could crop up again? I am just a little leary of installing this
> application on other servers if I might get a call a year from now that
> there is another problem like this and would like to know if it is
> inevitable or if we can do something to ensure it doesn't happen again.
>|||Hi Stephan,
"Stephen Brown" <nospam@.telusplanet.net> wrote in message
news:cPL0b.168$f7.69628@.localhost...
>I have had a SQL Server
>backend on a web application that has been working flawlessly for about a
>year. The app seemed to get a tiny bit slower each month, but it was no
>worry in that no page ever took more than a second to return.
It's nothing to 'worry' about but definetely a thing to look into and
understand the reasons for this trend.
I guess you learned that lesson the hard way :-)
> We noticed that the table had a single index, the primary key,
> and it was created non-clustered.
If a table has no clustered index, it is a heap which means that there is no
logical order to the rows.
performance issues with heaps are basically physical defragmantation and
forwarding pointers for rows that had to be moved off their original pages.
This will naturally increase over time if not taken care of.
> The indexes had
> been rebuilt by the maintenance plan the night before, but rebuilding them
> again didn't fix the problem, either. Then we set the index as clustered
> and the problem vanished.
Rebuilding the non clustered index will not solve either problem mentioned
above since the table is a heap.
In a clustered index, the rows have to be logically ordered by the
clustering key so creating the clustered index solved both problems as it
made SQL Server rearrange all rows.
In order to get rid of the forwarding pointers and the physical
defragmantation of a heap, you can issue the DBCC SHRINKDATABASE or
SHRINKFILE and use the NOTRUNCATE option (Look in BOL for more details).
This will 'defragment' the pages in the data file, rearrange the rows and
get rid of all forwarding pointers.
Remember that clustered indexes also need maintenance. Those will be taken
care of by the agent's 'auto' maintenance plan index rebuild job.
> Is this a typical database maintenance thing?
Yes it is. performance will usually degrade due to defragmentation at
various levels, depending on the nature of the activity in your DB and your
index design.
> I am just a little leary of installing this
> application on other servers if I might get a call a year from now that
> there is another problem like this and would like to know if it is
> inevitable or if we can do something to ensure it doesn't happen again.
There is nothing to worry about. A proper maintenance strategy performed
regularly will keep your DB up and running in good shape for years and
years.
If you want to dig deeper into this issue, I recommend Kalen Delaney's
'Inside SQL Server 2000'.
It contains in depth information about index and data structures among other
topics.
HTH.|||I just reread your post and the other replies and realized I have missed the
'suddenly yesterday' issue :-)
If your table had not experienced any major updates yesterday, i guess my
suggestion is invalid in your case although it is still true and a good
practice...
HTH :-)
"Amy" <NOSPAM_l.a@.usa.com> wrote in message
news:Or0JHTzZDHA.3768@.tk2msftngp13.phx.gbl...
> Hi Stephan,
> "Stephen Brown" <nospam@.telusplanet.net> wrote in message
> news:cPL0b.168$f7.69628@.localhost...
> >I have had a SQL Server
> >backend on a web application that has been working flawlessly for about a
> >year. The app seemed to get a tiny bit slower each month, but it was no
> >worry in that no page ever took more than a second to return.
> It's nothing to 'worry' about but definetely a thing to look into and
> understand the reasons for this trend.
> I guess you learned that lesson the hard way :-)
> > We noticed that the table had a single index, the primary key,
> > and it was created non-clustered.
> If a table has no clustered index, it is a heap which means that there is
no
> logical order to the rows.
> performance issues with heaps are basically physical defragmantation and
> forwarding pointers for rows that had to be moved off their original
pages.
> This will naturally increase over time if not taken care of.
> > The indexes had
> > been rebuilt by the maintenance plan the night before, but rebuilding
them
> > again didn't fix the problem, either. Then we set the index as
clustered
> > and the problem vanished.
> Rebuilding the non clustered index will not solve either problem mentioned
> above since the table is a heap.
> In a clustered index, the rows have to be logically ordered by the
> clustering key so creating the clustered index solved both problems as it
> made SQL Server rearrange all rows.
> In order to get rid of the forwarding pointers and the physical
> defragmantation of a heap, you can issue the DBCC SHRINKDATABASE or
> SHRINKFILE and use the NOTRUNCATE option (Look in BOL for more details).
> This will 'defragment' the pages in the data file, rearrange the rows and
> get rid of all forwarding pointers.
> Remember that clustered indexes also need maintenance. Those will be taken
> care of by the agent's 'auto' maintenance plan index rebuild job.
> > Is this a typical database maintenance thing?
> Yes it is. performance will usually degrade due to defragmentation at
> various levels, depending on the nature of the activity in your DB and
your
> index design.
> > I am just a little leary of installing this
> > application on other servers if I might get a call a year from now that
> > there is another problem like this and would like to know if it is
> > inevitable or if we can do something to ensure it doesn't happen again.
> There is nothing to worry about. A proper maintenance strategy performed
> regularly will keep your DB up and running in good shape for years and
> years.
> If you want to dig deeper into this issue, I recommend Kalen Delaney's
> 'Inside SQL Server 2000'.
> It contains in depth information about index and data structures among
other
> topics.
> HTH.
>

No comments:

Post a Comment