It looks like T-SQL Tuesday is here again, brought to us this time by Brad Schultz (blog) on the topic of stored procedures and functions. It’s been a while since I joined this blog party (or blogged at all for that matter), but this topic jumped out at me, so here we go.
I was lucky enough to attend the PASS conference in Seattle a few weeks ago. A recurring theme in several of the sessions I attended was how to deal with bad query plans for stored procedures; or more specifically, how to deal with bad plans created as a result of parameter sniffing. This topic had a direct impact on an issue I was dealing with at the time.
Here’s the scenario:
A particular stored procedure that runs very often and usually runs very fast, decides one day to start running much slower. This causes elevated CPU on the server, and some timeouts in the application. Manually updating statistics on tables that the procedure references fixes the problem temporarily, but it reoccurs the next day and again a few days later.
Digging in, we found that this stored procedure contains a single parameterized SELECT query. The query is a combination of joins between several Product related tables, with the final join to a ProductCategory table on ProductID. @CatID is an input parameter to the stored procedure, and is in the WHERE clause of the query.
Looking deeper, we found that most of the CatID values are in the ProductCategory table 100K or more times, but there are also some (new) CatIDs that are not in the ProductCategory at all.
When the procedure is recompiled, we found that the plan created is highly dependent on the value of the input parameter, namely @CatID. There were two possibilities:
- The CatID passed in is one that exists in ProductCategory.
In this case then the plan created was fine. It contained a clustered index seek to ProductCategory using ProductID and CatID, then a nested loop join.
- The CatID passed in is one that does not exist in ProductCategory.
Here, SQL created a plan which did a seek on a non-clustered index on ProductCategory(CatID) for that specific CatID with the estimated row count = 1. That plan works fine if there are no (or very few) CatIDs in ProductCategory. However, when there are 100K+ of them, it does 100K+ individual seeks followed by a lookup, which is very slow and very bad…
It turned out that the optimizer was “sniffing” the value of CatID, and then using that sniffed value to optimize the plan. So, how did we fix this?
Before I reveal the solution we used, here are some of the potential solutions to the parameter sniffing problem I compiled from several PASS sessions, with the bulk coming from an excellent session by Gail Shaw (blog)).
- Break the stored procedure into multiple stored procedures (i.e., a separate one for each condition)
- Rewrite using dynamic SQL with sp_executesql – this will create multiple query plans by condition. Below are a couple of references for this solution from Erland Sommarskog (blog)
For our issue, we chose the third option, and assigned the input parameters to local variables. This has the advantage of being a simple, relatively non-invasive change that could be implemented quickly.
Basically, we took the CatID input parameter and assigned it to a local variable defined in the stored procedure, then passed that local variable to the query. This single level of indirection was enough to cause SQL to not use parameter sniffing, and to rather create a generic plan based on the statistics of the table. The generic plan was the good plan described above, and which was exactly the plan we needed all the time.
I would be remiss if I neglected to mention that parameter sniffing is a good thing most of the time, and very often gives excellent results. In cases where it causes problems though, it seems to cause big problems…
Thanks for reading, and happy Tuesday!