For May 2011, T-SQL Tuesday is brought to us by Chicago’s own Bob Pusateri (blog | @SQLBob). Bob chose common table expressions, otherwise know as CTEs as the topic of the month. Thanks for hosting Bob, and for the great choice of topics.
I’ve been a fan of common table expressions, or CTEs, ever since they were introduced in SQL Server 2005. Originally they took a little getting used to. In my opinion though, once you’ve used them for a while, they get easier and eventually feel more natural than using sub-queries in many cases.
One of the coolest thing about CTEs is that they allow for the use of recursion in SQL queries, something that wasn’t possible before CTEs entered the picture. When code is recursive, it simply means that the code can reference itself – and it can do so repeatedly to get a desired result. BOL has a decent description and example of recursive CTEs here.
The following is an example taken from a real-world project using a recursive CTE to do what I’m calling a “UNGROUP BY” operation. I’m sure that sounds a little odd and probably requires some explanation.
We have two staging tables that are part of the load process for a Sales database. The ImportHeader table contains customer info, the ImportDetail table holds product info, and the two are joined together on a TicketNumber. This is a standard one-to-many relationship, so for each ImportHeader record, there can be multiple ImportDetail records representing the different products purchased by a customer on a given sales ticket.
The tricky part is there is also a Qty field on each ImportDetail record representing the quantity. If a customer purchases multiple instances of the same product, the Qty field is set to the quantity purchased, rather than adding another line to the detail.
What makes that tricky is that the sales system is expecting exactly one line for each product purchased; it doesn’t know anything about a quantity field. So in effect, we need to “ungroup” the detail records where Qty > 1.
Make sense? The example code is shown below.
First, let’s create the tables and populate with some sample data.
CREATE TABLE dbo.ImportHeader ( ImportHeaderID INT PRIMARY KEY IDENTITY(1,1), TicketNumber VARCHAR(50), CustomerID INT, FName VARCHAR(50), LName VARCHAR(50), PostalCode VARCHAR(50) ); CREATE TABLE dbo.ImportDetail ( ImportDetailID INT PRIMARY KEY IDENTITY(1,1), TicketNumber VARCHAR(50), BrandName VARCHAR(50), ModelNumber VARCHAR(50), Price NUMERIC(12,2), Qty INT ); INSERT dbo.ImportHeader (TicketNumber, CustomerID, FName, LName, PostalCode) SELECT '1000', 112, 'Bob', 'Loblaw', '60654' UNION ALL SELECT '1001', 213, 'Carl', 'Weathers', '60654' UNION ALL SELECT '1002', 314, 'Steve', 'Holt', '60654' ; INSERT dbo.ImportDetail (TicketNumber, BrandName, ModelNumber, Price, Qty) SELECT '1000', 'Sony', 'KDL1000', '1999.99', 3 UNION ALL SELECT '1001', 'Sanyo', 'KDL1005', '1799.99', 1 UNION ALL SELECT '1002', 'Sanyo', 'KDL1005', '1799.99', 2 UNION ALL SELECT '1002', 'Samsung', 'KDL1011', '1699.99', 1 ;
Three customers were added with 3 different ticket numbers. The customers purchased 4 products, with varying quantities of each.
Now we can run a query to ungroup the data using a recursive CTE.
WITH Products AS ( SELECT TicketNumber, BrandName, ModelNumber, Price, Qty, 1 AS n FROM dbo.ImportDetail WHERE (Qty > 0) UNION ALL SELECT TicketNumber, BrandName, ModelNumber, Price, Qty, n + 1 FROM Products WHERE (Qty > n) ) SELECT ROW_NUMBER() OVER(ORDER BY a.TicketNumber, b.n) as RowNum, a.TicketNumber, a.CustomerID, a.FName, a.LName, a.PostalCode, b.BrandName, b.ModelNumber, b.Price, b.n FROM dbo.ImportHeader a JOIN Products b ON a.TicketNumber = b.TicketNumber ORDER BY RowNum ;
The recursive portion of the query works like this: it includes all of the detail records at the first level (Qty > 0), then unions that with the detail records where Qty > 1, then unions that with the detail records where Qty > 2, etc. This continues until there are no records left where Qty > n.
As you can see in the output below, each line represents a single product purchased by a customer. The records have been successfully ungrouped.
Recursive code is often described as elegant in its simplicity. If you were to compare this query to an iterative solution for the same task (which I didn’t write but try to imagine), you would probably agree.
One thing to keep in mind is that SQL Server puts a limit on the number of recursions that can happen. This is done as a safety measure; and by default, SQL sets this limit to 100. If you were change the Qty for the first detail record from 3 to say, 142, you’ll get a nice error message like this:
Msg 530, Level 16, State 1, Line 45
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
The value for the recursion limit can be changed using a query hint like so:
OPTION (MAXRECURSION 500), which would (obviously) change the limit to 500 levels.
Using OPTION (MAXRECURSION 0) removes the limit. This can be dangerous if a coding error causes you to end up with a circular reference. It’s always better to throw an error than to go into an infinite loop.
Thanks for reading, and happy Tuesday!