T-SQL Tuesday #026 – The Wrap Up

Thank you everyone who participated in T-SQL Tuesday #026! I really enjoyed hosting and reading all of the contributions. Thanks also to Adam Machanic (blog | @AdamMachanic) for starting this monthly event and for allowing me to run this show this time around.

We had 14 posts, covering about 10 different topics depending on how you count. Here’s a brief summary of them all, in the order received.

1. The first post was from Rob Farley (blog | @rob_farley), a T-SQL Tuesday regular and one of my favorite SQL bloggers. He writes about a limitation of using APPLY when dealing with working calculations which he had written about in an earlier post.
Original T-SQL Tuesday 17 / 25

2. Sebastian Meine (blog | @sqlity) delves into some index internals, explaining SQL Server’s B+ tree structure, and how that is different from a standard B-Tree.
Original T-SQL Tuesday 10 / 11

3. Steve Wales (blog | @walessj) tells a tale of T-SQL Tuesday, managing to incorporate references to all the past 25 events – very creative! He also let us in on his 2012 resolutions, which match up with mine in a lot of ways (except for that one with the “O” word :) )
Original T-SQL Tuesday 14

4. After a brief intro in French and some cool New Year’s Eve images, Jason Brimhall (blog | @sqlrnnr) covers several type of ways to perform JOINs on tables where a bitmask is applied to an id field.
Original T-SQL Tuesday 23

5. I enjoyed reading Jason Grobe’s (blog | @sqlmental) resolutions, which cover both personal, professional and community oriented goals. More power to you Jason!
Original T-SQL Tuesday 14

6. Nancy Hidy Wilson (blog | @NancyHidyWilson) extols the virtues of her favorite “new” features in SQL 2008 R2: CMS (Central Management Server) and PBM (Policy Based Management).
Original T-SQL Tuesday 7

7. Mike Fal (blog | @Mike_Fal) shows a trick for dynamically generating a restore command using the output of a RESTORE FILELISTONLY command. Very cool.
Original T-SQL Tuesday 25

8. Robert Pearl (blog | @PearlKnows) shows us the “best of all worlds” with a very comprehensive query against the missing index DMVs.
Original T-SQL Tuesday 25

9. Aaron Bertrand (blog | @AaronBertrand) shares some work he did at a previous job to implement customer time zone logic at the database layer simplifying the presentation to the application.
Original T-SQL Tuesday 22

10. Steve Jones (blog | @way0utwest) discusses a topic I run into all the time reviewing code, how to deal with the time component of a date range correctly.
Original T-SQL Tuesday 1

11. Colleen Morrow (blog | @ClevelandDBA) shares some of her new year’s resolutions with us as well as her strategy for keeping up with them. I really like her idea of breaking them down into monthly chunks so they don’t seem so daunting.
Original T-SQL Tuesday 14

12. My (blog | @daveh0ward) contribution was a brief discussion of what I think experienced DBA’s provide to an organization.
Original T-SQL Tuesday 12

13. Carlos Bossy (blog | @carlosbossy) covers a topic I’m excited to learn more about: the new windowing functions in SQL 2012, specifically in dealing with sliding aggregations.
Original T-SQL Tuesday 16

14. Ricardo Leka (blog | @bigleka) shares an improvement to a previous SQLCMD script he wrote to allow for creating a database mirror in a more dynamic way.
Original T-SQL Tuesday 25

Thanks again to everyone who contributed! See you next month.

Posted in Uncategorized | 6 Comments

T-SQL Tuesday #26 – DBA Skillz

Since I’m hosting T-SQL Tuesday this month, I really wanted to participate as well. I chose the topic I missed called Why are DBA Skills Necessary? This topic was originally hosted by Paul Randal (blog|@PaulRandal) back in November of 2010.

This is an interesting topic for me because I’ve spent time on both sides of the fence, both as a developer and a DBA. My current role is primarily as a production DBA with many mission critical SQL Servers in-house. I’ve also worked in smaller companies that didn’t have a full-time DBA, but rather had multiple developers with some DBA experience. My contention is that not every organization needs a DBA, but every organization that has critical databases needs someone with DBA skills.

Below, I attempted to quickly come up with a list of skills a DBA should bring to a company. These are skills that companies need (to varying levels of degree) that experienced DBAs should have (again, to varying levels of degree). I came up with off the top of my head, in no particular order:

  • Designer of physical architecture. Answers questions like how much memory do we need? What RAID level should we use?
  • Has knowledge of proper database design e.g. normalization, referential integrity
  • Protector of the data
  • Has knowledge of the business and how changes to the databases affect the business
  • Reviewer of code and schema reviews with database performance and manageability in mind
  • Acts as a layer of security to meet regulatory compliance and to avoid company exposure to liability
  • Expert in DR/HA, Backup/Restore strategy, meeting recovery time SLAs
  • Saver of money by getting the most out of the current hardware
  • Integrator of data between systems
  • Knows what is going on behind the scenes, internals
  • Learns constantly
  • Keeps up with changes to SQL Server and to trends in the industry
  • Supporter of users
  • Monitors proactively, responds to issues before they become problems or big problems
  • Trainer of junior DBAs and developers in doing things the right way
  • Enforcer of standards
  • Lastly, someone who knows what to do when everything goes wrong

The world is all about data these days, data always seems to be the most important thing, and the amount of data is always growing. What would your company do if it lost its list of pending orders, or if the customer list fell into the wrong hands, or if the database goes corrupt and there are no good backups to revert to.

Or any number of other things. DBA skills are as important now as ever.

Posted in Uncategorized | 1 Comment

T-SQL Tuesday in Local Time

The January edition of T-SQL Tuesday is next week! If you knew that, then you probably already know that the start and end times for T-SQL Tuesday are always at midnight UTC time, and that finding the current UTC date in SQL is as simple as using the GETUTCDATE() function.

SELECT GETUTCDATE();

You can obviously compare the two, but for any T-SQL Tuesday, the questions I really want answered are: “When does it start (in local time)?” and “How much time do I have before it starts and until it ends?”Get Moving! The script below will answer those questions.

This script runs for the current month, and assumes that T-SQL Tuesday is on the second Tuesday of the month (which it *usually* is). And for obvious reasons, it will only work on SQL 2008 and later.

Just now, running it now on my machine gave the following output:

Local time now: Jan 5 2012 8:09AM
The start of T-SQL Tuesday for this month is Jan 9 2012 6:00PM and the end is Jan 10 2012 6:00PM local time.
You have 4 days, 9 hours, 50 mins, and 53 seconds before it starts. Better get moving!

DECLARE 
	@Now DATETIMEOFFSET(0), 
	@TZ INT,
	@FDOM DATETIMEOFFSET(0), 
	@2ndTues DATETIMEOFFSET(0), 
	@TuesStart AS DATETIMEOFFSET(0), 
	@TuesEnd AS DATETIMEOFFSET(0),
	@Msg VARCHAR(MAX);
	
-- Save the current datetime and offset	
SELECT @Now = 
	SYSDATETIMEOFFSET(), @TZ = DATEPART(TZ,SYSDATETIMEOFFSET());

-- Get the first day of the current month (in UTC)
SET @FDOM = 
	DATEADD(dd, DATEDIFF(dd, 0, DATEADD(DAY, -1 * (DATEPART(DAY,@Now) - 1), @Now)), 0) ;

-- Get the second Tuesday of the current month and start/end datetimes (in UTC)
SET @2ndTues = 
	DATEADD(DAY, (7 + (10 - DATEPART(weekday,(DATEADD(DAY, 7, @FDOM)))) % 7), @FDOM);
SELECT @TuesStart = CAST(CAST(@2ndTues AS DATE) AS DATETIMEOFFSET(0)),
	@TuesEnd = CAST(CAST(DATEADD(day, 1, @2ndTues) AS DATE) AS DATETIMEOFFSET(0));

-- Display T-SQL Tuesday datetimes in local time	
SET @Msg = 'Local time now: ' + CAST(CONVERT(DATETIME, @Now) AS VARCHAR);
PRINT @Msg;
SET @Msg = 'The start of T-SQL Tuesday for this month is ' 
	+ CAST(CONVERT(DATETIME, SWITCHOFFSET(@TuesStart, @TZ)) AS VARCHAR) 
	+ ' and the end is ' + CAST(CONVERT(DATETIME, SWITCHOFFSET(@TuesEnd, @TZ)) AS VARCHAR) 
	+ ' local time.';
PRINT @Msg;

-- Countdown stats
IF @Now < @TuesStart
BEGIN
	SET @Msg = 'You have ' + CAST(DATEDIFF(day, @Now, @TuesStart)-1 AS VARCHAR) + ' days, ' 
		+ CAST((DATEDIFF(hour, @Now, @TuesStart) % 24)-1 AS VARCHAR) + ' hours, ' 
		+ CAST((DATEDIFF(minute, @Now, @TuesStart) % 60)-1 AS VARCHAR) + ' mins, and ' 
		+ CAST((DATEDIFF(second, @Now, @TuesStart) % 60) AS VARCHAR) 
		+ ' seconds before it starts. Better get moving!';
	PRINT @Msg;
END
ELSE IF @Now >= @TuesStart AND @Now <= @TuesEnd
BEGIN
	SET @Msg = 'You have ' + CAST(DATEDIFF(day, @Now, @TuesEnd)-1 AS VARCHAR) + ' days, ' 
		+ CAST((DATEDIFF(hour, @Now, @TuesEnd) % 24)-1 AS VARCHAR) + ' hours, ' 
		+ CAST((DATEDIFF(minute, @Now, @TuesEnd) % 60)-1 AS VARCHAR) + ' mins, and ' 
		+ CAST((DATEDIFF(second, @Now, @TuesEnd) % 60) AS VARCHAR) 
		+ ' seconds before T-SQL Tuesday ends. Publish!';
	PRINT @Msg;
END
ELSE IF @Now > @TuesEnd
	PRINT 'T-SQL Tuesday is over for this month.  See you next month!';

Confession: I really did this as an excuse to learn about some of the new date functionality in SQL 2008+ that I haven’t had a chance to use yet like DATETIMEOFFSET datatype and the SYSDATETIMEOFFSET() funciton, but hopefully it’s helpful as well.

Thanks, and remember, the clock is ticking!

Posted in Uncategorized | 1 Comment

TSQL Tuesday #026 – Second Chances

UPDATE: The Wrap-Up is here.

Happy New Year everyone! Welcome to 2012.

Every new year brings with it a fresh start, a second chance to do things differently.  In the spirit of second chances, for this month’s T-SQL Tuesday, please pick any one of the previous 25 T-SQL Tuesday topics and write about it. You can choose one of the topics you participated in before, or you can choose one you missed for some reason. 

Maybe you were just too busy when Michael Swart invited everyone to write about indexes. Or maybe you weren’t quite happy with the CTE post you came up with when Bob Pusateri hosted.  Or maybe you have a ton to say about IO, but like me, you didn’t know anything about T-SQL Tuesday until long after Mike Walsh’s month.

Well now is your chance to set things straight!  In fact, if you like, you can even throw down a few resolutions for the new year (see Jen McCown’s #14).  And if this is your first T-SQL Tuesday, just pick a topic and jump in.

As with any T-SQL Tuesday, the standard rules apply:

  • Your post must go live next Tuesday GMT (between 00:00:00 GMT on Tuesday January 10 and 00:00:00 GMT on Wednesday January 11).
  • Your post should link back to this post, and you should leave a comment or a trackback here regarding your post.
  • “T-SQL Tuesday #026″ should be included in the title of the post.
  • If you’re on Twitter, tweet about your post using the hash tag #TSQL2sDay

Since this month is a bit unusual, here are a couple additional rules:

  • Your post should be new in content, in other words don’t just repost a previous T-SQL Tuesday entry (yours or anyone else’s Smile)
  • Somewhere in your post, you should say what T-SQL Tuesday topic name and/or number you chose to write about.

I can’t wait to see which topics everyone chooses and the posts that come out of those.  And of course after it’s done, I’ll summarize the posts on this blog within a couple of days.

To refresh your memory, below is a history of all T-SQL Tuesday topics to date.  It’s a pretty amazing list.

  1. Adam MachanicDate/Time Tricks
  2. Adam MachanicA Puzzling Situation
  3. Rob FarleyRelationships
  4. Mike WalshIO
  5. Aaron NelsonReporting
  6. Michael Coles“What about BLOB?”
  7. Jorge SegarraSummertime in the SQL
  8. Robert DavisGettin’ Schooled
  9. Jason BrimhallBeach Time
  10. Michael SwartIndexes
  11. Sankar ReddyMisconceptions in SQL Server
  12. Paul RandalWhy are DBA skills necessary?
  13. Steve JonesWhat the Business Says is not What the Business Wants
  14. Jen McCownResolutions
  15. Pat WrightAutomation in SQL Server
  16. Jes BorlandAggregations
  17. Matt VelicApply Knowledge
  18. Bob PusateriCTEs
  19. Allen KinselDisasters & Recovery
  20. Amit BanerjeeT-SQL Best Practices
  21. Adam MachanicInelegant Yet Educational
  22. Robert PearlData Presentation
  23. Stuart AinsworthJoins
  24. Brad SchultzProx ‘n’ Funx
  25. Allen WhiteT-SQL Tricks

Thanks in advance for participating!

Posted in Uncategorized | 27 Comments

TOP Tricks (T-SQL Tuesday #25)

T-SQL Tuesday is here again, brought to us this month by Allen White (blog|@SQLRunr). I’ve read Allen’s blog many times for PowerShell advice, but today he’s asking for your favorite T-SQL tricks.

I’m not sure if this qualifies as a “trick” per se since it’s pretty simple and well documented. But it has come up several times in the past few weeks during code reviews, so I am sure that it’s something that not everyone is familiar with.

It seems like most T-SQL developers know that the TOP keyword can be used to SELECT the TOP n number of records. I use it all the time when trying to get a quick look at the data, and SSMS has it as a built in menu option.

Selecting the first 100 records of a table is easy.

 
SELECT TOP 100 *
FROM Sales.Customer;

So what if you don’t want 100 records, but instead you want to dynamically specify the number of rows to be returned? Often, developers will try to do something like this is a stored procedure:

DECLARE @NumRows int = 10;
SELECT TOP @NumRows * FROM Sales.Customer;

This results in an error:
Msg 102, Level 15, State 1, Line 2
Incorrect syntax near ‘@NumRows’.

An accurate but not very helpful error message.

From here, many developers will turn to dynamic SQL to solve the problem. While dynamic SQL is perfectly appropriate in many situations, it’s not really needed here at all. The sad part is that the developer was almost there with the first attempt, they were just missing one little thing – the parentheses.

DECLARE @NumRows int = 10;
SELECT TOP (@NumRows) * FROM Sales.Customer;

The TOP keyword can also be used with other DML statements such as INSERT, UPDATE and DELETE. In those cases the parentheses are required, even if you specify only a number. The parentheses should be there when you SELECT a TOP number as well, but they are not required for reasons of backwards compatibility and in my experience, no one uses them. More info here.

In fact, in Management Studio, when you right click on a table and choose SELECT TOP 1000 Rows…, the query SSMS produces has a TOP 1000 without parentheses. And this is still true in SQL 2012 (at least in CTP 3).

One last note: if the TOP amount used as an input parameter varies wildly, you may run into problems due to parameter sniffing. A query with SELECT TOP 100000 records might look very different than a SELECT TOP 10, but the stored procedure will be compiled based on the first value passed in. There are ways to work around that, one of them being the full dynamic SQL solution.

Thanks for reading and happy Tuesday!

Posted in Uncategorized | 2 Comments

Fixing the Query Plan for a Stored Procedure (T-SQL Tuesday #24)

It looks like T-SQL Tuesday is here again, brought to us this time by Brad Schultz (blog) on the topic of stored procedures and functions.  It’s been a while since I joined this blog party (or blogged at all for that matter), but this topic jumped out at me, so here we go.

I was lucky enough to attend the PASS conference in Seattle a few weeks ago.  A recurring theme in several of the sessions I attended was how to deal with bad query plans for stored procedures; or more specifically, how to deal with bad plans created as a result of parameter sniffing.  This topic had a direct impact on an issue I was dealing with at the time.

Here’s the scenario:

A particular stored procedure that runs very often and usually runs very fast, decides one day to start running much slower.  This causes elevated CPU on the server, and some timeouts in the application.  Manually updating statistics on tables that the procedure references fixes the problem temporarily, but it reoccurs the next day and again a few days later.

Digging in, we found that this stored procedure contains a single parameterized SELECT query.  The query is a combination of joins between several Product related tables, with the final join to a ProductCategory table on ProductID.  @CatID is an input parameter to the stored procedure, and is in the WHERE clause of the query.

Looking deeper, we found that most of the CatID values are in the ProductCategory table  100K or more times, but there are also some (new) CatIDs that are not in the ProductCategory at all.

When the procedure is recompiled, we found that the plan created is highly dependent on the value of the input parameter, namely @CatID.  There were two possibilities:

  1. The CatID passed in is one that exists in ProductCategory.
     
    In this case then the plan created was fine.   It contained a clustered index seek to ProductCategory using ProductID and CatID, then a nested loop join.
  2.  

  3. The CatID passed in is one that does not exist in ProductCategory.
     
    Here, SQL created a plan which did a seek on a non-clustered index on ProductCategory(CatID) for that specific CatID with the estimated row count = 1.  That plan works fine if there are no (or very few) CatIDs in ProductCategory.  However, when there are 100K+ of them, it does 100K+ individual seeks followed by a lookup, which is very slow and very bad…

It turned out that the optimizer was “sniffing” the value of CatID, and then using that sniffed value to optimize the plan.  So, how did we fix this?
 
Before I reveal the solution we used, here are some of the potential solutions to the parameter sniffing problem I compiled from several PASS sessions, with the bulk coming from an excellent session by Gail Shaw (blog)).

  1. Break the stored procedure into multiple stored procedures (i.e., a separate one for each condition)
  2. Rewrite using dynamic SQL with sp_executesql – this will create multiple query plans by condition.  Below are a couple of references for this solution from Erland Sommarskog (blog)
  3. Assign the input parameters to local variables, and pass the local variables to the query (creates a “generic” plan, disables parameter sniffing and uses table statistics to do a best guess).
  4. Add “option recompile” to the procedure (would be fine if the query is not executed frequently, otherwise may be too much overhead).
  5. Add query hints (like “optimize for value” or “optimize for unknown”) and plan guides

For our issue, we chose the third option, and assigned the input parameters to local variables.   This has the advantage of being a simple, relatively non-invasive change that could be implemented quickly.

Basically, we took the CatID input parameter and assigned it to a local variable defined in the stored procedure, then passed that local variable to the query.  This single level of indirection was enough to cause SQL to not use parameter sniffing, and to rather create a generic plan based on the statistics of the table.   The generic plan was the good plan described above, and which was exactly the plan we needed all the time.

I would be remiss if I neglected to mention that parameter sniffing is a good thing most of the time, and very often gives excellent results.  In cases where it causes problems though, it seems to cause big problems…

Thanks for reading, and happy Tuesday!

Posted in Uncategorized | 1 Comment

More Recursion–Removing Multiple Spaces

My last post about recursive CTEs reminded me of a recursive SQL Server function I wrote a while back for removing multiple spaces from strings.  By removing multiple spaces, I mean replacing all consecutive repeating spaces with a single space.

This function was originally created as part of a cleansing process for incoming customer data, especially address related fields. It worked very well, and I thought it performed well at the time, but as you’ll see, it was actually pretty slow compared to other options.

To solve this problem, the less experienced SQL developer might ask, why not just use the T-SQL REPLACE function? After all, the REPLACE function replaces all of the instances of once set of characters with another set of characters within a given string, right?

If you are only replacing double-spaces with single-spaces, a single REPLACE works just fine. However, if there are longer sequences of spaces in the string, REPLACE doesn’t finish the job unless you apply it multiple times. For example, a sequence of 4 spaces would be replaced with 2 single-spaces next to each other, which is a new double-space. In order to remove all the the multiple spaces, REPLACE needs to be run multiple times, which is how the function below was born.

CREATE FUNCTION dbo.RemoveMultipleSpaces (@str NVARCHAR(MAX))
	RETURNS NVARCHAR(MAX)
AS
BEGIN
	IF CHARINDEX('  ', @str) = 0
		RETURN @str;

	RETURN dbo.RemoveMultipleSpaces(REPLACE(@str, '  ', ' '));
END
GO 

Here’s an example of calling REPLACE on an address string vs. calling the above function:

DECLARE @s VARCHAR(MAX) = '221B    BAKER       STREET';
SELECT REPLACE(RTRIM(@s), '  ', ' ') AS StillSpaced;
SELECT dbo.RemoveMultipleSpaces(@s) AS AllGood;

And the results:

An iterative version of this same function is straight-forward as well, though not quite as much fun to talk about…

CREATE FUNCTION dbo.RemoveMultipleSpacesItr(@str NVARCHAR(MAX))
	RETURNS NVARCHAR(MAX)
AS
BEGIN
	WHILE CHARINDEX('  ', @str) > 0
		SET @str = REPLACE(@str, '  ', ' ');

	RETURN @str;
END
GO

 

A Third Option

Lastly, I ran across a very interesting post by Jeff Moden on SQL Server Central from a couple of months ago. It describes a set based solution to this problem. Basically, it involves a series of replaces, first with an “unlikely character” then with a single space. Check it out the link for the details; a snippet of the code is below:

DECLARE @s VARCHAR(MAX) = '221B    BAKER       STREET';
SELECT REPLACE(
        REPLACE(
            REPLACE(
                LTRIM(RTRIM(@s))
            ,'  ',' '+CHAR(7))
        ,CHAR(7)+' ','')
    ,CHAR(7),'') AS CleanString ;

That code can be used as is, or it could be put in a scalar or inline table valued function, like so:

CREATE FUNCTION dbo.RemoveMultipleSpaces3Scalar (@str NVARCHAR(MAX))
	RETURNS VARCHAR(MAX)
AS
BEGIN
	DECLARE @str1 VARCHAR(MAX);

	SELECT @str1 = REPLACE(
				REPLACE(
					REPLACE(
						LTRIM(RTRIM(@str))
					,'  ',' '+CHAR(7))
				,CHAR(7)+' ','')
			,CHAR(7),'') ;

	RETURN @str1;
END
GO

CREATE FUNCTION RemoveMultipleSpaces3TVF(@str NVARCHAR(MAX))
RETURNS TABLE
AS
  RETURN (
   SELECT REPLACE(
            REPLACE(
                REPLACE(
                    LTRIM(RTRIM(@str))
                ,'  ',' '+CHAR(7))
            ,CHAR(7)+' ','')
        ,CHAR(7),'') AS CleanString
  );
GO

The scalar function would be called like the others, but the table valued function can be called by using a CROSS APPLY, for example:

SELECT *
FROM Customer a
CROSS APPLY dbo.RemoveMultipleSpaces3TVF(a.Address1)

 

A Word About Performance

Even though each of these pieces of code accomplishes the same task, the performance is very different. To test, I took the address record we’ve been using (’221B     BAKER       STREET’) and inserted it into a table 1,000,000 times. Then I tested each of the functions by calling them against that table and inserting the results into a second table. Each was run a few times to be sure there weren’t any delays due to warming the cache or compilation, etc. The results are shown below (listed in the order discussed in this post):

Not too surprisingly, the non-set based solutions are the worst performers, with the recursive one bringing up the rear. The set-based solution outside of a function, is smokin’ fast relatively speaking – it’s not much slower than the control.

The performace difference looks like a lot, and it is, but that doesn’t always matter. Like I mentioned earlier, in the customer data loading process I was working on, the recursive function was used and worked fine. Looking back, that’s probably because the loads did many other things as well, so slower performance at this step wasn’t noticable in the big picture.

If I were to repeat that project today though, I would take a hard look at the set based option. Because every little bit counts.

And because I’m a geek :)

Posted in Uncategorized | Leave a comment

The Missing UNGROUP BY Clause (T-SQL Tuesday #18)

For May 2011, T-SQL Tuesday is brought to us by Chicago’s own Bob Pusateri  (blog | @SQLBob). Bob chose common table expressions, otherwise know as CTEs as the topic of the month.  Thanks for hosting Bob, and for the great choice of topics.

tsqltuesday

I’ve been a fan of common table expressions, or CTEs, ever since they were introduced in SQL Server 2005. Originally they took a little getting used to.  In my opinion though, once you’ve used them for a while, they get easier and eventually feel more natural than using sub-queries in many cases.

One of the coolest thing about CTEs is that they allow for the use of recursion in SQL queries, something that wasn’t possible before CTEs entered the picture. When code is recursive, it simply means that the code can reference itself – and it can do so repeatedly to get a desired result.  BOL has a decent description and example of recursive CTEs here.

The following is an example taken from a real-world project using a recursive CTE to do what I’m calling a “UNGROUP BY” operation. I’m sure that sounds a little odd and probably requires some explanation.

Picture this…

We have two staging tables that are part of the load process for a Sales database. The ImportHeader table contains customer info, the ImportDetail table holds product info, and the two are joined together on a TicketNumber.  This is a standard one-to-many relationship, so for each ImportHeader record, there can be multiple ImportDetail records representing the different products purchased by a customer on a given sales ticket.

The tricky part is there is also a Qty field on each ImportDetail record representing the quantity.  If a customer purchases multiple instances of the same product, the Qty field is set to the quantity purchased, rather than adding another line to the detail.

recursive

What makes that tricky is that the sales system is expecting exactly one line for each product purchased; it doesn’t know anything about a quantity field.  So in effect, we need to “ungroup” the detail records where Qty > 1.

Make sense? The example code is shown below.

First, let’s create the tables and populate with some sample data.

CREATE TABLE dbo.ImportHeader
(
	ImportHeaderID INT PRIMARY KEY IDENTITY(1,1),
	TicketNumber VARCHAR(50),
	CustomerID INT,
	FName VARCHAR(50),
	LName VARCHAR(50),
	PostalCode VARCHAR(50)
);

CREATE TABLE dbo.ImportDetail
(
	ImportDetailID INT PRIMARY KEY IDENTITY(1,1),
	TicketNumber VARCHAR(50),
	BrandName VARCHAR(50),
	ModelNumber VARCHAR(50),
	Price NUMERIC(12,2),
	Qty INT
);

INSERT dbo.ImportHeader 
(TicketNumber, CustomerID, FName, LName, PostalCode)
SELECT '1000', 112, 'Bob', 'Loblaw', '60654'
UNION ALL
SELECT '1001', 213, 'Carl', 'Weathers', '60654'
UNION ALL
SELECT '1002', 314, 'Steve', 'Holt', '60654'
;

INSERT dbo.ImportDetail
(TicketNumber, BrandName, ModelNumber, Price, Qty)
SELECT '1000', 'Sony', 'KDL1000', '1999.99', 3
UNION ALL
SELECT '1001', 'Sanyo', 'KDL1005', '1799.99', 1
UNION ALL
SELECT '1002', 'Sanyo', 'KDL1005', '1799.99', 2
UNION ALL
SELECT '1002', 'Samsung', 'KDL1011', '1699.99', 1
;

Three customers were added with 3 different ticket numbers.  The customers purchased 4 products, with varying quantities of each. 

Now we can run a query to ungroup the data using a recursive CTE.

WITH Products AS
(
SELECT 
	TicketNumber, BrandName, ModelNumber, Price, Qty, 1 AS n
FROM dbo.ImportDetail
WHERE (Qty > 0)
UNION ALL
SELECT 
	TicketNumber, BrandName, ModelNumber, Price, Qty, n + 1
FROM Products
WHERE (Qty > n)
)
SELECT 
	ROW_NUMBER() OVER(ORDER BY a.TicketNumber, b.n) as RowNum,
	a.TicketNumber, a.CustomerID, a.FName, a.LName, a.PostalCode,
	b.BrandName, b.ModelNumber, b.Price, b.n
FROM dbo.ImportHeader a
JOIN Products b 
	ON a.TicketNumber = b.TicketNumber
ORDER BY RowNum
;

The recursive portion of the query works like this:  it includes all of the detail records at the first level (Qty > 0), then unions that with the detail records where Qty > 1, then unions that with the detail records where Qty > 2, etc.  This continues until there are no records left where Qty > n.

As you can see in the output below, each line represents a single product purchased by a customer.  The records have been successfully ungrouped.

CTE_Results

Recursive code is often described as elegant in its simplicity. If you were to compare this query to an iterative solution for the same task (which I didn’t write but try to imagine), you would probably agree.

Recursion Limit

One thing to keep in mind is that SQL Server puts a limit on the number of recursions that can happen. This is done as a safety measure; and by default, SQL sets this limit to 100. If you were change the Qty for the first detail record from 3 to say, 142, you’ll get a nice error message like this:

Msg 530, Level 16, State 1, Line 45
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.

The value for the recursion limit can be changed using a query hint like so: 
OPTION (MAXRECURSION 500), which would (obviously) change the limit to 500 levels.

Using OPTION (MAXRECURSION 0) removes the limit.  This can be dangerous if a coding error causes you to end up with a circular reference.  It’s always better to throw an error than to go into an infinite loop.

Thanks for reading, and happy Tuesday!

Posted in Uncategorized | 6 Comments

Meme Monday–Running with SQL

My Meme Monday 11 word blog post:

“Running fast, heart racing, working through SQL problems in my mind.”

I was on vacation last week in Arizona. We were in Phoenix a lot, but also spent time in Sedona and visited the Grand Canyon. It was really beautiful there and very nice to be able to get away with the family to someplace warm and remote.

With some free time, I was also able to get out for run almost every day. I love to run, and I’m just starting to get back into it after some time off. My history as a runner is kind of sporadic. I ran a lot during and after college, then got injured and stopped for a few years. Then in 2009 I trained for and ran (and finished!) the Chicago Marathon, which was an incredibly awesome experience. Since then I’ve run very little, at least up until the last few weeks.

One of the things I love about running is chance to be alone with my thoughts for a short period of time. During those runs in AZ, I found myself thinking of many things, some related to database work. Those thoughts somehow evolved into thinking how much running is like working with SQL Server.  Here’s a few I remember…

  • Both are fun, challenging and rewarding
  • They require practice and persistence
  • The more you give to either, the more you get out
  • Can do it alone, but is better when you do it with other people

Given those and other similarities, it’s not surprising to me that there are several runners in the SQL community. Since this is far beyond 11 words, I’m going to finish here by tagging 3 SQL folks that I know like to run: Gill, Keith, Erin.

Happy Monday everyone!

Thanks Tom for the idea, and Noel for tagging me.

Posted in Uncategorized | 3 Comments

Disable all SQL Agent Jobs

Recently, I was looking for an easy way to disable all of the SQL Server Agent jobs as part of a migration to a new environment. My first thought was to just stop the SQL Agent service, but there were some jobs that were going to be used during the migration, so that was out.

Not wanting to reinvent the wheel, I ran a quick search to see if anyone had posted an example of this before.  I did find some results, but everything I found either used a cursor or updated system tables directly.  A cursor would be fine for this task performance-wise, but I hate using cursors if I don’t need to, and here you really don’t need to.

Here’s the simple script I came up with:

declare @sql nvarchar(max) = '';
select
@sql += N'exec msdb.dbo.sp_update_job @job_name = ''' + name + N''', @enabled = 0;
' from msdb.dbo.sysjobs
where enabled = 1
order by name;

print @sql;
exec (@sql);

 
This queries the sysjobs table creates an exec statement of sp_update_job for each active job, then runs the batch.

exec msdb.dbo.sp_update_job @job_name = 'SQLAgentJob1', @enabled = 0;
exec msdb.dbo.sp_update_job @job_name = 'SQLAgentJob2', @enabled = 0;
exec msdb.dbo.sp_update_job @job_name = 'SQLAgentJob3', @enabled = 0;
…

 
You can use the same technique to avoid cursors in other situations – I’ll blog about that another time.

Posted in Uncategorized | 4 Comments