THE SQL Server Blog Spot on the Web
Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | Join | Help
in Search

Adam Machanic

Adam Machanic, Boston-based independent database consultant, writer, and speaker, shares his experiences with programming, performance tuning, and optimizing SQL Server 2000, 2005, and 2008, in conjunction with related technologies such as .NET.

  • Speaking Season About to Begin: MCM, Roadshow, PASS, and SQLCLR Precon at DevTeach

    Fall is now in full swing, and after a relatively quiet several months my speaking schedule is about to jump into fairly-packed mode.  Following is a list of where I'll be headed over the next couple of months:

    Next week I'm really excited to be teaching a day of the Microsoft Certified Master: Microsoft SQL server 2008 program's beta rotation. I'll be teaching developer topics including ADO.NET, XML, and SQLCLR. Should be a good time!

    November 12, the SQL Server 2008 Roadshow hits Boston, and I'll be presenting a half day on all sorts of SQL Server 2008 topics.  This one is free, so if you're in the Boston area you have nothing to lose by checking it out.

    November 19-21 is the PASS Summit, and I'm excited to be involved in three presentations. I'm doing a spotlight presentation on programmatic/application concurrency topics, a duo with Peter DeBetta called "[Anti]patterns and [Mal]practices", and a duo with James Luetkehoelter on performance metrics.

    December 1 and 2 I'll be in Montreal, speaking at the always-fun DevTeach/SQLTeach conference. Aside from speaking in the regular conference program, on December 1 I will be doing a full-day preconference on SQLCLR development. In this preconference I will start with SQLCLR beginner topics and take you all the way through to some really advanced use cases. So this one should be of interest even if you've been using SQLCLR for some time. Best practices will be stressed, and you might just learn about some of the dusty corners you've not yet had a chance to touch.  I'm also upgrading the content to cover what changes SQL Server 2008 brings to the table, so this will be a complete session on the topic.  I think that SQLCLR, while not used much in SQL Server 2005, will really pick up adoption in SQL Server 2008 and beyond.  So now is the time to pick up what will be a key skill going forward.


  • [New England] NESQL October 9: SQL Server 2008 Performance Studio

    Just a quick note to remind people in the greater Boston area that tomorrow night, Thursday October 9, Jeremy Gaige from Idera Software will be doing a presentation on SQL Server 2008 Performance Studio at the New England SQL Server Users Group.  This should be a really interesting session on a fantastic new SQL Server 2008 feature.  Hope to see you there!

    Note: if you are planning to attend, please RSVP by sending an e-mail to nesql [at] hotmail [dot] com.  Only people who RSVP will be able to win giveaways at the end of the night.


  • Solution for the "LIKE vs. ?" Puzzle

    In late April, I posted a puzzle to test readers' knowledge of SQL Server query processing internals. The goal of the puzzle was to take a simple yet incredibly inefficient query and rewrite it, without changing the base tables or adding any additional database objects, to improve performance.

    The deadline for entries was May 1. I expected to receive a few responses, sort through them, and get a solution post published within the week. But alas, that's not what happened. I received hundreds of submissions, many of them really, really good. I started sorting them, and during that process I spoke at a couple of conferences, my hard drive crashed and I lost my work (but none of the submissions, luckily), and I was generally too busy to sort through the hundreds of submissions.  Then I finally got some time and started working on this post on June 24, and then... Well, I have no further excuses.  I just didn't finish.

    So I would like to start with an apology for the extensive delay. I know how impatient DBAs are, and several readers were kind enough to e-mail me and call me out for my laziness prod me along all this time.

    I am happy to announce that today, finally, I finished sorting and judging the submissions, and I have to say that I am extremely impressed with everyone who submitted! After eliminating all but the very best submissions, I still had 20 on my list.  Choosing the top two was extremely difficult; I had to decide between excellence and excellence. After thinking about it for a while, I decided to give one award for the best code and a second for the best explanation (as both of these were required per the rules of the challenge).  And I still ended up awarding three people rather than just two.

    Before I get to the solutions, let's take a second (or Nth) look at the original query:

    SELECT *
    FROM b1
    JOIN b2 ON
    b2.blat2 LIKE b1.blat1 + '%'

    The difficulty here, from a query processing point of view, is the nature of the JOIN predicate. The task for the query processor is to figure out how, given a 'blat2' value, all of the associated 'blat1' values can be determined. The query processor has only three possible options for satisfying this (or any other) join:

    1. A merge join. In this case, it's not possible, as SQL Server's merge algorithm only supports equijoins. In this case, we're joining based on a range: all 'blat2' strings that start with the same characters asthose in 'blat1'.
    2. A hash join. Again, not possible, as we need an equijoin. SQL Server's hash algorithm is not ordered (which is not to say that it couldn't be in the future), so hash keys must be compared based on equality.
    3. A loop join. This join type can be used, for better or for worse, in all cases.  And in this case it does get used, for worse.  For every row of table b1, table b2 gets scanned and all of the matching rows are found. That's a lot of scans, and a lot of work.

    There must be a better way. Clearly, the key is to try one of those other join types, but we need an equijoin!  Virtually everyone who submitted responses figured this out, whether by accident or on purpose, and almost everyone came up with a query approximating the following:

    SELECT *
    FROM b1
    JOIN b2 ON
        LEFT(b2.blat2,5) = b1.blat1

    This query solves the problem, and solves it well.  It turns the query from a two-minute nightmare to a half-second dream, all via the magic that is the hash join.  But don't take my word for it.  Instead, why don't we let contest winner #1, Chamindu Munasinghe explain?  Following is a slightly edited version of his winning submission.

    The optimized query uses a hash match to perform the join operation. When a hash match is performed the table b1 is scanned and a hash table is built in memory. The hash key is computed using the b1.blat1 column and the rows from b1 are added to it. Then another table scan is performed on the b2 table and a hash key is computed using LEFT(b2.blat2, 5) value and then the previously built hash table is searched [ed: "probed"] on the key and matches are made.

    But if you consider the query that uses the LIKE operator, the database engine cannot perform a hash join because we don't have a value from b2 to compute a hash. Instead it has to do a nested loops join where for each row in b1 it loops over the b2 table and tries to match the blat1 and blat2 columns using the wild card pattern.

    If you have N.b1 rows in b1 table and N.b2 rows in the b2 table the number of operations required to do the join can be computed like this

    Optimized query = N.b1 + N.b2  (Scan both tables)
    LIKE query = N.b1 * N.b2 (Nested Loops)

    Due to that the optimized query is much faster. I have assumed that the number of rows in b1 can be accommodated in memory and a the hash match is done in memory. If this is not the case then the computations will be different because it can use a grace hash match or a recursive hash match. But the hash match is still faster than the nested loop.

    Chimandu pointed out in his submission that he is a .NET developer -- not a SQL Server professional -- and it's clear in his answer that it was his knowledge of the inner workings of data structures that helped him answer the question so well.

    Now that we've looked at the winning explanation, which explains why fastest possible way to solve the puzzle works, I will reveal the second set of winners -- a tie, actually, between two creative coders.

    Andrew den Hertog came up with the same solution as Chimandu and everyone else, but took it one step further. He was bothered by the fact that this solution was hardcoded and would only work with an input of five characters, so he smartly took a stab at making it dynamic:

    SELECT
        b1.*,
        b2.*
    FROM b1
    CROSS JOIN
    (
        SELECT
            MAX(LEN(blat1)) AS fieldSize
        FROM b1
    ) AS fs
    JOIN b2 ON
        LEFT(b2.blat2, fs.fieldsize) = b1.blat1

    This solution is a bit slower than the non-dynamic version, and almost works... But alas, it fails to return the same results if you modify the script that builds the two input tables, and change the number of characters to, say, 7 rather than 5. The reason this fails is due to the fact that some of the input values are only 5 characters long -- which is the reason I chose that as the initial length.

    Around the same time that Andrew came up with his solution, Kevan Riley sent me a non-dynamic way of solving the problem generally for any input character length. The trick is to delimit both of the input strings:

    SELECT *
    FROM b1
    JOIN b2 ON
        '#'+b1.blat1+'#' = '#'+LEFT(b2.blat2,7)+'#'

    Great job solving that one, Kevan!  Combining this solution with Andrew's, we get the best possible solution, totally dynamic and viable for all possible input cases:

    SELECT
        b1.*,
        b2.*
    FROM b1
    CROSS JOIN
    (
        SELECT
            MAX(LEN(blat1)) AS fieldSize
        FROM b1
    ) AS fs
    JOIN b2 ON
        '#'+LEFT(b2.blat2, fs.fieldsize)+'#' = '#'+b1.blat1+'#'

    Great job, Chamindu, Andrew, and Kevan!!!  I would also like to name four runners-up, each of whom also provided excellent submissions:

    Gordon Klundt
    Creighton Kagey
    Erik Eckhardt
    Greg Bodden

    Again, this was an extremely difficult choice and I'm sorry I was not able to award everyone, but if you're on this list please give yourself several pats on the back for your great work.

    Thanks again to everyone who participated.  I was truly impressed by all of the submissions.  This contest was a lot of fun, and I have a few ideas for some other challenges for the near future. Stay tuned.


  • Demos from my TechEd session (DAT302: Best Practices for Exception Handling and Defensive Programming in Microsoft SQL Server 2005 and 2008)

    If you attended my session this morning, thank you!  It's always flattering when people are willing to wake up before 8:00 a.m. just so that they can hear a talk about SQL Server.  I had a great time doing the talk, and I think it went well.  My sincerest apologies to the person I accidentally hit in the head with the t-shirt; I will work on my throwing arm and hopefully do a better job with the swag tossing next time!

    I had a few requests for the demo scripts, and they are attached to this post.  Please let me know if you have any questions or issues with them.  And thanks again for attending.

    For those who were not there, the presentation was recorded, and I believe it will be posted online at some point, so keep your eyes open for that.


  • When Your Hard Drive Comes Knocking... A Cautionary Tale

    Imagine yourself, happily computing (or whatever it is that you do with your computer). It's a fine, sunny day, narry a cloud in the sky, and you're happily typing along when all of a sudden you hear a rather alien sound eminating from your hard drive. Something that sounds, perhaps, like some combination of a roofer banging in a nail, and a miner's pick as he works on releasing a stubborn piece of ore from a cave wall. 

    Certainly not a good sound to hear coming from the general region of your hard drive on a nice, sunny day. Especially when you have not taken a backup in over two years...

    Now consider your options. You could...

    1. Turn off your computer immediately, Google "data recovery", and call one of the multitude of companies that pops up. They will charge you in the neighborhood of $500 to get your data safely off of your damaged disk.
    2. Immediately stop working, grab an external USB drive, and transfer all of your valuable data onto the external drive. Order a new internal hard disk, and keep working, confident in the knowledge that your disk is going to die soon, but you've already salvaged the valuable data and a new disk is on the way.
    3. Try to get clever. Think to yourself "maybe if I run 'chkdsk' it will go through the bad areas of the disk, mark the sectors as unusable, and I can keep using this disk."

    Such was the scene at my desk last Friday. And you have probably already figured out where this is going. Yes, I ran 'chkdsk'. The first three (of five) checks completed without error, and I thought everything was going to be fine. Then, during the fourth phase, after about an hour of intense clicking and banging noises, a message appeared on the screen, which I can only paraphrase at this point:

    Not enough space is available on the disk to fix the bad sectors.

    This was a concerning message, given that the disk was much less than half full.  But only a minute or two later my concerns were answered by another rather vague message:

    Unspecified error has occurred. Aborting.

    An unspecified error during a 'chkdsk' run is never a good thing. And so I rebooted, only to discover that my hard disk was no longer recognized by my system as a hard disk. I did what I should have done in the first place -- called a couple of the friendly data recovery companies -- and after listening to my story and, nicely enough, not laughing at me outright, they told me that my chances of data recovery were near zero percent. You see, when your disk is banging away like that, it's the sound of the heads hitting the platters. And they're not supposed to do that. When I ran 'chkdsk', I forced the heads to touch every surface of the platters, thereby scratching them into oblivion.

    A quick trip to my neighborhood computer store, a new hard drive, and I'm more or less back in business, minus two years worth of documents, most of which I never bothered to back up. A bad way to start the weekend, but I actually can't think of anything especially valuable or irreplacable that was lost.  This is more of an extreme annoyance.

    So please, learn from my mistake, and next time you hear an odd clicking sound, don't try to outsmart your already-broken hardware. Listen to your computer. It's sending you a very clear (albiet perhaps Morse-coded) message. And back up your data now and again! I work hard to tell my customers about all of the great reasons they should back up their enterprise data; but like most people, I never think about applying those same behaviors to my personal machines.

    As an aside, when I reinstalled Vista on the new drive I accidentally used the 32-bit rather than 64-bit media. After I realized my error I decided to stay on the 32-bit version for now and see how it performs. I was not happy with memory consumption before, and I suspect that the 32-bit version will be a bit leaner. Since I only have 2 GB of RAM in this machine, there is no great reason for me to run 64-bit anyway. I'll post again in a while about my thoughts on 32-bit vs. 64-bit Vista once I've had a chance to work with it a bit more intensely.


  • [New England] NESQL Special Meeting, Featuring Craig Freedman

    Next Thursday, May 8, the New England SQL Server Users Group will have a special meeting, featuring Craig Freedman from the SQL Server development team.  Craig is The Man when it comes to query optimizer internals, and wrote an incredibly detailed chapter on the topic for "Inside SQL Server 2005: Query Tuning and Optimization".

    At the meeting next week, Craig will discuss some of what he talked about in the chapter, including the basics of how the query processor works and what iterators are.  He'll cover the various operators you'll commonly see in query plans, and describe how they actually work internally. 

    This should be a great meeting, and we expect it to be very well attended.  In order to help us figure out food and drink, in addition to securing enough chairs for the meeting room, we need you to RSVP if you're planning to attend.  In order to RSVP, sign up for our mailing list.  I will send out an e-mail next Tuesday, and you can RSVP by replying to it.  Only attendees who RSVP will be eligible for our prize draw at the end of the night, so make sure to sign up for our list by Monday in order to guarantee that you don't get left out.

    We would like to thank Red Gate Software, who made a very generous donation to the group that allowed us to have this special meeting.  Red Gate makes some of my favorite SQL Server tools and provides a huge amount of community support in the SQL Server and .NET space, and we hope that you will give their products a try.


  • [OT] SQLTeach: Almost Here

    I was just reviewing my calendar for the next several weeks and noticed that the Toronto SQLTeach conference is now only a few weeks away.  This conference includes quite a few interesting SQL Server-related sessions, on topics ranging from best practices, to performance, to some of the new SQL Server 2008 features. I fully expect this to be a great show.

    I am doing two breakout sessions during the main conference:

    • SQL Server 2005: Authorization, Privilege, and Access Control.  In this talk I cover SQL Server 2005’s enhancements around granting permissions via stored modules (i.e., stored procedures, views, functions)
    • Designing Highly Concurrent Database Applications.  In this talk I get into the business requirements behind supporting concurrent processes, and the areas where SQL Server (and every other database product) falls short. I then go on to show how to solve the problems in the database programmatically.

    I am also doing a full-day post-conference session on SQLCLR programming. This will be the first time that I will be presenting all of my SQLCLR material in a single day; should be fun. I will take attendees from the basics all the way through some advanced applications and techniques, so if you’re interested in becoming a SQLCLR expert I highly recommend attending.

    The conference starts in just three weeks, but it is not too late to register.


  • SQL Server Query Processing Puzzle: LIKE vs ?

    How creative are you with manipulating your queries to produce more efficient plans? Try the following puzzle and e-mail your solution to me at [<my last name> @ pythian.com]. Make sure to include an explanation of why it works, as well as your mailing address. The best two solutions/explanations win a free copy of Expert SQL Server 2005 Development, a wonderful feeling of accomplishment, plus eternal fame and glory when I reveal your solutions here on the blog.

    Run the following T-SQL to create two tables in TempDB:

    USE TempDB
    GO

    CREATE TABLE b1 (blat1 CHAR(5) NOT NULL)
    CREATE TABLE b2 (blat2 VARCHAR(200) NOT NULL)
    GO

    INSERT b1
    SELECT LEFT(AddressLine1, 5) AS blat1
    FROM AdventureWorks.Person.Address

    INSERT b2
    SELECT AddressLine1 AS blat2
    FROM AdventureWorks.Person.Address
    GO
    Now consider the following query:
    SELECT *
    FROM b1
    JOIN b2 ON
    b2.blat2 LIKE b1.blat1 + '%'

    This query takes around three minutes to run on my notebook, and does over 1.8 million logical reads. Can you figure out a way to re-write it so that it performs better? No modification of the base tables or addition of any other objects is allowed (sorry, no indexed views!) -- the challenge is to tune this by doing nothing more than re-writing the query.

    Good luck! I'll leave the contest open for submissions until May 1.


  • SQL Server Procedure Cache: More Relief on the Way

    If you've read many of my blog posts, you know that I consider lack of procedure cache control to be a major SQL Server pain point. Badly written apps that use non-parameterized ad hoc queries can quickly flood SQL Server's memory pools and bring the server to its knees.

    SQL Server 2005 brought some relief in the form of the Forced Parameterization database option, and SP2 took things one step further with better throttling of the cache... but it's still not enough. We want a knob!

    The bad news: We're not getting quite the knob I was hoping for.

    The good news: SQL Server 2008 will include an sp_configure option called "optimize for ad hoc workloads".  This option will cause the procedure cache to only cache the parameterized stubs for ad hoc queries, rather than the full query with parameters.  This means that applications passing a large number of non-parameterized batches should see much lower procedure cache memory utilization and, therefore, better overall throughput.  I'm really looking forward to seeing this in action; this feature should be added with the next pre-release drop.

    Remember, there is simply no substitute for properly designing your application's data access layer, but hopefully this will help for those applications that simply can't be changed...


  • SQL Server 2008: Return of the Debugger

    A lot of people will be interested to know that at the launch event in LA it was announced that the T-SQL debugger is returning to Management Studio in SQL Server 2008.

    Personally, this is not a feature I've been lamenting the loss of; I never used it in SQL Server 2000, and unless we can view temp tables, table variables, etc, I just don't see it as something with a lot of utility for the way I personally develop T-SQL. 

    But this isn't just about me, and I know that there was a huge amount of interested in seeing the debugger come back into the core SQL Server tools.  So congrats to all of the step debug fans out there; get your F10 keys ready for SQL Server 2008!


  • Who's On First? Solving the Top per Group Problem (Part 1: Technique)

    Relative comparison is a simple matter of human nature. From early childhood we compare and contrast what we see in the world around us, building a means by which to rate what we experience. And as it turns out, this desire to discover top and bottom, rightmost and leftmost, or best and worst happens to extend quite naturally into business scenarios. Which product is the top seller? How about the one that's simply not moving off the shelves? Which of our customers has placed the most expensive order? What are the most recent orders placed at each of our outlets?

    In the world of common business questions, the edge cases are generally of most interest. What's in the middle is unimportant; it's often too difficult for the mind to compare and comprehend when there are hundreds, thousands, or even millions of items, transactions, or facts that are all within a similar range. Instead, we focus on those that stick out in some extraordinary way.

    Those of us who work with SQL products on a regular basis are faced with solving this same problem time and again as we work through various business requirements. Over time, I have noticed four basic query patterns that can be used to solve the problem; each are logically equivalent (within certain restrictions -- more on that later), but can have surprisingly different performance characteristics depending on the data being queried. In this first post, I will outline the available patterns/methods. In the following posts, I will show the results of testing each pattern against a variety of scenarios in an attempt to discover where and when each should be used.

    The four basic patterns are outlined below. Each of the methods is illustrated using a query to show all customers' names, plus their most recent order date, and the amount of that order. I've included notes that indicate where logic differences can arise among the various methods.

     

    Method 1: Join to full group and use correlated subquery with a MIN/MAX aggregate to filter

    In this method we use an inner join to get all required columns, then filter the resultant set using a correlated subquery in the WHERE clause.

    SELECT
        c.FirstName,
        c.LastName,
        o.OrderDate,
        o.OrderAmount
    FROM Customers c
    JOIN Orders o ON o.CustomerId = c.CustomerId
    WHERE
        o.OrderDate  =
        (
            SELECT MAX(o1.OrderDate)
            FROM Orders o1
            WHERE
                o1.CustomerId = o.CustomerId
        )

    Logic notes: With this method ties are automatically included in the output, unless a tiebreaker is specified (which can be tricky given that you only have one column to work with). This method does not allow you to pull back an arbitrary number of rows, such as top 10 per customer; you are limited to the edge and any ties that might exist.

     

    Method 1a: Join to full group and use correlated subquery with TOP(n) and ORDER BY to filter

    This method is almost identical to Method 1 (which is why it is classified here as 1a), but the TOP and ORDER BY allow for a bit more flexibility than the aggregates.

    SELECT
        c.FirstName,
        c.LastName,
        o.OrderDate,
        o.OrderAmount
    FROM Customers c
    JOIN Orders o ON o.CustomerId = c.CustomerId
    WHERE
        o.OrderDate  =
        (
            SELECT TOP(1)
                o1.OrderDate
            FROM Orders o1
            WHERE
                o1.CustomerId = o.CustomerId
            ORDER BY
                o1.OrderDate DESC
        )

    Logic notes: With this method you can more easily integrate a tiebreaker than with Method 1; the comparison column can be anything, including a primary key, and you can still order on whatever column makes most sense. In addition, you can take more rows than with Method 1 by using IN instead of = in the WHERE clause, and increasing the argument value to TOP.

     

    Method 2: CROSS APPLY to ordered TOP(n)

    In this method, SQL Server 2005's CROSS APPLY operator is used. This operator allows us to essentially create a table-valued correlated subquery -- something that impossible in previous versions of SQL Server. By using TOP in conjunction with ORDER BY we can get as many rows per group as needed.

    SELECT
        c.FirstName,
        c.LastName,
        x.OrderDate,
        x.OrderAmount
    FROM Customers c
    CROSS APPLY
    (
        SELECT TOP(1)
            o.OrderDate,
            o.OrderAmount
        FROM Orders o
        WHERE
            o.CustomerId = c.CustomerId
        ORDER BY
            o.OrderDate DESC
    ) x

    Logic notes: This method is almost identical, from a logic point of view, with Method 1a modified to use IN on a primary key column. With both methods WITH TIES can be added to the TOP in order to get ties.

     

    Method 3: Join to derived table that uses a partitioned, ordered windowing function, and filter in the outer query based on the row number

    In this method a derived table or CTE is used, in conjunction with a windowing function partitioned based on the required grain of the final query. So for the "most recent order per customer" query, the row number is partitioned based on the customer. This gives us a count starting at 1 for each customer, which can be filtered in the outer query.

    SELECT
        c.FirstName,
        c.LastName,
        x.OrderDate,
        x.OrderAmount
    FROM Customers c
    INNER JOIN
    (
        SELECT
            o.OrderDate,
            o.OrderAmount,
            o.CustomerId,
            ROW_NUMBER() OVER
            (
                PARTITION BY o.CustomerId
                ORDER BY o.OrderDate DESC
            ) AS r
        FROM Orders o
    ) x ON
        x.CustomerId = c.CustomerId
        AND x.r = 1

    Logic notes: If ties are important, use DENSE_RANK instead of ROW_NUMBER. ROW_NUMBER is good for arbitrary TOP(n), similar to Method 2. Unlike the previously described methods, in conjunction with DENSE_RANK this method can return an arbitrary TOP(n) rows, all of which can include ties. So if you would like to see the three most recent order dates and each happens to have multiple orders, this method will be able to return them all by simply filtering on x.r = 3. This would not be directly possible with any of the other methods described here.

     

    Method 4: "Carry-along sort"

    This is the only "tricky" method, and not one that I recommend using, except as a last resort. I'm including it here only for completeness and comparison because it happens to be a very high performance method in some cases. This method involves converting each of the required inner columns into a string, concatenating them, then applying an aggregate to the string as a whole. By putting the "sort" column first, the other data is "carried along" -- thus the name for the method. The concatenated data is then "unpacked" in the outer query. This can be surprisingly efficient from an I/O standpoint, but the resultant code is a maintenance nightmare and it is quite easy to introduce errors. In addition, this method can only return the top 1 per group -- no ties or multiple return items are supported.

    SELECT
        c.FirstName,
        c.LastName,
        CONVERT(DATETIME, SUBSTRING(x.OrderInfo, 1, 8)) AS OrderDate,
        CONVERT(MONEY, SUBSTRING(x.OrderInfo, 9, 15)) AS OrderAmount
    FROM Customers c
    INNER JOIN
    (
        SELECT
            o.CustomerId,
            MAX
            (
                CONVERT(CHAR(8), OrderDate, 112) +
                CONVERT(CHAR(15), SubTotal)
            ) OrderInfo
        FROM Orders o
        GROUP BY
            o.CustomerId
    ) x ON
        x.CustomerId = c.CustomerId

     

    This post is just the beginning; watch this space in the coming days for a series of performance tests and analysis of these methods, and some results that I personally found to be quite surprising.
     


  • In case you don't read our Roller... Breaking news on RTM release date

    And I quote...

    final Release to manufacturing (RTM) of SQL Server 2008 expected in Q3.

    Thanks to Jason Massie for the pointer (via our Roller, of course!) 


  • [New England] Special New England SQL Server Event with Itzik Ben-Gan

    Just a heads up for those in the Boston area: The New England SQL Server Users Group is doing a special event next Wednesday night (January 23), featuring Itzik Ben-Gan, talking about Grouping Sets in SQL Server 2008:

    SQL Server 2008 introduces enhanced support for aggregating data addressing the needs to analyze aggregated data dynamically. The enhancements include the new GROUPING SETS clause, the standard CUBE and ROLLUP clauses (not to confuse those with the existing non-standard CUBE and ROLLUP options), the GROUPING_ID function, and other T-SQL enhancements. This session will cover those enhancements in detail, and will describe and demonstrate their practical uses.

    This should be a great event, so mark your calendars and check the NESQL Web site for more information. Please note that RSVP is strongly encouraged for this event -- only people who RSVP will be eligible for giveaways that night, and we have a great selection. To get on the list to RSVP, please visit the Web site and sign up for our mailing list.  I will send a mailing -- which will include RSVP instructions -- next Monday.

     


  • A Gift of Script for 2008: Who's Active, What Are They Doing, and Who is Blocked?

    As with all of the blog posts I keep meaning to write -- I keep a list and given the infrequency with which I've been posting lately, it's getting quite large -- this script has been on the queue for quite some time. So here I find myself with a spare moment right on the cusp of the new year, and figured what better way to end the year than with a script that, at least for me, has been quite useful these last few months.

    The driving force behind my writing this script is that I found myself endlessly calling sp_who2 'active' to see who was doing what on servers I needed to take a look at. Then I would have to sort through the results, and end up calling DBCC INPUTBUFFER to take a look at the SQL being used. This was a serious pain, and I finally caved a few months back and decided to end the madness once and for all with the help of some DMVs.

    The following script primarily uses the sys.dm_exec_requests view, and finds all "active" requests -- i.e., those that are running, about to start running, or suspended. It also finds some other useful information, including the host name, login name, the start time of the batch, and whether or not the batch is currently blocked. In the outer query I use the sys.dm_exec_sql_text function to get the text of the SQL that all of the active requests are running, in addition to the SQL being run by the blocking sessions, if applicable. This way I don't have to do two lookups to chase down what's blocking what.

    You'll notice that I use FOR XML PATH in the subqueries that pull the SQL text. This gives us a nice little bonus: instead of copying the text out of the cell in SSMS and pasting it somewhere else, you can simply click on it -- and it maintains whatever formatting, including white space and carriage returns, that it originally had. This is much, much nicer than getting the batch on a single line. The only problem is that certain characters, such as greater-than and less-than, get "entitized" when the text is converted to XML. This means that some queries won't be able to be run without a bit of editing. A small price to pay for nicer output, in my opinion. If anyone out there has a solution for the entitization, please let me know! The only way I know to solve it is to convert back to VARCHAR, and that defeats the whole purpose...

    Anyway, thanks all for a great 2007. Here's to an even better 2008! Without further ado, the script:

     
    SELECT
        x.session_id,
        x.host_name,
        x.login_name,
        x.start_time,
        x.totalReads,
        x.totalWrites,
        x.totalCPU,
        x.writes_in_tempdb,
        (
            SELECT
                text AS [text()]
            FROM sys.dm_exec_sql_text(x.sql_handle)
            FOR XML PATH(''), TYPE
        ) AS sql_text,
        COALESCE(x.blocking_session_id, 0) AS blocking_session_id,
        (
            SELECT
                p.text
            FROM
            (
                SELECT
                    MIN(sql_handle) AS sql_handle
                FROM sys.dm_exec_requests r2
                WHERE
                    r2.session_id = x.blocking_session_id
            ) AS r_blocking
            CROSS APPLY
            (
                SELECT
                    text AS [text()]
                FROM sys.dm_exec_sql_text(r_blocking.sql_handle)
                FOR XML PATH(''), TYPE
            ) p (text)
        ) AS blocking_text
    FROM
    (
        SELECT
            r.session_id,
            s.host_name,
            s.login_name,
            r.start_time,
            r.sql_handle,
            r.blocking_session_id,
            SUM(r.reads) AS totalReads,
            SUM(r.writes) AS totalWrites,
            SUM(r.cpu_time) AS totalCPU,
            SUM(tsu.user_objects_alloc_page_count + tsu.internal_objects_alloc_page_count) AS writes_in_tempdb
        FROM sys.dm_exec_requests r
        JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id
        JOIN sys.dm_db_task_space_usage tsu ON s.session_id = tsu.session_id and r.request_id = tsu.request_id
        WHERE r.status IN ('running', 'runnable', 'suspended')
        GROUP BY
            r.session_id,
            s.host_name,
            s.login_name,
            r.start_time,
            r.sql_handle,
            r.blocking_session_id
    ) x

     

    Enjoy!


  • Anti-Patterns and Malpractices, Volume 1: Tumbling Data

    "Lonely but free I'll be found
    Drifting along with the tumbling tumbleweeds"

     - Supremes, "Tumbling Tumble Weeds"

     

    Welcome to the first installment of what I hope will be a regular feature on this blog, Anti-Patterns and Malpractices. As a consultant, I get the honor of seeing a lot of different systems, with a lot of different code. Some of it is good, and some of it -- well -- I'll be featuring that which is not so good here. No names will be named, and code will be changed to protect the not-so-innocent; my goal is not to call out or embarrass anyone, but rather to expose those misguided patterns and practices which inevitably lead to problems (and a subsequent call to a consultant; perhaps if I post enough of these I'll have fewer less-than-appealing encounters in my work!)

    The topic du jour is the Tumbling Data Anti-Pattern, a name coined by my friend Scott Diehl. Much like the