Recently someone asked the following, where we can use (and almost "need" to use) dynamic SQL to accomplish a dynamic PIVOT with multiple groupings. The original example had some very specific data requirements, so I'm recreating the example to simplify the data – while still keeping to the overall requirements.
As I'll point out throughout this blog post, this is another case where identifying the pattern is part of the battle.
Before I go any further, I have 3 download files. The script to generate the data, the script for the query model we'll want to generate, and then the final query that will use dynamic SQL to dynamically generate the desired result.
Suppose we have this data: a table called dbo.SalesData (on the left), with Sales Amounts by Client on specific days….and a table called dbo.ReturnData, with Sales Returns by Data, Client, and a Return Reason. I realize most production systems would have a more elaborate table structure. I'm simply using the core necessary elements for this example.
Sales |
Returns w/multiple Return Types |
|
|
In the end, we'd like to produce a result set that shows the following. Notice three things:
- The columns essentially pivot all of the possible Return Reasons, after showing the sales amount
- The result set contains subtotals by the week ending (Sunday) date across all clients (where the Client is NULL)
- The result set contains a grand total row (where the Client and Date are both NULL)
First, before I get into the SQL end – we could use the dynamic pivot/matrix capability in SSRS. We would simply need to combine the two result sets by one column and then we could feed the results to the SSRS matrix control, which will spread the return reasons across the columns axis of the report. However, not everyone uses SSRS (though most people should!). But even then, sometimes developers need to consume result sets in something other than a reporting tool. So for this example, let's assume we want to generate the result set for a web grid page – and possibly the developer wants to "strip out" the subtotal rows (where I have a ResultSetNum value of 2 and 3) and place them in a summary grid. So bottom line, we need to generate the output above directly from a stored procedure.
And – as an added twist – next week there could be Return Reason X and Y and Z. So we don't know how many return reasons there could be. We simple want the query to pivot on the possible distinct values for Return Reason. Here is where the T-SQL PIVOT has a restriction – we need to provide it the possible values. Since we won't know that until run-time, we need to generate the query string dynamically using the dynamic SQL pattern. The dynamic SQL pattern involves generating the syntax, piece by piece, storing it in a string, and then executing the string at the end. Dynamic SQL can be tricky, as we have to embed syntax inside a string. But in this case, it our only true option if we want to handle a variable number of return reasons.
I've always found that the best way to create a dynamic SQL solution is by figuring out what the "ideal" generated-query would be at the end (in this case, given the Return reasons we know about)….and then reverse-engineering it by piecing it together one part at a time.
And so, here is the SQL we need – if we knew those Return Reasons (A through D) were static and would not change. The query does the following:
- Combines the data from SalesData with the data from ReturnData, where we "hard-wire" the word Sales as an Action Type form the Sales Table, and then use the Return Reason from the Return Data into the same ActionType column. That will give us a clean ActionType column on which to pivot. We are combining the two SELECT statements into a common table expression (CTE), which is basically a derived table subquery that we subsequently use in the next statement (to PIVOT)
- A PIVOT statement against the CTE, that sums the dollars for the Action Type being in one of the possible Action Type values. Note that this isn't the final result set. We are placing this into a CTE that reads from the first CTE. The reason for this is because we want to do multiple groupings at the end.
-
The final SELECT statement, that reads from the PIVOTCTE, and combines it with a subsequent query against the same PIVOTCTE, but where we also implement two groupings in the GROUPING SETS feature in SQL 2008:
- GROUPING by the Week End Date (dbo.WeekEndingDate)
- GROUPING for all rows ()
So if we knew with certainty that we'd never have more return reason codes, then that would be the solution.
However, we need to account for other reason codes. So we need to generate that entire query above as one big string – where we construct the possible return reasons as one comma separated list.
I'm going to show the entire T-SQL code to generate (and execute) the desired query. And then I'll break it out into parts and explain each step. So first, here's the entire code to dynamically generate what I've got above.
There are basically five steps we need to cover.
Step 1, we know that somewhere in the mix, we need to generate a string for this in the query:
[SalesAmount],[Reason A],[Reason B],[Reason C],[Reason D]
What we can do is built a temporary common table expression that combines the hard wired "Sales Amount" column with the unique list of possible reason codes. Once we have that in a CTE, we can use the nice little trick of FOR XML PATH('') to collapse those rows into a single string, put a comma in front of each row that the query reads, and then use STUFF to replace the first instance of a comma with an empty space. This is a trick that you can find in hundreds of SQL blogs. So this first part builds a string called @ActionString that we can use further down.
Step 2: we also know that we'll want to SUM the generated/pivoted reason columns, along with the standard sales column. So we'll need a separate string for that, which I'll call SUMSTRING. I'll simply use the original @ActionString, and then REPLACE the outer brackets with SUM syntax, plus the original brackets.
Step 3: Now the real work begins. Using that original query as a model, we want to generate the original query (starting with the UNION of the two tables), but replacing any references to pivoted columns with the strings we dynamically generated above. Also, while not absolutely required, I've also created a variable to simply any carriage return/line feed combinations that we want to embed into the generated query (for readability). So we'll construct the entire query into a variable called SQLPivotQuery.
Step 4: We continue constructing the query – again, concatenating the syntax we can "hard-wire" with the @ActionSelectString (that we generated dynamically to hold all the possible return reason values)
Step 5: Finally, we'll generate the final part of the Pivot Query, that reads from the 2nd common table expression (PIVOTCTE, from the model above) and generates the final SELECT to read from the PIVOTCTE and combine it with a 2nd read against PIVOTCTE to implement the grouping sets.
Finally, we can "execute" the string using the SQL system stored proc sp_executeSQL
So hopefully you can see that the process to following for this type of effort is…
- Determine what the final query would be, based on your current set of data and values (i.e. built a query model)
- Write the necessary T-SQL code to generate that query model as a string. Arguably the most important part is determining the unique set of values on which you'll PIVOT, and then collapsing them into one string using the STUFF function and the FOR XML PATH('') trick