One of the important new features in SQL Server 2008 is Change Data Capture (also known as CDC). CDC implements change tracking into the database management system. (Change Data Capture, as a concept, has been around for many years. You can search on the term and find a good history of information on it.)
Before I dive into CDC in SQL Server, a little history: until SQL Server 2008, if we wanted to implement audit trail history (i.e. log insertions, updates, changes), the most effective method was through database triggers. Even with the new OUTPUT/OUTPUT INTO feature in SQL Server 2005, triggers were still the most reliable and effective way to detect and log the "before/after" values the audit logs require. And with any form of audit trail processing, reliability is ABSOLUTELY CRITICAL. Many applications have strict requirements for audit trail processing, to view historical changes (along with who made the changes, and when).
While database triggers are great (I've used them for years and I love them), critics make a valid point that the database management system should provide some sort of audit log "out of the box". And so Microsoft implemented Change Data Capture in SQL Server 2008. In both the SetFocus SQL Server Master's Program and Business Intelligence Master's Program, I cover Change Data Capture in SQL Server 2008. Here's an overview and a brief example of CDC.
First, I need to make sure SQL Server Agent is running. CDC in SQL Server 2008 relies on SQL Server Agent jobs.
Second, I'll create a new test database called CDCTest.
Third, I'll run a new system stored procedure that enables CDC on a database. I only need to run this one time.
Fourth, I'll create a very simple database table called TestEmployees. (Eventually I'll insert/update/delete rows, and will expect SQL Server to log all the changes).
Fifth, I need to tell SQL Server to log any insertions/updates/deletes for this table. So I'll call another new system stored procedure, to activate CDC for the table:
Note that SQL Server will create a new system table called cdc.dbo_TestEmployees_CT. This is the change tracking table for TestEmployees. (So as you can see, CDC maintains a separate log table for each "regular" database table).
Sixth, I'll insert and update and delete a few rows (to give the log file something to work with):
OK, step 7: let's look at the change tracking log for the TestEmployees table (the name of the change tracking log is derived from the base table name):
The CDC log takes some time to get accustomed to: the _$operation column isn't intuitive (2 is an Insert, 3 and 4 are Old Value/New Value combinations for updates, and a 1 is a deletion), and we need to join the Log Sequence Number column to the system table cdc.lsn_time_mapping in order to retrieve the date and time and transaction number of each operation.
Now for the $64,000 question – suppose I add a column to the original TestEmployees table – will the underlying CDC processes automatically adjust and log any new columns?
There's bad news and there's good news. The bad news is that CDC will not automatically adjust the changing tracking log table, if I add a column to the original table. (This is rather unfortunate, as new columns are a fact of life). The good news is that we can write a little bit of database code to get around this. This involves 4 steps:
First, I need to backup the change tracking log into a temporary database. (The reason is that step 2 will erase the current change tracking log).
Second, I'll add a column to the TestEmployees table (a column for each employee's department):
Third, I'll disable (and then re-enable) change tracking for the TestEmployees table. (This will delete the change tracking table and then recreate it with the new structure that utilizes the new Department column:
Fourth, I'll insert the temporary log back into the newly created log (but before that, I'll add the same new department to the temporary log, to make the insert back into the new log clean). And then after that, I'll drop the temporary log.
Finally, I'll update the department on one of the existing rows, I'll insert a new row, and then I'll read the change tracking table again (which now contains a reference to the new department column).
I hope you enjoyed this walkthrough on CDC in SQL Server 2008. Again, in both the SetFocus SQL Server Master's Program and Business Intelligence Master's Program, I cover CDC in greater detail.
Comments