Home » Articles » SQL Server – Change Data Capture(CDC)

SQL Server – Change Data Capture(CDC)

Our Categories

Blog Stats

  • 33,407 hits

Calendar

January 2013
M T W T F S S
    Feb »
 123456
78910111213
14151617181920
21222324252627
28293031  

Please visit SQLVERSITY.COM for more details.

Thank you very much for reading my previous article Change Tracking (CT) and I hope you have enjoyed while reading the content of CT.

As I promised you guys, I am back with another awesome feature of SQL Server 2008 called ‘Change Data Capture (CDC)’.  As you all know that I am a lazy fellow and in this entire article I am going to call it as CDC for my convenience.

Microsoft has introduced this feature to overcome a small drawback in CT.

I hope you already got to know the drawback in CT. That is, we cannot get historical data in CT. That means, we can get the details of Primary key, DML (like I/D/U) changes which we have done to the table. But not entire modified data. By using CDC, we can retrieve historical changes also.

Here it goes……!!!

Before starting CDC, I would like to say one important thing to you guys. Do not think that CDC is similar to CT. Though they look similar, there is quite difference between them. We may have this discussion in the next article.

As the name itself says that, it captures the changed data (Updated/Deleted/Inserted) and stores it in separate tables which are automatically created when we enable CDC. It captures all fields values but does track only Primary key or does not depend on Primary key like CT.

I hope you already have an idea on SQL Server data files and log files. If not, let me give you brief introduction.

When we create a database, two data files (with .mdf and .ldf extensions) will be created automatically in

C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\location\

.mdf file: It is called as main data file which contains the current data of that database.

.ldf file: It is called as log data file which contains the transaction log details. That means if we do any operation like delete/insert/update…everything will be stored in this file.

Do not feel that I am explaining about mdf and ldf files unnecessarily. Since .ldf file is heart of CDC, I explained about them. As I explained above .ldf file holds data changes in a database, it will be useful while capturing changed data. So our CDC simply reads the .ldf file of a particular database and stores the required data in some tables which are created only to hold this kind of data.

Now I am creating a database and table to work with CDC. Since you all aware of creating database and tables or inserting data into it, I am just moving forward without explaining the below code.


---------------------------------------------------
-- Creating Database ChangeDataCaptureDB
---------------------------------------------------
USE [master]
GO

CREATE DATABASE [ChangeDataCaptureDB]
GO
---------------------------------------------------
-- Creating Table [dbo].[MSCertificationCapturing]
---------------------------------------------------
USE [ChangeDataCaptureDB]
GO

CREATE TABLE [dbo].[MSCertificationCapturing](
CertID INT PRIMARY KEY,
CertName Varchar(30) NOT NULL,
Eligibility Varchar(10),
RegFee$ INT
)
GO
----------------------------------------------------------------------
-- Insert 5 records into [dbo].[MSCertificationCapturing] table
----------------------------------------------------------------------
BEGIN TRANSACTION [CDC]
BEGIN TRY

INSERT [dbo].[MSCertificationCapturing]
VALUES(70433,'DBDevelopment','Developers',80)
INSERT [dbo].[MSCertificationCapturing]
VALUES(70483,'Programming in C#','Developers',60)
INSERT [dbo].[MSCertificationCapturing]
VALUES(70462,'Administring SQLServer','Admin',100)
INSERT [dbo].[MSCertificationCapturing]
VALUES(70685,'W7 Enterprise Support','Technician',50)
INSERT [dbo].[MSCertificationCapturing]
VALUES(70563,'.NET Framework','Developers',80)

COMMIT TRANSACTION [CDC]
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION [CDC]
END CATCH
GO
----------------------------------------------------------------------
SELECT * FROM [dbo].[MSCertificationCapturing]–- 5 rows
------------------------------------------------------------------------

Change Data Capture-1

Flow of CDC:

Change Data Capture-2

Let’s dig into this flow in detail.

1. Enable CDC on Database:

Before enabling CDC, let us check the properties of our database. As shown in the below screenshot, sys.databases table has a column called ‘is_cdc_enabled’ which represents whether CDC is enabled on that particular database or not. Possible values of this column are 0 or 1.

0 — indicates CDC is not enabled and

1 — indicates CDC is enabled on the database.

By default it is 0(disabled) when we create a database.

Please run the below query to check whether CDC is enabled on our database or not.

-------------------------------------------------
USE [ChangeDataCaptureDB]
GO

EXEC sp_cdc_enable_db
GO
-------------------------------------------------

Change Data Capture-3

Let’s enable CDC on our database by executing the below procedure.

-------------------------------------------------
USE [ChangeDataCaptureDB]
GO

EXEC sp_cdc_enable_db
GO
-------------------------------------------------

Change Data Capture-4

I do not know whether you have noticed or not, without knowing yourself you created the below.

When we enable CDC on a database, by default the following will be created. If they are not displaying in object explorer, please refresh your SQL Server instance and open.

Change Data Capture-5

Tables:

cdc.captured_columns: It holds list of captured column.

cdc.change_tables: It holds list of tables for which CDC is enabled.

cdc.ddl_history: It holds all DDL changes since CDC is enabled.

cdc.index_columns: It holds list of indexes which related to our CDC table.

cdc.lsn_time_mapping: It maps LSN number and time.

dbo.systranschemas: It is used to track schema changes.

No need to bother about these tables. So I am not going in detail 🙂

Users:

CDC

Schema:

CDC

2. Start SQL Server Agent service:

This service is required to schedule SQL Server agent jobs. Since we are going to use these jobs in future in this article, we need to start this service if it is not started automatically.

Change Data Capture-6

If it is not started, please follow below steps to start the service.

Start > Run > type Services.msc > Ok > SQL Server Agent (MSSQLSERVER) > Start > Ok.

Now check your SQL Server agent whether it is started or not.

Change Data Capture-7

3. Enable CDC on Table:

Please run the below query to check CDC is enabled on our table or not.

-------------------------------------------
USE [ChangeDataCaptureDB]
GO

SELECT is_tracked_by_cdc FROM sys.tables
WHERE name ='MSCertificationCapturing'
GO
-------------------------------------------

Change Data Capture-8

It is not enabled. Let’s enable CDC on it. Please execute the below procedure to enable CDC on [dbo].[MSCertificationCapturing] table.

------------------------------------------------
USE [ChangeDataCaptureDB]
GO

EXEC sys.sp_cdc_enable_table
@source_schema ='dbo',
@source_name ='MSCertificationCapturing',
@role_name =NULL,
@supports_net_changes = 1
GO
------------------------------------------------

Let’s have a discussion on the above parameters.

@source_schema:Schema name of the table that we want to enable CDC.

@source_name: Table name that we want to enable CDC.

@role_name: It is used to restrict the access to some users. If it is NULL, then it will be accessible to everyone.

@supports_net_changes: It can have 0 or 1.

0 — To track all changes.

1 — To track all changes and net changes.

Some more parameters are there in this procedure. But they are optional and not much important.

Change Data Capture-9

Now CDC is enabled. If we want we can check by running above query.

-------------------------------------------
USE [ChangeDataCaptureDB]
GO

SELECT is_tracked_by_cdc FROM sys.tables
WHERE name ='MSCertificationCapturing'
GO
-------------------------------------------

After enabling CDC on table, by default the following were created.

Table:

cdc.dbo_MSCertificationCapturing_CT – This is the main table which holds all the changed data since the CDC is enabled.

Below is the list of columns in this table.

  • __$start_lsn
  • __$end_lsn
  • __$seqval
  • __$operation
  • __$update_mask

Out of the above 5 columns, __$operation column is much important for our data capturing. It can have 1, 2, 3 and 4 as values. Below table shows the meaning of these values.

Value Meaning

1

Delete

2

Insert

3

Value before update

4

Value after update

So, based on this column value, we are going to say which DML operation has done on the data.

Change Data Capture-10

Jobs:

cdc.ChangeDataCaptureDB_capture:  It reads the data from .ldf file and stored all data in change table.

cdc.ChangeDataCaptureDB_cleanup: It cleans the old data. By default it cleans for 3 days.

Change Data Capture-11

Note: Not only above, some more procedures and functions also creates when we enable CDC.

4 & 5.Make changes and Track the data:

Before modifying data, let us see the records in tables.

----------------------------------------------------------------------
SELECT * FROM [dbo].[MSCertificationCapturing] -- 5 rows
----------------------------------------------------------------------
SELECT * FROM cdc.dbo_MSCertificationCapturing_CT -- 0 rows
----------------------------------------------------------------------

Change Data Capture-12

Now I am going to do modification in the existing data of [dbo].[MSCertificationCapturing] table.

Please run the below script.

----------------------------------------------------------------------
-- Making DML changes to [dbo].[MSCertificationCapturing] table
----------------------------------------------------------------------
BEGIN TRANSACTION [CDC_DML]
BEGIN TRY

INSERT [dbo].[MSCertificationCapturing]
VALUES(70620,'MS Excel Macros','Developers',90)
DELETE FROM [dbo].[MSCertificationCapturing]
WHERE CertID = 70433
UPDATE [dbo].[MSCertificationCapturing]
SET RegFee$ = 70
WHERE RegFee$ = 50

COMMIT TRANSACTION [CDC_DML]
END TRY

BEGIN CATCH
ROLLBACK TRANSACTION [CDC_DML]
END CATCH
GO
----------------------------------------------------------------------

Now check the data.

----------------------------------------------------------------------
SELECT * FROM [dbo].[MSCertificationCapturing] -- 5 rows
----------------------------------------------------------------------
SELECT * FROM cdc.dbo_MSCertificationCapturing_CT -- 4 rows
----------------------------------------------------------------------

Change Data Capture-13

Even if we do not know about the DML script which we ran above, we can find the modifications by looking at the above screenshot.

6. Disable CDC on Table:

Yes..!!! We are done with data capture.

Now ultimately our next step will be disabling CDC. We should disable on database as well as table.

But we should first disable on table and then table. As you already gone through my Change Tracking article, I need not explain the reason why am I disabling fist on table not on database.

Please execute the below procedure to disable CDC on table.

------------------------------------------------
USE [ChangeDataCaptureDB]
GO

EXEC sys.sp_cdc_disable_table
@source_schema ='dbo',
@source_name ='MSCertificationCapturing',
@Capture_Instance ='dbo_MSCertificationCapturing'
GO
------------------------------------------------

Change Data Capture-14

7. Disable CDC on database:

Please execute the below procedure to disable CDC on table.

------------------------------------------------
USE [ChangeDataCaptureDB]
GO

EXEC sys.sp_cdc_disable_db
GO
------------------------------------------------

Change Data Capture-15

Ufff…..!!! We are done with Change Data Capture :). 

Please do not think that this article is complete reference for CDC 🙂. This article gives you basic understanding and I almost covered all concepts. If I discuss each and every topic in depth, then it will not be an article and becomes a complete reference of CDC. So I am concluding here.

If you guys have little confusion between CT and CDC, do not worry 🙂. Let’s have a great discussion in my next article Comparison between Change Tracking and Change Data Capture’

Have a nice day…!!!

ALL THE BEST


13 Comments

  1. razvantim says:

    Very interesting topic and useful to track/recover changes of data. Thank you

  2. Venkatesh says:

    Nice work.. It helped alot.. Keep it up dude..

  3. Chetan says:

    G8 work. Helped me alot….
    Share some more dude…

  4. Rani says:

    very nice article with simple explanation for detailed understanding

  5. […] To overcome the above drawbacks, SQL Server 2008 introduced powerful and efficient tracking mechanisms called‘Change Tracking (CT)’ and Change data Capture (CDC)   […]

  6. UGAN says:

    Good Article….

  7. Igor says:

    Nice article!
    But unfortunatelly I have Standard Edition, so I can’t use this feature.
    I have to update several tables in my database and I must report users which columns are updated and show values before and after update. So, how can I do it without CTC?

  8. charles says:

    Thanks for sharing nice information about changes tracking. This article described about how captures the changed details. I have also found good information from http://www.sqlserveraudit.org/. This tool allows to track the SQL server changes as who, what, when and where changes and audit specific user activities. It provides data filter options that helps to unfold the minutest of information to the last level and allows to automate the report generation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: