Generating Surrogate Keys

by Guest 4 Apr 2005 14:00

(By Jamie Thomson)

Introduction

Surrogate keys are generally considered fundamental building blocks of a data warehouse. They are used as identifiers for dimensional members and enable us to manage slowly changing dimensions.

SSIS does not contain a built in component for generating surrogate keys but there is still a mechanism for doing it – the Script Component. The Script Component allows us to modify the data in a data flow path using managed code and we can use it to generate surrogate keys.

The Row Number Transformation can be used to help generate surrogate keys.

Methodology

1. Creating the transformation

When you drag a script component from the toolbox to the design surface you will be prompted as to whether the component is to be a source adapter, a destination adapter or a transformation. Select Transformation.

Select Script Component Type

2. Configuring the transformation metadata

Your script component should contain 1 input and 1 output. This is to be a synchronous transformation so the SynchronousInputID property of the output must be the same as the ID property of the input. The output should have 1 column which in this instance I have called SK.

In the case of a synchronous transformation no output buffer is required; the input buffer is used by the output. Therefore, any columns that we create on our output actually appear as if they are part of the input buffer as we will see later in the script

A good rule of thumb for synchronous script transformations is that you should only add columns to the output which do not already exist in your input because the input columns will flow through the script transformation in any event

Inputs and Outputs

Note in the screenshot above the SynchronousInputID property of the output is set to the ID of the Input thus indicating that this is a synchronous transformation.

3. Building the script

We are going to use managed code to populate SK with an integer that acts as a surrogate key.

The code looks like this:

Code

There are four things to note about what is going on here:

  • We have declared a variable called counter that will contain a value to be outputted as the surrogate key.
  • We have initialized counter within the New() method which gets called just once on each execution
  • We increment counter within the Input_ProcessInputRow() method which gets called for each row in the input buffer
  • We are outputting the contents of the input buffer to the component output along with our surrogate key value from counter

Conclusion

And that's it! Its all very simple. Naturally your surrogate key values won’t always start from 1 as they do here but you can build on this method to pass in the current maximum value of the surrogate key in question in order to initialize counter.

The example package that has been illustrated above can be downloaded (7KB) Generating Surrogate Keys.zip. It is very simple to run as it does not require any configuration. The synchronous nature of the script transformation is demonstrated by passing through 100000 rows of data. Download it and try it out!

Comments (2) -

2/19/2009 4:54:41 AM #

namemo

Thanks. this is useful for me.

namemo Thailand

3/23/2010 11:06:43 AM #

KG


I assume this would pick the latest value of the SK from a table and write back at the end?

What happens in the event of a crash, particularly if loading records in batches of say 10000, what happens if the load crashes at row 12000 and the SK is not written back to the SK table?

I am probably missing something, would appreciate comments.

KG

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading