Checksum Transformation

by Darren Green 6 Jun 2016 00:00
The Checksum Transformation computes a hash value, the checksum, across one or more columns, returning the result in the Checksum output column. The transformation provides functionality similar to the T-SQL CHECKSUM function, but is encapsulated within SQL Server Integration Services, for use within the pipeline without code or a SQL Server connection. As featured in The Microsoft Data Warehouse Toolkit by Joy Mundy and Warren Thornthwaite from the Kimbal Group. Have a look at the book samples especially Sample package for custom SCD handling. All input columns are passed through the tr... [More]

RegexClean Transformation

by Darren Green 6 Sep 2014 16:03
Use the power of regular expressions to cleanse your data right there inside the Data Flow. This transformation includes a full user interface for simple configuration, as well as advanced features such as error output configuration. Two regular expressions are used, a match expression and a replace expression. The transformation is designed around the named capture groups or match groups, and even supports multiple expressions. This allows for rich and complex expressions to be built, all through an easy to reuse transformation where a bespoke Script Component was previously the only alter... [More]

Regular Expression Transformation

by Darren Green 6 Sep 2014 16:01
The regular expression transformation exposes the power of regular expression matching within the pipeline. One or more columns can be selected, and for each column an individual expression can be applied. The way multiple columns are handled can be set on the options page. The AND option means all columns must match, whilst the OR option means only one column has to match. If rows pass their tests then rows are passed down the successful match output. Rows that fail are directed down the alternate output. This transformation is ideal for validating data through the use of regular expressio... [More]

Row Count Plus Transformation

by Darren Green 6 Sep 2014 16:00
As the name suggests we have taken the current Row Count Transform that is provided by Microsoft in the Integration Services toolbox and we have recreated the functionality and extended upon it. There are two things about the current version that we thought could do with cleaning up Lack of a custom UI You have to type the variable name yourself In the Row Count Plus Transformation we solve these issues for you. Another thing we thought was missing is the ability to calculate the time taken between components in the pipeline. An example usage would be that you want to know how ma... [More]

Row Number Transformation

by Darren Green 6 Sep 2014 15:56
The Row Number Transformation calculates a row number for each row, and adds this as a new output column to the data flow. The column number is a sequential number, based on a seed value. Each row receives the next number in the sequence, based on the defined increment value. The final row number can be stored in a variable for later analysis, and can be used as part of a process to validate the integrity of the data movement. The Row Number transform has a variety of uses, such as generating surrogate keys, or as the basis for a data partitioning scheme when combined with the Conditional ... [More]

Red Gate join the SSIS custom component club

by Darren Green 11 Jul 2011 08:11
I recently noticed that Red Gate have launched themselves into the SSIS component market by releasing a new Data Cleanser component, albeit in beta for now. It seems to be quite a simple component, bringing together several features that you can find elsewhere, but with a suitable level  polish that you’d expect from them. String operations include find and replace with regular expressions, case formatting and trim, all of which are available today in one form or another, but will the RedGate factor appeal to people? Benefits include ease of use, all operations in one place, versus ... [More]

SSIS and StreamInsight Working Together.

by Allan Mitchell 18 Jan 2011 21:59
I have been thinking a lot recently about what it would be like to have StreamInsight and SSIS working together.  Well the CAT team have produced a paper on some of our options here. Here are some of my thoughts. There is of course a slight mismatch in their types of usage.  StreamInsight is an Event Stream processing engine capable of operating on new data in the sub second timeframe.  The engine allows you to do real time analytics and take decisions on events that have potentially only just happened.  SSIS on the other hand is a batch processing engine.  In... [More]

Sorting data in the SSIS Pipeline (Video)

by Allan Mitchell 29 May 2010 20:27
In this post I want to show a couple of ways to order the data that comes into the pipeline.  a number of people have asked me about this primarily because there are a number of ways to do it but also because some components in the pipeline take sorted inputs.  One of the methods I show is visually easy to understand and the other is less visual but potentially more performant.

Script Task/Component and Template Information

by Allan Mitchell 8 Dec 2009 06:46
The Script Task and Script component are often used by people developing SSIS packages because they are easy to use and now because SSIS could be perceived to be more developer friendly they are very powerful. That being said we should no be using them everywhere.  There are generally Tasks/Components already provided that will do the job it may be that we have to rethink the way we want to draw our package. I had cause last week to break out the script component in SQL Server 2008 SP1 and found that it was broken.  I don’t know when it broke as I do not use them all that often.... [More]

Attunity Oracle CDC Solution for SSIS - Beta

by Allan Mitchell 18 Feb 2009 12:23
We in no way work for Attunity but we were asked to test drive a beta version of their Oracle CDC solution for SSIS.  Everybody should know that moving more data than you need to takes too much time and uses resources that may better be employed doing something else.  Change data Capture is a technology that is designed to help you identify only the data that has had something done to it and you can therefore move only what is needed.  Microsoft have implemented this exact functionality into SQL server 2008 and I really like it there.  Attunity though are doing it on Oracle... [More]

SSAS Cache Warming Using SSIS

by Allan Mitchell 31 Oct 2008 08:22
Having your users walk into a cold cache on a morning is something that will cause them to have slower running queries initially.  Cache warming takes care of a lot of that by executing typical queries against the newly processed cube and thereby building the cache ready for your users when they get in.  There are a number of ways to warm the cache available. Chris Webb has already blogged about how he has done it and we thought we would show you a variation on that theme. UPDATE: The Trace File Source Adapteris now available. The package is relatively simple.     ... [More]

Expressions

by Darren Green 28 Oct 2008 08:40
Data Flow Property Expressions Property expressions within the data flow are not immediately obvious. Whilst most pipeline or data flow components do support them for their properties, the expression is not generally exposed through the component user interface, neither is the Expressions property shown properties grid for the component. Property expressions are only available at the task level, so for those components that do support expressions, you need to look in the parent data flow task's property Expressions collection. This can be accessed through the Property Grid for the task. You n... [More]

Expression Date Functions

by Darren Green 28 Oct 2008 07:58
Date Parts Expressions support a range of date related functions such as DATEADD, with the same basic syntax to that found in T-SQL. Whilst the familiarity is very helpful, the difference that catches me out is the format of date part which must be quoted. T-SQL allows this: DATEADD(n, -10, GETDATE()) DATEADD(mi, -10, GETDATE()) DATEADD(minute, -10, GETDATE()) The SSIS equivalent is: DATEADD("n", -10, GETDATE()) DATEADD("mi", -10, GETDATE()) DATEADD("minute", -10, GETDATE()) Related functions that use the same date part tokens - DATEAD... [More]

Extending SSIS with custom Data Flow components (Presentation)

by Darren Green 6 Mar 2008 14:00
Download the slides and sample code from my Extending SSIS with custom Data Flow components presentation, first presented at the SQLBits II (The SQL) Community Conference. Abstract Get some real-world insights into developing data flow components for SSIS. This starts with an introduction to the data flow pipeline engine, and explains the real differences between adapters and the three sub-types of transformation. Understanding how the different types of component behave and manage data is key to writing components of your own, and probably should but be required knowledge for anyone bu... [More]

Multicast Transform and Threading in SQL Server 2008

by Allan Mitchell 19 Sep 2007 14:00
The Multicast transform in SSIS 2005 enables us to take 1 input dataset and from it generate n output datasets that are an exact copy of the input dataset which is extremely useful at times and we have used it on a number of projects. The downside is that those outputs are synchronous outputs and therefore are on the exact same thread as the input dataset. Normally a Synchronous output is good as no memory has to swap buffers and it is exceptionally quick. A problem arises if we have on one or more of those outputs a blocking Asynchronous transform. When this happens processing of the other ou... [More]

OLE-DB Command Transformation

by Allan Mitchell 2 Aug 2004 14:00
In the newsgroups we have been asked on a number of occasions how to execute a stored procedure once for every row of input data. There's currently no really good way of doing this but in SQL Server 2005 DTS that is about to change. This article is going to show you how to take values from a source table and for every row execute a stored procedure which inserts values into another table. This package is really quite simple. It involves only one Data Flow task in the Workflow and two components in the Data Flow, an OLE DB Source adapter and the OLE DB Command Transformation. As alway... [More]

RecentComments

Comment RSS