RegexClean Transformation

Use the power of regular expressions to cleanse your data right there inside the Data Flow. This transformation includes a full user interface for simple configuration, as well as advanced features such as error output configuration.

Two regular expressions are used, a match expression and a replace expression. The transformation is designed around the named capture groups or match groups, and even supports multiple expressions. This allows for rich and complex expressions to be built, all through an easy to reuse transformation where a bespoke Script Component was previously the only alternative.

Some simple properties are available for each column selected –

Behaviour
The two behaviour modes offer similar functionality but with a difference. Replace, replaces tokens with the input, and Emit overwrites the whole string.

Cascade
Cascade allows you to define multiple expressions, each on a new line. The match expression will be processed into one operation per line, which are then processed in order at run-time. Multiple replace expressions can also be specified, again each on a new line. If there is no corresponding replace expression for a match expression line, then the last replace expression will be used instead. It is common to have multiple match expressions, but only a single replace expression.

Match Expression
The expression used to define the named capture groups. This is where you can analyse the data, and tag or name elements within it as found by the match expression.

Replace Expression
The replace determines the final output. It will reference the named groups from the match expression and assembles them into the final output.

If you want to use regular expressions to validate data then try the Regular Expression Transformation.

Quick Start Guide

  • Select a column. A new output column is created for each selected column; there is no option for in-place replacement of column values. One input column can be used to populate multiple output columns, just select the column again in the lower grid, using the Input Columns drop-down selector.
  • Amend the output column name and size as required. They default to the same as the input column selected.
  • Amend the behaviour as required, the default is Replace.
  • Amend the cascade option as required, the default is true.
  • Finally enter your match and replace regular expressions

Quick Sample #1

Parse an email address and extract the user and domain portions. Format as a web address passing the user portion as a URL parameter. This uses two match groups, user and host, which correspond to the text before the @ and after it respectively.

Behaviour is Emit, and cascade of false, we only have a single match expression.

Match Expression ^(?<user>[^@]+)@(?<host>.+)$

Replace Expression – http://www.${host}?user=${user}

Results

Sample Input Sample Output
zheng0@adventure-works.com http://www.adventure-works.com?user=zheng0

The component is provided as an MSI file, however for 2005/2008 you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items…. Select the SSIS Data Flow Items tab, and then check the RegexClean Transformation from the list.

Downloads

Please choose the version to match your SQL Server version. You can install multiple versions and use them side by side if you are developing for multiple versions of SQL Server. For the recent releases of SQL Server Data Tools that can target different versions of SQL Server, please install all target server versions your intend to use.

RegexClean Transformation for SQL Server 2005

RegexClean Transformation for SQL Server 2008

RegexClean Transformation for SQL Server 2012

RegexClean Transformation for SQL Server 2014

RegexClean Transformation for SQL Server 2016

RegexClean Transformation for SQL Server 2017

RegexClean Transformation for SQL Server 2019

Version History

SQL Server 2019

Version 7.0.2.10 – SQL Server 2019.
(29 Sep 2019)

SQL Server 2017

Version 6.0.2.10 – SQL Server 2017.
(29 Sep 2019)

SQL Server 2016

Version 5.0.2.10 – SQL Server 2016 GA release.
(25 Jul 2017)

SQL Server 2014

Version 4.0.0.8 – SQL Server 2014 release. Includes upgrade support for both 2005, 2008 and 2012 packages to 2014.
(6 Sep 2014)

SQL Server 2012

Version 3.0.0.7 – SQL Server 2012 release. Includes upgrade support for both 2005 and 2008 packages to 2012.
(6 Jun 2012)

SQL Server 2008

Version 2.0.0.7 – SQL Server 2008 Release
(20 Oct 2008)

SQL Server 2005

Version 1.0.0.105 – Public Release
(28 Jan 2008)

Screenshot

RegexClean Transformation Editor dialog