RegexClean Transformation

by Darren Green 25 Jul 2017 16:03

Use the power of regular expressions to cleanse your data right there inside the Data Flow. This transformation includes a full user interface for simple configuration, as well as advanced features such as error output configuration.

Two regular expressions are used, a match expression and a replace expression. The transformation is designed around the named capture groups or match groups, and even supports multiple expressions. This allows for rich and complex expressions to be built, all through an easy to reuse transformation where a bespoke Script Component was previously the only alternative.

Some simple properties are available for each column selected –

The two behaviour modes offer similar functionality but with a difference. Replace, replaces tokens with the input, and Emit overwrites the whole string.

Cascade allows you to define multiple expressions, each on a new line. The match expression will be processed into one operation per line, which are then processed in order at run-time. Multiple replace expressions can also be specified, again each on a new line. If there is no corresponding replace expression for a match expression line, then the last replace expression will be used instead. It is common to have multiple match expressions, but only a single replace expression.

Match Expression
The expression used to define the named capture groups. This is where you can analyse the data, and tag or name elements within it as found by the match expression.

Replace Expression
The replace determines the final output. It will reference the named groups from the match expression and assembles them into the final output.

If you want to use regular expressions to validate data then try the Regular Expression Transformation.

Quick Start Guide

  • Select a column. A new output column is created for each selected column; there is no option for in-place replacement of column values. One input column can be used to populate multiple output columns, just select the column again in the lower grid, using the Input Columns drop-down selector.
  • Amend the output column name and size as required. They default to the same as the input column selected.
  • Amend the behaviour as required, the default is Replace.
  • Amend the cascade option as required, the default is true.
  • Finally enter your match and replace regular expressions

Quick Sample #1

Parse an email address and extract the user and domain portions. Format as a web address passing the user portion as a URL parameter. This uses two match groups, user and host, which correspond to the text before the @ and after it respectively.

Behaviour is Emit, and cascade of false, we only have a single match expression.

Match Expression ^(?<user>[^@]+)@(?<host>.+)$

Replace Expression - http://www.${host}?user=${user}


Sample Input Sample Output

The component is provided as an MSI file, however for 2005/2008 you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the RegexClean Transformation from the list.


Please choose the version to match your SQL Server version. You can install multiple versions and use them side by side if you are developing for multiple versions of SQL Server. For the recent releases of SQL Server Data Tools that can target different versions of SQL Server, please install all target server versions your intend to use.

RegexClean Transformation for SQL Server 2005

RegexClean Transformation for SQL Server 2008

RegexClean Transformation for SQL Server 2012

RegexClean Transformation for SQL Server 2014

RegexClean Transformation for SQL Server 2016

Version History

SQL Server 2016

Version - SQL Server 2016 GA release.
(25 Jul 2017)

SQL Server 2014

Version - SQL Server 2014 release. Includes upgrade support for both 2005, 2008 and 2012 packages to 2014.
(6 Sep 2014)

SQL Server 2012

Version - SQL Server 2012 release. Includes upgrade support for both 2005 and 2008 packages to 2012.
(6 Jun 2012)

SQL Server 2008

Version - SQL Server 2008 Release
(20 Oct 2008)

SQL Server 2005

Version - Public Release
(28 Jan 2008)


RegexClean Transformation Editor dialog

Comments (28) -

1/22/2009 3:12:57 PM #

Tommy Petersson

A nice component, unfortunately it doesn't seem to be able to do what I want at the moment...

I would like it to check a column from a Flat File Source to see if it contains a date. Since there's no ISDATE in SSIS I wanted to see if this was an OK workaround. I'm OK with regexp checking for yyyy/mm/dd, even if it's 1981/44/55 - if the system creating the flat file has a date to export it's correct. Otherwise it's either an empty string or some blanks. If there is no date in the column, It's possible to derive that info from another column - but RegexClean don't let you get data from another column...

Tommy Petersson Sweden

1/23/2009 11:27:29 AM #

Allan Mitchell


It sounds like you may just need to validate the column value in which case may be more appropriate.  if your validation fails then on the alternate output you can hand a Derived Column transform which will be used to derive the dae from the other columns you mention.

Allan Mitchell Germany

2/20/2009 2:32:29 AM #


Anyone tried doing a replace with an empty expression?

I want to strip whitespace out of the input, and don't see any mechanism to specify empty place in RegEx, and the component won't accept an empty expression for a Replace.


2/20/2009 7:56:32 AM #

Allan Mitchell

I would probably use a Derived Column transform to replace my existing column.  There is a string REPLACE() function in there.

Allan Mitchell United Kingdom

2/24/2009 1:43:12 AM #


Cheers Allan, I had decided Replace was the way to go anyway.
Another issue: Doing a Replace to replace following characters with a space that I then do the Dervied Replace on:

The output column value is null though if none of those characters are in input - this doesn't seem right to me.


2/24/2009 1:44:34 AM #


My temp solution is to append a ' ' at end of column in the query..


3/26/2009 6:26:39 PM #


Hi, I am trying to remove all alphanumeric characters from the string using [^0-9] in Match section, but couldn't find the correct expression for the Replace section. Thanks,

Jason United Kingdom

5/1/2009 3:14:11 PM #

Steven Molen

Any way to access members within the group itself? i.e. if i have a string of and i have a match expression of ^(?<stringsonly>[^.]*), I want to access the different members separately i the replace syntax.

Steven Molen United States

6/25/2009 7:18:05 PM #



I did something similar and posted it out on MSDN. (  I didn't find an eloquent way to do it.

Devo United States

8/3/2009 9:34:47 PM #


I have unstructured text in a nvarchar(max) column that I am trying to extract into sepearate fields. I need to search for specific strings e.g. "THC METABOLITE CONFIRMED BY GC/MS 21 NG/ML" and extract the numbers after the GC/MS. I've written a regex \bTHC\sMETABOLITE\sCONFIRMED\sBY\sGC/MS\b.(?<amt>\d{4})to search for my string. How do I get the RegEx Clean to replace the string with the 21. The RegEx transformation correctly identifies every occurence, however I can not get the RegEx Clean to work. Any suggestions would be appreciated


8/6/2009 10:22:21 PM #


Built a Script Destination Component to put text into the data stream and used the RegEx clean in emit mode with the cascade option to search 34 specific patterns and replaced them all. Strings are similar to the ones in my previous post (\bTHC\sMETABOLITE\sCONFIRMED\sBY\sGC/MS\b.(?<amt>\d{4}). The transform works well.

Good job!


9/25/2009 5:44:28 PM #


Tool has been very useful- only my [mis]understanding of regexp getting in my way.
Although, I have yet to figure out how to remove a row once it's been created.
I am reading an http log file and culling out several data elements from the string. One of them is causing an error (I guess my regexp is incorrect) but I can't figure out how to remove that field from the transform in order to move forward.

Any help is appreciated.

Eric United States

1/27/2010 7:05:50 PM #


This works great for what I need to do involving dates.  However, I recieve an error when the column is null and the next output is to a derived column (I'm combining the date and time to a datetime after formatting the date and time in regex). "[RegexClean] Error: ocessing with an exception. Value cannot be null. Parameter name: input"

Any words?

Kyle United States

1/28/2010 7:55:14 PM #



I have written a SSIS component called RegExtractor that does what you would like to do.  You can specify a regular expression to operate on any text column and the captured matches of that regex get output to new columns.  The SSIS component is free and open source.  You can get it at:

Hope this helps (I realize I'm late in my response)


Eric United States

3/20/2010 2:26:23 AM #


Can someone expand upon what the Cascade option does? I've tried playing with it, but it doesn't produce any different results whether the box is checked or unchecked. I'm hoping someone can provide an example as to when you would or would not use it.

Spencer United States

8/10/2010 2:19:02 AM #


Have you adapted this for VS 2010 yet?

andrew Canada

8/10/2010 1:09:58 PM #


Sorry, I got it installed in the right place.  Really nice utility, thanks.

andrew Canada

9/17/2010 10:19:37 PM #

Dave Bunch

I want to extract characters [a-zA-Z0-9] from email addresses using the RegexClean component. When I first opened the RegexClean editor, for the input column 'email' I named the output column 'email' and for a 'Match Expression' I used:


I did not enter anything into or touch the 'Replace Expression' column. When I ran the package, RegexClean did exactly what I wanted it to:

Test email address before:   after: samsmithgmailcom

When I reopened the RegexClean editor, I accidentally clicked on the 'Match Expression' and from then on, received the error message: 'Replace Expression property value cannot be null.'  I can't seem to enter any replace expression that extracts the same output as before when I (incorrectly) used the component with an empty 'Replace Expression'. Any suggestions would be appreciated.

Dave Bunch United States

10/31/2010 9:28:00 PM #


This tool is pure dynamite!  I am having a ball with it.

Doug United States

11/1/2010 10:23:27 PM #


RegexClean does not appear to work with SSIS on SQL Server 2008 R2.  Is this product being maintained?

Doug United States

11/9/2010 12:39:14 PM #

Darren Green

Doug, Can you explain what problems you are having on R2? All our 2008 components are fully compatible with R2. If you have moved to a new R2 server, make sure you have installed the component on the new machine.

Darren Green United Kingdom

11/10/2010 10:11:13 PM #

Dave Bunch

Figured out how to remove non-alphanumeric characters from a string

This was such a pain in the arse to figure out that I wanted to post my solution to hopefully help others. I wanted to use RegexClean to strip non-alphanumeric characters from a string - in this case an email address -- and return only the alphanumeric characters that remained. Here are the RegexClean expressions I used to achieve this:

Match Expression: (?<stringIN>[^a-zA-Z0-9]+)|(?<stringOUT>[a-zA-Z0-9]+)

Replace Expression: ${stringOUT}

I really dig this component now. Thanks for sharing it!

Dave Bunch United States

1/25/2011 6:19:48 PM #

Kenneth Franklin

I am trying to use this component to clean flat file data. It works perfectly, unless there is no match on the row, when I just get NULL results. Using RegexClean with:
Match exp: \s{2,100}[|]
Replace exp: |
On these two rows (one column of data in the data flow):
00000|0000000|  |0000000000000|00000

Row 2 is modified properly, but row 1 is now NULL. This Match/Replace combination works in any regexp tester I have tried, but not with this component. Am I doing something wrong?


Kenneth Franklin United States

3/10/2011 8:38:40 AM #

Ty Parten

Thanks for the utilities! They help quite a lot. (Love the trash destination :)

Is there a way to replace the Match and Replace expressions dynamically prior to entering RegExClean? I'm processing property-value pairs and the configuration of the transform is dynamic from pair to pair. Of course I'm processing the transforms in a loop so that I only have to replace the expressions once per batch. (i.e. not replacing row-by-row, lol)

I'm familiar with how to set properties ahead of time (empty for loop) but I can't seem to locate the correct ones.


Ty Parten United States

12/21/2011 6:12:43 PM #

Craig Bobchin

This looks pretty cool. I have a question about a particular file I'm trying to clean however, I have a | delimited text file, One of the fields (in the middle of the record no less) can contain 1 or more embedded returns before you get to the next field.

I've got a RegEx macro that I use in Ultraedit to clean the file prior to importing, but I'd like to be able to do it in the SSIS package. This macro removes the embedded returns and puts the record back to one line/record.

Can this transform handle embedded returns in a field? If so how would I go about setting it up?


Craig Bobchin United States

12/21/2011 6:44:02 PM #

Craig Bobchin

I have one more question. It appears this does not work for text stream (memo) type fields. Is there any hope for adding support for them?



Craig Bobchin United States

9/5/2012 9:54:27 PM #

Colin Miller

I have used the following code in T-SQL to remove carriage returns:

set @NewLine=char(13)+char(10)
update dbo.[DOTDOT3]
     set WEB =Replace(WEB , @NewLine,'__')
WHERE WEB like '%' +@NewLine +'%'

For some reason I don't remember I replaced the carriage return with '__' then did search and replace to remove the '__'

Colin Miller United States

10/7/2016 8:11:52 PM #


Hello everyone.
I just download and used this tool (component RegexClean).
Works well. Many thanks to the creator(s).
Just a little bug (and his workaround):
If you try to replace something by an empty string, if you don't click or enter in the ReplaceExpressions textbox, it's work.
But as soon as you click or enter in the ReplaceExpressions textbox, you are asked to provide a string (cannot be null...).
I see that some of the previous comments ask for how to do this.
Very simple: don't touch this textBox.

Marty Canada

Pingbacks and trackbacks (2)+

Add comment

  Country flag

  • Comment
  • Preview