18 Jan 2011 21:59
I have been thinking a lot recently about what it would be like to have StreamInsight and SSIS working together. Well the CAT team have produced a paper on some of our options here.
Here are some of my thoughts.
- There is of course a slight mismatch in their types of usage. StreamInsight is an Event Stream processing engine capable of operating on new data in the sub second timeframe. The engine allows you to do real time analytics and take decisions on events that have potentially only just happened. SSIS on the other hand is a batch processing engine.
- In general I do not like having to invoke the same package more than once every 90 seconds or so as it can start to get expensive. Usually when doing batch processing we have an hour or longer of grace before we have to move data from A –> B.
- StreamInsight operates on streams of data. Before anyone mentions it yes I know StreamInsight is equally adept at using the IEnumerable interface, but I would argue live streaming and real-time analytics is a primary goal of the product.
- SSIS does not have an “Always On” button
- I do not like the idea of embedding StreamInsight inside SSIS using a transform particularly. It means StreamInsight becomes a batch processing engine because it can only operate when the SSIS package is running and SSIS is in charge of when that happens.
- If I am to have StreamInsight within SSIS then I prefer to have StreamInsight on the adapters. This way you can force the adapters to stay open and introduce events into your Pipeline. SSIS has a much richer set of transforms out of the box than StreamInsight. Although “Always On” was not a design goal of SSIS I have used it like this and it works just fine.
- SSIS being called from within StreamInsight, now that excites me. see below
For a while now I have been thinking what it would be like to decouple the Data Flow task from the SSIS package and expose it as something with which you can interact. Anything can instantiate this version of a DFT as it would expose one or more input interfaces and one or more output interfaces. I can imagine that this would be a big hit when moving to “The Cloud” as well. I could see the Data Flow task maybe being hosted in Azure Appfabric or some such layer. StreamInsight would be able to take advantage of this as well.
I am interested to see where this goes and will be pressing for more meat around the subject when I visit Redmond soon.