Hopping/Tumbling Windows Could Introduce Latency.

by Allan Mitchell 5 Dec 2010 15:15

This is a pre-article to one I am going to be writing on adjusting an event’s time and duration to satisfy business process requirements but it is one that I think is really useful when understanding the way that Hopping/Tumbling windows work within StreamInsight.  A Tumbling window is just a special shortcut version of  a Hopping window where the width of the window is equal to the size of the hop

Here is the simplest and often used definition for a Hopping Window.  You can find them all here

public static CepWindowStream<CepWindow<TPayload>> HoppingWindow<TPayload>(
    this CepStream<TPayload> source,
    TimeSpan windowSize,
    TimeSpan hopSize,
    WindowInputPolicy inputPolicy,
    HoppingWindowOutputPolicy outputPolicy
)

 

And here is the definition for a Tumbling Window

public static CepWindowStream<CepWindow<TPayload>> TumblingWindow<TPayload>(
    this CepStream<TPayload> source,
    TimeSpan windowSize,
    WindowInputPolicy inputPolicy,
    HoppingWindowOutputPolicy outputPolicy
)

 

These methods allow you to group events into windows of a temporal size.  It is a really useful and simple feature in StreamInsight.  One of the downsides though is that the windows cannot be flushed until an event in a following window occurs.  This means that you will potentially never see some events or see them with a delay. 

Let me explain.

Remember that a stream is a potentially unbounded sequence of events. Events in StreamInsight are given a StartTime.  It is this StartTime that is used to calculate into which temporal window an event falls.  It is best practice to assign a timestamp from the source system and not one from the system clock on the processing server.  StreamInsight cannot know when a window is over.  It cannot tell whether you have received all events in the window or whether some events have been delayed which means that StreamInsight cannot flush the stream for you.  

Imagine you have events with the following Timestamps

12:10:10 PM
12:10:20 PM
12:10:35 PM
12:10:45 PM
11:59:59 PM

And imagine that you have defined a 1 minute Tumbling Window over this stream using the following syntax

var HoppingStream = from shift in inputStream.TumblingWindow(TimeSpan.FromMinutes(1),HoppingWindowOutputPolicy.ClipToWindowEnd)
                    select new WindowCountPayload { CountInWindow = (Int32)shift.Count() };

 

The events between 12:10:10 PM and 12:10:45 PM will not be seen until a CTI in a later window is enqueued.  Say we enqueue CTIs after each of the first 4 events and then do not do so again until after the event at 11:59:59, it is only then that StreamInsight knows the window carrying our previous events is finished and can be flushed.  Remember CTIs are responsible for moving events through the StreamInsight engine not the events themselves.  This could be a real problem if you need to react to windows promptly.  What I would say here is this is not a problem unique to StreamInsight.  Semantically we have to wait until the event enqueuing stream tells us through a CTI that a window is complete before we can process it.

This can always be worked around by using a different design pattern but a lot of the examples I see assume there is a constant, very frequent stream of events resulting in windows always being flushed.

Further examples of using windowing in StreamInsight can be found here

Add comment

  Country flag

biuquote
  • Comment
  • Preview
Loading