StreamBase Pattern Matching Language

StreamBase queries can compare data from one or more input streams using pattern matching statements. This topic describes the basic concepts.

In an EventFlow module, pattern evaluation is performed by the Pattern operator, where separate controls on the Properties view's Pattern Settings tab are used to specify the desired template, dimension, and interval values. A predicate entry qualifies which tuples match the pattern. See Using the Pattern Operator for more on configuring the Pattern operator.

Syntax

template [template [pattern_operator [NOT] template]...] window

Pattern Elements

template

An expression that evaluates to a stream identifier or alias. The pattern can include nested templates, or templates combined using pattern operators.

pattern_operator

A logical operator (AND, OR, THEN, or NOT) that describes the relationship between a pair of templates. See Pattern Operators for descriptions and shortcut notation.

window
  • For time-based patterns, specifies a time interval within which the operation must terminate.

  • For value-based patterns, specifies a maximum range of values in a particular field. The window terminates when the range of values for that order field, across all streams, equals or exceeds the specified value.

    For value-based patterns, the order field must be a top-level numeric or timestamp field: that is, it cannot be a sub-field in an event schema. For example, consider the following schema, whose second field is a nested tuple:

    (id int, sub (id2 int))

    Given the preceding schema, the only valid order field is id; you cannot specify either sub or sub.id2.

If the pattern is composed of multiple subpatterns, the entire pattern, not just each subpattern in isolation, must match within the window limits.

In EventFlow modules, windows (either time or value) are defined graphically in the Pattern Settings tab of the operator's Properties view.

Pattern Operators

Operator names are not case-sensitive. Operators have the following order of precedence (starting with the most tightly bound):

  1. NOT

  2. AND

  3. OR

  4. THEN

The following list describes the use of each pattern operator in templates. Notice that the operator keywords can also be expressed using the shorthand symbols shown:

NOT A or !A (Absence)

Matches when no A tuple is received within the window interval.

A AND B or A && B (Conjunction)

Combines two subpatterns, and matches when both subpatterns match. It is like a join in that it produces the cross-product of two streams. The AND pattern is right-associative. For example, A AND B AND C is interpreted as A AND (B AND C).

A OR B or A || B (Disjunction)

Combines two subpatterns, and matches when either subpattern matches. It is like union. The OR pattern is right-associative (for example, A OR B OR C is interpreted as A OR (B OR C)).

A THEN B or A -> B (Followed By)

Matches when a match on the left side is followed in sequence by a match on the right side. The THEN pattern is right-associative. For example, A THEN B THEN C is interpreted as A THEN (B THEN C).

Examples

Example 1. Equivalent Patterns

The following patterns are equivalent:

a AND b OR c THEN NOT d AND e
a && b || c -> !d && e
(((a and b) or c) then ((not d) and e))

Example 2. Predicate

A predicate added after the pattern further constrains the potential matches. For example consider this pattern statement:

A -> B

followed by this predicate:

WHERE A.id == B.id

This combination produces matches only where the ID of a tuple on the A stream matches the ID of a tuple on the B stream, and the tuple on the A stream arrived first. By contrast, a NOT predicate would specify which tuples resulting from the main pattern must not occur.


Example 3. Negated Stream in Predicate

You can use fields from a negated stream in a predicate. For example, consider the following time-based pattern query:

SELECT A.id AS fi, C.id AS fo
FROM PATTERN A -> !B -> C  WITHIN 5 TIME
WHERE B.id == A.id 
INTO out;

The predicate causes the pattern to ignore any tuple arriving on Stream B if the tuple's ID field does not match the ID field of a tuple that previously arrived on A.

To illustrate, the following figure represents a sequence of tuples flowing into Stream A, B, and C. At each step, the figure shows the value of the ID field. Assume that all these events occur within the specified time interval:

Notice these events:

  • After Step 3, tuples have arrived in the specified sequence, and the Stream B ID field does not match the existing Stream A ID. This satisfies the conditions of both the pattern and the predicate, and so the selected A.id and C.id values are emitted as output.

  • No output can occur after Step 3 until the specified pattern is repeated, beginning in Step 6.

  • In Step 7, the tuple arriving on Stream C satisfies the specified sequence, because after Step 6, no intervening tuple arrived on Stream B. Therefore, another tuple is output.