To specify a field-based dimension, select Field from the Type drop-down list in the Edit Dimension dialog. The drop-down list labeled Field then shows all fields in the incoming tuple of type int, long, double, or timestamp (interpreted as an interval timestamp in seconds).
With a field-based dimension, a new window is established and evaluated based on the value of a numeric field in the incoming tuple. A tuple containing the results of the aggregation is emitted and typically the window is closed when a tuple arrives whose field value exceeds the specified range of the open window. The arriving tuple that triggered the close is placed in the next window.
A window also closes and emits when the period specified in the Close and emit after expression elapses, even if no tuple arrives.
The field you select is typically a timestamp field, or a numeric index or count that behaves like a timestamp. The values
of any such field are assumed to increase with each new tuple, but not necessarily in a regular fashion. A runtime OutOfOrder
exception occurs if Input values for the selected numeric field do not increase monotonically. The window for a dimension
based on a timestamp field can be set to emit and close based on elapsed time. For more information and a caveat about using
time-based aggregation windows, see Aggregate Operator: Time-Based Dimension Options.
The Edit Dimension dialog for a field-based dimension has the following appearance.
In this example, even though the field dimension identified here as AggregateTradesDim represents time and its values are assumed to increase monotonically, it is not of data type timestamp, but is relative to some unspecified base time.
Using the group-as option, the Aggregate Operator Field Dimension Sample aggregates trading volume on a per-stock-symbol basis over time. Its input stream has the schema {Time: int, Symbol: string, Volume: int}. Its output stream has the schema {Symbol: string, TimeChunk: int, TotalVolume: int}. This Aggregate operator field dimension opens a window when the first tuple arrives and receives tuples for 30 seconds, until the next arriving tuple triggers the window to emit and close. The dimension then opens a new window immediately. The following table illustrates this behavior, aggregating tuples for the stock symbols AMAT and INTC.
The following table illustrates the behavior of this field-based dimension with windows of Size: 30, Advance: 30 and Offset: 0 based on field Time and grouping by field Symbol as a sequence of input events arrive.
Tuple | Field Values | Open? | Emit? | Close? |
---|---|---|---|---|
1 | 10, AMAT, 100 | Yes (window 1) | - | - |
2 | 20, AMAT, 200 | - | - | - |
3 | 40, AMAT, 100 | Yes (Window 2) | Yes (Window 1): Symbol AMAT TimeChunk 0 TotalVolume 300 | Yes (Window 1) |
4 | 41, AMAT, 100 | - | - | - |
5 | 45, INTC, 100 | Yes (Window 3) | - | - |
6 | 50, AMAT, 200 | - | - | - |
7 | 55, INTC, 300 | - | - | - |
8 | 65, AMAT, 100 | -Yes (Window 4) | Yes (windows 2 and 3): Symbol AMAT TimeChunk 30 TotalVolume 400 Symbol INTC TimeChunk 30 TotalVolume 400 | Yes (windows 2 and 3) |
To summarize this flow of events:
-
Tuple 1 arrives. No windows exist, so one is created to receive it.
-
Tuple 2 comes 10 seconds later and is added to window 1.
-
When tuple 3 arrives, its Time value is 40, which is greater than the window size (30). The first window emits the tuple {Symbol=AMAT, TimeChunk=0, TotalVolume=305} and closes. A new window opens to hold tuple 3.
-
Tuple 4 arrives with values {41, AMAT, 100} and enters window 2.
-
Tuple 5 arrives. Its Symbol value (INTC) is new, which causes window 3 to open to hold a new group.
-
Tuples 6 and 7 arrive, and enter their respective group windows. As Time has not advanced by more than 30 seconds for either open window, no calculations or emissions occur.
-
Tuple 8 arrives with value {65, AMAT, 100}. As the time value is now greater than 60, windows 2 and 3 both calculate and emit values, and then close. Window 4 opens to receive tuple 8.
The following table describes the options available in the Edit Dimension dialog for field-based dimensions.
Category | Options and Meaning |
---|---|
Field | The drop-down list for this field shows all fields in the incoming schema that have the StreamBase data type of int, double, long, or timestamp (interpreted as an interval timestamp in seconds). Base the dimension only on input fields you know will contain monotonically increasing values. |
Opening policy: | Select one of these options:
|
Window size: | Select one of these options:
|
Emission policy: | Select one of these options:
|
Optional windows: | Select one of these options:
|
By default, at most one new window is created when the first tuple arrives or the difference between the current tuple and
the previous tuple values (for the field on which the dimension is based) is greater than or equal to the specified Window size. The openval()
for that window and subsequent windows is an integer multiple of Advance, plus Offset, and is equal to or less than the current tuple's value.
However, more than one window can be created if Optional windows, and the value you specified for Advance is smaller than that for Window size.
is selected underSuppose you have an application that calculates one-week moving averages of daily temperature records for localities. The input stream schema is: {Day int, City string, Low double, Average double, High double}. The application uses a field-based Aggregate operator to compute averages of the three temperature measurements and outputs the base day and the number of days represented by each computed average.
The Aggregate Functions tab view looks like this:
To produce weekly moving averages, the field dimension Opening policy should advance one day at a time, based on integer field Day, and have a Window size of 7 days, with no intermediate emissions. The Edit Dimension dialog looks like this:
This creates up to seven overlapping windows. Whether results are emitted for partially full windows at the start is controlled by the Optional windows setting:
-
When Open only a single window for the first event or following a gap in values is selected, a window opens with the first tuple. Its
openval
is the largest Advance + Offset that is less than or equal to that first value. The operator emits tuples when windows contain seven tuples. That is, the first emission occurs when the eighth input tuple is received. The StartDay output field, which is the dimension'sopenval
, begins with the first value of the Day field. For a certain input stream, the output starts off as follows:StartDay=1 NumDays=7 LowAvg=64.4 AverageAvg=70.1 HighAvg=78.0 StartDay=2 NumDays=7 LowAvg=64.7 AverageAvg=70.7 HighAvg=78.3 StartDay=3 NumDays=7 LowAvg=65.4 AverageAvg=72.4 HighAvg=79.9 ...
-
If Optional windows is changed to , a new window opens for each input tuple because Advance is set to 1. Emissions begin when the second tuple is received. The first emission averages one value, the second two values, and so on, until all windows contain seven tuples. The first StartDay output field, which indicates the dimension's
openval
, has a value of -5. At the seventh iteration, all windows are full and the output is the same as the single window case above. For the same input stream as above, the output starts out as follows:StartDay=-5 NumDays=1 LowAvg=62.0 AverageAvg=68.0 HighAvg=78.0 StartDay=-4 NumDays=2 LowAvg=61.5 AverageAvg=67.0 HighAvg=75.5 StartDay=-3 NumDays=3 LowAvg=61.7 AverageAvg=67.0 HighAvg=75.0 StartDay=-2 NumDays=4 LowAvg=62.8 AverageAvg=68.8 HighAvg=76.5 StartDay=-1 NumDays=5 LowAvg=64.6 AverageAvg=70.6 HighAvg=79.0 StartDay=0 NumDays=6 LowAvg=64.8 AverageAvg=70.5 HighAvg=78.8 StartDay=1 NumDays=7 LowAvg=64.4 AverageAvg=70.1 HighAvg=78.0 StartDay=2 NumDays=7 LowAvg=64.7 AverageAvg=70.7 HighAvg=78.3 StartDay=3 NumDays=7 LowAvg=65.4 AverageAvg=72.4 HighAvg=79.9 ...
In both cases, a new window opens for every tuple received, and events after the first tuple enter multiple windows. In both cases, windows do not emit until the operator has received seven tuples. In the second case, however, not all open windows receive new tuples; the oldest window stops receiving tuples whenever a new window opens, so the first six windows have less the seven events to average.
To learn more about field-based aggregation works, run the Aggregate Operator Field Dimension Sample. It contains an Aggregate operator that sums the volume of trades of particular stocks over 30 second windows, advancing
every 30 seconds, as described above. Extend the sample by computing the average volume per symbol in each group window and
adding it to the output. Click the green Plus Sign on the Aggregate Functions tab and add an output field named AverageVolume, produced by the expression: avg(Volume)
. You can also, in the Edit Dimension dialog, change the Advance value from 30 to 20. Setting Advance to be less than Window
Size creates sliding windows with overlapping contents.