StreamBase Data Types

< Previous		Next >

blob Data Type

Blobs provide a way of representing binary data in a tuple. They are designed to efficiently process large data objects such as video frames or other multimedia data, although performance might diminish with larger sizes.

In expressions, blobs can be used in any StreamBase function that supports all data types, such as string() or firstval(). To use a blob in other functions, the blob must first be converted to a supported data type. For example, error() does not accept a blob, but you can call error(string(b)), where you first recast the blob as a string.

The StreamBase blob() function converts a string to a blob.

Two or more blob values are comparable with relational operators. The comparison is bytewise.

bool Data Type

A Boolean always evaluates to either true or false.

Two or more bool values are comparable with relational operators as follows: true > false.

capture Data Type

When you declare a capture field as part of the schema for a hygienic module, you assign a type name for that capture field. Thereafter, that capture type name appears in the dropdown list of data types in Studio, and can be referenced as @capture-typename in expressions. While a capture field is not strictly a type of data in the same sense as the other types on this page, a capture field's type name is effectively a data type in some circumstances. See Capture Fields in the Authoring Guide.

double Data Type

A double is always 8 bytes. Any numeric literal in scientific notation is considered a double. That is, you do not need a decimal point for a numeric literal to be recognized as a double (instead of an int). For example, 1e1 is considered to be 10.0 (double) instead of 10 (integer).

The StreamBase double type conforms to the IEEE 754 decimal64 specification. If the precision of a value needs to be corrected before printing, writing to a log, or sending to an external system, use the round() function as the last step. Do not use round() before or within an aggregation or an iterative, looping, or large summation calculation, as this would introduce cumulative mathematical error.

Two or more double values are comparable numerically with relational operators.

function Data Type

function is a StreamBase data type and a reserved keyword. Use a constructor for the function data type to create a custom expression language function whose components are built-in functions, math operators, and even other functions.

The primary use for the function data type is to declare custom expression language functions in various contexts, and to call them like built-in expression language functions. A function can also be the data type of a field on a stream, and thus passed to other components. In addition, there are certain higher order functions such as foldleft that take a defined function as an argument.

The function data type is described in more detail in Using the Function Data Type.

int Data Type

An int is always 4 bytes. The valid range of the int data type is -2,147,483,648 [–2³¹] to 2,147,483,647 [2³¹–1], inclusive. Integers are always signed. Thus, the following expression is not valid, because the number 2,147,483,648 is not a valid int:

2147483648 + 0

However, integer computations wrap around cleared, so the following expression is valid and evaluates to -2,147,483,648:

2147483647 + 1

The following expression is valid, because the period means that the number is treated as a floating-point value, which can have a much greater value:

2147483648. + 0.

Two or more int values are comparable numerically with relational operators.

list Data Type

A list is an ordered collection of values, called elements, each of which is of the same StreamBase data type, called the list's element type. The element type can be any StreamBase data type, such as an int, a tuple, or even a list (thus allowing for constructions such as a list of list of int).

In addition to the list() function, you use the [] constructor to create a list. For example, ['this','constructs','a','list','of','type','string',]. As the example illustrates, the last list element can optionally be followed by a comma (which is ignored) to make it easier to add a new element to the list if you later need to.

Lists are returned by a variety of functions in the StreamBase expression language, such as list() and range().

Individual elements in a list can be accessed using their zero-based integer position (their index) in the list. In any expression in an EventFlow module, use brackets to address individual elements of a list. Thus, for a field named L with data type list, use L[0] to address the first element in the list, L[1] for the second element, and L[length(L)-1] to address the last element.

In most list-related functions that take an index, you can also use a negative index to count backward from the end of the list. Thus, for a list L, L[-1] is equivalent to L[length(L)-1].

The number of elements in a list can range from zero to a theoretical maximum of 2³¹–1 elements (although that maximum cannot be reached in typical practice). The number of elements in a list is determined at application runtime.

Two or more lists are comparable with relational operators. The comparison is performed lexicographically: that is, like words in a dictionary, with each element in list AAA compared to its corresponding element in list BBB, just as the letters in two words are compared, one by one. For example, the list [1, 2, 3] is less than the list [1, 9, 3], and the list [1, 2] is less than the list [1, 2, 3].

Lists with numeric element types can be coerced if two lists are concatenated or merged in a Union operator, following the rules listed in Data Type Coercion and Conversion. For example, if you have a list(int) merging with a list(double), the result is a merged list(double). Two list(tuple) will successfully merge if a valid supertype tuple can be found. Coercion rules do not apply to output streams with declared, explicit schemas.

See Null Lists for a discussion of null lists compared to empty lists.

Using Lists in Expressions

Most lists you deal with in the StreamBase expression language are tuple fields with a declared list type in the tuple's schema. You address such a list by means of the field's name, optionally using the bracket syntax shown above to address individual elements of the list. Functions are provided to append to a list, insert elements, replace elements, return the first, last, minimum, or maximum element in a list, and much more. See the list management elements in Simple Functions: Lists.

You can create lists of your own using one of the functions that returns a list, such as list() and range(). As an alternative, you can define a list and specify its contents by placing a comma-separated list of elements in square brackets. Thus, list(100.00, 130.00, 78.34) and [100.00, 130.00, 78.34] express the same list.

Viewing and Specifying List Data in CSV Format

In contexts where list data appears in string form, such as the output of of epadmin dequeue, lists are output in standard array format within square brackets. For example:

list(int) [1,3,5]
list(double) [34.78,123.23,90.84,85.00]

Lists of strings do not show each element enclosed in quotes, because the element type of such a list is known to be string. For example:

list(string) [IBM,DELL,HPQ]

When specifying lists as input to epadmin enqueue, enclose the list in quotes to escape the commas inside the list brackets. For list(string), there is no need to quote each element. For example, to input data for a stream with schema {int, list(double), list(string), list(int)}, use the following format:

9456,"[234.0,2314.44]","[IBM,DELL]","[3000,2000]"

When specifying strings and lists that occur within a tuple data type, use one pair of quotes around the tuple value, and use two pairs of quotes to surround the string and list members of that tuple. For example, to input data for a stream with schema tuple(int, int, int), tuple(string, list(string)), use the following format:

"1, 3, 3"," ""Alpha"", ""[Beta,Gamma,Delta]"" "

In the example above, quotes surround the first tuple field, consisting of three int values. Quotes surround the second tuple field, from the comma after the 3 to the end. Within the second field, two pairs of quotes surround the string sub-field, and surround the list(string) sub-field. Notice that there is still no need to quote each element of the list(string) sub-field.

long Data Type

A long is always 8 bytes. The range is -9,223,372,036,854,775,808 [-2⁶³] to +9,223,372,036,854,775,807 [2⁶³ -1]. You can use the long data type to contain integer numbers that are too large to fit in the four-byte int data type.

When specifying a long value in a StreamBase expression, append L to the number. Thus, 100L and 314159L are both long values. Without the L, StreamBase interprets values in the int data type's range as ints. Values outside the range of an int are interpreted as longs without the L.

Two or more long values are comparable numerically with relational operators.

Named Schema Data Type

When you define a named schema for a module or interface, StreamBase automatically generates a new function in the StreamBase expression language that allows you to construct tuples with that schema. Thereafter, the names of named schemas appear in the dropdown list of data types in Studio, which allows you to use a named schema's name wherever you would use the tuple data type. Thus, while not strictly a type of data in the same sense as the other entries on this page, the names of named schemas can be used as an effective data type. See Named Schema Constructor Function in the Authoring Guide.

string Data Type

A string is a field of text characters.

The theoretical maximum length for a string is maxint() characters, but the practical limit is much smaller. While StreamBase does support large tuples, including large string fields, be aware that moving huge amounts of data through any application negatively impacts its throughput.

Two or more string values are comparable with the relational operators. By default, strings are compared lexicographically based on ASCII sort order. If Unicode support is enabled for StreamBase Server (as described in Unicode Support), string elements are compared in the sort order for the current character set.

timestamp Data Type

The timestamp data type can hold either an absolute timestamp or an interval timestamp.

An absolute timestamp represents a date and time. Its value is the number of seconds between the epoch and that date and time, with a maximum precision of milliseconds. The epoch is defined as midnight of January 1, 1970 UTC.

An interval timestamp represents a duration. Its value is the number of seconds in the interval, with a maximum precision of milliseconds.

The range for timestamp values is –2⁶² to (2⁶² – 1), which holds absolute timestamps for plus or minus 146 million years, and holds interval timestamp values between -4,611,686,018,427,387,904 and +4,611,686,018,427,387,903.

Absolute timestamps are expressed in the time format patterns of the java.text.SimpleDateFormat class described in the Oracle Java Platform SE reference documentation. For example, the now() function returns a timestamp value for the current time. The returned value is a representation of the internal value as a date and time. Thus, the now() function returns the following when run on 14 Feb 2017 in the EST time zone:

2019-05-22 20:15:54.880-0400

By contrast, the expression hours(1) returns an interval timestamp, showing the number of seconds in one hour:

3600.000

You can add and subtract timestamp values in expressions, using the rules in the following table:

Operation	Result	Example
interval + interval	interval	`days(1) + hours(2)` Result: 93600.0, the number of seconds in 26 hours.
interval – interval	interval	`days(1) – hours(2)` Result: 79200.0, the number of seconds in 22 hours.
absolute + interval	absolute	`now() + hours(1)` Result: an absolute timestamp representing the time one hour from now.
absolute – absolute	interval	`today_utc() - today()` Result: an interval timestamp representing the number of seconds between midnight UTC and midnight in the local time zone.
absolute + absolute	absolute	Adding two absolute timestamp values does not produce an error, but the results are undefined.

Two or more timestamp values are comparable with relational operators such as > and <. You must compare timestamp values interval-to-interval or absolute-to-absolute. You cannot compare interval-to-absolute or absolute-to-interval.

In comparison expressions that use the operators ==, !=, <=, >=, <, or >, if one side of the comparison is a timestamp, and the other side is a string literal, StreamBase tries to interpret the string as a valid timestamp. If the string literal does not contain an explicit time zone, the string is interpreted as having the time zone set in the operating system of the computer that compiles the application. If the conversion of the string literal fails, then the comparison fails typechecking.

Specifying Time Zones

StreamBase supports three ways of specifying time zones. The following examples all indicate the same time zone (Central Europe):

Content	Data Type	Example
Offset from UTC in hours:minutes	double	`+02:00`
Zone Abbreviation	string	`CET`
Time Zone ID	string	`Europe/Paris`

Certain timestamp functions allow you to use these specifications as arguments, but not interchangeably. Selected time zone IDs for the United States, Europe, and Asia are shown below. Note that time zone ID literals are case-sensitive.

Time Zone or City	Time Zone ID	UTC Offset	DST Offset
Buenos Aires, Argentina	`America/Argentina/Buenos_Aires`	-03:00	-03:00
US Eastern Standard Time	`America/New_York`	-05:00	-04:00
US Central Standard Time	`America/Chicago`	-06:00	-05:00
US Mountain Standard Time (for areas that do not observe DST)	`America/Phoenix`	-07:00	-07:00
US Mountain Standard Time (for areas that observe DST)	`America/Denver`	-07:00	-06:00
US Pacific Standard Time	`America/Los_Angeles`	-08:00	-07:00
US Alaska Standard Time	`America/Anchorage`	-09:00	-08:00
US Hawaii Standard Time	`Pacific/Honolulu`	-10:00	-10:00
Greenwich Mean Time	`Etc/GMT or UTC`	00:00	00:00
London, UK	`Europe/London`	+00:00	+01:00
Zurich, Switzerland	`Europe/Zurich`	+01:00	+02:00
Tallinn, Estonia	`Europe/Tallinn`	+02:00	+03:00
Moscow, Russia	`Europe/Moscow`	+03:00	+03:00
Pune, India	`Asia/Kolkata`	+05:30	+05:30
Bangkok, Thailand	`Asia/Bangkok`	+07:00	+07:00
Singapore	`Asia/Singapore`	+08:00	+08:00
Seoul, Korea	`Asia/Seoul`	+09:00	+09:00
Auckland, New Zealand	`Pacific/Auckland`	+12:00	+13:00

For for a full listing, see the Wikipedia article List of tz database time zones. You can also obtain the list by calling the Java function TimeZone.getAvailableIDs(). See the Javadoc for class TimeZone for details.

The get_* functions (get_second(), get_year(), and so on) take time zone IDs as an optional second argument to obtain timestamp fields for a given time zone rather than for local time, as described in Timestamp Fields.

tuple Data Type

The tuple data type is an ordered collection of fields, each of which has a name and a data type. The fields in the collection must be defined by a schema, which can be unnamed or named. Fields can be of any StreamBase data type, including other tuples, nested to any depth. The size of a tuple depends on the aggregate size of its fields.

See Null Tuples for a discussion of null tuples and empty tuples.

Two or more tuples are comparable with relational operators as long as the tuples being compared have identical schemas.

The following sections discuss features of the tuple data type:

Addressing Tuple Sub-Fields

Using the Tuple Data Type in Expressions

Viewing and Specifying Tuple Data in CSV Format

Copying Tuple Contents

Null Tuples, Empty Tuples, No-Fields Tuples

Addressing Tuple Sub-Fields

In expressions, you can address a tuple field's individual sub-fields using dot notation: tuplename.tuplefieldname.

In an EventFlow module, tuplename is the name of a field of type tuple, and tuplefieldname is name of a sub-field.

Using the Tuple Data Type in Expressions

In an expression, use the tuple() function to create both schema and field values of a single tuple.

The name of a named schema automatically becomes a generated function that returns a single tuple with that schema. See named schema constructor function for details.

Viewing and Specifying Tuple Data in CSV Format

In contexts where a tuple value appears in textual string form, comma-separated value (CSV) format is used. Examples of such contexts include the contents of files read by the CSV Input Adapter, written by the CSV Output Adapter, and in the result of the Tuple.toString() Java API method.

Use the nested quote techniques in this section to enter a field of type tuple when specifying input data at the command prompt with epadmin enqueue.

The string form of a tuple with three integer fields whose values are 1, 2, and 3 is the following:

1,2,3

We will refer to the above as tuple A.

When tuple A appears as a field of type tuple inside another tuple, surround tuple A with quotes. For example, a second tuple, B, whose first field is a string and whose second field is tuple A, has a CSV format like the following:

IBM,"1,2,3"

These quotes protect the comma-separated values inside the second field from being interpreted as individual field values.

With deeper nesting, the quoting gets more complex. For example, suppose tuple B, the two-field tuple above, is itself the second field inside a third tuple, C, whose first field is a double. The CSV format of tuple C is:

3.14159," IBM,""1,2,3"" "

The above form shows doubled pairs of quotes around 1,2,3, which is necessary to ensure that the nested quotes are interpreted correctly. There is another set of quotes around the entire second field, which contains tuple B.

StreamBase's quoting rules follow standard CSV practices, as defined in RFC 4180, Common Format and MIME Type for Comma-Separated Values (CSV) Files.

Copying Tuple Contents

You can duplicate any tuple field into another field of type tuple without using wildcards. For example, a Map operator might have an entry like the following in its Additional Expressions grid, where both IncomingTuple and CopyOfIncomingTuple are the names of tuple fields:

Action	Field Name	Expression
Add	CopyOfIncomingTuple	IncomingTuple

Use the .* syntax to flatten a tuple field into the top level of a stream.

For example, a Map operator might define an entry like the following in its Additional Expressions grid. When using this syntax, you must have an asterisk in both Field Name and Expression columns.

Action	Field Name	Expression
Add	*	IncomingTuple.*

Use the * AS * syntax for tuples defined with a named schema to copy the entire tuple into a single field of type tuple.

For example, let's say the tuple arriving at the input port of a Map operator was defined upstream with the NYSE_FeedSchema named schema. To preserve the input tuple unmodified for separate processing, the Map operator could add a field of type tuple using settings like the following in the Additional Expressions grid. When using the * AS * syntax in the Expression column, the name of the tuple field in the Field Name column has an implied asterisk for all of its fields.

Action	Field Name	Expression
Add	OriginalOrder	NYSE_FeedSchema(input1.* as *)

Because the Map operator has only one input port, the port does not need to be named:

Action	Field Name	Expression
Add	OriginalOrder	NYSE_FeedSchema(* as *)

Null Tuples, Empty Tuples, No-Fields Tuples

A null tuple results when the entire tuple is set to null (not just the fields of the tuple).

An empty tuple is a tuple with each individual field set to null.

A no-fields tuple is what is sent to an input stream that has an empty schema, which is a schema with no fields defined, as described in Using Empty Schemas. An input stream with an empty schema might be declared, for example, as a trigger for a Read All Rows operation on a Query Table. In this case, the tuple sent to this input stream is itself neither null nor empty, it is a no-fields tuple.