Using the TERR Predict Operator

< Previous		Next >

Introduction

The TIBCO StreamBase® TERR Predict operator for TIBCO Enterprise Runtime for R (TERR)) allows StreamBase to use TIBCO's implementation of the R language to load RDS files and perform predict operations.

How the Operators Locate TERR

In order to run correctly, the TERR operators assume that the machine running the StreamBase Runtime has a 64-bit version of TERR installed locally. The TERR operators were tested and validated with TERR version 4.5; the minimum supported TERR version is 4.2.

A copy of TERR Developer Edition is installed as part of your StreamBase installation. The Developer Edition edition has restricted license terms, as described on the StreamBase License Considerations page.

TERR Developer Edition is installed in the directory STREAMBASE_HOME/terr and, as a convenience, the environment variable TERR_HOME is set to this directory when using a StreamBase Command Prompt on Windows. TERR_HOME must be manually set

The operators locate the version of TERR to call using the following formula:

If the option Use Embedded TERR is selected on the Operator Properties tab of the operator's Properties view, then the embedded TERR engine is used. If not selected:
- If you specify a path to a local TERR installation in the TERR Home Path property on the Operator Properties tab of the operator's Properties view, that version of TERR is used first.
- If that property is left blank, the operator looks for a path specified in the TERR_HOME environment variable.

This sequence lets you override the embedded TERR version with any newer or older version, compared to the embedded version, that your application requires.

Installing an Alternate TERR Version

To determine the version of TERR installed with StreamBase 7.6.3 or later:

On Windows, using a StreamBase Command Prompt, run:

%STREAMBASE_HOME%\terr\bin\TERR --version

On Linux or macOS in a shell configured with the sbconfig --env command as described in the Installation Guide, run:

$STREAMBASE_HOME/terr/bin/TERR --version

TIBCO customers can download TERR from edelivery.tibco.com, or download an evaluation copy of TERR from the TIBCO Access Point.

For Linux

TERR is only provided for 64-bit Linux. Download the tar file provided. Untar the file into a temporary local directory, and run the ./INSTALL file provided. The default installation directory is /opt/tibco/terrver, where ver is the TERR version number.

For Windows

Download the zip file provided; unzip the file to find a single installer executable. Run this installer and accept its suggested default location (C:\Program Files\TIBCO\terrver) or install into the currently recommended location (C:\TIBCO\terrver), where ver is the TERR version number.

On Windows, the TERR installer provides both 32-bit and 64-bit versions of the TERR runtime code. When run on 64-bit Windows, the 64-bit version of TERR is automatically used. Since StreamBase supports only 64-bit Windows, it uses the 64-bit version of TERR.

For macOS

Download the DMG file provided and run the installer.

To connect StreamBase and its TERR operator to your local TERR installation, you must either:

Set the TERR Home Path property in the Operator Properties tab of each operator's Properties view, providing the full, absolute path to the TERR installation directory.
Set the TERR_HOME environment variable to point to the full, absolute path of your alternate TERR installation directory. Use this method if you anticipate using many operator instances in your StreamBase applications.

Operating System Configuration for TERR

The TERR operators recognize and honor the TERR_HOME environment variable if set, and if it points to a valid local TERR installation directory. However, setting TERR_HOME is not required.

On Windows and Linux, the TERR bin directory does not need to be in the system PATH, and no environment variables are required.

To use TERR on macOS requires additional settings:

For TERR versions 4.2 and 4.3, remove spaces in the path

TERR releases earlier than 4.4 require the path to the TERR_HOME/lib directory to have no spaces in the path. If you installed StreamBase using the DMG installer, your StreamBase home directory does contain spaces, and therefore so does the path to its embedded terr/lib subdirectory.

To use the TERR operators under macOS with TERR versions 4.2 or 4.3, including with the embedded TERR Developer Edition, rename the folder containing StreamBase to remove spaces in the folder name. For example, change TIBCO StreamBase 7.7.2 to TIBCOStreamBase7.7.2.

Identify the location of the TERR native libraries

The operators must know where to find the dynamic libraries that implement TERR on macOS. You can use either of these methods:

Set the DYLD_LIBRARY_PATH environment variable, OR

Configure your shell environment to include a line like the following:

export DYLD_LIBRARY_PATH=/absolute/path/to/libs

See below for example paths. This environment variable method may be more convenient if you are developing or running several StreamBase applications that use TERR operators.

Specify the library path in a configuration file

In your project's sbd.sbconf file (or one of its included .sbconf files), use the <library> child element of the <java-vm> element to specify the path. For example:

<java-vm>
  <library path="/absolute/path/to/libs"/>
</java-vm>

This setting must be configured in the configuration files for every StreamBase project that uses one of the TERR operators.

When using the embedded TERR Developer Edition, the value of /absolute/path/to/libs is a path like the following:

/Users/sbuser/Applications/TIBCOStreamBase7.7.2/terr/lib

When overriding the embedded TERR version with an external installation of TERR, the value of /absolute/path/to/libs is like the following:

/Library/Frameworks/TERR.framework/Versions/version-number/
   Resources/lib/x86_64-apple-darwin

In the example above, the long line is broken into two for clarity. Enter this path as a single unbroken line.

How the TERR Predict Operator Works

This operator allows a stream of tuples to be evaluated by an external TERR process performing a predict operation, with the results returned as another stream of tuples.

The operator can instantiate multiple TERR instances to improve performance. When more than one instance is required, the tuple execution can no longer be guaranteed to be in order, as the operator now works asynchronously.

The input tuple's terrVars field is converted directly into a global TERR variable. A predict operation is then run in that environment and the result variable retrieved and converted to the output tuple.

All tuple entries that are to be read into the TERR process must be in a top level tuple named terrVars.

A list of integers can be sent using the tuple (1) or (list (1, 2, 3)) or the enhanced form (tuple myInts (names = ["one", "two"], values=[1,2])). All data types are supported with the exception of capture fields and functions.

Once the variables are sent to the TERR process, the model is executed and the result is retrieved.

Using the TERR Predict Operator

To use a TERR Predict operator in a StreamBase EventFlow module, drag a token for the operator onto the canvas of your EventFlow Editor. Then select the newly placed operator to rename it and configure its properties.

Placing an Operator on the Canvas

The operator is a member of the Java Operators group in the Palette view in StreamBase Studio. Select the operator from the Insert an Operator or Adapter dialog. Invoke the dialog with one of the following methods:

Drag the Adapters, Java Operators token from the Operators and Adapters drawer of the Palette view to the canvas.
Click on the canvas where you want to place the operator, then invoke the keyboard shortcut O V.
From the top-level menu, invoke Insert → Operator → Java.

When the dialog is open, enter terr in the search field to narrow the list of operators.

Properties View Settings

This section describes the properties you can set for the TERR Predict Operator, using the various tabs of the Properties view in StreamBase Studio.

In the tables in this section, the Property column shows each property name as found in the one or more adapter properties tabs of the Properties view for this adapter.

General Tab

Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class: Shows the fully qualified class name that implements the functionality of this operator. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true, this instance of this operator starts as part of the JVM engine that runs this EventFlow module. If this field is set to No or to a module parameter that evaluates to false, the operator instance is loaded with the engine, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property	Data Type	Description
Model	String	The model to load into each TERR instance at startup (RDS File).
Model Name	String	This will be the R variable name set when loading this model.
Predict Options	String	Specifies a comma-separated list of the predict method options to use. For example: 'interval="prediction", level = 0.99'
Use Embedded TERR	Check box	When enabled, the operator uses the embedded TERR engine that is bundled with StreamBase (licensed for development use only).
TERR Home Path	String	When not using the embedded TERR engine, you must supply the home path for the TERR installation to use. You can leave this blank if the TERR_HOME environment variable is set.
Enable Status Port	Check box	When enabled, the adapter reports data on the status port regarding various adapter states.
Log Level	INFO	Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Advanced Tab

Property	Data Type	Description
TERR Instances	Integer	The number of instances of the TERR engine to use with this adapter. NOTE: If greater than 1, the operator becomes asynchronous and tuple order is not guaranteed.
Enable Timing	Check box	When enabled, the result tuples produced include timing information.
Pause Before TERR Execution	Check box	If enabled the EventFlow operation pauses in debug mode to allow you to execute R methods in the console on the current instance before executing the input tuple.
Pause After TERR Execution	Check box	If enabled the EventFlow operation pauses in debug mode to allow you user to execute R methods in the console on the current instance after executing the input tuple.
To TERR Date Format	String	The date format to use when converting tuple data into TERR.
From TERR Date Format	String	The date format to use when converting TERR data into tuples.
TERR Engine Parameters	String	The engine parameters to send into the TERR engine.
TERR Java Home Path	String	The path to the Java Home to use with the TERR instance. If blank, the Java instance embedded with the StreamBase installation is used.
TERR Java Options	String	The engine parameters to send into the TERR engine.
TERR Instance Process Affinity	Map	The processor affinity to set for each instance of TERR. Instance values are matched to processors; you can specify an instance number more than once to have multiple processors.
TERR Environment	Map	The environment to set for each instance of TERR.

Edit Schema Tab

Use the Edit Schema tab to specify the schema of the output tuple for this adapter.

For general instructions on using the Edit Schema tab, see the Properties: Edit Schema Tab section of the Defining Input Streams page.

Use the Import proposed schemas link to import schemas as needed for the various TERR output types. The list of importable schemas is specified in the Definitions tab of the EventFlow Editor.

Only a single field is allowed in the output schema. This represents the result of an R predict execution that is retrieved after the execution of an input tuple.

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Use the TERR Instances property on the Advanced tab to enable parallel processing into multiple TERR instances as needed. You can still use the Concurrency tab, but it will have very little impact on performance.

Input Port

The TERR Predict operator has a single input port to handle all interactions. The schema for this can include any field, but the following are used by the operator; the remaining fields are passed through the operator into an inputTuple field on the output stream.

Field Name	Field Type	Description
terrVars	tuple	(Optional) The tuple data to convert into R variables. This field must be a tuple. Each field in the tuple is converted into an R variable based on the fields schema.
rData	blob	(Optional) The R byte data to load as the new model.
terrInstance	int	Optional instance to send this tuple to.

Output Ports

The TERR operator has two output ports: a data port and an optional status port.

Data Port

The data port outputs the result of each call into the TERR engine. The resulting tuple contains two or three fields, depending on whether timing is enabled.

terrData — The result data pulled from TERR instance after execution. This field contains the values specified from the Edit Schema Tab. Each sub field of the terrData field represents a variable from the TERR instance.
inputTuple — This tuple contains all the fields from the input tuple.
(Optional) timing — This tuple contains some timing information to help gauge what might be the bottleneck in execution. The timing tuple contains the following fields:
- eval — The time in nanoseconds it took for the TERR instance to evaluate and execute the R functions.
- tupleToTerr — The time in nanoseconds it took to convert the input tuple into TERR data objects to send to the TERR instance.
- terrToTuple — The time in nanoseconds it took to convert the TERR data objects from the TERR instance into the outbound tuple.
- terrSetVariable — The time in nanoseconds it took to send the TERR data objects into the running TERR instance.
- terrGetVariable — The time in nanoseconds it took to get the TERR data objects from the running TERR instance.

Status Port

The status port emits tuples that describe the processing status for each input tuple. It is only present when the Enable Status Port property is selected. The schema of the output tuple consists of:

Field Name	Field Type	Description
type	String	The type of report, which follows normal log levels: DEBUG, ERROR, INFO, TRACE, and WARN.
action	String	The action that caused the report. These can be `Load R Data Objects`, `Init`, or `Execute`.
object	String	An option object that has been affected by this status.
Message	String	A human-readable status message.
time	Tuple	The timestamp indicating when the status occurred.
inputTuple	Tuple	The input tuple that caused this status message. NOTE: This value is null when loading initialization data.

Data Type Conversion

This section describes how data is converted from a tuple into Terr Data objects and back again.

TERR to Tuple

This section describes how data is converted from Terr Data objects into a tuple result. Note that the best data conversion option is highlighted.

Note

Primitive types with NA or NaN for doubles will be converted to a null value in StreamBase

Terr Data Type	StreamBase Field Types
Terr Byte (vector byte)	blob — converts the vector elements into a blob field list(blob) — converts the vector elements into a list with a single blob element list(double) — converts the vector elements to a list of doubles double — converts the first vector element to a double list(boolean) — converts the vector elements to a list of boolean values, with a value of 1 being true and any other value being false boolean — converts the first vector element to a boolean, with a value of 1 being true and any other value being false list(int) — converts the vector elements to a list of ints int — converts the first vector element to a int list(long) — converts the vector elements to a list of longs long — converts the first vector elements to a long string — converts the vector elements to a string timestamp — converts the vector elements to a string and tries to parse as a timestamp value using the given simple date format from the advanced tab tuple(names list(string), values list(blob)) — converts the vector elements to a tuple that contains a list of names and a list of values for each element of the vector * See Terr Generic below for completely generic conversion
Terr Double (vector double)	list(double) — converts the vector elements to a list of doubles double — converts the first vector element to a double list(blob) — converts the vector elements to a list of blob each with a single byte blob — converts the first vector element to a blob with a single byte list(boolean) — converts the vector elements to a list of boolean values, with a value of 1 being true and any other value being false boolean — converts the first vector element to a boolean, with a value of 1 being true and any other value being false list(int) — converts the vector elements to a list of ints int — converts the first vector element to a int list(long) — converts the vector elements to a list of longs long — converts the first vector element to a long list(string) — converts the vector elements to a list of strings string — converts the first vector element to a string timestamp — converts the first vector element to a timestamp base on the double being the milliseconds from epoch, January 1, 1970 00:00:00.000 GMT tuple(names list(string), values list(double)) — converts the vector elements to a tuple that contains a list of names and a list of values for each element of the vector * See Terr Generic below for completely generic conversion
Terr Integer (vector integer)	list(int) — converts the vector elements to a list of ints int — converts the first vector element to a int list(double) — converts the vector elements to a list of doubles double — converts the first vector element to a double list(blob) — converts the vector elements to a list of blob each with a single byte blob — converts the first vector element to a blob with a single byte list(boolean) — converts the vector elements to a list of boolean values, with a value of 1 being true and any other value being false boolean — converts the first vector element to a boolean, with a value of 1 being true and any other value being false list(long) — converts the vector elements to a list of longs long — converts the first vector element to a long list(string) — converts the vector elements to a list of strings string — converts the first vector element to a string timestamp — converts the first vector element to a timestamp base on the int being the milliseconds from epoch, January 1, 1970 00:00:00.000 GMT tuple(names list(string), values list(int)) — converts the vector elements to a tuple that contains a list of names and a list of values for each element of the vector * See Terr Generic below for completely generic conversion
Terr String (vector string)	list(int) — converts the vector elements to a list of ints int — converts the first vector element to a int list(double) — converts the vector elements to a list of doubles double — converts the first vector element to a double list(blob) — converts the vector elements to a list of blob each with a single byte blob — converts the first vector element to a blob with a single byte list(boolean) — converts the vector elements to a list of boolean values, with a value of 1 being true and any other value being false boolean — converts the first vector element to a boolean, with a value of 1 being true and any other value being false list(long) — converts the vector elements to a list of longs long — converts the first vector element to a long list(string) — converts the vector elements to a list of strings string — converts the first vector element to a string timestamp — converts the first vector element to a timestamp parsed using the given simple date format from the advanced tab list(timestamp) — converts the vector elements to a list of timestamps parsed using the given simple date format from the advanced tab tuple(names list(string), values list(string)) — converts the vector elements to a tuple that contains a list of names and a list of values for each element of the vector * See Terr Generic below for completely generic conversion
Terr Logical (vector logical)	list(int) — converts the vector elements to a list of ints int — converts the first vector element to a int For all listed below NA is converted to a null StreamBase value. list(double) — converts the vector elements to a list of doubles double — converts the first vector element to a double list(blob) — converts the vector elements to a list of blob each with a single byte blob — converts the first vector element to a blob with a single byte list(boolean) — converts the vector elements to a list of boolean values boolean — converts the first vector element to a boolean list(long) — converts the vector elements to a list of longs long — converts the first vector element to a long list(string) — converts the vector elements to a list of strings string — converts the first vector element to a string timestamp — no conversion available tuple(names list(string), values list(boolean)) — converts the vector elements to a tuple that contains a list of names and a list of values for each element of the vector * See Terr Generic below for completely generic conversion
Terr Factor	list(double) — converts the vector elements to a list of doubles double — converts the first vector element to a double list(blob) — converts the vector elements to a list of blob each with a single byte blob — converts the first vector element to a blob with a single byte list(boolean) — converts the vector elements to a list of boolean values, with a value of 1 being true and any other value being false boolean — converts the first vector element to a boolean, with a value of 1 being true and any other value being false list(int) — converts the vector elements to a list of ints int — converts the first vector element to a int list(long) — converts the vector elements to a list of longs long — converts the first vector element to a long list(string) — converts the vector elements to a list of strings string — converts the first vector element to a string timestamp — converts the first vector element to a timestamp base on the int being the milliseconds from epoch, January 1, 1970 00:00:00.000 GMT tuple(names list(string), indexes list(int), levels list(string)) — converts the vector elements to a tuple that contains a list of names, a list of indexes, and a list of levels * See Terr Generic below for completely generic conversion
Terr List	list(x) — Terr list types are converted to a StreamBase list type. The elements inside the list determine how the conversion takes place further.
Terr DataFrame	tuple(x,y,z) — Terr data frame types will use the names values of the data frame to match sub fields of the tuples and convert each sub field based on the rules already listed. list(x) — Each element in the list of data frames will be converted based on the statement above.
Terr Generic	tuple(names list(string), doubles list(tuple(names list(string), values list(double))), integers list(tuple(names list(string), values list(integer))), factors list(tuple(names list(string), indexes list(int), levels list(string))), strings list(tuple(names list(string), values list(string))), logicals list(tuple(names list(string), values list(boolean))), bytes list(tuple(names list(string), values list(blob))) ) — This is a completely generic tuple format that, if specified as the output format, converts the inbound data into the specified data types. Please note that only the names field is required for this kind of generic conversion; you can specify one or all the remaining fields. The Import proposed schemas feature of the adapter does create this full tuple for you as well.

Tuple to TERR

This section describes how data is converted from a tuple into Terr Data objects.

Note

Primitive types (int, double, long, boolean) with a null value will be converted to NA or NaN for doubles in TERR

StreamBase Field Type	Terr Data Types
boolean	TerrLogical — NULL values are converted to NA values.
list(boolean)	TerrLogical — NULL values are converted to NA values.
tuple(names list(string), values list(boolean))	TerrLogical — converts the list elements inside the tuple to a logical vector with names supplied.
int	TerrInteger
list(int)	TerrInteger
tuple(names list(string), values list(int))	TerrInteger — converts the list elements inside the tuple to a int vector with names supplied.
long	TerrDouble
list(long)	TerrDouble
tuple(names list(string), values list(long))	TerrDouble — converts the list elements inside the tuple to a double vector with names supplied.
double	TerrDouble
list(double)	TerrDouble
tuple(names list(string), values list(double))	TerrDouble — converts the list elements inside the tuple to a double vector with names supplied.
blob	TerrByte
list(blob)	TerrByte — All bytes from all the elements in the list are copied into a single Terr Byte
tuple(names list(string), values list(blob))	TerrByte — converts the list elements inside the tuple to a byte vector with names supplied.
string	TerrString
list(string)	TerrString
tuple(names list(string), values list(string))	TerrString — converts the list elements inside the tuple to a string vector with names supplied.
timestamp	TerrString
list(timestamp)	TerrString
tuple(names list(string), values list(timestamp))	TerrString — converts the list elements inside the tuple to a string vector with names supplied.
tuple(names list(string), indexes list(int), levels list(string))	TerrFactor — converts the list elements inside the tuple to a factor vector with names supplied.
tuple(x string, y string, z double)	TerrData (DataFrame) — Each sub field of the tuple is converted to a field in the data frame with the tuples field name being the names supplied to the TerrData objects. The object types are converted based on the rules supplied in this list.
list(names list(string), values(tuple(x string, y string, z double))	TerrList (List) — This will create a list with a single row with each tuple field used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list.
list(names list(string), values(tuple(x list(string), y list(string), z list(double)))	TerrList (List) — This will create a list with multiple rows with each tuple field to create multiple rows used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list.
list(names list(string), values(list(tuple(x string, y string, z double)))	TerrList (List) — This will create a list item for each item in the values list with a single row with each tuple field used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list.
list(names list(string), values(list(tuple(x list(string), y list(string), z list(double))))	TerrList (List) — This will create a list item for each item in the values list with each tuple field to create multiple rows used against the names list in order that the fields appear. The object types are converted based on the rules supplied in this list.
Function	Function fields not supported.
Capture Field	Capture Fields are not supported.

Typecheck and Error Handling

Typechecking fails when:

Any required fields are not filled in.
The Embedded Engine property is disabled and no TERR Home is set or no TERR_HOME environment variable is found.
Process Affinity is not an integer greater than 0.
The Model RDS data file specified cannot be located.
The Model Name is not specified.
The output schema contains more than one field.
The input schema is missing the terrVars field.
The input field terrVars is not a tuple.
The input field rData is not a blob.

Suspend and Resume Behavior

On suspension, the TERR Predict operator finishes processing the current tuple or tuples (depending on the TERR instance count), outputs the result tuples, then pauses, waiting for input.

On resumption, the TERR Predict operator continues processing with the next input tuple.

The TERR instance or instances remain running during suspension.

Using the TERR Predict Operator

Introduction

How the Operators Locate TERR

Installing an Alternate TERR Version

Operating System Configuration for TERR

How the TERR Predict Operator Works

Using the TERR Predict Operator

Placing an Operator on the Canvas

Properties View Settings

General Tab

Operator Properties Tab

Advanced Tab

Edit Schema Tab

Concurrency Tab

Caution

Input Port

Output Ports

Data Port

Status Port

Data Type Conversion

TERR to Tuple

Note

Tuple to TERR

Note

Typecheck and Error Handling

Suspend and Resume Behavior

Related Topics