Using the JPMML Model Evaluator Operator

Introduction

The TIBCO StreamBase® Operator for JPMML Model Evaluator enables StreamBase applications to execute numerical models expressed as Java PMML.

Predictive Model Markup Language (PMML) enables models from data analytics tools to pass to model execution environments, such as Event Processing systems.

The JPMML operator allows you to deploy an arbitrary number of models to be executed against incoming events. The EventFlow designer's responsibility is the conversion of the incoming data into models' expected features. It frequently requires event enrichment, such as cross-referencing, attribute lookup, or prior events related to the same context.

The operator processes input data given as a tuple or a list of tuples. The tuple schema corresponds to the models' input parameters. For each model, the operator generates output data that matches the defined output schema. Depending on the input data, the output can be a single or a list of tuples.

Dynamic model definitions allow you to provide additional metadata to the deployed models. The metadata is attached to the model result, allowing the EventFlow to take action based on the model attributes. Examples of attributes include: the champion/challenger flag, the category for propensity scoring, and so on.

The JPMML model evaluator is implemented using the version of the JPMML library listed on the Supported Configurations page.

The operator supports an arbitrary number of models simultaneously, as well as scoring single samples and data frames.

Operator Properties

This section describes the properties you can set for this operator, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class: Shows the fully qualified class name that implements the functionality of this operator. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true, this instance of this operator starts as part of the JVM engine that runs this EventFlow fragment. If this field is set to No or to a module parameter that evaluates to false, the operator instance is loaded with the engine, but does not start until you send an epadmin container resume command (or its sbadmin equivalent), or until you start the component with StreamBase Manager.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Type Description
Control Port check box Enables dynamic reconfiguration of the model list. Control port enables also control output port which reports status of the model loading request. Control port supports all-or-nothing semantics, that is, either the full list is successfully loaded and replaces the currently deployed models or it reports failure.
Status Port check box Enables failure notifications. If the scoring fails, the failure is emitted to the status port including the original input tuple.
Timing Info check box Fine granular timing information. It collects the effective times of input conversion, model evaluation and output conversion. Time is in nanoseconds.
Log Level drop-down list Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Models Tab

Property Type Description
Model URLs name/value pairs List of design-time specified models. The models consist of name and URL pointing to the model definition. Models can also be deployed in HDFS format.

Schemas

Property Type Description
Result Data Schema schema Anticipated schema for model output. Only fields defined in the schema are used in the output tuple.

AMS Tab

Use the AMS tab to specify which artifacts should be pulled from a running TIBCO Artifact Management Server, which is a separately installed product.

Note

If you deploy an artifact from the AMS system, it will first check your list of artifacts to match the path and if matched will use the model name given. If the path is not matched, then the artifact's filename is used without the file extension as the model name. Example sample/audit.rds would resolve to a model name of audit.

Property Data Type Description
Required On Startup check box When enabled, the artifacts listed are requested from AMS at initialization and the system waits until all artifacts are loaded.
Artifacts list (string, string) List of artifacts to load from AMS. The first value of the path is the project name followed by the full path to the artifact. Use a / separator with an optional @version at the end. If @version is not specified, then the latest version is assumed.

For example: project/path1/path2/artifact@1

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Data Input Port

The data port is the default input port for the model operator. It is always enabled. Use data port to execute the model scoring.

The default schema for the data input port model is:

  • frame, tuple or list(tuple). Samples to be scored by the deployed models.

    The tuple structure contains primitive fields (int, long, double, string or boolean) with names corresponding to model input fields.

    Sparse dictionaries. A sparse dictionary is a list of tuples with name (string) and value (any supported primitive type) fields. Sparse dictionaries are used when 1) the fields are not known at the design time 2) the field names are not supported by StreamBase 3) the number of fields is large. Input tuple may contain any number of sparse dictionaries. For example, to provide categorical and continuous values or fields from various domains.

  • modelName (optional), string. If this field exists and is not null it will be used to specify which model the input tuple will score against. If this field is missing or null then all models are scored.

  • * arbitrary pass through parameters.

Unrecognized fields are transparently passed. The frame field is not propagated; the scores field is not allowed.

Scores Output Port

The scores port provides a list of model evaluation results.

The schema for the scores output port is:

  • scores, list(tuple). List of record for each currently deployed model.

  • scores.modelName, string. Name of the model defined in the Model URLs or provided via the control port.

  • scores.modelUrl, string. URL defining the model configured in the Model URLs or provided via the control port.

  • scores.modelData, blob. Binary defining the model if used to load the model.

  • scores.score, tuple or list(tuple). The type depends on the type of frame input. List of scores in the same order as the input list. The schema is defined as Result Data Schema property.

  • scores.*. Arbitrary parameters provided during model redeployment on the control port.

  • * parameters other than frame.

The scores port transparently replicates unrecognized fields. The frame field is not propagated

Control Input Port

The control port enables runtime redeployment of models. Models are deployed in all-or-nothing semantics, which means if all provided models are successfully loaded, they fully replace the current set.

The schema for the control input port is as follows:

  • models, list(tuple). List of record for each model to be deployed.

  • models.modelName, string. Logical name of the model.

  • models.modelUrl, string. URL defining the model.

  • models.modelData, blob. The binary data of the model. This field can be used to load a model directly from any source. This field is only used if models.modelUrl is null or empty.

  • models.*. Arbitrary parameters describing the model. They are later provided in the score.

  • *. Arbitrary parameters provided during model redeployment on the control port.

The status port transparently replicates unrecognized fields. The input port must not explicitly use the status or message fields.

Status Output Port

The status port provides responses for runtime model deployment. Tuples are emitted only as responses to the control port tuples.

The schema for the status output port is as follows:

  • status, string. Deployment status, which can be success or failure.

  • message, string. Descriptive status message.

  • models, list(tuple). List of record for each model to be deployed.

  • models.status, string. Model loading status, which can be success or failure.

  • models.message, string. Descriptive model status message.

  • models.modelName, string. Logical model name.

  • models.modelUrl, string. URL defining the model.

  • models.modelData, blob. Binary defining the model if used to load the model.

  • models.*. Arbitrary parameters describing the model. They are later provided in the score.

  • * parameters other than models.

The status port transparently replicates unrecognized fields from the control port.