Using the Python Operator

Introduction

  The Python operator allows you to execute any valid Python code within a StreamBase module. The Python operator and its companion Python Instance operator allow Python-centric teams to reuse existing Python code in an event processing context without requiring major rewrites to that code. The Python operators allow the execution of Python-based statistical modeling, data science processing, and machine learning produced with Python packages such as SciPy and TensorFlow.

The Python operators do not turn the StreamBase EventFlow language into a Python interpreter. Instead, Python stateful sessions managed by an external Python interpreter are attached as child processes to EventFlow modules. The Python operators interact with these sessions by setting input variables, executing the script, and reading output variables, potentially emitting Python results as fields of StreamBase tuples. The Python operator guarantees that these three Python management operations are executed sequentially, even if there are multiple operator instances touching the same session, or if the operator is running in asynchronous mode.

The Python operators support most Python interpreters compliant with Python 2.7 or 3.x. The Python script provided to a particular Python operator must be compliant with the Python interpreter configured for the containing EventFlow application to use. This means that the Python script can only use the libraries and language structures available to the configured Python interpreter. The operator treats the script code as opaque and does not attempt to parse or compile it before sending it to the interpreter. Thus, all the power of the configured interpreter's Python libraries (or Java classes for Jython, or .NET access for IronPython) is accessible from the script.

Python and Python Instance Operators

Use the Python operator to execute Python code and to optionally emit the results of that execution.

By contrast, you use the Python Instance operator to specify and configure the name and features of a constrained local instance execution environment for the set of Python operators that you configure to use that name and environment.

By default, Python operators execute in the global execution environment of the containing EventFlow module. However, it is possible to use EventFlow concurrency features to run multiple copies of a single module that contains Python operators. In this case, you can use a Python Instance operator to define and name a local execution environment for the Python operators in that concurrent module. In this way, the execution of Python in one concurrent instance of that module does not interfere with other instances.

You can also configure a Python Instance operator with a control port that lets you enable and disable at runtime the execution of Python operators configured to use that Python instance.

Python Compatibility

The Python operator integration layer uses a minimal set of features from Python 2.7 and Python 3.x, requiring the pickle library and TCP/IP networking. The constructs used are compatible with Python 2.7 and 3.x.

The following table provides a summary of the Python interpreters supported by the Python operator. See Python Versions Supported for detailed information.

Runtime Version
Python 2 2.7.x
Python 3 3.x.x (but the TensorFlow sample is limited to 3.4, 3.5, or 3.6)
PyPy 5.x
Jython 2.7.0
IronPython 2.7.7

Data Type Conversion

The Python operator's input tuple can include a single field of type tuple whose name is inputVars. The type of data passed to the Python script from the inputVars tuple is inferred from the data type of each field.

The operator optionally emits a field of type tuple named outputVars. When you define the data type for the outputVars tuple fields, the operator makes its best effort to cast the Python objects to StreamBase data types. The following table summarizes the data type conversions.

StreamBase type to Python from Python
boolean truth truth, int, float
int int truth, int, float
double float truth, int, float
string unicode (Python2), str (Python3) str, bytes, bytearray, unicode (Python2)
timestamp datetime.datetime (absolute), datetime.timedelta (interval) datetime.datetime, datetime.date, datetime.time (absolute), datetime.timedelta (interval)
blob bytes bytes, bytearray
list list list, tuple, array.array, materialized generator (list)
tuple dict dict
capture unsupported unsupported
function unsupported unsupported

Configuring a Global Python Instance

The global Python instance is the default, module-scoped execution environment in which all Python operators in a module perform their operations. The global Python instance environment is configured in the adapter's configuration file, src/main/resources/adapter-configurations.xml.

By contrast, the Python Instance operator is configured using the operator's Properties view on the EventFlow canvas.

Use the adapter-configurations.xml file to define the parameters of the module's global Python environment. The following table describes the valid child elements for the <adapter-configuration name="python"> element of this file.

Property Type Default Description
instance string   Provides an arbitrary name for the global instance. This name is displayed in the drop-down list for the Global Instance ID control in the Properties view for both Python and Python Instance operators. This name links the two operators together when using a Python Instance operator.
executable string python Specifies the full path to the Python executable. If not specified, the operator invokes the command python, which is assumed to be on the PATH. This default Python command is not likely to be the same if you develop on Windows or Mac and deploy on Linux, so be sure to specify the exact path to the Python interpreter you intend to use for each platform. See Python Versions Supported below for example paths.
useTempFile boolean false This flag indicates that the operator's integration layer should create a temporary file with Python code wrapping the interactions with StreamBase, instead of pushing it through standard input. Using standard input works for most Python interpreters and is the default.

Note

You must set this property to false when using IronPython.

captureOutput boolean false Modifies the stdout and stderr behavior of the operator. By default, both are chained to the parent process's stdout and stderr. For running tests that include output, you can set this flag to capture the output.
workingDir string . Specifies the working directory for the launched process. When not specified, the process starts in the same directory as the parent StreamBase process.
envVariables section/ setting   Specifies environment variables to be passed before launching the Python interpreter, potentially overriding variables in the platform's environment. You can use more than one <setting> line.
arguments section/ setting   Specifies arguments to be passed to the Python interpreter (not the Python script), and can be defined multiple times. The usual use for this property it to pass -u, which forces Python to use unbuffered stdin, stdout, and stderr streams. Consult the following references for information on Python launch parameters:

The following shows an example adapter-configurations.xml file for a global instance named pythonic that uses Python 2.7 on Windows.

<adapter-configurations>
  <adapter-configuration name="python">
    <section name="python">
      <setting name="instance" val="pythonic"/>
      <setting name="executable" val="C:/Python27/python.exe"/>   
      <setting name="useTempFile" val="false"/>
      <setting name="captureOutput" val="false"/>
      <setting name="workingDir" val="."/>
      <section name="envVariables">
        <setting name="CUDA_VISIBLE_DEVICES" val="0,1"/>
      </section>
      <section name="arguments">
        <setting val="-u"/>
      </section>
    </section>
  </adapter-configuration>
</adapter-configurations>

Python Operator Properties

This section describes the properties you can set for the Python operator, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this required field to specify or change the component's name, which must be unique in the current module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true, this instance of this operator starts as part of the JVM engine that runs this EventFlow fragment. If this field is set to No or to a module parameter that evaluates to false, the operator instance is loaded with the engine, but does not start until you send an epadmin container resume command (or its sbadmin equivalent), or until you start the component with StreamBase Manager.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Type Description
Instance Type radio button Select Global to specify that this operator executes in the global execution environment of the containing EventFlow module, which is defined in the module's configuration file.

Select Local to specify that this operator executes in the constrained environment defined by a Python Instance operator present in this EventFlow module.

Local Instance ID text Only active when Instance Type specifies Local. In this case, enter the canvas name of a Python Instance operator in this EventFlow module that defines a local Python execution environment.
Global Instance ID drop-down list Only active when Instance Type specifies Global. In this case, the global Python execution environment is defined in this module's configuration file. Use the drop-down control to select the name of a global instance defined in configuration.
Asynchronous check box When selected, the operator executes the Python script using a non-blocking call. This allows long operations to be executed without suspending the processing of the containing module. Make sure module invariants are preserved around the call. Note that, in contrast to the concurrent parallel execution feature of StreamBase, this operator does not allocate additional threads and uses lightweight job scheduling.
Log Level drop-down list Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Script Tab

Property Type Description
Enable control port check-box Control port will be enabled when selected.(Available Command will be "Load", and fill in the Script field.)
Script source radio button Select File to specify that the executable script comes from Python file.

Select Script text to specify using text below as executable script.

Script file resource-chooser Choose a valid python file to load.
Script multiline text Specifies a script of known working Python code to be executed for each incoming tuple.

Output Tab

Property Type Description
Output variables schema definition Definition for the expected output variables. Each field defined for the schema corresponds to a Python session variable expected to be stored by this operator's script, or any previous call. The output variables must be of a type castable to a StreamBase field type, as shown in Data Type Conversion above.

AMS Tab

Use the AMS tab to specify the artifacts that should be pulled from a running TIBCO Artifact Management Server, which is a separately installed product.

Note

If you deploy an artifact from the AMS system, it first checks your list of artifacts to match the specified path. If matched, AMS uses the specified model name. If the path is not matched, then the artifact's filename is used without the file extension as the model name. For example, sample/audit.rds would resolve to a model name of audit.

Property Type Description
Required On Startup check-box When enabled, the specified artifacts are requested from AMS at initialization of the EventFlow module that contains this Python operator, and the system waits until all artifacts are loaded.
Artifact's Name String The first value of the path is the project name followed by the full path to the artifact. For example: project/path1/path2/artifact
Artifact's Version int If version is not specified, then the latest version is assumed.

Input and Output Port

The input port transparently accepts any incoming tuple, which can optionally contain a field of type tuple with the reserved name inputVars. The outgoing tuple contains a field of type tuple named outputVars.

  • inputVars — optional tuple containing variables to be set in the Python session.

  • outputVars — tuple whose structure is defined in the Output tab of the Properties view, containing variables read from the Python session.

All other incoming fields are transparently passed. The inputVars field is not propagated to the output. The outputVars field is not allowed in the input port.

Python Versions Supported

This section describes the Python interpreters supported by the StreamBase Python operators. Python versions are described as of March, 2018.

In general:

  • Your system can have as many Python versions, from as many vendors, as you require, as long as the name or path to the Python executables are different. Thus, a single system can have python, python3, pypy, and pypy3 installed at the same time, if necessary.

  • The topmost Python in your shell's PATH is the dominant Python version for Python called at the shell prompt.

  • However, the StreamBase Python operators use the Python version specified in the adapter's configuration file, src/main/resources/adapter-configurations.xml.

  • The Python operator only calls a Python version named python on the system PATH as a fallback, if there is no Python version specified in configuration.

  • For Windows, use forward slashes in path names.

Windows: Python.org

Download Python 2.7 or Python 3 from python.org and run the installer provided. Python.org provides:

Python Command Version Installation path to specify in configuration files
python 2.7.14 C:/Python27/python.exe
python3 3.6.4 C:/Users/sbuser/AppData/Local/Programs/Python/Python36/python.exe

(Replace sbuser in the Python 3 path with your Windows login name.)

Pythons from python.org come with a minimal set of included Python packages. This means there is less you need to update, compared to ActiveState, but also means you have the freedom to install only the packages you need. Both Python versions include pip and/or pip3 commands.

Windows: Activestate

Download Python 2.7 or Python 3 from activestate.com. Activestate includes the pip and/or pip3 commands, and a large collection of Python packages with their Python downloads, including many data science packages. At this writing, Activestate encourages the use of their Python 3.5.4 edition for the best data science support. They do provide a Python 3.6 installer, but it does not include the full set of packages provided by their 3.5 installer.

Pypy Command Version Installation path to specify in configuration files
python 2.7.14 C:/Python27/python.exe (or python27.exe)
python3 3.5.4 C:/Python35/python.exe (or python35.exe)
python3 3.6.0 C:/Python36/python.exe (or python36.exe)

Because Activestate bundles data science packages with their installer, including Tensorflow and OpenCV, you must update those packages to the latest release after installing Activestate Python 3.5. Use commands like the following:

pip3 install --update tensorflow
pip3 install --update opencv-python

Windows: Pypy

Pypy is an alternative Python implementation that claims a significant speed advantage over CPython-based Python implementations. Download either a Python 2.7 or 3.x equivalent from pypy.org. Pypy provides:

Python Command Version Python Equivalent Installation path to specify in configuration files
pypy 5.9.0 2.7.13 Delivered as zip files with no default installation path. Unzip the contents and arrange PATH and PYTHONPATH as you require. For example:
  • C:/Pypy2/pypy.exe

  • C:/Pypy3/pypy3.exe

pypy3 5.10.1 3.5.3

There are the pip and pip3 equivalents specific for Pypy; getting those installed and using them to download packages to pypy's site-packages directory can be daunting.

MacOS: Default and Homebrew

MacOS ships with Python 2.7.10 as of macOS Sierra 10.12.6 and High Sierra 10.13.4. You can install pip for the macOS Python 2.7 with the following command:

sudo easy_install pip

You can obtain Python 3.x from Homebrew; at this writing, the version is 3.6.4.

Python Command Version Installation path to specify in configuration files
python 2.7.10 /usr/bin/python
python3 3.6.4 /usr/local/bin/python3

After March, 2018, Homebrew's Python 3 includes the pip3 command. If you installed an earlier version of Homebrew's Python 3 but don't have pip3, run these commands:

brew update
brew upgrade
brew postinstall python3

You can also download and install Python 3 manually from python.org.

MacOS versions since 10.11 El Capitan have included a feature called System Integrity Protection, or SIP. SIP restricts the ability of processes to replace, upgrade, or overwrite commands that ship with macOS, and that includes /usr/bin/python. SIP also makes it difficult to use pip to install or upgrade certain packages already in place as part of the macOS-provided Python 2.7.10 installation.

This feature can prevent you from installing complex data science packages such as Tensorflow, which has many dependencies. The best workarounds are to either use Homebrew's Python 3 with Tensorflow, or to install Homebrew's python2 package to bypass the Python 2 installation shipped with macOS.

MacOS: Pypy from Homebrew

Homebrew makes Pypy available for macOS in two releases:

Python Command Version Python Equivalent Installation path to specify in configuration files
pypy 5.10.0 2.7.13 /usr/local/bin/pypy
pypy3 5.10.1 3.5.3 /usr/local/bin/pypy3

Each Pypy version from Homebrew installs with its own pip_pypy or pip_pypy3 command.

To use Pypy on macOS, make sure you have configured your locale setting correctly, either in the shell environment inherited by Studio, or explicitly set in the Environment tab of the Run Configuration for any EventFlow module that includes a Python operator. In particular, you may need to set the LANG environment variable equal to en_US.UTF-8 (or to the equivalent setting for your locale). See locale(1) and the LANG environment variable in your platform's reference documentation.

Linux: Python from Yum

This section applies both to RHEL 7 and CentOS 7.

There is no default installation configuration for RHEL or CentOS, and the packages you get depend on the installation options you choose at installation time. But even after choosing the Development and Creative Workstation and Python installation options, you still end up with only Python 2.7.5 and no pip command.

Run the following commands on CentOS 7 to install Python 3.4.5 and both pip and pip3 commands:

sudo yum install epel-release
sudo yum install python34 python34-devel python34-setuptools
cd /usr/lib/python3.4/site-packages
sudo -H python3 easy_install.py pip

On RHEL 7, replace the first line above with the following line, assuming your site policies allow you to install from the epel repository:

sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Next, upgrade the pip and pip3 commands to their latest versions with:

sudo -H pip3 install –upgrade pip
sudo -H pip install –upgrade pip

These commands install the following versions:

Python Command Version Installation path to specify in configuration files
python 2.7.5 /usr/bin/python
python3 3.4.5 /usr/bin/python3

You can also download and install Python 3 manually from python.org.

Linux: Pypy

On either RHEL 7 or CentOS 7, the following command installs the Python 2 compatible release of Pypy:

sudo yum install pypy

At this writing, no Python 3 compatible release of Pypy is available in the RPM databases used by the yum command on RHEL or CentOS 7. You can install pypy3 manually from the “portable Linux binaries” link on the Pypy.org Downloads page.

This provides you with:

Python Command Version Python Equivalent Installation path to specify in configuration files
pypy 5.0.1 2.7.10 /usr/bin/pypy
pypy3 5.10.1 3.5.3 Delivered as a tar.gz2 file with no default installation path. Untar the contents and arrange PATH and PYTHONPATH as you require. For example:
  • /usr/local/pypy3/bin/pypy3