Using the Python Operator

Introduction

  The Python operator allows you to execute any valid Python code within a StreamBase module. The Python operator and its companion Python Instance operator allow Python-centric teams to reuse existing Python code in an event processing context without requiring major rewrites to that code. The Python operators allow the execution of Python-based statistical modeling, data science processing, and machine learning produced with Python packages such as SciPy and TensorFlow.

The Python operators do not turn the StreamBase EventFlow language into a Python interpreter. Instead, Python stateful sessions managed by an external Python interpreter are attached as child processes to EventFlow modules. The Python operators interact with these sessions by setting input variables, executing the script, and reading output variables, potentially emitting Python results as fields of StreamBase tuples. The Python operator guarantees that these three Python management operations are executed sequentially, even if there are multiple operator instances touching the same session, or if the operator is running in asynchronous mode.

The Python operators support several Python interpreters compliant with Python 2.7 or 3.4+. The Python script provided to a particular Python operator must be compliant with the Python interpreter configured for the containing EventFlow application to use. This means that the Python script can only use the libraries and language structures available to the configured Python interpreter. The operator treats the script code as opaque and does not attempt to parse or compile it before sending it to the interpreter. Thus, all the power of the configured interpreter's Python libraries is accessible from the script.

StreamBase includes a set of samples of using the Python and Python Instance operators, described in Python Operator Samples.

Python and Python Instance Operators

Use the Python operator to execute Python code and to optionally emit the results of that execution.

By contrast, you use the Python Instance operator to specify and configure the name and features of a constrained local instance execution environment for the set of Python operators that you configure to use that name and environment.

By default, Python operators execute in the global execution environment of the containing EventFlow module. However, it is possible to use EventFlow concurrency features to run multiple copies of a single module that contains Python operators. In this case, you can use a Python Instance operator to define and name a local execution environment for the Python operators in that concurrent module. In this way, the execution of Python in one concurrent instance of that module does not interfere with other instances.

You can also configure a Python Instance operator with a control port that lets you enable and disable at runtime the execution of Python operators configured to use that Python instance.

Python Compatibility

The Python operator integration layer uses a minimal set of features from Python 2.7 and Python 3.4+, requiring the pickle library and TCP/IP networking. The constructs used are compatible with Python 2.7 and 3.4+.

The following table provides a summary of the Python interpreters supported by the Python operator. See Python Versions Supported for detailed information for each operating system.

Runtime Version Equivalent to ...
Python 2 2.7.x
Python 3 3.4 or newer
PyPy 5.x Python 2.7
PyPy 7.x Python 2.7, 3.5, or 3.6

Data Type Conversion

The Python operator's input tuple can include a single field of type tuple whose name is inputVars. The type of data passed to the Python script from the inputVars tuple is inferred from the data type of each field.

The operator optionally emits a field of type tuple named outputVars. When you define the data type for the outputVars tuple fields, the operator makes its best effort to cast the Python objects to StreamBase data types. The following table summarizes the data type conversions.

StreamBase type to Python from Python
boolean truth truth, int, float
int int truth, int, float
double float truth, int, float
string unicode (Python2), str (Python3) str, bytes, bytearray, unicode (Python2)
timestamp datetime.datetime (absolute), datetime.timedelta (interval) datetime.datetime, datetime.date, datetime.time (absolute), datetime.timedelta (interval)
blob bytes bytes, bytearray
list list list, tuple, array.array, materialized generator (list)
tuple dict dict
capture not supported not supported
function not supported not supported

Configuring a Global Python Instance

The global Python instance is the default, module-scoped execution environment in which all Python operators in a module perform their operations. The global Python instance environment is configured in the module's configuration file, such as sbd.sbconf.

By contrast, the Python Instance operator is configured using the operator's Properties view on the EventFlow canvas.

Use the <adapter-configuration> element of the sbconf file to define the parameters of the module's Python environment. The following table describes the valid child elements for the <adapter-configuration> element.

Property Type Default Description
instance string   Provides an arbitrary name for the global instance. This name is displayed in the drop-down list for the Global Instance ID control in the Properties view for both Python and Python Instance operators. This name links the two operators together when using a Python Instance operator.
executable string python Specifies the full path to the Python executable. Use forward slash path separators, even on Windows. If not specified, the operator invokes the command python, which is assumed to be on the PATH. This default Python command is not likely to be the same if you develop on Windows or Mac and deploy on Linux, so be sure to specify the exact path to the Python interpreter you intend to use for each platform. See Python Versions Supported below for example paths.
useTempFile boolean false This flag indicates that the operator's integration layer should create a temporary file with Python code wrapping the interactions with StreamBase, instead of pushing it through standard input. Using standard input works for most Python interpreters and is the default.
captureOutput boolean false Modifies the stdout and stderr behavior of the operator. By default, both are chained to the parent process's stdout and stderr. For running tests that include output, you can set this flag to capture the output.
workingDir string . Specifies the working directory for the launched process. When not specified, the process starts in the same directory as the parent StreamBase process.
envVariables section/ setting   Specifies environment variables to be passed before launching the Python interpreter, potentially overriding variables in the platform's environment. You can use more than one <setting> line.
arguments section/ setting   Specifies arguments to be passed to the Python interpreter (not the Python script), and can be defined multiple times. The usual use for this property it to pass -u, which forces Python to use unbuffered stdin, stdout, and stderr streams. Consult the following references for information on Python launch parameters:

The following shows an example <adapter-configuration> element for a global instance named pythonic that uses Python 2.7 on Windows.

<adapter-configurations>
  <adapter-configuration name="python">
    <section name="python">
      <setting name="instance" val="pythonic"/>
      <setting name="executable" val="C:/Python27/python.exe"/>   
      <setting name="useTempFile" val="false"/>
      <setting name="captureOutput" val="false"/>
      <setting name="workingDir" val="."/>
      <section name="envVariables">
        <setting name="CUDA_VISIBLE_DEVICES" val="0,1"/>
      </section>
      <section name="arguments">
        <setting val="-u"/>
      </section>
    </section>
  </adapter-configuration>
</adapter-configurations>

Python Operator Properties

This section describes the properties you can set for the Python operator, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true, this instance of this operator starts as part of the JVM engine that runs this EventFlow module. If this field is set to No or to a module parameter that evaluates to false, the operator instance is loaded with the engine, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager.

Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Type Description
Instance Type radio button Select Global to specify that this operator executes in the global execution environment of the containing EventFlow module, which is defined in the module's configuration file.

Select Local to specify that this operator executes in the constrained environment defined by a Python Instance operator present in this EventFlow module.

Local Instance ID text Only active when Instance Type specifies Local. In this case, enter the canvas name of a Python Instance operator in this EventFlow module that defines a local Python execution environment.
Global Instance ID drop-down list Only active when Instance Type specifies Global. In this case, the global Python execution environment is defined in this module's configuration file. Use the drop-down control to select the name of a global instance defined in configuration.
Asynchronous check box When selected, the operator executes the Python script using a non-blocking call. This allows long operations to be executed without suspending the processing of the containing module. Make sure module invariants are preserved around the call. Note that, in contrast to the concurrent parallel execution feature of StreamBase, this operator does not allocate additional threads and uses lightweight job scheduling.
Log Level INFO Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Script Tab

Property Type Description
Enable control port check box When selected, the Python operator gains a control port on the canvas. Connect an Input Stream to this port whose schema contains two string fields named Command and Script. When the control port exists, a new Python script can be uploaded to a running Python operator, replacing a script loaded at startup using either the File or Script text options described next.

To use the control port, send in a tuple with the Command field containing the single supported command, Load, and send an entire Python script in the Script field, with line breaks and indentation correctly honored.

When testing the control port with the Manual Input view in StreamBase Studio, you can copy a script and paste it into the single Script field. This appears to accept only one line, but the entire script is actually accepted. Use up-arrow and down arrow (or Ctrl+up and down-arrow) to see the separate lines of the pasted script.

Script source radio button Select File to specify that the initial executable script comes from the specified Python file.

Select Script text to specify using the code specified in the Script field as the initial executable script.

Script file resource chooser Choose a valid Python file to load. Use the Choose button to browse the current module's Resource Search Path for Python files. This browse dialog opens in the project's src/main/resources folder.
Script multiline text Specifies a script of known working Python code to be executed for each incoming tuple.

Output Tab

Property Type Description
Output variables schema definition Definition for the expected output variables. Each field defined for the schema corresponds to a Python session variable expected to be stored by this operator's script, or any previous call. The output variables must be of a type castable to a StreamBase field type, as shown in Data Type Conversion above.

Input and Output Port

The input port transparently accepts any incoming tuple, which can optionally contain a field of type tuple with the reserved name inputVars. The outgoing tuple contains a field of type tuple named outputVars.

  • inputVars — optional tuple containing variables to be set in the Python session.

  • outputVars — tuple whose structure is defined in the Output tab of the Properties view, containing variables read from the Python session.

All other incoming fields are transparently passed. The inputVars field is not propagated to the output. The outputVars field is not allowed in the input port.

Python Versions Supported

This section describes the Python interpreters tested for use with the StreamBase Python operators. Python versions are described as of April, 2019.

In general:

  • Your system can have as many Python versions, from as many vendors, as you require, as long as the name or path to the Python executables are different. Thus, a single system can have python, python3, pypy, and pypy3 installed at the same time, if necessary.

  • The topmost Python in your shell's PATH is the dominant Python version for Python called at the shell prompt.

  • However, the StreamBase Python operators use the Python version specified in the adapter's configuration files, src/main/resources/adapter-configurations.xml or src/main/configurations/sbengine.conf.

  • The Python operator only calls for Python with the name python on the system PATH as a fallback, if there is no Python version specified in configuration.

  • For Windows, use forward slashes in path names.

Windows: Python.org

Download 64-bit Python 2.7 or Python 3.7 from python.org and run the installer provided. Python.org provides:

Python Command Version Installation path to specify in configuration files Notes
python.exe 2.7.16 C:/Python27/python.exe  
python.exe 3.7.3 C:/Program Files/Python37/python.exe When installed for all users using the Advanced options.
python.exe 3.7.3 C:/Users/sbuser/AppData/Local/Programs/Python/Python37/python.exe When installed for the current user.

(Replace sbuser in the Python 3 path with your Windows login name.)

Pythons from python.org come with a minimal set of included Python packages. This means there is less you need to update, compared to ActiveState, but also means you have the freedom to install only the packages you need. Both Python versions include pip and/or pip3 commands.

Windows: Activestate Python

Download Python 2.7 or Python 3 from activestate.com. Activestate includes the pip and/or pip3 commands, and a large collection of Python packages with their Python downloads, including many data science packages. At this writing, Activestate encourages the use of their Python 3.5.4 edition for the best data science support. They do provide a Python 3.6 installer, but it does not include the full set of packages provided by their 3.5 installer.

Pypy Command Version Installation path to specify in configuration files
python 2.7.14 C:/Python27/python.exe (or python2.7.exe or python2.exe)
python3 3.5.4 C:/Python35/python.exe (or python3.5.exe)
python3 3.6.6 C:/Python36/python.exe (or python3.6.exe)

Because Activestate bundles data science packages with their installer, including TensorFlow and OpenCV, you must update those packages to the latest release after installing Activestate Python 3.5. Use commands like the following:

python3 -m pip install --update tensorflow
python3 -m pip install --update opencv-python

Windows: Pypy

Pypy is an alternative Python implementation that claims a significant speed advantage over CPython-based Python implementations. Download either a Python 2.7 or 3.x equivalent from pypy.org. Only 32-bit editions are available for Windows. Pypy provides:

Python Command Version Python Equivalent Installation path to specify in configuration files
pypy 7.1.0 2.7.13 Delivered as zip files with no default installation path. Unzip the contents and arrange PATH and PYTHONPATH as you require. For example:
  • C:/Pypy2/pypy.exe

  • C:/Pypy3/pypy3.exe

pypy3 7.0.0 3.5.3
pypy3 7.1.0 3.6

There are the pip and pip3 equivalents specific for Pypy; getting those installed and using them to download packages to pypy's site-packages directory can be daunting.

MacOS: Default Python 2

MacOS ships with Python 2.7.10 as of macOS Sierra 10.12, High Sierra 10.13, and Mojave 10.14. You can install pip for the macOS Python 2.7 with the following command:

sudo /usr/bin/easy_install pip

Use the following path to specify the macOS-shipped Python in the StreamBase Python operator's configuration files.

Python Command Version Installation path to specify in configuration files
python 2.7.10 /usr/bin/python

MacOS versions since 10.11 El Capitan have included a feature called System Integrity Protection, or SIP. SIP restricts the ability of processes to replace, upgrade, or overwrite commands that ship with macOS, and that includes /usr/bin/python. SIP also makes it difficult to use pip to install or upgrade certain packages already in place as part of the macOS-provided Python 2.7.10 installation.

This feature can prevent you from installing complex data science packages such as TensorFlow, which has many dependencies. The best workarounds are to either use Homebrew's Python 3 with TensorFlow, or to install Homebrew's python2 package to bypass the Python 2 installation shipped with macOS.

MacOS: Python from Homebrew

You can obtain Python from Homebrew. Use the following command to install Python 3; at this writing, the latest version is 3.7.3:

brew install python

Use this command to install Homebrew's alternative Python 2:

brew install python@2

Use the following paths to specify the Homebrew-installed Python in the StreamBase Python operator's configuration files.

Python Command Version Installation path to specify in configuration files
python or python2 2.7.16 /usr/local/bin/python (or python2)
python3 3.7.3 /usr/local/bin/python3

Recent versions of Homebrew's Python 3 includes the pip3 command. If you installed an earlier version of Homebrew's Python 3 but don't have pip3, run these commands:

brew update
brew upgrade
brew postinstall python3

MacOS: Pypy from Homebrew

Homebrew makes Pypy available for macOS in two releases installed with these commands:

brew install pypy
brew install pypy3

Use the following paths to specify the Homebrew-installed Pypy in the StreamBase Python operator's configuration files.

Python Command Version Python Equivalent Installation path to specify in configuration files
pypy 7.1.0 2.7.13 /usr/local/bin/pypy
pypy3 7.0.0 3.6.1 /usr/local/bin/pypy3

Each Pypy version from Homebrew installs with its own pip_pypy or pip_pypy3 command.

To use Pypy on macOS, make sure you have configured your locale setting correctly, either in the shell environment inherited by Studio, or explicitly set in the Environment tab of the Run Configuration for any EventFlow module that includes a Python operator. In particular, you may need to set the LANG environment variable equal to en_US.UTF-8 (or to the equivalent setting for your locale). See locale(1) and the LANG environment variable in your platform's reference documentation.

Linux: Python from Yum

This section applies both to RHEL 7 and CentOS 7.

There is no default installation configuration for RHEL or CentOS, and the packages you get depend on the installation options you choose at installation time. But even after choosing the Development and Creative Workstation and Python installation options, you still end up with only Python 2.7 and no pip command.

On CentOS 7, run the following commands to install Python 3.4 and both pip and pip3 commands:

sudo yum install epel-release
sudo yum install python34 python34-devel python34-setuptools
cd /usr/lib/python3.4/site-packages
sudo -H python3 easy_install.py pip

On RHEL 7, replace the first line above with the following line, assuming your site policies allow you to install from the epel repository:

sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

Next, upgrade the pip and pip3 commands to their latest versions with:

sudo -H pip3 install –upgrade pip
sudo -H pip install –upgrade pip

These commands install the following versions:

Python Command Version Installation path to specify in configuration files
python 2.7.5 /usr/bin/python
python3 3.4.9 /usr/bin/python3

As of April 2019, Python 3.4 is deprecated. If your site policies allow, you can download and install Python 3.5 or 3.6 manually from python.org. Or use a Google search for instructions to download and install Python from source.

Linux: Pypy

On either RHEL 7 or CentOS 7, the following command installs the Python 2 compatible release of Pypy:

sudo yum install pypy

At this writing, no Python 3 compatible release of Pypy is available in the RPM databases used by the yum command on RHEL or CentOS 7. You can install pypy3 manually from the “portable Linux binaries” link on the Pypy.org Downloads page, or you can build from source as described on their site.

These options provide you with:

Python Command Version Python Equivalent Installation path to specify in configuration files
pypy 5.0.1 2.7.10 /usr/bin/pypy
pypy3 7.1.0 3.6.1 Delivered as a tar.bz2 file with no default installation path. Untar the contents and arrange PATH and PYTHONPATH as you require. For example:
  • /usr/local/pypy3/bin/pypy3