Contents
The Python operator allows you to execute any valid Python code within a StreamBase module. The Python operator and its companion Python Instance operator allow Python-centric teams to reuse existing Python code in an event processing context without requiring major rewrites to that code. The Python operators allow the execution of Python-based statistical modeling, data science processing, and machine learning produced with Python packages such as SciPy and TensorFlow.
The Python operators do not turn the StreamBase EventFlow language into a Python interpreter. Instead, Python stateful sessions managed by an external Python interpreter are attached as child processes to EventFlow modules. The Python operators interact with these sessions by setting input variables, executing the script, and reading output variables, potentially emitting Python results as fields of StreamBase tuples. The Python operator guarantees that these three Python management operations are executed sequentially, even if there are multiple operator instances touching the same session, or if the operator is running in asynchronous mode.
The Python operators support several Python interpreters compliant with Python 2.7 or 3.4+. The Python script provided to a particular Python operator must be compliant with the Python interpreter configured for the containing EventFlow application to use. This means that the Python script can only use the libraries and language structures available to the configured Python interpreter. The operator treats the script code as opaque and does not attempt to parse or compile it before sending it to the interpreter. Thus, all the power of the configured interpreter's Python libraries is accessible from the script.
StreamBase includes a set of samples of using the Python and Python Instance operators, described in Python Operator Samples.
Use the Python operator to execute Python code and to optionally emit the results of that execution.
By contrast, you use the Python Instance operator to specify and configure the name and features of a constrained local instance execution environment for the set of Python operators that you configure to use that name and environment.
By default, Python operators execute in the global execution environment of the containing EventFlow module. However, it is possible to use EventFlow concurrency features to run multiple copies of a single module that contains Python operators. In this case, you can use a Python Instance operator to define and name a local execution environment for the Python operators in that concurrent module. In this way, the execution of Python in one concurrent instance of that module does not interfere with other instances.
You can also configure a Python Instance operator with a control port that lets you enable and disable at runtime the execution of Python operators configured to use that Python instance.
The Python operator integration layer uses a minimal set of features from Python 2.7 and Python 3.4+, requiring the pickle library and TCP/IP networking. The constructs used are compatible with Python 2.7 and 3.4+.
The following table provides a summary of the Python interpreters supported by the Python operator. See Python Versions Supported for detailed information for each operating system.
Runtime | Version | Equivalent to ... |
---|---|---|
Python 2 | 2.7.x | — |
Python 3 | 3.4 or newer | — |
PyPy | 5.x | Python 2.7 |
PyPy | 7.x | Python 2.7, 3.5, or 3.6 |
The Python operator's input tuple can include a single field of type tuple whose name is inputVars
. The type of data passed to the Python script from the inputVars
tuple is inferred from the data type of each field.
The operator optionally emits a field of type tuple named outputVars
. When you define the data type for the outputVars
tuple fields, the operator makes its best effort to cast the Python objects to StreamBase data types. The following table
summarizes the data type conversions.
StreamBase type | to Python | from Python |
---|---|---|
boolean | truth | truth, int, float |
int | int | truth, int, float |
double | float | truth, int, float |
string | unicode (Python2), str (Python3) | str, bytes, bytearray, unicode (Python2) |
timestamp | datetime.datetime (absolute), datetime.timedelta (interval) | datetime.datetime, datetime.date, datetime.time (absolute), datetime.timedelta (interval) |
blob | bytes | bytes, bytearray |
list | list | list, tuple, array.array, materialized generator (list) |
tuple | dict | dict |
capture | not supported | not supported |
function | not supported | not supported |
The global Python instance is the default, module-scoped execution environment in which all Python operators in a module perform
their operations. The global Python instance environment is configured in the module's configuration file, such as sbd.sbconf
.
By contrast, the Python Instance operator is configured using the operator's Properties view on the EventFlow canvas.
Use the <adapter-configuration>
element of the sbconf file to define the parameters of the module's Python environment. The following table describes the
valid child elements for the <adapter-configuration>
element.
Property | Type | Default | Description |
---|---|---|---|
instance | string | Provides an arbitrary name for the global instance. This name is displayed in the drop-down list for the Global Instance ID control in the Properties view for both Python and Python Instance operators. This name links the two operators together when using a Python Instance operator. | |
executable | string | python |
Specifies the full path to the Python executable. Use forward slash path separators, even on Windows. If not specified, the operator invokes the command python, which is assumed to be on the PATH. This default Python command is not likely to be the same if you develop on Windows or Mac and deploy on Linux, so be sure to specify the exact path to the Python interpreter you intend to use for each platform. See Python Versions Supported below for example paths. |
useTempFile | boolean | false | This flag indicates that the operator's integration layer should create a temporary file with Python code wrapping the interactions with StreamBase, instead of pushing it through standard input. Using standard input works for most Python interpreters and is the default. |
captureOutput | boolean | false | Modifies the stdout and stderr behavior of the operator. By default, both are chained to the parent process's stdout and stderr. For running tests that include output, you can set this flag to capture the output. |
workingDir | string | . | Specifies the working directory for the launched process. When not specified, the process starts in the same directory as the parent StreamBase process. |
envVariables | section/ setting | Specifies environment variables to be passed before launching the Python interpreter, potentially overriding variables in
the platform's environment. You can use more than one <setting> line.
|
|
arguments | section/ setting | Specifies arguments to be passed to the Python interpreter (not the Python script), and can be defined multiple times. The
usual use for this property it to pass -u , which forces Python to use unbuffered stdin, stdout, and stderr streams. Consult the following references for information
on Python launch parameters:
|
The following shows an example <adapter-configuration>
element for a global instance named pythonic
that uses Python 2.7 on Windows.
<adapter-configurations> <adapter-configuration name="python"> <section name="python"> <setting name="instance" val="pythonic"/> <setting name="executable" val="C:/Python27/python.exe"/> <setting name="useTempFile" val="false"/> <setting name="captureOutput" val="false"/> <setting name="workingDir" val="."/> <section name="envVariables"> <setting name="CUDA_VISIBLE_DEVICES" val="0,1"/> </section> <section name="arguments"> <setting val="-u"/> </section> </section> </adapter-configuration> </adapter-configurations>
This section describes the properties you can set for the Python operator, using the various tabs of the Properties view in StreamBase Studio.
Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Operator: A read-only field that shows the formal name of the operator.
Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to true
, this instance of this operator starts as part of the JVM engine that runs this EventFlow module. If this field is set to
No or to a module parameter that evaluates to false
, the operator instance is loaded with the engine, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Type | Description |
---|---|---|
Instance Type | radio button | Select Global to specify that this operator executes in the global execution environment of the containing EventFlow module, which is defined
in the module's configuration file.
Select Local to specify that this operator executes in the constrained environment defined by a Python Instance operator present in this EventFlow module. |
Local Instance ID | text | Only active when Instance Type specifies Local. In this case, enter the canvas name of a Python Instance operator in this EventFlow module that defines a local Python execution environment. |
Global Instance ID | drop-down list | Only active when Instance Type specifies Global. In this case, the global Python execution environment is defined in this module's configuration file. Use the drop-down control to select the name of a global instance defined in configuration. |
Asynchronous | check box | When selected, the operator executes the Python script using a non-blocking call. This allows long operations to be executed without suspending the processing of the containing module. Make sure module invariants are preserved around the call. Note that, in contrast to the concurrent parallel execution feature of StreamBase, this operator does not allocate additional threads and uses lightweight job scheduling. |
Log Level | INFO | Controls the level of verbosity the adapter uses to issue informational traces to the console. This setting is independent of the containing application's overall log level. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE. |
Property | Type | Description |
---|---|---|
Enable control port | check box | When selected, the Python operator gains a control port on the canvas. Connect an Input Stream to this port whose schema contains
two string fields named Command and Script . When the control port exists, a new Python script can be uploaded to a running Python operator, replacing a script loaded
at startup using either the File or Script text options described next.
To use the control port, send in a tuple with the When testing the control port with the Manual Input view in StreamBase Studio, you can copy a script and paste it into the
single |
Script source | radio button | Select File to specify that the initial executable script comes from the specified Python file.
Select Script text to specify using the code specified in the Script field as the initial executable script. |
Script file | resource chooser | Choose a valid Python file to load. Use the Resource Search Path for Python files. This browse dialog opens in the project's src/main/resources folder.
|
button to browse the current module's
Script | multiline text | Specifies a script of known working Python code to be executed for each incoming tuple. |
Property | Type | Description |
---|---|---|
Output variables | schema definition | Definition for the expected output variables. Each field defined for the schema corresponds to a Python session variable expected to be stored by this operator's script, or any previous call. The output variables must be of a type castable to a StreamBase field type, as shown in Data Type Conversion above. |
The input port transparently accepts any incoming tuple, which can optionally contain a field of type tuple with the reserved
name inputVars
. The outgoing tuple contains a field of type tuple named outputVars
.
-
inputVars — optional tuple containing variables to be set in the Python session.
-
outputVars — tuple whose structure is defined in the Output tab of the Properties view, containing variables read from the Python session.
All other incoming fields are transparently passed. The inputVars
field is not propagated to the output. The outputVars
field is not allowed in the input port.
This section describes the Python interpreters tested for use with the StreamBase Python operators. Python versions are described as of April, 2019.
In general:
-
Your system can have as many Python versions, from as many vendors, as you require, as long as the name or path to the Python executables are different. Thus, a single system can have python, python3, pypy, and pypy3 installed at the same time, if necessary.
-
The topmost Python in your shell's PATH is the dominant Python version for Python called at the shell prompt.
-
However, the StreamBase Python operators use the Python version specified in the adapter's configuration files,
src/main/resources/adapter-configurations.xml
orsrc/main/configurations/sbengine.conf
. -
The Python operator only calls for Python with the name python on the system PATH as a fallback, if there is no Python version specified in configuration.
-
For Windows, use forward slashes in path names.
Download 64-bit Python 2.7 or Python 3.7 from python.org and run the installer provided. Python.org provides:
Python Command | Version | Installation path to specify in configuration files | Notes |
---|---|---|---|
python.exe | 2.7.16 | C:/Python27/python.exe | |
python.exe | 3.7.3 | C:/Program Files/Python37/python.exe | When installed for all users using the Advanced options. |
python.exe | 3.7.3 | C:/Users/sbuser /AppData/Local/Programs/Python/Python37/python.exe
|
When installed for the current user. |
(Replace sbuser
in the Python 3 path with your Windows login name.)
Pythons from python.org come with a minimal set of included Python packages. This means there is less you need to update, compared to ActiveState, but also means you have the freedom to install only the packages you need. Both Python versions include pip and/or pip3 commands.
Download Python 2.7 or Python 3 from activestate.com. Activestate includes the pip and/or pip3 commands, and a large collection of Python packages with their Python downloads, including many data science packages. At this writing, Activestate encourages the use of their Python 3.5.4 edition for the best data science support. They do provide a Python 3.6 installer, but it does not include the full set of packages provided by their 3.5 installer.
Pypy Command | Version | Installation path to specify in configuration files |
---|---|---|
python | 2.7.14 | C:/Python27/python.exe (or python2.7.exe or python2.exe) |
python3 | 3.5.4 | C:/Python35/python.exe (or python3.5.exe) |
python3 | 3.6.6 | C:/Python36/python.exe (or python3.6.exe) |
Because Activestate bundles data science packages with their installer, including TensorFlow and OpenCV, you must update those packages to the latest release after installing Activestate Python 3.5. Use commands like the following:
python3 -m pip install --update tensorflow python3 -m pip install --update opencv-python
Pypy is an alternative Python implementation that claims a significant speed advantage over CPython-based Python implementations. Download either a Python 2.7 or 3.x equivalent from pypy.org. Only 32-bit editions are available for Windows. Pypy provides:
Python Command | Version | Python Equivalent | Installation path to specify in configuration files |
---|---|---|---|
pypy | 7.1.0 | 2.7.13 | Delivered as zip files with no default installation path. Unzip the contents and arrange PATH and PYTHONPATH as you require.
For example:
|
pypy3 | 7.0.0 | 3.5.3 | |
pypy3 | 7.1.0 | 3.6 |
There are the pip and pip3 equivalents specific for Pypy; getting those installed and using them to download packages to pypy's site-packages directory can be daunting.
MacOS ships with Python 2.7.10 as of macOS Sierra 10.12, High Sierra 10.13, and Mojave 10.14. You can install pip for the macOS Python 2.7 with the following command:
sudo /usr/bin/easy_install pip
Use the following path to specify the macOS-shipped Python in the StreamBase Python operator's configuration files.
Python Command | Version | Installation path to specify in configuration files |
---|---|---|
python | 2.7.10 | /usr/bin/python |
MacOS versions since 10.11 El Capitan have included a feature called System Integrity Protection, or SIP. SIP restricts the ability of processes to replace, upgrade, or overwrite commands that ship with macOS, and that includes /usr/bin/python. SIP also makes it difficult to use pip to install or upgrade certain packages already in place as part of the macOS-provided Python 2.7.10 installation.
This feature can prevent you from installing complex data science packages such as TensorFlow, which has many dependencies. The best workarounds are to either use Homebrew's Python 3 with TensorFlow, or to install Homebrew's python2 package to bypass the Python 2 installation shipped with macOS.
You can obtain Python from Homebrew. Use the following command to install Python 3; at this writing, the latest version is 3.7.3:
brew install python
Use this command to install Homebrew's alternative Python 2:
brew install python@2
Use the following paths to specify the Homebrew-installed Python in the StreamBase Python operator's configuration files.
Python Command | Version | Installation path to specify in configuration files |
---|---|---|
python or python2 | 2.7.16 | /usr/local/bin/python (or python2) |
python3 | 3.7.3 | /usr/local/bin/python3 |
Recent versions of Homebrew's Python 3 includes the pip3 command. If you installed an earlier version of Homebrew's Python 3 but don't have pip3, run these commands:
brew update brew upgrade brew postinstall python3
Homebrew makes Pypy available for macOS in two releases installed with these commands:
brew install pypy brew install pypy3
Use the following paths to specify the Homebrew-installed Pypy in the StreamBase Python operator's configuration files.
Python Command | Version | Python Equivalent | Installation path to specify in configuration files |
---|---|---|---|
pypy | 7.1.0 | 2.7.13 | /usr/local/bin/pypy |
pypy3 | 7.0.0 | 3.6.1 | /usr/local/bin/pypy3 |
Each Pypy version from Homebrew installs with its own pip_pypy or pip_pypy3 command.
To use Pypy on macOS, make sure you have configured your locale setting correctly, either in the shell environment inherited
by Studio, or explicitly set in the Environment tab of the Run Configuration for any EventFlow module that includes a Python
operator. In particular, you may need to set the LANG
environment variable equal to en_US.UTF-8
(or to the equivalent setting for your locale). See locale(1) and the LANG environment variable in your platform's reference
documentation.
This section applies both to RHEL 7 and CentOS 7.
There is no default installation configuration for RHEL or CentOS, and the packages you get depend on the installation options you choose at installation time. But even after choosing the Development and Creative Workstation and Python installation options, you still end up with only Python 2.7 and no pip command.
On CentOS 7, run the following commands to install Python 3.4 and both pip and pip3 commands:
sudo yum install epel-release sudo yum install python34 python34-devel python34-setuptools cd /usr/lib/python3.4/site-packages sudo -H python3 easy_install.py pip
On RHEL 7, replace the first line above with the following line, assuming your site policies allow you to install from the epel repository:
sudo rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Next, upgrade the pip and pip3 commands to their latest versions with:
sudo -H pip3 install –upgrade pip sudo -H pip install –upgrade pip
These commands install the following versions:
Python Command | Version | Installation path to specify in configuration files |
---|---|---|
python | 2.7.5 | /usr/bin/python |
python3 | 3.4.9 | /usr/bin/python3 |
As of April 2019, Python 3.4 is deprecated. If your site policies allow, you can download and install Python 3.5 or 3.6 manually from python.org. Or use a Google search for instructions to download and install Python from source.
On either RHEL 7 or CentOS 7, the following command installs the Python 2 compatible release of Pypy:
sudo yum install pypy
At this writing, no Python 3 compatible release of Pypy is available in the RPM databases used by the yum command on RHEL or CentOS 7. You can install pypy3 manually from the “portable Linux binaries” link on the Pypy.org Downloads page, or you can build from source as described on their site.
These options provide you with:
Python Command | Version | Python Equivalent | Installation path to specify in configuration files |
---|---|---|---|
pypy | 5.0.1 | 2.7.10 | /usr/bin/pypy |
pypy3 | 7.1.0 | 3.6.1 | Delivered as a tar.bz2 file with no default installation path. Untar the contents and arrange PATH and PYTHONPATH as you require.
For example:
|