Unicode Support

Both StreamBase Server and StreamBase Studio are fully capable of processing and displaying Unicode character sets, but neither is configured to do so by default. To see Unicode characters correctly displayed in the Output Streams view in Studio, you must configure both Server and Studio independently.

Follow the instructions on this page to enable Unicode support for Server, Studio, and for StreamBase clients written with the StreamBase Client Library.

Java programmers writing any StreamBase extension that writes to a file (including a Java operator or adapter) must remember to set the Java system property file.encoding=UTF-8 for complete Unicode support for such files. This is independent of the settings described below, which would also require file.encoding=UTF-8.

Configuring StreamBase Server for Unicode Support

Configure StreamBase Server to process Unicode characters in streams by setting three Java system property settings to UTF-8 for the JVM that runs the server. You can make this change in the engine configuration file, by adding systemProperties properties, as shown in the following example:

name = "sbengine1"
version = "1.0.0"
type = "com.tibco.ep.streambase.configuration.sbengine"
configuration = {
  StreamBaseEngine = {
    systemProperties = {"file.encoding" = "UTF-8",
      "sun.jnu.encoding"="UTF-8",  
      "streambase.tuple-charset" = "UTF-8" }
  }
}

(Notice that the separator between streambase and tuple is a period, while the separator between tuple and charset is a hyphen.)

To enforce this change while running applications in Studio, the configuration file must be placed in the src/main/configurations folder of your Studio project.

Configuring StreamBase Studio for Unicode Support

Configure StreamBase Studio to process and display Unicode characters in streams by setting the Java system property streambase.tuple-charset to UTF-8 for the JVM that runs Studio. In this case, you must make the change using the environment variable STREAMBASE_STUDIO_VMARGS.

Important

Remember that the STREAMBASE_STUDIO_VMARGS variable overrides and replaces the default vmargs passed to Studio. If you use the variable for any purpose, you MUST include memory-setting values like the following:

STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M

To use this environment variable correctly, set values for the default arguments –Xms and –Xmx, then add your new setting at the end.

See Java VM Memory Settings for a discussion of alternative settings.

Configure this environment variable globally for your system, or temporarily in the UNIX terminal or StreamBase Command Prompt environment from which you run the sbstudio command. Use a command like the following for Windows. This example is shown on two lines for publication clarity, but should be typed as one long line:

set STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M
    -Dstreambase.tuple-charset=UTF-8

Use a line like the following example for the Bash shell in Linux:

export STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M \
    -Dstreambase.tuple-charset=UTF-8

Configuring StreamBase Clients for Unicode Support

For complete Unicode support, you must configure both ends of any communication with StreamBase Server. This applies to StreamBase client applications written with any of the StreamBase Client Libraries.

For clients written with the StreamBase Java API, you must configure the JVM that runs your client code. You can do this in any of three ways:

  1. Set the streambase.tuple-charset system property to UTF-8 with System.setProperty()in your client code.

  2. Set the environment variable STREAMBASE_TUPLE_CHARSET to UTF-8 in the environment that runs your client application.

  3. Start the JVM that runs your client code with the -Dstreambase.tuple-charset=UTF-8 option.

For clients written with the StreamBase C++, .NET, or Python APIs, you must set the environment variable STREAMBASE_TUPLE_CHARSET=UTF-8 in the environment that runs your client application.

Unicode Strings in Expressions

You can perform expression language operations such as substr() on Unicode strings. Unicode strings on input streams are canonicalized to UTF-8 NFC (Normalization Form C, as described in Unicode Normalization Forms).

With Unicode support enabled as described above, some of the expression language functions that deal with strings have different behavior than in the default configuration with Unicode disabled. For example, for some functions, characters are counted as a number of graphemes with Unicode enabled, but as a number of bytes with Unicode disabled. Each affected function is noted on the Expressions page.

Studio Saves UTF-8 Encoded Files

EventFlow applications are saved in an XML format that specifies UTF-8 encoding, which ensures that any Unicode strings you use in expressions in operators are preserved. In addition, all other files created and saved by StreamBase Studio, such as server configuration files, are saved as Unicode-compliant files with UTF-8 encoding.

The UTF-8 encoding of files created by StreamBase Studio occurs by default, and is independent of the Server, Studio, or client configuration settings described in the first three sections of this page.

Back to Top ^