Using the Field Serializer Operator

This topic explains how to use the Field Serializer operator and describes the configuration settings you can make in the operator's Properties view.

Introduction

The Field Serializer operator is a Java operator that provides a way to serialize the unused fields of a large tuple into a single blob field, leaving untouched the fields that your application must process. At the end of your processing chain, after processing the fields of interest, you can deserialize the blob field to reconstruct the tuple's unused fields. This effectively compresses a large tuple as it passes through your application, for a potential throughput and performance improvement.

The Field Serializer operator is a member of the Java Operator group in the Palette view in StreamBase Studio. Select the Field Serialize operator from the Insert an Operator or Adapter dialog, which you invoke with one of the following methods:

  • Drag the Adapters, Java Operators token from the Operators and Adapters drawer of the Palette view to the canvas.

  • Click in the canvas where you want to place the operator, and invoke the keyboard shortcut O V

  • From the top-level menu, invoke Insert>Operator>Java.

From the Insert an Operator or Adapter dialog that opens, select Field Serialize and double-click or press OK.

The Field Serializer operator is expected to be used in matched pairs: one in serialize mode, and another farther downstream in deserialize mode.

For example, you might have a multi-module application that must process a complex data feed whose incoming tuple has 100 fields. But your StreamBase application only needs to analyze and modify 10 of those fields. In many applications, you can simply discard the unused 90 fields with a Map or Filter operator near the beginning of your processing chain. But there are scenarios where you must preserve the entire incoming tuple throughout, perhaps for consumption by another application downstream that expects the full tuple. In these cases, the StreamBase application would be carrying 90 untouched fields through every step of processing.

In those cases, you can use a Field Serializer operator in serialize mode to specify:

  • The field names of the 90 fields your application does not need.

  • The name of a field to be appended to the tuple by the operator.

The Field Serializer operator serializes the specified 90 fields into the appended field. Thus, in this example, the incoming tuple has 100 fields, but the outgoing tuple has 11 fields: the incoming 10 that you did not mark for serialization, plus the appended field that contains the serialized 90 fields.

Farther downstream in your application, specify another Field Serializer operator, this time in deserialize mode. In this case, the incoming tuple must include, at minimum, the blob field that contains the serialized 90 fields. The incoming tuple can also contain the 10 fields processed by your application. The resulting output tuple is the original 100-field tuple, with 10 fields processed and 90 fields unchanged.

Properties View Settings

This section describes the properties you can set for a Field Serializer operator, using the various tabs of the Properties view in StreamBase Studio.

General Tab

This section describes the properties on the General tab in the Properties view for the Field Serializer operator.

Name: Use this required field to specify or change the name of this instance of this component. The name must be unique within the current EventFlow module. The name can contain alphanumeric characters, underscores, and escaped special characters. Special characters can be escaped as described in Identifier Naming Rules. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class name: Shows the fully qualified class name that implements the functionality of this operator. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start options: This field provides a link to the Cluster Aware tab, where you configure the conditions under which this operator starts.

Enable Error Output Port: Select this checkbox to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally, enter text to briefly describe the purpose and function of the component. In the EventFlow Editor canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

This section describes the properties on the Operator Properties tab in the Properties view for the Field Serializer operator.

Property Data Type Default Description
Output type Radio buttons Serialize

Choose Serialize or Deserialize to specify the operation of this operator instance.

Serialized Field string serializedFields

For operators whose Output type is Serialize, specifies the name of the field this operator is to append to the outgoing tuple. The appended field is always of data type blob, and contains the serialization of the fields as listed in the Edit Schemas tab.

For operators whose Output type is Deserialize, specifies the name of the blob field in the incoming tuple that contains a set of fields serialized by an upstream Field Serializer operator. You must specify the expected field contents of the serialized field in the Edit Schemas tab.

Edit Schema Tab

For the Field Serializer operator, use the Edit Schema tab in two cases:

  • For operators whose Output type is Serialize, to specify the fields of the incoming tuple to be serialized into the appended field.

  • For operators whose Output type is Deserialize, to specify the schema of the serialized fields in the incoming field designated as the Serialized Field.

The schema of the fields you serialize must exactly match the schema of the fields you deserialize. It is a best practice to use a named schema to make sure the serialize and deserialize schemas are identical.

Note

Typechecking of your application module cannot validate that the schemas of your serialize and deserialize operators are identical. If you have a schema mismatch in the two operators of a matched pair of Field Serializer operators, the error can only be reported at runtime in the Error Log view in Studio, or on the console for command-line launches of StreamBase Server.

For general instructions on using the Edit Schema tab, see the Properties: Edit Schema Tab section of the Defining Input Streams page.

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Ports

By default, the Field Serializer operator has one input port and one output port.

In its default configuration, the operator is almost a pass-through operator that copies the tuple on its input port to its output port, appending one field of type blob, named in the Serialized Field control on the Operator Properties page. By default, the appended field does not contain any serialized fields. That is, the operator does not serialize any fields until you list the fields to be serialized in the Edit Schemas tab.

You can also add an optional Error Output port, which outputs a StreamBase error tuple for any error thrown by the operator, as described in General Tab.