The XML to Tuple Java operator converts XML-encoded messages to StreamBase tuples. The operator's input port schema has a single string field that passes an XML-encoded message to the operator. The operator parses the XML message and populates tuple fields corresponding to the elements and attributes found in the message. Each XML message enqueued to the operator results in a single tuple emitted on its output port.
The operator's output schema determines the set of fields retrieved from the XML messages. The hierarchy of the fields in the schema must match that of the elements in the XML message. Fields not present in the XML message are set to null in the emitted tuple.
Repeated XML elements at a given level can be retrieved with a StreamBase field of type list. For example, an XML message
<MyInt>1</MyInt><MyInt>2</MyInt><MyInt>3</MyInt> could be retrieved with a tuple field named MyInt of type
list<int>. If, however, the MyInt tuple field is of type
int, it would be populated with the value of the first
MyInt XML element instance, the subsequent instances would be discarded, and a warning would be emitted for each discarded instance.
Tuple list fields can be used to retrieve not only repeated leaf XML elements (as in the example above), but also repeated
non-leaf elements using fields of type
list<tuple>. One of the StreamBase applications shipped with this operator's sample,
xml2tuple-datatypes.sbapp, illustrates both scenarios.
Support for XML attributes is controlled through an operator property. When attributes are disabled, the tag of an XML leaf
element typically matches the name of the tuple field that receives its value. For example, a tuple field named
MyInt of type
int receives the value
123 when an the XML element
<MyInt>123</MyInt> is processed.
When attributes are enabled, an XML element's value and attributes are retrieved through subtuples of the tuple whose name
matches the XML element's tag. For example, to retrieve the value and attributes of an XML element
<MyInt myattr="myattrvalue">123</MyInt>, a tuple field named
MyInt of type
tuple containing two subfields named
_ATTRIBUTES should be present in the output schema. The
_VALUE subfield would be of type
int and receive
123, while the
_ATTRIBUTES subfield would be of type
list<tuple<string Name, string Value>> and receive a list with a single tuple whose
Value fields would contain
An alternate mechanism is available for retrieving attribute values. Rather than using an
_ATTRIBUTES subfield, a subfield with a name matching the attribute name and type compatible with the attribute value can be used. Thus,
to retrieve the
myattr attribute from the XML element above, a subfield named
myattr of type
string could be used.
When attributes are enabled, a
_VALUE subfield needs to be used to retrieve an XML element's value if no attributes are to be retrieved from that specific element.
This section describes the properties you can set for an XML to Tuple operator, using the various tabs of the Properties view in StreamBase Studio.
Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Operator: A read-only field that shows the formal name of the operator.
Class: Shows the fully qualified class name that implements the functionality of this operator. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start with application: If this field is set to Yes (default) or to a module parameter that evaluates to
true, this instance of this operator starts as part of the JVM engine that runs this EventFlow module. If this field is set to
No or to a module parameter that evaluates to
false, the operator instance is loaded with the engine, but does not start until you send an sbadmin resume command, or until you start the component with StreamBase Manager.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
|XSLT File||The name of the file used to transform the incoming XML before the XML to tuple operations. If this entry is empty no XSLT transform will occur. NOTE: The intermediate XML is displayed in the console when the adapters log level is set to debug.|
|Element Value Field Name||The name of the tuple subfield that receives an XML element's value; the default is
|Attribute Values Supported||If enabled (the default), attributes can be retrieved from XML elements, either through a field specified by the Attribute Values Field Name property or through a field with the same name as the XML attribute.|
|Attribute Values Field Name||The name of the tuple subfield that receives an XML element's attributes; the default is
|Date/Time Format||The format to use in converting StreamBase date-time strings to timestamps in parsing XML messages. The format of the format
string is described in the
|Assume Local Time Zone||If enabled, date-time strings containing no timezone specifier are assumed to represent local time. If disabled (the default), date-time strings are assumed to represent GMT.|
|Include Null List Values||If enabled (the default), Include list values containing nulls in the generated tuple.|
|Null List Value Representation||Representation of null list values in XML. The default is
|Use Namespaces||If enabled (default is disabled), the system tries to match namespaces, as well as XML elements, to schema field names. If
disabled, the namespaces in the XML are ignored. For example, if enabled and an XML element is
|Namespace Field Separator||The string value to join the namespace to the field when evaluating against a schema field. For example if the separator is
|Enable Pass Through Fields||When enabled, all fields from the incoming tuple are replicated in the output. When selecting this option, you must specify the XML Field parameter. Default is disabled.|
|XML Field||Identifies the field of the incoming tuple that contains the XML data. This parameter is only used when Enable Pass Through Fields is enabled.|
|Field Name Replacements||Specifies key-value pairs for mapping XML tag elements to field names. The mappings are applied before trying to match XML
elements to schema field names. For example, if a key is
|Enable Status Port||If enabled (the default), status tuples are sent to port 2. If disabled, No status is reported. If disabled after previously being enabled, the arc connected to port 2 is deleted.|
|Log Level||Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.|
|XPath Expressions||A mapped list of XPath expressions to use and their associated schema field names.|
|XPath Namespaces||The XPath namespaces to resolve when parsing the XML. These values must be set in order to use namespaces with XPath operations.|
The XPath can be any valid XPath v1 statement.
Each XPath must be mapped to a top level schema field name from the edit schema tab.
The schema field can be of any type, but note that if the XPath produces multiple node values and the data type is not a list, then the last node is used as the value.
To produce XML strings from the XPath statement, your field name must end with
__XMLand the data type must be a string or list of strings.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
As shown in the diagram below, the operator has one input port and two output ports to communicate with the surrounding application.
The XML to Tuple operator's ports are used as follows:
XMLIn: The XML message to be converted to a tuple. The XMLIn port has the following schema:
XML, string: The contents of the XML message to be converted.
TupleOut: This output port contains one or more top-level fields, each of which is used to receive the results of XML message with a different top-level tag. For example, a TupleOut schema having top-level fields
MyStringcould be used to parse XML messages
<MyString>This is a string</MyString>, respectively. If
Enable Pass Through Fieldsis checked this port will also contain all the fields from the input port.
Status: A tuple is emitted on this port when an attempt to convert an XML message to a tuple fails. The Status port has the following schema:
type, string: Contains the following value describing the type of event that occurred:
action, string: Contains the following value indicating the conversion failed:
object, string: Contains a string representation of the input tuple.
message, string: Contains a human-readable description of the conversion failure.
time, timestamp: Contains the time of the conversion failure.
inputTuple, tuple: Contains a copy of the input tuple.
The XML to Tuple operator uses typecheck messages to help you configure the operator in your StreamBase application. In particular, the operator generates typecheck messages when:
XMLInport schema does not contain exactly one field of type
Enable Pass Through Fieldsis unchecked.
TupleOutport schema contains a field of type
list<list<?>>(which is not allowed).
TupleOutport schema contains an Element Value Field (default name
_VALUE) of type
list(which is also not allowed).
TupleOutport schema contains an Attribute Values Field (default name
_ATTRIBUTES) that is not of type
list<tuple<string Name, string Value>>.
The Attribute Values Supported property is enabled and No Attribute Values Field Name is specified.
The Element Value Field Name and Attribute Values Field Name properties contain the same non-empty value.
TupleOutport schema contains at least one timestamp field and no Date/Time Format string is specified.
Enable Pass Through Fieldsproperty is checked and no
XML Fieldis specified.
A value is specified in the
Field Name Replacementsproperty but no key.
An invalid Date/Time Format string is specified.
The operator generates messages on the status port when an attempt to convert an XML message to a tuple fails.