Regular Expression Socket Reader Input Adapter

Introduction

The TIBCO StreamBase® Regular Expression Socket Reader input adapter allows StreamBase applications to read custom-formatted text from a TCP socket, parsed with regular expressions. It closely resembles the Regular Expression File Reader adapter.

Unlike the Regular Expression File Reader adapter, though, this socket adapter reads input data from a TCP socket connected to a specified external address. Also unlike the file reader, the input source of this adapter is indefinite and naturally timed, so repetition and timing are not specified as properties.

Properties

Property StreamSQL Property Default Description
Host Name HostName none A string specifying the host or IP address to connect to.
Port Port none An integer specifying the TCP port to connect to.
Use Default Charset check box Selected If selected, specifies whether the Java platform's default character set is to be used. If cleared, a valid character set name must be specified for the Character Set property.
Character Set string None The name of the character set encoding that the adapter is to use to read input or write output.
Format Format none A string specifying the regular expression used to parse the input file. This must be a Java regular expression as expected by the java.util.regex.Pattern class. For example, ([^,]*),([^,]*) could be used to parse a simple, two-field CSV file.
Drop Mismatches DropMismatches checked (true) If this check box is selected, records that do not match the regular expression in the Format field are ignored and the next record is immediately examined. Otherwise, a tuple with all fields set to null is emitted when a non-matching input line is encountered.
Timestamp Format TimestampFormat MM/dd/yyyy hh:mm:ss aa Specifies the format used to parse timestamp fields extracted from the input file. Specify a string in the form expected by the java.text.SimpleDateFormat class described in the Oracle Java Platform SE reference documentation.
Log Level drop-down list INFO Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level will be used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.

Use the Edit Schemas tab to specify the schema to output from the adapter.

Typechecking and Error Handling

Typechecking fails if the Format property contains an invalid regular expression, if the number of fields in the output schema does not match the number of capture groups in the Format property, or if the Timestamp Format is malformed.

Malformed records (lines that do no match the Format regular expression) cause the adapter to either ignore the input line or to emit a tuple with all fields set to null, depending on the value of the Drop Mismatches property.

If a field extracted from the file cannot be coerced into the type specified for that field in the schema (for example, if "abc" is extracted where a int field is expected), that field is set to null in the output tuple. Likewise, if a capture group in the Format expression fails to match, but the overall regular expression does match, the corresponding field in the output tuple is set to null.

Suspend/Resume Behavior

On suspend, this adapter closes its input socket.

On resumption, it reconnects its socket and continues reading tuples from it.

This adapter does not leave its socket open during suspend because the input source is naturally timed, so the input source itself cannot be paused. Leaving the socket open could lead to buffering problems, ultimately causing the socket to close with an error.