The TIBCO StreamBase® CSV File Reader is an embedded adapter that reads comma-separated value (CSV) files.
An embedded adapter is an adapter that runs in the same process as StreamBase Server. The CSV File Reader reads records from a CSV file, creates tuples from these records, then sends these tuples to the operator downstream from it in its StreamBase application. A record typically consists of a line in the CSV file. If quoted, however, a record can span more than one line in the file.
The CSV File Reader is similar to an input stream that supplies its own input from a CSV file. As with an input stream, a schema needs to be specified for the CSV File Reader. The schema used by the CSV File Reader is specified in the Edit Schema tab of the Properties view in StreamBase Studio.
An embedded adapter that reads from a CSV file differs from an external data source, in that it consumes its input file as rapidly as it can. This means the rate at which it consumes records and produce tuples is governed only by the speed at which it can read records from disk and create tuples from them. This would not typically be true of an external data source and it may not be the desired behavior. A property of the CSV File Reader, Period, is used to govern the rate at which the CSV File Reader consumes records. The period is the amount of time that the CSV File Reader pauses between consuming records. That is, the CSV File Reader reads one record, processes it to completion, pauses for the specified period, and then reads another record.
The name of the CSV file is specified as a property of the CSV File Reader. If you
use the File Name field without the Start Control Port option, the specified file must exist in the
same project folder in StreamBase Studio, or in a referenced project's folder. If you
use the File Name field in conjunction with the
Start Control Port option, you can specify a relative
or absolute path to the CSV file. If you specify a relative path, the named file is
searched for in the directory specified in the StreamBase Server configuration file.
In the <global>
section, look for the operator-resource-search
parameter. By default, it is commented out.
Uncomment the element and specify a path. For example:
<global> <operator-resource-search directory="/home/sbuser/mysbapps/resources"/> </global>
The size of a CSV file may be limited by practical considerations, and it may not be practical to provide the desired amount of data in a single file. One possible solution is to iterate over one CSV file a number of times, which is provided for by the Repeat property. If 0 is specified for Repeat, then the CSV File Reader iterates over the CSV file indefinitely.
Note that the CSV file can be either imported into your StreamBase Studio project, or created and edited in Studio. To create a new one, select
> > . In the New File dialog, specify the file's name and project. A new, empty file is opened in a text editor, where you can edit and save it.
The CSV File Reader allows you to specify a string that, when encountered in an
incoming CSV field, will be translated into a null tuple field value. The default
string is null
, but you can specify any string in the
NULL String property.
The CSV File Reader can read files compressed in the zip or gzip formats,
automatically extracting the file to be read from the zip or gzip archive file. For
this to work, the adapter requires the target file to have the extension .zip
, .gz
, or .bz2
file and expects to find exactly one CSV file inside each
compressed file. This feature allows the adapter to read market data files provided
by a market data vendor in compressed format, without needing to uncompress the files
in advance.
The CSV File Reader considers lines starting with the number sign (#
), also known the hash character, to be comments and discards them.
This section describes the properties you can set for this adapter, using the various tabs of the Properties view in StreamBase Studio.
In the tables in this section, the Property column shows each property name as found in the one or more adapter properties tabs of the Properties view for this adapter.
Name: Use this required field to specify or change the name of this instance of this component, which must be unique in the current EventFlow module. The name must contain only alphabetic characters, numbers, and underscores, and no hyphens or other special characters. The first character must be alphabetic or an underscore.
Adapter: A read-only field that shows the formal name of the adapter.
Class name: Shows the fully qualified class name that implements the functionality of this adapter. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.
Start options: This field provides a link to the Cluster Aware tab, where you configure the conditions under which this adapter starts.
Enable Error Output Port: Select this check box to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.
Description: Optionally enter text to briefly describe the component's purpose and function. In the EventFlow Editor canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.
Property | Data Type | Default | Description |
---|---|---|---|
File Name | drop-down list | None |
The name of the CSV file to read, without any path. The specified file must be in the current project folder, or in a referenced project's folder. You must enter a file name in this field, or enable the Start Control Port, or both. If Start Control Port is disabled, the file specified in this field is the only file to be read by the current adapter instance. If Start Control Port is enabled, a file specified in this field is the default file to be read, as described below.
This adapter automatically uncompresses the input file before attempting
to interpret the CSV content, if the input file was compressed with Zip
and has the |
Read As Resource | checkbox | enabled | If enabled and the path given is not absolute then the file will be resolved as a resource file |
Use Default Charset | check box | Selected | If selected, specifies whether the Java platform's default character set is to be used. If cleared, a valid character set name must be specified for the Character Set property. |
Character Set | string | None | The name of the character set encoding that the adapter is to use to read input or write output. |
Start Control Port | check box | Cleared |
Select this check box to give this adapter instance an input port that you can use to control which CSV files to read, and in which order. The input schema for the Start Control Port must have at least one field of type string. You can optionally define a more complex schema for this port for use with the Map Control Port to Event Port option; in this case, the first field must be of type string and the second field used for user must also be of type string. The schema is typechecked as you define it. If the File Name property is empty, the adapter begins reading when it receives a control tuple on this port. Specify the full, absolute path to the CSV file to be read in the first field of the tuple, and optionally specify the user as the second field. There is no need to surround the full path with quotes if the path contains spaces. If the File Name property specifies a file name, there are two cases:
|
Start Event Port | check box | Cleared |
Select this check box to create an output port that emits an informational tuple each time a CSV file is opened or closed. The informational tuple schema has five fields:
For a file open event, the event port tuple's
For a file close event,
If you enable the Map Control Port to Event
Port option below, the event port tuple also includes a sixth
field named When running in Studio, remember that tuples from more than one output port may appear in the Output Streams view in a different order than they are emitted from the adapter. Thus, you may see the Close event appear on the output of this event port while data tuples are still displaying. |
Map Control Port to Event Port | check box | Cleared |
Select this check box to pass all information received on the control
input port to the event output port. When enabled, this property adds a
field of type tuple named |
Tail Mode | check box | Cleared |
Select this check box to process records as they are appended to the CSV file. Newly appended records are not emitted until the reader detects the line ending character appropriate for the operating system. |
Ignore Existing Records | check box | Selected |
Select this check box to ignore existing records in the CSV file when in tail mode. |
Tail Update Interval | int | 1000 |
The time, in milliseconds, between checks for updates to the CSV file when in tail mode. |
Log Level | drop-down list | INFO | Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level will be used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE. |
Property | Data Type | Default | Description |
---|---|---|---|
Field Delimiter | string | , (comma) |
The delimiter used to separate tokens in the input file. Control characters
can be entered as &#ddd; where ddd is the character's ASCII value. For example, use
	 for a tab
character. A special exception also allows the \t character to be used in this field to
represent a tab delimiter.
|
String Quote Character | string | " (double quote) | The optional quote character used in pairs to delimit string constants. |
Timestamp Format | string | MM/dd/yyyy HH:mm:ss aa |
The string format used to represent timestamp fields extracted from the
input file. The default and ideal is the form expected by the
If a timestamp value is read that does not match the specified format
string, the entire record is discarded and a WARN message appears on the
console that includes the text |
Lenient Parsing | boolean | Selected |
Set this to true if you would like to parse timestamp values that do not conform to the specified format using default formats. |
NULL String | string | None |
The string which, if encountered in a CSV field when reading a file, is to
be translated as a null tuple field value for the corresponding tuple
field. If unspecified, the default string is null . You can designate any string to be considered the
null value string.
|
Preserve Whitespace | boolean | Cleared | Set this to true to preserve leading and trailing white space in string fields. |
Header Type | drop-down list | No header |
The type of header used in the CSV file. Choose one of the following:
|
Incomplete Records | radio button | Populate with nulls |
Specifies what should be done when the adapter reads a record with less than the required number of fields.
|
Discard Empty Records | check box | Selected |
This is a special case to handle empty lines. If rows with some fields must send output, but not empty lines, leave this selected. Unselect this to send empty tuples for empty lines. |
Log Warning | check box | Cleared |
Select this check box if warning messages are to be logged when incomplete records are encountered. If cleared, no warning messages are logged for records with less than the required number of fields. |
Property | Data Type | Default | Description |
---|---|---|---|
Repeat | int | 1 | The number of times to iterate over the CSV file. 0 specifies iterating indefinitely. Note that if you send a new file to be read using the control port when this control is set to iterate indefinitely means the new file is not picked up. |
Emit Policy | Radio button | Periodic |
Specifies whether to emit tuples with a regular period or based on a field
in the data.
Specify Periodic, the default setting, to use the Period property below. In this case, the two Time field properties are dimmed. Specify Field based to use a field in the output tuple to control the tuple emission rate. In this case, the Period property is dimmed. Specify the field to use in the Time field property, and specify how to use that field with a selection in the Time field meaning property. |
Period | int | O | Active only when Emit Policy is Periodic. Specifies the time, in milliseconds, to wait between the processing of records. |
Time field meaning | Drop-down list | Emission times relative to the first record. |
Active only when Emit Policy is Field based. In the drop-down list, select one of the
following options to specify how to use the time field named in the next
property.
|
Time field | string | none | Active only when Emit Policy is Field based. Specifies the name of a field in the output tuple whose values are used to control the tuple emission rate. |
Capture Transform Strategy | radio button | FLATTEN | The strategy to use when transforming capture fields for this operator: FLATTEN or NEST. |
Use the Edit Schema tab to specify the schema of the output tuple for this adapter. For general instructions on using the Edit Schema tab, see the Properties: Edit Schema Tab section of the Defining Input Streams page.
Use the settings in this tab to allow this operator or adapter to start and stop based on conditions that occur at runtime in a cluster with more than one node. During initial development of the fragment that contains this operator or adapter, and for maximum compatibility with TIBCO Streaming releases before 10.5.0, leave the Cluster start policy control in its default setting, Start with module.
Cluster awareness is an advanced topic that requires an understanding of StreamBase Runtime architecture features, including clusters, quorums, availability zones, and partitions. See Cluster Awareness Tab Settings on the Using Cluster Awareness page for instructions on configuring this tab.
Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.
Caution
Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.
Typechecking fails if the schema does not have at least one parameter, if the Delimiter is not a single character string, if the QuoteChar is longer than one character, or if the TimestampFormat is malformed. The File Name field fails to typecheck only if it is blank and you have not enabled the Start Control Port option.
A warning is emitted if the File Name property is empty and a null control tuple is received on the Start Control Port.
On suspend, the CSV File Reader adapter finishes processing the current record, outputs the tuple, and then pauses. The input file remains open and the adapter retains its position in the file. The adapter will stay paused until it is either shutdown or resumed.
On resumption, the CSV File Reader adapter continues processing with the next record in the input file.