Using the Descriptive Statistics Operator

Introduction

The Spotfire Streaming Descriptive Statistics Operator is used to provide basic statistical information for each specified variable including measures of central tendency (e.g. mean) and of dispersion (e.g. standard deviation).

Descriptive Statistics Properties

This section describes the properties you can set for this adapter, using the various tabs of the Properties view in StreamBase Studio.

General Tab

Name: Use this required field to specify or change the name of this instance of this component. The name must be unique within the current EventFlow module. The name can contain alphanumeric characters, underscores, and escaped special characters. Special characters can be escaped as described in Identifier Naming Rules. The first character must be alphabetic or an underscore.

Operator: A read-only field that shows the formal name of the operator.

Class name: Shows the fully qualified class name that implements the functionality of this adapter. If you need to reference this class name elsewhere in your application, you can right-click this field and select Copy from the context menu to place the full class name in the system clipboard.

Start options: This field provides a link to the Cluster Aware tab, where you configure the conditions under which this adapter starts.

Enable Error Output Port: Select this checkbox to add an Error Port to this component. In the EventFlow canvas, the Error Port shows as a red output port, always the last port for the component. See Using Error Ports to learn about Error Ports.

Description: Optionally, enter text to briefly describe the purpose and function of the component. In the EventFlow Editor canvas, you can see the description by pressing Ctrl while the component's tooltip is displayed.

Operator Properties Tab

Property Description
Log Level Controls the level of verbosity the adapter uses to send notifications to the console. This setting can be higher than the containing application's log level. If set lower, the system log level is used. Available values, in increasing order of verbosity, are: OFF, ERROR, WARN, INFO, DEBUG, TRACE.
Lower quartile If enabled, the lower quartile ( the value for which 25% of the observations lie below and 75% above that value) for each specified variable will be computed.
Maximum If enabled, the maximum value for each specified variable will be computed.
Mean If enabled, the mean or average for each specified variable will be computed.
Median If enabled, the median (the value for which 50% of the observations lie above and below that value) for each specified variable will be computed.
Median absolute deviation If enabled, the median absolute deviation for each specified variable will be computed.
Minimum If enabled, the minimum value for each specified variable will be computed.
Missing data deletion Specifies the missing data to be used the by the Descriptive Statistics Operator when computing results. casewise deletion involves removing all cases prior to computation of the results that have missing data on any of the specified variables. Pairwise deletion computes statistics for a given variable using the available data for the variable while ignoring the missing data patterns of all other selected variables.
N If enabled, the number of observations for each specified variable will be computed.
Standard deviation If enabled, the sample standard deviation for each specified variable will be computed.
Sum If enabled, the sum for each specified variable will be computed.
Upper quartile If enabled, the upper quartile (the value for which 75% of the observations lie below and 25% above that value) for each specified variable will be computed.
Variance If enabled, the sample variance for each specified variable will be computed.

Field Select Tab

Property Description
Variable list Specify the list of variables for the analysis. Regular expression matching is supported.

Cluster Aware Tab

Use the settings in this tab to enable this operator or adapter for runtime start and stop conditions in a multi-node cluster. During initial development of the fragment that contains this operator or adapter, and for maximum compatibility with releases before 10.5.0, leave the Cluster start policy control in its default setting, Start with module.

Cluster awareness is an advanced topic that requires an understanding of StreamBase Runtime architecture features, including clusters, quorums, availability zones, and partitions. See Cluster Awareness Tab Settings on the Using Cluster Awareness page for instructions on configuring this tab.

Concurrency Tab

Use the Concurrency tab to specify parallel regions for this instance of this component, or multiplicity options, or both. The Concurrency tab settings are described in Concurrency Options, and dispatch styles are described in Dispatch Styles.

Caution

Concurrency settings are not suitable for every application, and using these settings requires a thorough analysis of your application. For details, see Execution Order and Concurrency, which includes important guidelines for using the concurrency options.

Operator Ports

The operator expects that the variables or fields to be analyzed are of type 'double'. The output tuple will consist of a list of statistics for each variable based upon selections made on the Operator Properties tab. Additionally, the incoming data shall be passed through the node.