Contents
This sample demonstrates the use of the Spotfire Streaming Chi-Square operator. The Chi-Square operator can be used to test if one or more pairs of discrete/categorical variables are statistically independent or not. Additionally, the operator computes the Cramer's V value which is a statistical measure of association bounded in the interval [0, 1] with values closer to 0 indicating independence and 1 indicating greater dependence or association. The two-way crosstabulation for each pair of variables including row, column, and total percentages is also available upon request.
The provided StreamBase module uses the randomly generated data set consisting of variables/fields X, Y, and Z which all have values ranging from 1-4. In this sample, the Matrix Operator is used to create a sliding window that keeps collecting 30 rows of data and emits the results every collected 30 rows. The Chi-square operator takes data lists as inputs and tests the null hypothesis that X, Y, and Z are independent.
In StreamBase Studio, import this sample with the following steps:
-
From the top-level menu, click
> . -
In the search field, type
ChiSquare
to narrow the list of options. -
Select Chi-Square operator from the Streaming Datascience Operators category.
-
Click
.
StreamBase Studio creates a single project containing the sample files.
-
In the Project Explorer view, expand
sample_datascience_chisquare
, find and openChiSquare.sbapp
. Make sure the application is the currently active tab in the EventFlow Editor. -
Click the Run button. This opens the SB Test/Debug perspective and starts the application.
-
In the Feed Simulations view, select
ChiSquare.sbfs
and click Run to start feeding the data. -
The Chi-square operater starts streaming randomly generated data which has X, Y, and Z column. It will test the independence between X, Y, and Z variables and send results downstream.
-
When done, press F9 or click the Stop Running Application button.