This sample shows basic usage of the Spark MLib model evaluator. The scenario uses an audit data dataset from the PMML examples at http://dmg.org/pmml/pmml_examples/index.html.
The EventFlow module in this sample shows an example of runtime binomial classification. The input events contain some demographics data about bank customers. The events leave the flow enriched by the Target information computed by the model.
Run this sample in Studio as follows:
If you have not yet loaded the Spark MLib Model sample into Studio, follow the steps in Importing This Sample.
Download Spark MLib 1.6.0 binary distribution from http://spark.apache.org and save the assembly file to the lib directory.
Add the extracted assembly to project's build path.
In the Project Explorer view, open the
Double-click to open the
Make sure the application is the currently active tab in the EventFlow Editor, then click the Run button. This opens the SB Test/Debug perspective and starts the application.
In the Feed Simulations view, select the
audit.sbfsand click Run.
In the Application Output view, select All Output Streams and view the data emitted by the StreamBase application. It starts with tuples similar to the following:
Target=0, Age=38.0, Employment=Private, Education=College, Marital=Unmarried, Occupation=Service, Income=81838.0, Gender=Female, Deductions=0.0, Hours=72.0 Target=0, Age=35.0, Employment=Private, Education=Associate, Marital=Absent, Occupation=Transport, Income=72099.0, Gender=Male, Deductions=0.0, Hours=30.0 Target=0, Age=32.0, Employment=Private, Education=HSgrad, Marital=Divorced, Occupation=Clerical, Income=154676.74, Gender=Male, Deductions=0.0, Hours=40.0 Target=0, Age=45.0, Employment=Private, Education=Bachelor, Marital=Married, Occupation=Repair, Income=27743.82, Gender=Male, Deductions=0.0, Hours=55.0 Target=0, Age=60.0, Employment=Private, Education=College, Marital=Married, Occupation=Executive, Income=7568.23, Gender=Male, Deductions=0.0, Hours=40.0 Target=0, Age=74.0, Employment=Private, Education=HSgrad, Marital=Married, Occupation=Service, Income=33144.4, Gender=Male, Deductions=0.0, Hours=30.0 Target=1, Age=43.0, Employment=Private, Education=Bachelor, Marital=Married, Occupation=Executive, Income=43391.17, Gender=Male, Deductions=0.0, Hours=50.0 Target=0, Age=35.0, Employment=Private, Education=Yr12, Marital=Married, Occupation=Machinist, Income=59906.65, Gender=Male, Deductions=0.0, Hours=40.0 Target=0, Age=25.0, Employment=Private, Education=Associate, Marital=Divorced, Occupation=Clerical, Income=126888.91, Gender=Female, Deductions=0.0, Hours=40.0 Target=0, Age=22.0, Employment=Private, Education=HSgrad, Marital=Absent, Occupation=Sales, Income=52466.49, Gender=Female, Deductions=0.0, Hours=37.0 Target=0, Age=48.0, Employment=Private, Education=College, Marital=Divorced, Occupation=Service, Income=291416.11, Gender=Female, Deductions=0.0, Hours=35.0
When done, press F9 or click the Stop Running Application button.
In StreamBase Studio, import this sample with the following steps:
From the top-level menu, select→ .
Sparkto narrow the list of options.
Select Spark from the StreamBase Model Operators category.
StreamBase Studio creates a single project for the Spark operator samples in your current Studio workspace.
When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.
Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.
Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:
See Default Installation Directories for the location of
studio-workspace on your system.
In the default TIBCO StreamBase installation, this sample's files are initially installed in:
See Default Installation Directories for the default location of
studio-workspace on your system.