Query TopN Sample

About This Sample

Suppose you want to extract a range of values from a query table, such as the ten highest or lowest stocks or best- or worst-selling items in an inventory. One method is to use a Query operator to repeatedly read all the rows in the table, and then use a series of other operators to process the result, compare prices across stock symbols or products to find the top n. Conversely, you could pre-process tuples upstream from a query table that has just one row containing n fields. Each time a tuple arrives on the input stream, compare values to see if the table needs to be updated.

Both of these options require complicated processing to compare the values and continually calculate the top n values. The QueryTopN sample application demonstrates a simpler method, using the Query operator's built-in option to limit the number of output rows and b-tree indexing.

The sample has one input stream, in which you enter the value of n as the int field howMany. The table is populated from a CSV data file containing randomly generated values, but in a real application data from an input stream or another module would be updating the table dynamically.

The table is indexed by field value in descending order, so that the first n values of the index are always the largest ones. Sending a tuple from the enterN input stream triggers a read operation that outputs just the current top n highest values by setting the Limit field in the Query operator to the input value howMany.

The tuples output from the table are split into two streams and processed by an Aggregate operator and a Map operator, respectively. The aggregate operator uses aggregatelist(tuple(...)) in a predicate dimension to generate a list of the top n tuples. The dimension just has a Close expression, count()=howMany, to do this. The Map operator restores the original field names and drops input field howMany to output n individual tuples on the lower stream.

Importing This Sample into StreamBase Studio

This sample is part of the operator samples. In StreamBase Studio, import the operator samples with the following steps:

  • From the top menu, click FileLoad StreamBase Sample.

  • Type operator to narrow the list of options.

  • Select Operator sample group from the Data Constructs and Operators category.

  • Click OK.

StreamBase Studio creates a single project containing all the operator samples.

Running QueryTopN.sbapp in StreamBase Studio

  1. In the Project Explorer, open the sample you just loaded.

  2. Open the src/main/eventflow folder.

  3. Open the package folder (most samples contain a single package folder. Open the top-level package folder if your sample contains more than one folder).

  4. Open the QueryTopN.sbapp application and click the Run button. This opens the SB Test/Debug perspective and starts the application.

    If you see red marks, wait a moment for the project in Studio to load its features.

    If red marks do not resolve themselves in a moment, select the project and right-click MavenUpdate Project from the context menu.

  5. In the Application Output view, make sure that All Output Streams is selected in the Output stream control.

  6. Enter 1 for howMany (the number of values you want to output) in the Manual Input view and click Send Data.

  7. Observe the output streams in the Application Output view. Note that:

    1. The topNtuples stream contains one tuple having fields value and symbol. It is the highest value in the table.

    2. The topNlist stream contains one tuple, a list containing the above tuple.

  8. Repeat steps 4 and 5, increasing howMany to 2, 3, ..., to see the set of top n values grow.

  9. When done, press F9 or click the Stop Running Application button.

Sample Location

When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.

Important

Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.

Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:

studio-workspace/sample_operator

See Default Installation Directories for the default location of studio-workspace on your system.