Managing the Size of Data Tables

Overview

This article describes the various ways to manage LiveView table size by deleting rows from a data table.

To accomplish this, LiveView provides a number of ways to delete rows. These various deletion methods can be used in a number of different application and administration contexts. Which deletion methods you choose to use as an application designer depends on a number of application- and context-specific considerations. However, you are most likely going to have to choose, as it is common for the table sizes to increase without bound over time by virtue of the publishing activity that occurs. If the primary key is, for example, a unique ID that keeps incrementing, then the table can grow without bound and exhaust available resources. So some kind of explicit table size maintenance is commonly required for every LiveView application.

If you want a table to automatically trim itself, methods include using a delete table rule, an Alert Rule with a Delete Rows action or scheduling a periodic DELETE query to run using some scheduling mechanism such as LiveView's built-in cron mechanism, which you can use to issue periodic DELETE queries.

Using Query Methodology or Publish Methodology to Delete Rows

Row deletion methods in LiveView break down broadly into two categories: through the publish methodology, or through the query methodology.

By publish methodology refers to deleting based on sending a tuple to the table's DataIn stream that causes a zero or one row deletion in line with insert and update publications to the same table. The affected table row on the publish path is always identified by primary key.

By query methodology refers to issuing a DELETE query against the table that causes one or more rows to be deleted asynchronously to publish path actions. The affected rows are constrained by evaluating the WHERE clause of the query — an arbitrary predicate. For categorization purposes, the Alert Rule Delete Rows action uses the query path mechanism.

By comparison, a DELETE that occurs via the publish methodology affects zero or one rows, and must specify a primary key value. To delete more than one row in a single action or to delete based on an arbitrary predicate or being time-bound, you must use the query methodology.

The remaining sections below describe the most frequently used row deletion methods available to LiveView applications.

The examples refer to the Hello LiveView application sample and assume that the Hello LiveView application is running.

Query Methodology for Row Deletion

Alert Rule: Delete Rows Action

A powerful and flexible way to automatically trim a LiveView table is to define an Alert Rule that has a Delete Rows action. This is a preferred way of continuously controlling table size based on flexible criteria that may include queries on other tables.

An Alert Rule establishes a continuous query against a table that is evaluated every time an event is published to the table. Alert rules only fire on the transition from false to true, so you must ensure the delete action drives the alert predicate to false. If the alert predicate does not return to false, the delete action will never run again. Also, the table against which the Delete Row action operates may be a different table than the table that the Alert Rule's condition monitors.

When using an Alert Rule action for deleting rows, consider safe-guarding it from firing too often. It is a best practice for Delete actions to suppress subsequent execution for an order of magnitude longer than you expect the delete to take, since trimming a very large table could take a while and trimming the table does not typically have to occur all that often.

Example 1

You can set an alert rule on the LiveViewStatistics table with a condition predicate such as:

(TableName == 'ItemSales') && RecordCount > 20000

You can configure a Delete Rows action for this rule for table ItemSales with a query predicate:

where true order by transactionTime asc limit 5000

This delete action will remove 5000 of the oldest rows in the table.

Example 2

Suppose a LiveView table has a timestamp column that gets something like now() published to it regularly. For the Hello LiveView sample, that would be the ItemsSales table:

when transactionTime between today()-days(1) and today()-hours(23)

The alert will occur when transactionTime transitions to one hour before midnight. You must also set the “suppress subsequent execution for” to something like far greater that you expect the delete to occur (for example, 60 seconds). The delete action predicate should be configured as:

transactionTime < today()

This delete action deletes everything older whose transitionTime is older than midnight of today.

Example 3

Another common Delete Rows action pattern can resemble

when transactionTime between now()-hours(2) and now()

where the delete predicate resembles

transactionTime < now()+hours(1)

Every two hours, this will delete all rows with a transactionTime older than one hour. Again, use an “After running actions, suppress subsequent execution for __ seconds” of some number of seconds, such as 30 or 60 seconds. This ensures the delete starts and completes before more rows hit the time window and issue more deletes.

Example 4

The Hello LiveView sample application contains a Delete Rows Alert Rule called Trim ItemsSales Table that deletes all ItemsSales table rows that are more than four minutes old, every five minutes. The run period is slightly longer than the trim period to give the delete rows action time to run on a very large table before the next alert would run.

There are several interfaces for creating and managing Alert Rules. While the details of these interfaces are beyond the scope of this article, briefly they are:

  • lv-client addalertrule

  • LiveView Web Monitor Interface

  • LiveView Java Client API (LiveViewConnection.getAlertManager())

  • LiveView JavaScript Client API (LiveView.Connection.addAlertRule())

Deleting Rows Using a DELETE Query

LiveQL supports a DELETE query type that deletes all the rows in the specified table that matches the query's WHERE predicate. The DELETE query type is SNAPSHOT only; there are no CONTINUOUS variants of the DELETE query type.

A LiveView client can issue a DELETE query using the command line, the LiveView Delete Adapter, and the LiveView Java and .NET Client APIs.

lv-client delete command

For ad-hoc manual deletes, you can use the lv-client delete command. See lv-client(1) for more information. The general syntax is:

lv-client -u sb://lvserver:port "delete from tablename where predicate"

For example, to delete a row from the Hello LiveView sample, you would use:

lv-client -u sb://localhost:10000 "delete from ItemSales where Item='Wagon'"

LiveView Delete Output Adapter for StreamBase

The Spotfire LiveView Delete Output adapter allows a StreamBase application to delete one or more rows in a LiveView table based on a specified predicate. See LiveView Delete Output Adapter for more information.

LiveView Query, Delete, Publish, and Ready Adapters sample

This sample contains the file lv2sbd.sbapp, which demonstrates a small example of using the LiveView Delete adapter. See the sample's README file for more information.

Publish Methodology for Row Deletion

Publish path row deletes are primary key-based deletes. These have a place in some situations. However, for common table-size and time-based table maintenance activities, the query methodology is required. Given the desire to delete a window of time, you must pose a query as you do not have the primary keys of all rows that fall within the time window. Consider the options (query methodology versus publish methodology) as table maintenance versus record maintenance.

Using a row-delete-rule Element in the Data Table lvconf

A row-delete-rule is an optional rule specifying a predicate that is evaluated against the data in any incoming insert or update. If the incoming data satisfies the predicate and a row with a matching primary key exists, then no update occurs and the row is deleted. If the incoming data satisfies the predicate and no row with a matching primary key exists, then no insert occurs. If the incoming data does not satisfy the predicate, then an insert or update occurs as normal.

Since a row-delete-rule's predicate can only refer to values from the incoming event tuple's field value, it may be helpful to think of it more as an input preprocessing validation, or as a way to do deletions based on "commands" transmitted within the input events itself.

Examples
<!-- delete rows with incoming events more than a day old -->
<row-delete-rule>transactionTime < now()-days(1)</row-delete-rule>
<row-delete-rule>myEventType=='DELETE'</row-delete-rule>

Using a Retention Policy in the Data Table

The Data Table wizard and the Data Table lvconf editor support retention policies to define how long you want table data retained. When setting a retention policy, the current LiveView server time is compared with the timestamp field value, and that becomes the basis for whether a row is deleted.

Using a LiveView Publish Output Adapter

You can use the LiveView Publish Output Adapter to delete rows from a data table. The Delete Field Name adapter property is an optional input boolean field that if true will delete the row corresponding to the value of the primary key fields in the input tuple.

Delete Query Limitations

There are caveats to be aware of when using delete in the following cases:

  • DELETE with LIMIT

  • DELETE ORDER BY

Delete queries against tables with snapshot-parallelism greater than 1 and with a LIMIT specified that actually has effect results in some unexpected characteristics you must take into consideration when performing a delete operation.

One caveat revolves around the fact that there's a single QueryIn stream that goes to all parallel regions. For a read query with a LIMIT, LiveView dumps extra rows at the top level transparently with no side effects. Delete with a LIMIT is different, however. LiveView takes the LIMIT number, divides it by the number of parallel regions, and sends that to the parallel regions. In rare cases, this can cause not exactly the expected rows to be deleted.

The first problem is rounding. For example, if you have two parallel regions and a delete LIMIT set to 3, you will actually only ever delete two rows. If you have two parallel regions and delete LIMIT is 1, you actually delete nothing. If any parallel region has fewer than LIMIT/NUMREGIONS rows while other shards have more than LIMIT/NUMREGIONS rows, then you will delete fewer rows then you requested. The number fewer is the sum of number of rows less than LIMIT/NUMREGIONS in all parallel regions.

DELETE ORDER BY always requires a LIMIT, just like any ORDER BY. For multiple shard tables, the same Delete approach is taken where each shard will delete LIMIT/NUMREGIONS rows, and within the shard the ORDER BY definition is adhered to. However it is possible that one parallel region has a disproportionate number of the rows that would be in the overall table order. In this case it is possible that some rows will be deleted that would not expect to be, while other rows that would expect to be deleted will not be.

Another artifact of multiple parallel region ORDER BY deletes is that rows will be deleted within the parallel region in strict ORDER BY definition. However, queries that the deleted rows participated in will not necessary see rows removed in strict order. The way in which removes are merged from parallel regions and delivered to the client is somewhat random. The removed rows will have some order to them, but not strictly.

One recommendation for using DELETE ORDER BY is to clean out some amount of old, no longer interesting data. For example, when the orders table reaches 10,000 rows, delete 1,000 of the oldest orders. In this scenario, it is less critical if this operation deletes a few fewer rows or if a few of the “wrong” rows are deleted.

In situations where not having the exact expected rows removed is a concern, using ORDER BY is not recommended for those deletes.

Delete with Limit Characteristics

With default snapshot-parallelism (1), using a delete with a LIMIT set will always delete exactly the correct number and expected rows.

The primary use case for delete with LIMIT is to delete a fairly large number of rows (thousands) from fairly large tables. Moreover, the deleted data is in the class of "no longer interesting", or perhaps "least interesting" data.

DELETE with LIMIT should not be used on tables with snapshot-parallelism greater than 1 when an exact set of data needs to be removed. This use case is sometimes characterized by (but not limited to) small LIMIT numbers (tens) and/or small table sizes.