Using Product Features

This chapter describes the key StreamBase® features and shows how to use them to ensure optimal application performance.

Managed Objects

Features:
  • Transactional.

  • Persisted in shared memory.

  • Shared across multiple JVMs.

Cost:

Compared to a POJO, a managed object consumes additional processing associated with providing transactional coherency, additional temporary shared memory resources associated with providing rollback capability, and shared memory associated with persisting the object.

Usage:
  • As a replacement for placing object state in a database.

  • To transactionally synchronize multi-threaded access to shared application data.

  • To provide in-memory objects, which can be navigated to with keys.

  • When application state needs to be persisted across multiple invocations of the JVM.

  • When application state needs to be shared between multiple JVMs.

Avoid:
  • For temporary objects.

  • For data, which does not need to be persisted.

Transactions

Features:
  • Provide multi-reader, single writer object locking.

  • May lock both managed objects and transactional POJOs.

  • Automatic deadlock detection.

  • Automatic rollback of modifications to transactional resources when a deadlock or error is encountered.

Cost:

Additional processing for each object that is locked. Additional processing and temporary heap space for each transactional field, which is modified. Additional processing for deadlock detection. Temporarily blocked threads when there is transaction contention.

Usage:
  • Used to access managed objects.

  • May be used to transactionally isolate and protect modifications to POJOs.

  • Used when multiple reader/single writer access to a resource is desired. Provides scalability for multiple readers running simultaneously in multiple threads, while still providing data consistency through exclusive write locking.

  • Small transactions scale better than large transactions.

Avoid:
  • Using transactions to manage non-transactional resources (for example, Network).

  • Using transactions when transactional semantics for a resource are not required (e.g a counter that needs to atomically increment but never rollback).

  • Deadlocks. Although the StreamBase® runtime automatically rolls back and replays deadlocked transactions, this is very expensive compared to avoiding the deadlock entirely. If deadlocks are seen in your testing, the involved code should be reorganized or rewritten to eliminate the possibility of deadlock.

  • Promotion locks. When two threads concurrently run the same code path containing a promotion lock, a deadlock is generated. Several different techniques can be used to eliminate promotion locks:

    Changing the code to take a write lock instead of a read lock at the first access in the transaction to the managed object to be modified.

    When finding an object through a query, use either LockMode.WRITELOCK or LockMode.NOLOCK.

    When iterating objects from ManagedObject.extent() or KeyQuery.getResults() use either LockMode.WRITELOCK or LockMode.NOLOCK.

    When the modification of an already read-locked object does not need to be done in the same transaction, move it to an @Asynchronous method, and it will run in another transaction after the current transaction commits.

  • Transaction lock contention. When a transaction is blocked waiting to acquire a lock, it remains blocked at least until the transaction holding the lock commits or cancels. It may remain blocked longer if there are multiple threads competing for the same transaction locks.

  • Long-running transactions. Transactional resources in multi-threaded applications are generally shared between threads. Locking a resource in a long running transaction can block other threads for the duration of the transaction. Short running transactions scale better than long-running transactions.

  • Large transactions (those that contain many locked resources). Large transactions tend to be more prone to generating contention and deadlocks. When there is contention between large transactions, even if there are no deadlocks, the deadlock detection becomes more expensive.

Summary

Transactions are a powerful tool for maintaining application data consistency and scaling. But this feature comes at a cost. Avoid using transactions where they are not necessary.

Java Monitors

Features:
  • Monitors (the Java synchronize keyword) provide a simple mutual exclusion mechanism.

  • Lighter weight than transactions.

  • Easy to cause undetected deadlocks.

  • Multiple threads sharing read access to a resource become single-threaded when accessing the resource.

Usage:
  • Use a monitor when synchronization is required for non-transactional resources.

Avoid:
  • Using monitors on transactional resources (they are already protected by transaction locking).

Transaction Isolation Level

Use of the READ_COMMITTED_SNAPSHOT isolation level carries a performance penalty. An extra shared memory copy of the object data must be made the first time the data is accessed with a transaction. Subsequent accesses then use the read image, and commit frees the memory.

The default isolation level, SERIALIZABLE, does not carry this penalty.

Keys and Indexes

Features:
  • Keys are only allowed on managed objects.

  • Allows the application to quickly and efficiently navigate to a unique managed object or group of managed objects.

  • Supports unique, non-unique, ordered and unordered keys and queries.

Cost:

Each key requires additional processing resources at object creation time, additional shared memory resources.

Usage:
  • Use keys as you would use an index in a database.

  • Use unique keys instead of extent iteration for finding a single object.

  • Use non-unique keys instead of extent iteration for finding a group of ordered or unordered objects.

Avoid:
  • Using keys on objects that do not require navigation to find them.

  • Defining unnecessary key fields.

High Availability

Features:
  • Transparent, transactional, high-performance replication of object data across nodes.

  • Horizontal scaling of data using dynamic or static data distribution policies.

  • Transparent routing of data to a partition or node.

  • High performance, automated support for migration of object ownership from a failed active node to a replica node.

Cost:

Additional CPU cycles and memory resources for managing the internal transaction resources when modifying a managed object. Additional network I/O for propagating the modifications to the replica nodes.

Reads of highly available objects have the same performance as reads of managed objects. No extra cycles are consumed and no network I/O is generated.

Static data distribution policies require node names to be explicitly configured, making configuration of elastic clusters operationally difficult.

Usage:
  • Use highly available objects to provide non-stop access to application data in the case of node failure.

  • Use partitions with multiple replica nodes to provide a transparent, transactional push mechanism of object data to a group of nodes.

  • Use highly available object methods to run behavior on the currently active node for a partition.

  • Use data distribution policies to transparently scale an application load horizontally across multiple nodes.

  • Use dynamic data distribution policies until performance testing on actual application indicates that optimal data locality is required to achieve latency targets.

Avoid:
  • Modifying highly available objects unnecessarily. Modifications cause network I/O and processing cycles on the replica nodes. If there is data being modified that is not necessary for the application to see after a failover, do not keep this data in a highly available object. Use either managed objects or POJOs.

    Note

    In comparison to managed objects and POJOs, a highly available object incurs extra processing costs even when there are no replica nodes defined for its partition.

  • Making highly available objects larger than necessary. Each time a modification occurs, the entire object is copied to the replica nodes.

  • Replicating object data to more nodes than is required. Each additional replica node requires additional network I/O and processing.

  • For simple load balancing, consider using a hardware-based solution instead of the location transparent routing provided by highly available objects.

  • Static data distribution policies if application latency requirements can be met with dynamic data distribution policies.

Distribution

Features:
  • Direct application access to the creation of remote objects and their data.

  • Direct application access to remote method invocation.

  • Optionally cached on remote nodes.

Avoid:
  • Distributed deadlocks. Distributed deadlock detection uses a timeout to detect a deadlock. This implies that a distributed transaction waits the entire value of the timeout value before a deadlock is reported. During this time, the transaction is stalled.

  • For simple load balancing, consider using a hardware-based solution instead of the location transparent routing provided by distributed objects.