Cluster Upgrades

Nodes in a cluster can be upgraded independently of other nodes in the cluster. These upgrades include:

  • Product versions

  • Application versions

  • Operating system versions

The upgrade functionality ensures that a cluster never needs to be completely brought down for any upgrades.

All nodes in a cluster can be at different product versions. Different product versions are detected when a node joins a cluster and any required protocol negotiation is done automatically at that time. This allows product versions to be upgraded on each node independently.

Different application versions can also be running on each node in a cluster. Application differences between two nodes are detected, and the objects are made compatible at runtime, either transparently, or by application specific code to resolve the inconsistencies. This allows application versions to be upgraded on each node independently.

All nodes in a cluster can use different operating system versions. This allows operating system version upgrades to be done on each node independently.

Application Versions

Classes in an application are versioned using a serialVersionUID. The rules used to determine which class is the latest version are:

  • The class with a larger serialVersionUID value is considered as a newer version than the one with a smaller value.

  • A class that does not have a serialVersionUID defined is considered older than a class with a serialVersionUID defined.

  • If classes have the same serialVersionUID value the node with the newest shared memory time stamp (see Locations) is considered newest.

Detecting Version Changes

Version changes are detected automatically during initialization and as classes are loaded into JVMs running on a node. As nodes connect to each other, and as new types are loaded into a JVM, a type exchange occurs between the two nodes. A type exchange is performed for both application classes and product runtime structures. The type exchange protocol is shown in Figure 1, “Type exchange”.

Type exchange

The steps in Figure 1, “Type exchange” are:

  1. Node One sends CRC values for all types defined on node One.

  2. Node Two compares the CRC values for all types sent from node One found on node Two.

  3. If the CRC values are different for a type, node Two sends node One its definition of the type.

  4. Node One saves the definition of the types received from node Two in a type mismatch table for node Two.

  5. Node One sends node Two its definition of the mismatched types received from node Two.

  6. Node Two saves the type definitions received from node one in a type mismatch table for node One.

The CRC defined above is a computed numeric value that determines whether a type definition has changed. The CRC value is identical on nodes that have the same type definition. The type information sent if the CRC values differ is a complete type definition that includes:

  • Field definitions

  • Inheritance hierarchy

  • Version information

The use of a CRC to determine type changes minimizes network bandwidth in the case where type information is identical.

Type mismatch tables exist for each node for which mismatched type information was detected. Type mismatch tables contain this information:

  • Complete type definition, including the type name

  • Version number

Whenever objects are marshaled for a type (reading and writing), the type mismatch table is checked to see if the type matches for the two nodes communicating. If a type is found in the type mismatch table — the object is upgraded as described in Object Upgrades.

Object Upgrades

Objects are always upgraded on the node that contains the newest version of the class (see Application Versions). This technique is called most current version makes right. This is true for both sending and receiving objects between nodes. This ensures that no application changes are required on nodes running an earlier version of a class.

Object upgrades can be transparent, or non-transparent. Transparent changes are handled automatically without any required application support. Non-transparent changes require an application to implement an object mismatch trigger. See the StreamBase Java Developer's Guide for details on supported upgrades and transparent versus non-transparent changes.

Error Handling

The overriding error handling policy for upgraded classes is to do no harm on nodes running older versions.

If an error is detected when reading an object from a remote node with an earlier version of a class definition, StreamBase logs the error, but does not propagate it back to the transaction initiator on the remote node. The error is not propagated to the initiator because the previous version of the class file has no knowledge of the new class version and it would not have any mechanism to handle the error. This is consistent with the do no harm policy.

Possible causes of errors are:

  • Application defect in upgrade code

  • Non-unique key errors because of inconsistent key values

The node administrator can make a decision on whether these errors are acceptable. If they are not acceptable, the node is taken offline and the upgraded classes restored to a previous working version. Another upgrade can be attempted after resolving the errors.

When an object is sent to a remote node with an earlier version of a class definition, any errors detected on the node with the earlier class version are propagated back to the transaction initiator. In this case, the new class version can either handle the errors, or it indicates a bug in the version mapping code provided by the application. Again, this is consistent with the do no harm policy.