Performance Monitoring

This chapter describes the tools and approach to monitoring TIBCO StreamBase applications.

JVM Tools

Standard Java tools may be used to monitor the JVM.

Visual VM

https://visualvm.dev.java.net

Visual VM

Visual VM

Java Mission Control and Flight Recorder

Flight Recorder - If using Oracle Java 7 or later, Java Mission Control and Flight Recorder is a built-in, low overhead tool for collecting JVM diagnostic and profiling data.

To enable (but not start) add the following VM deploy options:

-XX:+UnlockCommercialFeatures -XX:+FlightRecorder

The related GUI tool is Java Mission Control, jmc. Once the application is started your test run, you can select your JVM in mission control, select flight recorder and start recording.

Java Mission Control, JVM Select

Java Mission Control, JVM Select

Start Flight Recorder

Start Flight Recorder

Once the capture is finished, mission control will enable exploring the captured data.

Java Mission Control Explorer

Java Mission Control Explorer

Recording from the command line my be done with the jcmd command. See the jcmd manual page from the JDK, or Oracle online documentation for details.

Node Monitoring

You can generate a variety of reports using epadmin display statistics, which apply to an entire node.

Node Processes Report

Display a list of the application, node, and system processes running on the node, with epadmin display statistics --statistics=engine. These statistics are available while the node is running.

$ epadmin --servicename=a.test display statistics --statistics=engine
[a.test] 
[a.test] ======== Node Processes report for a.test ========
[a.test] 
[a.test] Data gathered at 2016-10-13 11:05:07 PDT.
[a.test] 
[a.test]     ID       Build Type Contains JVM        State                          Engine Name
[a.test]    100      DEVELOPMENT           NO      running                 System::swcoordadmin
[a.test]    101      DEVELOPMENT           NO      running                         System::kssl
[a.test]    102      DEVELOPMENT          YES      running               System::administration
[a.test]    103      DEVELOPMENT           NO      running                    Dtm::distribution
[a.test]    105      DEVELOPMENT          YES      running                  application::Hello0
  • ID - Node-unique integer identifier for the process.

  • Build Type - Whether the process contains PRODUCTION binaries or DEVELOPMENT (debug) binaries.

  • Contains JVM - Whether or not this process contains a Java VM.

  • State - Whether the process is running or stopped.

  • Process Name - The name of the process.

JVM Class Transformation Calls

Use --statistics=classtransform to show per-JVM information about the number of classes rewritten at load time. These statistics are available for all running JVMs in the node.

$ epadmin servicename=a.test display statistics statistics=classtransform
[a.test] 
[a.test] ======== Class Transformation report for a.test ========
[a.test] 
[a.test] Data gathered between 2018-03-23 14:53:57 PDT and 2018-03-23 14:57:30 PDT.
[a.test] 
[a.test]       Action   Calls      Total    Average    Minimum    Maximum JVM name
[a.test]                        Seconds  Microseconds.................
[a.test]         skip    1647      0.000        0.1        0.0        4.4 application::hello0
[a.test] no-transform     816      0.095      116.7        2.3     6495.8 application::hello0
[a.test]    transform     220      1.102     5009.1       40.5   216470.9 application::hello
  • Action - Action taken by the class transformer.

    • skip - the class was in the skip packages list and not examined.

    • no-transform - the class was examined and no transformations were required.

    • transform - the class was transformed.

  • Calls - Number of calls made (classes passed) to the class transformer since the last time this statistic was cleared.

  • Total Seconds - Total number of seconds spent in the class transformer since the last time this statistic was cleared.

  • Average Microseconds - Average number of microseconds per call to the class transformer since the last time this statistic was cleared.

  • Minimum Microseconds - Minimum number of microseconds spent in a call to the class transformer since the last time this statistic was cleared.

  • Maximum Microseconds - Maximum number of microseconds spent in a call to the class transformer since the last time this statistic was cleared.

  • JVM name - Name of the JVM.

JVM Class Type Descriptors

Use --statistics=classtypedescriptor to show per-JVM information about the system type descriptor metadata created for the loaded managed classes and enums. The collection of class type descriptor statistics imposes a slight performance penalty and is not enabled by default.

When collected, this statistics are available for running JVMs.

$ epadmin servicename=a.test display statistics statistics=classtypedescriptor
[a.test] 
[a.stone] ======== Class Type Descriptors report for a.test ========
[a.test] 
[a.test] Data gathered between 2019-04-27 22:27:32 PDT and 2019-04-27 22:27:55 PDT.
[a.test] 
[a.test] Engine Microseconds   Dup #Fields #Methods #Params Class
[a.test] hello0    25009.545  true       2       62     261 com.tibco.ep.sb.ads.internal.Node
[a.test] hello0    19867.623  true       8       22      18 com.kabira.platform.management.Target
...     
  • Engine - Name of the JVM.

  • Microseconds - Number of microseconds used to create type descriptors for the class.

  • Dup - Whether or not this type descriptor has already been created by another running JVM.

  • #Fields - Number of non-static fields in the class.

  • #Fields - Number of non-static methods in the class.

  • #Params - Total number of parameters used by all methods in the class.

  • Class - Name of the class. class.

Life Cycle Events

Show node and engine startup and shutdown event records.

These statistics are always available, but may be cleared.

$ epadmin --servicename a.test display statistics --statistics lifecycle
fig:-> epadmin --servicename a.test display statistics --statistics lifecycle
[a.test] 
[a.test] ======== Life Cycle Events report for a.test ========
[a.test] 
[a.test] Data gathered between 2019-03-11 12:38:35 PDT and 2019-03-11 12:38:59 PDT.
[a.test] 
[a.test] Time         Milliseconds Engine     Event               Message
[a.test] 12:38:35.422           76  720944343 Engine Start        System::swcoordadmin
[a.test] 12:38:35.777            2 1568875692 JVM Initialize      
[a.test] 12:38:36.203           19 1568875692 Component Load      HOCON parser
[a.test] 12:38:36.311          108 1568875692 Component Load      DTM, StreamBase, and LDM Security 
                                                                  Configuration
[a.test] 12:38:36.397           85 1568875692 Component Load      Administration Service Framework
[a.test] 12:38:36.469           72 1568875692 Component Load      Configuration Target
...
  • Time - Timestamp for when the event occurred.

  • Milliseconds - Milliseconds used by the event.

  • Engine - An engine instance identifier. This can be mapped to any engine name by looking for Engine Start event with a matching identifier in the Engine column. The engine name will be found in the Message column.

    The engine name mapping may also be found by running the engine report. See Node Monitoring.

  • Event - The event type. Most events come in begin/end pairs.

  • Message - A message associated with the event.

Internally, life cycle events come in begin/end pairs. The default report (--detailed false only shows the end events, collected when each life cycle event completes. Adding the --detailed true option causes all events to be displayed. Begin events have no entry in the Milliseconds column.

System Impact of Monitoring

Statistics Collection Impact

Some statistics collection turns on extra code or storage paths in the runtime. For each of these, an effort was made in the runtime to minimize the costs, but those costs are not zero. Application performance measurement is the best way to characterize these effects.

For a few of the statistics (such as --statistics=eventbus --detailed=true), there is also unbounded memory consumption.

If you leave the statistic enabled, you risk the node running out of memory.

Statistics Reporting Impact

Generally, statistics reporting can be more expensive (both computationally and in terms of contention) than statistics collection. There reasons for this are:

  • Most statistics are stored as a set of integer values that are relatively inexpensive to update. But nearly all reporting involves row by row string formatting of the statistic data, often including type lookups.

  • The synchronization for the collection of some statistics (particularly those in performance sensitive paths) is parallelized to allow concurrent writers. Where possible, this is done using atomic memory counter primitives, otherwise using pools of mutexes, or in some cases a single mutex. For the runtime execution path, locking is minimized. But for the statistics clear path, in paths where the statistic is protected by mutexes, one or all of the mutexes get locked. The data collection with the worst impact on the system would be the allocator report with the detailed=true option. This data reporting can cause expensive contention in the shared memory allocator.

  • The returning of report data through the administration plug-in service uses the same runtime support that an application does; that is creating objects, consuming shared memory, and so on. A large report (generally those that use the detailed=true option) can temporarily consume a great deal of shared memory.

Recommendations

Run statistics reporting judiciously in a production system. Do not design an operator console where a room full of operators are all continuously issuing statistics commands.

Unless there is a good reason, avoid the detailed=true reporting.

Measure. Using your existing application performance testing, measure the impact of collecting the desired statistics. Also measure the impact of reporting the desired statistics.