Contents
This chapter describes the tools and approach to monitoring StreamBase® applications.
Standard Java tools may be used to monitor the JVM.
Flight Recorder - If using Oracle Java 7 or later, Java Mission Control and Flight Recorder is a built-in, low overhead tool for collecting JVM diagnostic and profiling data.
To enable (but not start) add the following VM deploy options:
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder
The related GUI tool is Java Mission Control, jmc. Once the application is started your test run, you can select your JVM in mission control, select flight recorder and start recording.
Once the capture is finished, mission control enables exploring the captured data.
Recording from the command line may be done with the jcmd
command. See the jcmd manual page from the JDK, or Oracle
online documentation for details.
You can generate a variety of reports using epadmin display statistics, which apply to an entire node.
Display a list of the application, node, and system processes running on the node, with epadmin display statistics --statistics=engine. These statistics are available while the node is running.
$ epadmin --servicename=a.test display statistics --statistics=engine [a.test] [a.test] ======== Node Processes report for a.test ======== [a.test] [a.test] Data gathered at 2016-10-13 11:05:07 PDT. [a.test] [a.test] ID Build Type Contains JVM State Engine Name [a.test] 100 DEVELOPMENT NO running System::swcoordadmin [a.test] 101 DEVELOPMENT NO running System::kssl [a.test] 102 DEVELOPMENT YES running System::administration [a.test] 103 DEVELOPMENT NO running Dtm::distribution [a.test] 105 DEVELOPMENT YES running application::Hello0
-
ID - Node-unique integer identifier for the process.
-
Build Type - Whether the process contains PRODUCTION binaries or DEVELOPMENT (debug) binaries.
-
Contains JVM - Whether this process contains a Java VM.
-
State - Whether the process is running or stopped.
-
Process Name - The name of the process.
Use --statistics=classtransform to show per-JVM information about the number of classes rewritten at load time. These statistics are available for all running JVMs in the node.
$ epadmin servicename=a.test display statistics statistics=classtransform [a.test] [a.test] ======== Class Transformation report for a.test ======== [a.test] [a.test] Data gathered between 2018-03-23 14:53:57 PDT and 2018-03-23 14:57:30 PDT. [a.test] [a.test] Action Calls Total Average Minimum Maximum JVM name [a.test] Seconds Microseconds................. [a.test] skip 1647 0.000 0.1 0.0 4.4 application::hello0 [a.test] no-transform 816 0.095 116.7 2.3 6495.8 application::hello0 [a.test] transform 220 1.102 5009.1 40.5 216470.9 application::hello
-
Action - Action taken by the class transformer.
-
skip
- the class was in the skip packages list and not examined. -
no-transform
- the class was examined and no transformations were required. -
transform
- the class was transformed.
-
-
Calls - Number of calls made (classes passed) to the class transformer since the last time this statistic was cleared.
-
Total Seconds - Total number of seconds spent in the class transformer since the last time this statistic was cleared.
-
Average Microseconds - Average number of microseconds per call to the class transformer since the last time this statistic was cleared.
-
Minimum Microseconds - Minimum number of microseconds spent in a call to the class transformer since the last time this statistic was cleared.
-
Maximum Microseconds - Maximum number of microseconds spent in a call to the class transformer since the last time this statistic was cleared.
-
JVM name - Name of the JVM.
Use --statistics=classtypedescriptor to show per-JVM information about the system type descriptor metadata created for the loaded managed classes and enums. The collection of class type descriptor statistics imposes a slight performance penalty and is not enabled by default.
When collected, this statistics are available for running JVMs.
$ epadmin servicename=a.test display statistics statistics=classtypedescriptor [a.test] [a.stone] ======== Class Type Descriptors report for a.test ======== [a.test] [a.test] Data gathered between 2019-04-27 22:27:32 PDT and 2019-04-27 22:27:55 PDT. [a.test] [a.test] Engine Microseconds Dup #Fields #Methods #Params Class [a.test] hello0 25009.545 true 2 62 261 com.tibco.ep.sb.ads.internal.Node [a.test] hello0 19867.623 true 8 22 18 com.kabira.platform.management.Target ...
-
Engine - Name of the JVM.
-
Microseconds - Number of microseconds used to create type descriptors for the class.
-
Dup - Whether this type descriptor has already been created by another running JVM.
-
#Fields - Number of non-static fields in the class.
-
#Fields - Number of non-static methods in the class.
-
#Params - Total number of parameters used by all methods in the class.
-
Class - Name of the class. class.
Show node and engine startup and shutdown event records.
These statistics are always available, but may be cleared.
$ epadmin --servicename a.test display statistics --statistics lifecycle fig:-> epadmin --servicename a.test display statistics --statistics lifecycle [a.test] [a.test] ======== Life Cycle Events report for a.test ======== [a.test] [a.test] Data gathered between 2019-03-11 12:38:35 PDT and 2019-03-11 12:38:59 PDT. [a.test] [a.test] Time Milliseconds Engine Event Message [a.test] 12:38:35.422 76 720944343 Engine Start System::swcoordadmin [a.test] 12:38:35.777 2 1568875692 JVM Initialize [a.test] 12:38:36.203 19 1568875692 Component Load HOCON parser [a.test] 12:38:36.311 108 1568875692 Component Load DTM, StreamBase, and LDM Security Configuration [a.test] 12:38:36.397 85 1568875692 Component Load Administration Service Framework [a.test] 12:38:36.469 72 1568875692 Component Load Configuration Target ...
-
Time - Timestamp for when the event occurred.
-
Milliseconds - Milliseconds used by the event.
-
Engine - An engine instance identifier. This can be mapped to any engine name by looking for
Engine Start
event with a matching identifier in theEngine
column. The engine name is found in theMessage
column.The engine name mapping may also be found by running the
engine
report. See Node Monitoring. -
Event - The event type. Most events come in begin/end pairs.
-
Message - A message associated with the event.
Internally, life cycle events come in begin/end pairs. The default report
(--detailed false
only shows the end events, collected
when each life cycle event completes. Adding the --detailed
true
option causes all events to be displayed. Begin events have no entry in
the Milliseconds
column.
Some statistics collection turns on extra code or storage paths in the runtime. For each of these, an effort was made in the runtime to minimize the costs, but those costs are not zero. Application performance measurement is the best way to characterize these effects.
For a few of the statistics (such as --statistics=eventbus --detailed=true), there is also unbounded memory consumption.
If you leave the statistic enabled, you risk the node running out of memory.
Generally, statistics reporting can be more expensive (both computationally and in terms of contention) than statistics collection. There reasons for this are:
-
Most statistics are stored as a set of integer values that are relatively inexpensive to update. But nearly all reporting involves row by row string formatting of the statistic data, often including type lookups.
-
The synchronization for the collection of some statistics (particularly those in performance-sensitive paths) is parallelized to allow concurrent writers. Where possible, this is done using atomic memory counter primitives, otherwise using pools of mutexes, or in some cases a single mutex. For the runtime execution path, locking is minimized. But for the statistics clear path, in paths where the statistic is protected by mutexes, one or all of the mutexes get locked. The data collection with the worst impact on the system would be the allocator report with the detailed=true option. This data reporting can cause expensive contention in the shared memory allocator.
-
The returning of report data through the administration plug-in service uses the same runtime support that an application does; that is creating objects, consuming shared memory, and so on. A large report (generally those that use the detailed=true option) can temporarily consume a great deal of shared memory.
Run statistics reporting judiciously in a production system. Do not design an operator console where a room full of operators are all continuously issuing statistics commands.
Unless there is a good reason, avoid the detailed=true reporting.
Measure. Using your existing application performance testing, measure the impact of collecting the desired statistics. Also measure the impact of reporting the desired statistics.