Monitoring Zing

An important metric to monitor for many applications is the application response time, for others it is throughput or both combined. Those need to be measured from the outside with the perspective of the user. Many components and factors have an influence on those metrics and the JVM is only one of them, but an important one.

In addition monitoring the JVM, here in this article specifically Zing, helps to detect trends early, before the application metrics are affected.


Standard JVM Pausetime Logging

The following JVM standard command line options for monitoring application pauses caused by the GC and Safepoints are also usable on Zing. But on Zing no problematic pauses are expected as with its special GC algorithm the short synchronization steps for phase transitions inside the JVM are independent of the heap size and in most practical use cases are not exceeding single digit ms (1/1000 s). Here the standard JVM command line options for tracking application pauses caused by the JVM and at the same time track the time the application was running in parallel to GC activity:

Those are logged to the garbage collection log which is enabled by the following: -Xloggc:gc.log

Example entry in the gc.log of Zing or OpenJDK:

$ grep -E 'Application time:|Total time.*stopped' gc.log
12.836: Application time: 0.6752136 seconds
12.836: Total time for which application threads were stopped: 0.0000602 seconds, Stopping threads took: 0.0000074 seconds


Standard JVM JMX MBeans

If your monitoring system is already tracking the Java application heap utilization over JMX, for example to detect application memory leaks, the standard Java MXBeans like the following can be continued to be used on Zing:

As Zing's Garbage Collector always runs concurrently to the application, the application continues to allocate memory while the GC is running. That's a normal and desired behavior for any concurrent GC as it increases application performance. But it means on the monitoring side that you'll notice higher values on the MXBean attribute "used" on Zing than on other GCs which add pauses. But for detecting application memory leaks and long term trend analysis those metrics are still valuable. You might only need to increase the percentage of the warning threshold to avoid false alerts.

Additional note when enabling Zing's ZST component where the feature Contingency Memory is available by default: Under high memory pressure, the attribute value "used" can grow larger than 100% on Zing with ZST as there Contingency Memory is available to cover situations when applications temporarily need more heap than assigned by -Xmx. If your monitoring calculates percentages, that detail is important to avoid calculation errors or unintended negative values. Further details: on MemoryPoolMXBean the attribute "max" stays constant and equals -Xmx but on MemoryMXBean the attribute "max" will also grow above -Xmx when Contingency Memory is actively used under high memory pressure.


Zing Metrics from JMX MBeans

To monitor the CPU activity of Zing's concurrent GC in more detail over JMX or to view the stress on the GC, special MXBeans are available by adding the following command line option"


The primarily interesting metrics there to monitor are the following:

PercentageOfTimeCollectorIsRunning: This is useful for analyzing the CPU activity caused by the GC and the stress on the GC. Setting a warning threshold when it exceeds 30% for more than 5 minutes is a good starting point. Occasional short spikes to 100% are not problematic.

PercentJavaHeapOccupiedAfterCollection: This is useful for analyzing the actual memory requirements of the application and to detect application memory leaks. Setting a warning threshold when it exceeds 70% for 5 minutes is a good starting point.

For more details regarding ZingMXBeans and its most important metrics please start reading in the section Monitoring the ZVM Using JMX and Zing MXBeans on the following page


Zing Metrics from GC.log

The same metrics offered over JMX are also available in Zing's garbage collection log file activated by adding the standard JDK parameter -Xloggc:gc.log. Zing includes the example script /opt/zing/zing-jdk8/etc/gcLogScraper to extract metrics from that log file. 

Example usage to extract current heap utilization in MBytes (NewGen + OldGen + PermGen), CodeCache usage, Linux memory usage outside of the heap in kBytes and application thread delays: 

$ gcLogScraper -l GC -c 'jvmuptime_s=%{end#end} curheap_MB=%{heap#cur} maxheap_MB=%{heap#max} codecache_MB=%{heap#codec} rss_kB=%{heap#rss} maxdelay_s=%{delay#max}' < gc_zing.log
jvmuptime_s=29.701548 curheap_MB=6968 maxheap_MB=20480 codecache_MB=1 rss_kB=117817 maxdelay_s=-
jvmuptime_s=34.316951 curheap_MB=8834 maxheap_MB=20480 codecache_MB=1 rss_kB=134111 maxdelay_s=0.000
jvmuptime_s=36.048295 curheap_MB=9588 maxheap_MB=20480 codecache_MB=1 rss_kB=139652 maxdelay_s=0.000
jvmuptime_s=38.341703 curheap_MB=10538 maxheap_MB=20480 codecache_MB=1 rss_kB=147521 maxdelay_s=0.000

Example usage to extract the percentage of time the concurrent GC is active in parallel to the application and the live Java objects in MBytes. Those are separately tracked for the NewGen and OldGen memory pools and GC threads:

$ gcLogScraper -l GC -c '%{end#end} %{type} %{intracycle#sec} %{intercycle#sec} %{heap#live}' < gc_zing.log | grep New | awk '{ printf("jvmuptime_ms=%d type=%s gctime_%=%.0f liveheap_MB=%d\n", $1 * 1000, $2, $3 / ($3 + $4) * 100, $5) }'
jvmuptime_ms=62664 type=New gctime_%=44 liveheap_MB=4242
jvmuptime_ms=65855 type=New gctime_%=33 liveheap_MB=4454
jvmuptime_ms=67806 type=New gctime_%=38 liveheap_MB=4373
$ gcLogScraper -l GC -c '%{end#end} %{type} %{intracycle#sec} %{intercycle#sec} %{heap#live}' < gc_zing.log | grep Old | awk '{ printf("jvmuptime_ms=%d type=%s gctime_%=%.0f liveheap_MB=%d\n", $1 * 1000, $2, $3 / ($3 + $4) * 100, $5) }'
jvmuptime_ms=21032 type=Old gctime_%=31 liveheap_MB=2043
jvmuptime_ms=25274 type=Old gctime_%=28 liveheap_MB=1667
jvmuptime_ms=29701 type=Old gctime_%=27 liveheap_MB=1599

Next to the GC another component of any JVM using CPU resources in parallel to the application is the JIT Compiler. The Zing log also records metrics about its activity. Here an example to track the number of active compiler threads per compiler tier and CPU time spend in JIT compilation:

$ gcLogScraper -l COMPSTATS -c 'jvmuptime_s=%{logtime} jitcomp_tier=%{Tier} jitcomp_activethreads=%{Active} jitcomp_cumulated_cputime_ms=%{CPUTime}' < gc_zing.log
jvmuptime_s=12.939 jitcomp_tier=1 jitcomp_activethreads=0 jitcomp_cumulated_cputime_ms=129
jvmuptime_s=12.939 jitcomp_tier=2 jitcomp_activethreads=3 jitcomp_cumulated_cputime_ms=17569
jvmuptime_s=16.288 jitcomp_tier=1 jitcomp_activethreads=2 jitcomp_cumulated_cputime_ms=197
jvmuptime_s=16.288 jitcomp_tier=1 jitcomp_activethreads=1 jitcomp_cumulated_cputime_ms=197
jvmuptime_s=16.544 jitcomp_tier=1 jitcomp_activethreads=0 jitcomp_cumulated_cputime_ms=198
jvmuptime_s=16.544 jitcomp_tier=2 jitcomp_activethreads=2 jitcomp_cumulated_cputime_ms=18923
jvmuptime_s=21.036 jitcomp_tier=1 jitcomp_activethreads=0 jitcomp_cumulated_cputime_ms=245

Further details:


Add Comment



Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful