Monitoring Azul Platform Prime

An important metric to monitor for many applications is the application response time. For others it is throughput or both combined. Those need to be measured from the outside with the perspective of the user. Many components and factors have an influence on those metrics and the JVM is only one of them, but an important one.

In addition, monitoring the JVM helps to detect trends early, before the application metrics are affected.

This article talks about ways to monitor the Azul Zulu Prime JVM. 

 

Graphical Interactive Tools

Azul Zulu Prime JVM includes the following tools for analyzing the JVM metrics and performance:

Those tools are intended for interactive analysis or analyzing recorded data or logs.

For continuous monitoring of production systems, the methods described below exists:

 

Standard JVM Pausetime Logging

The following JVM standard command line options for monitoring application pauses caused by the GC and Safepoints are also usable on Azul Zulu Prime JVM. But on Azul Zulu Prime JVM no problematic pauses are expected as with its special GC algorithm the short synchronization steps for phase transitions inside the JVM are independent of the heap size and in most practical use cases are not exceeding single digit ms (1/1000 s). Here the standard JVM command line options for tracking application pauses caused by the JVM and at the same time track the time the application was running in parallel to GC activity:

-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime
Those are logged to the garbage collection log which is enabled by the following: -Xloggc:gc.log

Example entry in the gc.log of Azul Zulu Prime JDK or OpenJDK:

$ grep -E 'Application time:|Total time.*stopped' gc.log
12.836: Application time: 0.6752136 seconds
12.836: Total time for which application threads were stopped: 0.0000602 seconds, Stopping threads took: 0.0000074 seconds

For JDK 11 and newer you can use the Unified Logging instead:

-Xlog:gc,safepoint:gc.log

And the following to list the the application stopped times any JVM:

grep safepoint.*Total gc.log \
| sed -E 's/.*\[([0-9\.]+)s\].*Total[a-z ]*: ([0-9\.]+ [a-z]+).*/\1 \2/'

Often a statistical analysis of those times are interesting, for example the percentiles to compare with performance SLAs. Here an example using the open source datamash tool to collect the P50, P99, P99.9 and maximum of the pause times and skipping the first 60 seconds of warmup time:

grep safepoint.*Total gc.log \
| sed -E 's/.*\[([0-9\.]+)s\].*Total[a-z ]*: ([0-9\.]+ [a-z]+).*/\1 \2/' \
| awk '$1 > 60 { print $1,$2,$3 }' \
| datamash -t' ' perc:50 2 perc:99 2 perc:99.9 2 max 2

Output example:

0.0002 0.0034 0.0035 0.0160

Interpretation of this Example output:
50% of all GC pauses are below 0.2ms, 99% below 3.4ms, 99.9% below 3.5ms and all are below 16ms.

 

Standard JVM JMX MBeans

If your monitoring system is already tracking the Java application heap utilization over JMX, for example to detect application memory leaks, the standard Java MXBeans like the following can be continued to be used on Azul Zulu Prime :

java.lang.management.MemoryPoolMXBean
java.lang.management.MemoryMXBean

As Azul Zulu Prime JVM's Garbage Collector always runs concurrently to the application, the application continues to allocate memory while the GC is running. That's a normal and desired behavior for any concurrent GC as it increases application performance. But it means on the monitoring side that you'll notice higher values on the MXBean attribute "used" on Azul Zulu Prime JVM than on other GCs which add pauses. But for detecting application memory leaks and long term trend analysis those metrics are still valuable. You might only need to increase the percentage of the warning threshold to avoid false alerts.

Additional note when enabling Azul Zulu Prime JVM's ZST component where the feature Contingency Memory is available by default: Under high memory pressure, the attribute value "used" can grow larger than 100% on Azul Zulu Prime JVM with ZST as there Contingency Memory is available to cover situations when applications temporarily need more heap than assigned by -Xmx. If your monitoring calculates percentages, that detail is important to avoid calculation errors or unintended negative values. Further details: on MemoryPoolMXBean the attribute "max" stays constant and equals -Xmx but on MemoryMXBean the attribute "max" will also grow above -Xmx when Contingency Memory is actively used under high memory pressure.

 

Azul Zulu Prime JVM Metrics from JMX MBeans

To monitor the CPU activity of Azul Zulu Prime JVM's concurrent GC in more detail over JMX or to view the stress on the GC, special MXBeans are available by adding the following command line option"

-XX:+UseZingMXBeans

The primarily interesting metrics there to monitor are the following:

PercentageOfTimeCollectorIsRunning: This is useful for analyzing the CPU activity caused by the GC and the stress on the GC. Setting a warning threshold when it exceeds 30% for more than 5 minutes is a good starting point. Occasional short spikes to 100% are not problematic.

PercentJavaHeapOccupiedAfterCollection: This is useful for analyzing the actual memory requirements of the application and to detect application memory leaks. Setting a warning threshold when it exceeds 70% for 5 minutes is a good starting point.

For more details regarding ZingMXBeans and its most important metrics please start reading in the section Monitoring the ZVM Using JMX and Zing MXBeans on the following page https://docs.azul.com/zing/ZingMXBeans.htm

 

Azul Zulu Prime JVM Metrics from GC.log

The same metrics offered over JMX are also available in Azul Zulu Prime JVM's garbage collection log file activated by adding the standard JDK parameter -Xloggc:gc.log. Azul Zulu Prime JVM includes the example script /opt/zing/zing-jdk8/etc/gcLogScraper to extract metrics from that log file. 

Example usage to extract current heap utilization in MBytes (NewGen + OldGen + PermGen), CodeCache usage, Linux memory usage outside of the heap in kBytes and application thread delays: 

$ gcLogScraper -l GC -c 'jvmuptime_s=%{end#end} curheap_MB=%{heap#cur} maxheap_MB=%{heap#max} codecache_MB=%{heap#codec} rss_kB=%{heap#rss} maxdelay_s=%{delay#max}' < gc_zulu_prime.log
jvmuptime_s=29.701548 curheap_MB=6968 maxheap_MB=20480 codecache_MB=1 rss_kB=117817 maxdelay_s=-
jvmuptime_s=34.316951 curheap_MB=8834 maxheap_MB=20480 codecache_MB=1 rss_kB=134111 maxdelay_s=0.000
jvmuptime_s=36.048295 curheap_MB=9588 maxheap_MB=20480 codecache_MB=1 rss_kB=139652 maxdelay_s=0.000
jvmuptime_s=38.341703 curheap_MB=10538 maxheap_MB=20480 codecache_MB=1 rss_kB=147521 maxdelay_s=0.000

Example usage to extract the percentage of time the concurrent GC is active in parallel to the application and the live Java objects in MBytes. Those are separately tracked for the NewGen and OldGen memory pools and GC threads:

$ gcLogScraper -l GC -c '%{end#end} %{type} %{intracycle#sec} %{intercycle#sec} %{heap#live}' < gc_zulu_prime.log | grep New | awk '{ printf("jvmuptime_ms=%d type=%s gctime_%=%.0f liveheap_MB=%d\n", $1 * 1000, $2, $3 / ($3 + $4) * 100, $5) }'
jvmuptime_ms=62664 type=New gctime_%=44 liveheap_MB=4242
jvmuptime_ms=65855 type=New gctime_%=33 liveheap_MB=4454
jvmuptime_ms=67806 type=New gctime_%=38 liveheap_MB=4373
$ gcLogScraper -l GC -c '%{end#end} %{type} %{intracycle#sec} %{intercycle#sec} %{heap#live}' < gc_zulu_prime.log | grep Old | awk '{ printf("jvmuptime_ms=%d type=%s gctime_%=%.0f liveheap_MB=%d\n", $1 * 1000, $2, $3 / ($3 + $4) * 100, $5) }'
jvmuptime_ms=21032 type=Old gctime_%=31 liveheap_MB=2043
jvmuptime_ms=25274 type=Old gctime_%=28 liveheap_MB=1667
jvmuptime_ms=29701 type=Old gctime_%=27 liveheap_MB=1599

Next to the GC another component of any JVM using CPU resources in parallel to the application is the JIT Compiler. The Azul Zulu Prime JVM log also records metrics about its activity. Here an example to track the number of active compiler threads per compiler tier and CPU time spend in JIT compilation:

$ gcLogScraper -l COMPSTATS -c 'jvmuptime_s=%{logtime} jitcomp_tier=%{Tier} jitcomp_activethreads=%{Active} jitcomp_cumulated_cputime_ms=%{CPUTime}' < gc_zulu_prime.log
jvmuptime_s=12.939 jitcomp_tier=1 jitcomp_activethreads=0 jitcomp_cumulated_cputime_ms=129
jvmuptime_s=12.939 jitcomp_tier=2 jitcomp_activethreads=3 jitcomp_cumulated_cputime_ms=17569
jvmuptime_s=16.288 jitcomp_tier=1 jitcomp_activethreads=2 jitcomp_cumulated_cputime_ms=197
jvmuptime_s=16.288 jitcomp_tier=1 jitcomp_activethreads=1 jitcomp_cumulated_cputime_ms=197
jvmuptime_s=16.544 jitcomp_tier=1 jitcomp_activethreads=0 jitcomp_cumulated_cputime_ms=198
jvmuptime_s=16.544 jitcomp_tier=2 jitcomp_activethreads=2 jitcomp_cumulated_cputime_ms=18923
jvmuptime_s=21.036 jitcomp_tier=1 jitcomp_activethreads=0 jitcomp_cumulated_cputime_ms=245

Further details: https://docs.azul.com/zing/UsingZing_gcLogScraper_Overview.htm

 

Add Comment

Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?
0 out of 0 found this helpful