"Detected TTSP issue" messages in logs or on console

This article will be of interest to you if you are seeing messages like this in a logfile, or printed to stdErr for your process:

Detected TTSP issue: start: 31.259 wait: 17.526
Dumping stack for thread 0x0000440005e00000
"Thread-1" id: 30703 prio: 5 os_prio: 0 sched: SCHED_OTHER allowed_cpus: 000000000ffffff
lock_release: 31.276
last_cpu: 20 cpu_time: 17563
0 0x00000000209f41d5 SafepointProfilerBuf::record_sync_stack(JavaThread*)
1 0x0000000020a35d68 JavaThread::safepoint_profiler_record_sync_stack()


These messages are not Errors. They are produced by a mechanism in the Zing Virtual Machine (ZVM) called the Safepoint Profiler. This helps you work out what's happening in your application when there are threads that take longer than a given threshold to reach a safepoint. The Safepoint Profiler is designed to have a minimal impact on your application, and Azul Support recommends that you keep it on so that you have the information you need to help diagnose time to safepoint issues if they occur.

When the ZVM needs to bring all running threads to a safepoint in order to perform some critical task, it notifies all the threads. Sometimes one or more of the threads will take longer than others to respond to the notification, and then the threads which responded to the request in a timely fashion are delayed.

What the Safepoint Profiler messages show you is the stacks for the late threads, and you can examine the stacks to work out what could be making the thread late to safepoint, and then take action to correct it in future. 

There is a set of flags that you can use to control the Safepoint Profiler:

-XX:+SafepointWaitTimeProfiler
This is the flag that controls the turning on and off whole profiling mechanism. It's on by default: use -XX:-SafepointWaitTimeProfiler to turn it off.
-XX:SafepointWaitTimeProfilerLog=<logfile>
By default the profiling information is logged onto the stderr, this flag redirects the output to a file.
-XX:SafepointProfilerThreshold=n
This is a threshold in percentage of safepoint/ checkpoint that controls when to activate profiling. So, for the curretly default value of SafepointTimoutDelay 50000, we would start profiling the threads once we cross 1000 ms (2% of timeout). The default value of this flag is 2.
-XX:SafepointProfilerInterval=n
This is the interval in percentage of safepoint/ checkpoint that controls the frequency at which you would like to collect multiple stack samples for a TTSP issue. The default value of this flag is 1.

For more information on safepoint profiling, including a more detailed overview and advice on how to read the output, please consult the Safepoint Profiler section of the Zing User Guide.

 

Add Comment

Comments

0 comments

Please sign in to leave a comment.

Was this article helpful?
1 out of 1 found this helpful