Monitor job timeout
Each monitor has a configurable timeout parameter and must finish its execution within the allocated time.
If you see a monitor with a TIMEOUT
state, that means that the monitor was interrupted before normal termination.
When a monitor is interrupted, it will have the following consequences:
Alerts and metrics won't be generated for the monitored component
Shortdumps may be generated in the target system
The execution of other monitors may be delayed.
Reasons for timeout can be several, and from that will depend how to resolve it:
Monitor generating too long computing time in the system
Monitor fetching too many data
Slow SAP systems
Non responding components in SAP (Like
RFC destinations
)
Too many old data in the system
Too many monitors scheduled within the allocated batch time
How to investigate
You need to identify which monitor timed out, because they will slow down the monitoring by blocking other monitors:
From Pro.Monitor Monitor errors screen, look for TIMEOUT
and KILLED
status
Timeout of Monitors can also be detected in the worker logs
If configured, an alert will be sent when a monitor did not run correctly
How to fix
First:
In all cases, you can start by increasing the individual timeout of the monitors
Run a test, to see how long it takes to complete. Monitors have been designed to run from few seconds to a minute. If a monitor takes more than that, it can be problematic.
Second:
Check the allocated batch time: Monitors are executed in batches in a dedicated
OS process. This process has a configurable maximum run time in which all the monitors must fit.
If a batches reaches its time limit without having processed all the scheduled monitors, running monitors will be killed and immediately rescheduled with the remaining ones in a new process.
By giving more time to the process, you will have better chances to complete all tasks and avoid killed monitors.
Third:
If the monitor execution exceeds one or two minutes, you have different options to reduce its execution time: