===== Alarms and Metrics  =====

==== Purpose ====
  * Defines global Alarms and Metrics configuration
  * All alarms and metrics generated by Pro.Monitor can be propagated by email or to third party applications by the use of plugins only.
  * If you want to use different plugins depending on the origin of an alarm (SAP/internal), you can use Alarm rules for that
  * Internal alarms are typically send by email to Pro.Monitor admin

==== How to access Alarms and Metrics feature ====
  * From the top right of the screen, click on the setting icon
  * Select the admin configuration sub-menu
  * Click on tabs Alarms/Metrics

\\
==== System availability alerts ====
  * **Max connection resp. time (sec) :** An alert will be generated if a System is not responding after a number of seconds set in this input field. The severity of the Alert can be set using the corresponding dropdown list. 
  * **Max system down time (sec) :** An alert will be generated after attempting to reach a System for a number of seconds set in this input field. The severity of the Alert can be set using the corresponding dropdown list.
  * **Time zone alarm:** An alert will be generated if the time zone of a system is not properly set, or cannot be resolved. This option will define the severity used for this alert.

{{..:..:..:userguide:administration:adminconfig:pasted:20190329-181135.png}}

\\
==== Internal alerts ====
  * **Monitor job execution error : ** An alert will be generated if a Monitor job encounters an error during its execution. The severity of the Alert can be set using the corresponding dropdown list. 
  * **CCMS errors :** An alert will be generated if CCMS kind jobs encounter an error during its execution. The severity of the Alert can be set in the dropdown list.
  * **Monitor Tree loading errors :** An alert will be generated if Monitor Tree kind jobs encounter an error during loading data from SAP. The severity of the Alert can be set in the dropdown list. 

{{..:..:..:userguide:administration:adminconfig:pasted:20190227-111002.png}}
==== Agents ====
This set of alarm settings will help to detect and be notified when a problem is detected on a agent:

  * **Max agent down time (sec) : **
    * To be notified when an agent is not responding
    * Define the max time in seconds the agent must be available before sending a notification
  * **Min schedule ratio (%) : **
    * This alarm allows to detect when an agent has not enough time to execute all its monitors
    * The server computes the ratio between executed monitors and rescheduled ones and compare it to the threshold
    * A ratio of 100% is to be expected on well configured agents
  * **Min successful exec. ratio (%) :** 
    * This alarm allows to detect when an agent returns a lot of execution errors for its monitors
    * The server will compute the ratio between successful executions and failed ones
    * To have some monitor failing from time to time is normal, but a lot of failures might indicate a problem in the agent (resources/network)
  * **Max result send time (sec) :**
    * This alarm allows to detect when sending the results from the agent to the primary server is taking too long time
    * This can be caused by network problems, or resource problem on agent of primary server.
    * A notification will be sent if the send time is over threshold.
  * **Max time without results (sec) :** 
    * This alarm allows to detect when an agent is not sending any results to the server
    * This can indicate a resource problem on the agent
    * A notification will be sent if the time since last received result is over threshold
  * **Max VM Heap usage (%) :**
    * This alarm allows to detect when an agent is using all its allocated memory
    * If the agent memory usage reaches 100%, this may indicate memory starvation and instability
    * A notification will be sent if VM memory usage reaches threshold
  * **Max OS RAM usage (%) :**
    * This alarm allows to detect when the overall OS memory usage is too high
    * High OS memory usage may prevent the server to use its allocated memory, and also use paging which will decrease performances.
    * A notification will be sent if OS memory usage is over threshold
  * **Max OS disk usage (%) :**
    * This alarm allows to detect when the application disk space is running low
    * Disk full situation must absolutely be avoided, it may bring the service down.
    * A notification will be sent if the disk used space is over threshold

{{..:..:..:userguide:administration:adminconfig:pasted:20190227-111235.png}}


\\
==== Plugins ====
  * **Max plugin down time (sec) :**
    * Allows to detect when a plugin is failing to send events.
    * This is usually a critical case, because it means that monitoring might not be visible in the corresponding third party platform
    * A notification will be sent if the plugin error last for more than threshold.

{{..:..:..:userguide:administration:adminconfig:pasted:20190227-112140.png}}
\\
==== Licenses ====
  * **Max expiration delay (days) :**
    * Allows to be notified when a license is going to expire
  * **Invalid license severity :** 
    * Allows to be notified when a license is not valid

{{..:..:..:userguide:administration:adminconfig:pasted:20190227-111642.png}}

\\
==== Internal alarms settings ====
  * **Clear alarms :** 
    * If set, all clearable alarms will be cleared (by using an alarm with //toClear// paramter set to true.) once the problem is not detected anymore.

{{..:..:..:userguide:administration:adminconfig:pasted:20190227-111723.png}}

\\
==== Metrics sources ====
  * **Alarm source :** SID, HOST, FQND, TITLE, INSTANCE, IP
  * **Metric source :** SID, HOST, FQND, TITLE, INSTANCE, IP

{{..:..:..:userguide:administration:adminconfig:pasted:20190227-111809.png}}