==== Database backups ==== === Purpose === Backups are very important in case of disaster recovery and must be monitored closely. This monitor will look for backup age, size, status and duration for Oracle, MSSQL and DB2. === Configuration hints === For each type of backup, you can define a customized monitoring. The monitor will check the backup status, duration, size (MSSQL only) and occurrence. Use the surveillance table to adjust the monitoring settings: Thresholds, severity, filters, etc... **Note:** This monitor will look for the backup occurred since the last check. That means that most backup metrics or alerts will be sent just after the backup occurrence. Don't expect to get metrics all day long for backups occurring once a day. === Surveillance table === ^Parameter^Description^ ^Active|Use this field to activate or deactivate a line of configuration.| ^Backup type|Defines which type of backup you want to monitor. Be careful to choose a type appropriate for your database. If ANY is selected, all backup entries will be checked.| ^Error sev.|Defines the severity of the alarm sent in case of backup error. Set to DISABLED if not used.| ^Unknown sev.|Defines the severity of the alarm sent in case of backup with an UNKNOWN status. Set to DISABLED if not used.| ^Min. error level|For MSSQL database, there are some error status that you might want to ignore. Use this field to set the level that you want to consider as an error.| ^Max Duration (min)|Defines the maximum duration for a backup.| ^Max size (Mb)|Defines the maximum size of a backup (works for MSSQL only).| ^Max age (min)|Defines the maximum time elapsed since the last backup.| ^Default criticity|The default level of severity applied for a generated alarm if the multiple severity syntax is not used for a threshold.| ^Auto clear|If checked, the alarm will be cleared as soon as the alarm condition is not met anymore.| ^Alarm tag|This field allows to add custom text within the alarm message. %MSG% variable will contain the actual generated message and can be used such as: "my_prefix %MSG% my_suffix". By default, tag will be used as prefix.| ^Alarm|If checked, this line of surveillance will be used for alarm generation.| ^Metric|If checked, this line of surveillance will be used for metric generation.| ^Report|If checked, this line of surveillance will used for showing threshold and severity in the daily report| === Examples === ^Active^Backup type^Error sev.^Unknown sev.^Min. error level^Max Duration (min)^Max size (Mb)^Max age (min)^Backup schedules^Default criticity^Auto clear^Alarm tag^Alarm^Metric^ |true|FULL|CRITICAL|WARNING|1|50|500|1440|MAJOR|true| |true|false|false| **Effect** : Watch for FULL backups. Sends CRITICAL alarm if error status, WARNING if unknown. Sends MAJOR alarm if a backup ran for more than 50 minutes, if size grater than 500 MB or if last FULL backup occured since more than 24h. === Examples === ^Active^Backup type^Error sev.^Unknown sev.^Min. error level^Max Duration (min)^Max size (Mb)^Max age (min)^Backup schedules^Default criticity^Auto clear^Alarm tag^Alarm^Metric^ |true|LOG|CRITICAL|WARNING|2|30|0|45|WARNING|true| |true|false|true| **Effect** : Watch for LOG backups. Sends CRITICAL/WARNING for error (level>1)/unknown status. Sends WARNING alarm if a backup ran for more than 30 minutes or if did not occur since 45 minnutes. === Generated metrics === ^metricId^metricUnit^metricTarget^metricDescription^ |DATABASE_BACKUP_DURATION|Minutes|TYPE|Sends the last backup duration of the given type of backup| |DATABASE_BACKUP_SIZE|Megabytes|TYPE|Sends the last backup size of the given type of backup| |DATABASE_BACKUP_STATUS|Status|TYPE|Sends the last backup status of the given type of backup| |DATABASE_BACKUP_AGE|Minutes|TYPE|Sends the age of the last backup of a given type|