====== ALM Exception Monitoring ======
Retrieves exception count metrics from SAP Cloud ALM Exception Monitoring (EXM). Publishes time-series metrics and evaluates alarms
per configured row.
* API: ''GET /api/calm-metrics/v1/metrics?provider=exm''
===== Prerequisites =====
==== Cloud ALM Connector (required) ====
Requires a Web Service connector with authentication type **CLOUD_ALM**.
[[..:sap_cloud_alm|SAP Cloud ALM Connector]]
Based on a service key from the SAP Cloud ALM API service instance in the BTP subaccount.
Required OAuth scopes in the ''authorities'' list of the instance parameters:
* ''$XSMASTERAPPNAME.calm-api.exm.read''
* ''$XSMASTERAPPNAME.calm-api.metrics.read''
===== API Endpoints =====
^ Endpoint ^ Purpose ^
| POST /oauth/token | Authentication (BTP UAA) |
| POST /api/calm-analytics/v1/analytics/providers/filters | Fetch EXM service IDs (data collection) |
| GET /api/calm-metrics/v1/metrics?provider=exm | Retrieve EXM metrics |
===== Key Features =====
* Publishes six metric types per EXM datapoint: ''exm.counter'' ''exm.counter_available'' ''exm.counter_disruption'' ''exm.counter_degradation'' ''exm.counter_maintenance'' ''exm.counter_unknown''
* Time window is a **5-minute aligned UTC window** with a 1-minute delay: ''to = floor(now - 1min, 5min)'', ''from = to - 5min'' (format ''yyyyMMddHHmmss'' UTC)
* Paginated fetching up to 5000 records per page using ''x-total-count'' response header
* All 6 measures are gauge-to-count converted (non-monotonic delta sum) before storage
* Per-row thresholds with standard ''G2W:80 W2M:90'' syntax. [[..:commonsettings#multi_thresholds_syntax|Multi Threshold Syntax]]
* ''Attributes Filter'' narrows which datapoints a row evaluates
* ''Exclusive'' controls whether a matched datapoint is consumed or passed to later rows
* Glob support for ''Metric'' and ''Service name'' fields (''*'' = all)
* Optional alarm tag for grouping or routing
* Auto-clear when alarm condition no longer matches
* **Load Services** button: auto-discovers EXM service IDs and names from the live tenant
===== Data Collection =====
Data collection populates ''Service name'' and ''Service ID'' from the Cloud ALM EXM service registry.
Always run data collection before adding rows. A service name not returned by data collection may not match what the metrics API returns.
Data collection runs one call:
- ''POST /api/calm-analytics/v1/analytics/providers/filters'' with body ''{"providerName":"EXM_DATAPROVIDER","providerVersion":"v1"}'' — returns the EXM service filter list. Finds the entry with ''key = serviceId'' and extracts each service UUID and its label (e.g. ''S4H.100'').
Click **Load Services** to run data collection and populate the surveillance table.
===== Configuration =====
==== Method 1: Load Services ====
- Open monitor configuration
- Click **Load Services**
- Table populates with ''Service name'' and ''Service ID'' from the live tenant
- Enable rows, set thresholds, save
==== Method 2: Manual / Wildcard ====
- Set ''Metric'' = ''*'' and/or ''Service name'' = ''*'' to match all
- Use only service names returned by data collection
==== Settings Reference ====
^ Field ^ Type ^ Default ^ Description ^
| Active | Boolean | true | Enable or disable this row |
| Service name | String | * | Glob matched against EXM service label e.g. ''S4H.100''. ''*'' = all. Populated by data collection |
| Service ID | String | (empty) | UUID of service. Auto-populated by data collection |
| Metric | String | * | Metric type to match. Exact values or ''*''. See [[#metric_types|Metric types]] |
| Attributes Filter | String | (empty) | Narrows datapoints. Format: ''key:value,key2:value2''. Empty = no restriction. See [[#attributes_filter|Attributes Filter]] |
| Thresholds | String | G2W:80 W2M:90 | Alarm thresholds. [[..:commonsettings#multi_thresholds_syntax|Multi Threshold Syntax]] |
| Alarm tag | String | (empty) | Optional tag appended to alarm message |
| Exclusive | Boolean | true | If true datapoint is consumed by this row and not re-evaluated by later rows |
| Alarm | Boolean | true | Enable alarm evaluation for this row |
| Metric | Boolean | true | Publish metric datapoints for this row |
==== Metric types ====
^ Value ^ Description ^
| exm.counter | Total exception count |
| exm.counter_available | Exceptions while system status was Available |
| exm.counter_disruption | Exceptions during Disruption |
| exm.counter_degradation | Exceptions during Degradation |
| exm.counter_maintenance | Exceptions during Maintenance |
| exm.counter_unknown | Exceptions with unknown status |
| * | All of the above |
> **Note**: metric names in the ''Metric'' field use the snake_case form shown above (e.g. ''exm.counter_disruption'', not ''exm.counterDisruption'').
==== Attributes Filter ====
Narrows which datapoints a row matches. All clauses must match (AND). Matching is case insensitive. Malformed clauses are silently ignored.
^ Attribute ^ Description ^ Example ^
| categoryName | Exception category name | ''ABAP_SHORT_DUMP'' |
| serviceType | Service type identifier | ''S4'' |
| useCase | Use case identifier | ''EXM'' |
Example filter value:
categoryName:ABAP_SHORT_DUMP,serviceType:S4
==== Filter Evaluation Order ====
- No rows configured: nothing is published. Add at least one active row to collect data.
- ''Service name'': glob matched against EXM service label. ''S4H*'' matches ''S4H.100''. ''*'' matches all.
- ''Metric'': exact match or ''*''. Partial globs do NOT work.
- ''Attributes Filter'': all clauses must match. Empty = match all datapoints.
- ''Exclusive = true'': datapoint consumed by first matching row. Later rows skip it.
- ''Metric = false'': row evaluates alarms but publishes no metric datapoints.
===== Collected Metrics =====
Base key: ''promonitor.cloud_alm.exm.*''
^ Metric key ^ Unit ^ Description ^
| ''promonitor.cloud_alm.exm.exm.counter'' | count | Total exception count |
| ''promonitor.cloud_alm.exm.exm.counter_available'' | count | Exceptions during Available status |
| ''promonitor.cloud_alm.exm.exm.counter_disruption'' | count | Exceptions during Disruption |
| ''promonitor.cloud_alm.exm.exm.counter_degradation'' | count | Exceptions during Degradation |
| ''promonitor.cloud_alm.exm.exm.counter_maintenance'' | count | Exceptions during Maintenance |
| ''promonitor.cloud_alm.exm.exm.counter_unknown'' | count | Exceptions with unknown status |
Tags published with each datapoint: ''service.name'' ''sap.service.name'' ''service.namespace'' ''service.instance.id'' ''sap.service.display_name'' plus datapoint-level tags ''serviceId'' ''serviceName'' ''categoryName'' ''serviceType'' ''useCase''
All metrics are stored as non-monotonic delta counts (converted from gauge before storage).
===== Alarm Evaluation =====
Alarms use a suppression key to deduplicate. Format:
* With category: ''{monitorId}_{connectorId}_alm_exm_{service}_{metric}_{categoryName}_{rowIdx}''
* Without category: ''{monitorId}_{connectorId}_alm_exm_{service}_{metric}_{rowIdx}''
One alarm per unique key. A new value overwrites the previous alarm state for the same key.
===== Time Window =====
The time window is computed at collection time using 5-minute alignment with a 1-minute delay:
adjustedNow = UTC.now() - 1 minute
minuteFloor = floor(adjustedNow.minute / 5) * 5
to = adjustedNow with minute=minuteFloor, second=0, nanosecond=0
from = to - 5 minutes
Example at 14:37 UTC: ''from=20260601143000'' ''to=20260601143500''
This ensures the query always covers a complete 5-minute window that the EXM backend has already finalized.
===== Examples =====
==== 1. Publish all metrics for all services ====
Set ''Service name'' = ''*'' and ''Metric'' = ''*''. An empty table sends nothing.
==== 2. Alarm on total exceptions above threshold ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^
| true | * | exm.counter | (empty) | G2W:1 W2M:10 | true | true | true |
Alarms as soon as 1 exception appears. ''G2W:1'' means any non-zero value triggers warning.
==== 3. Alarm on short dumps for one service ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^
| true | S4H.100 | exm.counter | categoryName:ABAP_SHORT_DUMP | G2W:1 W2M:5 | true | true | true |
Only datapoints for ''S4H.100'' and category ''ABAP_SHORT_DUMP'' are evaluated.
==== 4. Alarm on disruption exceptions only ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^
| true | * | exm.counter_disruption | (empty) | G2W:1 W2M:5 | true | true | true |
Alarms when any exception occurs during a Disruption period.
==== 5. Alarm without publishing metrics ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Alarm ^ Metric ^
| true | * | exm.counter | (empty) | G2W:1 W2M:10 | true | false |
''Metric = false'': alarms fire but no datapoints written to time-series.
==== 6. Monitor disruption and degradation with different tags ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Alarm tag ^ Exclusive ^ Alarm ^ Metric ^
| true | * | exm.counter_disruption | (empty) | G2W:1 W2M:5 | EXM_DISRUPT | true | true | true |
| true | * | exm.counter_degradation | (empty) | G2W:1 W2M:5 | EXM_DEGRADE | true | true | true |
Each metric type gets its own alarm tag. Exclusive rows prevent cross-matching.
==== 7. Two services with different thresholds ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^
| true | S4H.100 | exm.counter | (empty) | G2W:1 W2M:5 | true | true | true |
| true | * | exm.counter | (empty) | G2W:5 W2M:20 | true | true | true |
Row 1 consumes ''S4H.100'' datapoints (''Exclusive = true''). Row 2 applies a looser threshold to all other services.
==== 8. Suppress a category from alarming ====
^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^
| true | * | exm.counter | categoryName:INFORMATIONAL | G2W:80 W2M:90 | true | false | false |
| true | * | exm.counter | (empty) | G2W:1 W2M:10 | true | true | true |
Row 1 consumes informational exceptions (''Exclusive = true'', ''Alarm = false''). Row 2 alarms on all others.
===== Troubleshooting =====
^ Symptom ^ Check ^
| Empty table after Load Services | Connector uses ''CLOUD_ALM'' auth. Service key is valid. Tenant has active EXM data. |
| HTTP 401 or 403 | Regenerate service key in BTP instance. Verify OAuth scopes include ''calm-api.exm.read''. |
| Metrics show 0 but Cloud ALM has data | Some measures legitimately report 0 when no exceptions occurred in the 5-minute window. |
| Alarms not triggering | Row is Active. Alarm is enabled. Metric and Service name match incoming data. Threshold syntax is correct. |
| Duplicate alarms for same category | Add ''Attributes Filter'' with ''categoryName:'' to target a specific category per row. |
| No metrics stored when data exists | Enable ''Metric'' on the relevant row. |
| Stale data after adding new services | Click **Load Services** again or wait for the next scheduled run. |
===== Limitations =====
* ''Metric'' field does not support partial glob patterns. ''*disruption'' does NOT match ''exm.counter_disruption''. Use exact values or ''*''.
* ''Service ID'' is auto-populated by data collection. Manual entry is possible if the UUID is known.
* Malformed ''Attributes Filter'' clauses are silently ignored. Validate format before saving.
* The time window is always the last complete 5-minute slot. Data more recent than 1 minute is not included.
* ''Exclusive = true'' means first matching row wins. Order rows from most specific to most general.
* Thresholds apply to the raw numeric count value of the selected metric.