====== ALM Exception Monitoring ====== Retrieves exception count metrics from SAP Cloud ALM Exception Monitoring (EXM). Publishes time-series metrics and evaluates alarms per configured row. * API: ''GET /api/calm-metrics/v1/metrics?provider=exm'' ===== Prerequisites ===== ==== Cloud ALM Connector (required) ==== Requires a Web Service connector with authentication type **CLOUD_ALM**. [[..:sap_cloud_alm|SAP Cloud ALM Connector]] Based on a service key from the SAP Cloud ALM API service instance in the BTP subaccount. Required OAuth scopes in the ''authorities'' list of the instance parameters: * ''$XSMASTERAPPNAME.calm-api.exm.read'' * ''$XSMASTERAPPNAME.calm-api.metrics.read'' ===== API Endpoints ===== ^ Endpoint ^ Purpose ^ | POST /oauth/token | Authentication (BTP UAA) | | POST /api/calm-analytics/v1/analytics/providers/filters | Fetch EXM service IDs (data collection) | | GET /api/calm-metrics/v1/metrics?provider=exm | Retrieve EXM metrics | ===== Key Features ===== * Publishes six metric types per EXM datapoint: ''exm.counter'' ''exm.counter_available'' ''exm.counter_disruption'' ''exm.counter_degradation'' ''exm.counter_maintenance'' ''exm.counter_unknown'' * Time window is a **5-minute aligned UTC window** with a 1-minute delay: ''to = floor(now - 1min, 5min)'', ''from = to - 5min'' (format ''yyyyMMddHHmmss'' UTC) * Paginated fetching up to 5000 records per page using ''x-total-count'' response header * All 6 measures are gauge-to-count converted (non-monotonic delta sum) before storage * Per-row thresholds with standard ''G2W:80 W2M:90'' syntax. [[..:commonsettings#multi_thresholds_syntax|Multi Threshold Syntax]] * ''Attributes Filter'' narrows which datapoints a row evaluates * ''Exclusive'' controls whether a matched datapoint is consumed or passed to later rows * Glob support for ''Metric'' and ''Service name'' fields (''*'' = all) * Optional alarm tag for grouping or routing * Auto-clear when alarm condition no longer matches * **Load Services** button: auto-discovers EXM service IDs and names from the live tenant ===== Data Collection ===== Data collection populates ''Service name'' and ''Service ID'' from the Cloud ALM EXM service registry. Always run data collection before adding rows. A service name not returned by data collection may not match what the metrics API returns. Data collection runs one call: - ''POST /api/calm-analytics/v1/analytics/providers/filters'' with body ''{"providerName":"EXM_DATAPROVIDER","providerVersion":"v1"}'' — returns the EXM service filter list. Finds the entry with ''key = serviceId'' and extracts each service UUID and its label (e.g. ''S4H.100''). Click **Load Services** to run data collection and populate the surveillance table. ===== Configuration ===== ==== Method 1: Load Services ==== - Open monitor configuration - Click **Load Services** - Table populates with ''Service name'' and ''Service ID'' from the live tenant - Enable rows, set thresholds, save ==== Method 2: Manual / Wildcard ==== - Set ''Metric'' = ''*'' and/or ''Service name'' = ''*'' to match all - Use only service names returned by data collection ==== Settings Reference ==== ^ Field ^ Type ^ Default ^ Description ^ | Active | Boolean | true | Enable or disable this row | | Service name | String | * | Glob matched against EXM service label e.g. ''S4H.100''. ''*'' = all. Populated by data collection | | Service ID | String | (empty) | UUID of service. Auto-populated by data collection | | Metric | String | * | Metric type to match. Exact values or ''*''. See [[#metric_types|Metric types]] | | Attributes Filter | String | (empty) | Narrows datapoints. Format: ''key:value,key2:value2''. Empty = no restriction. See [[#attributes_filter|Attributes Filter]] | | Thresholds | String | G2W:80 W2M:90 | Alarm thresholds. [[..:commonsettings#multi_thresholds_syntax|Multi Threshold Syntax]] | | Alarm tag | String | (empty) | Optional tag appended to alarm message | | Exclusive | Boolean | true | If true datapoint is consumed by this row and not re-evaluated by later rows | | Alarm | Boolean | true | Enable alarm evaluation for this row | | Metric | Boolean | true | Publish metric datapoints for this row | ==== Metric types ==== ^ Value ^ Description ^ | exm.counter | Total exception count | | exm.counter_available | Exceptions while system status was Available | | exm.counter_disruption | Exceptions during Disruption | | exm.counter_degradation | Exceptions during Degradation | | exm.counter_maintenance | Exceptions during Maintenance | | exm.counter_unknown | Exceptions with unknown status | | * | All of the above | > **Note**: metric names in the ''Metric'' field use the snake_case form shown above (e.g. ''exm.counter_disruption'', not ''exm.counterDisruption''). ==== Attributes Filter ==== Narrows which datapoints a row matches. All clauses must match (AND). Matching is case insensitive. Malformed clauses are silently ignored. ^ Attribute ^ Description ^ Example ^ | categoryName | Exception category name | ''ABAP_SHORT_DUMP'' | | serviceType | Service type identifier | ''S4'' | | useCase | Use case identifier | ''EXM'' | Example filter value: categoryName:ABAP_SHORT_DUMP,serviceType:S4 ==== Filter Evaluation Order ==== - No rows configured: nothing is published. Add at least one active row to collect data. - ''Service name'': glob matched against EXM service label. ''S4H*'' matches ''S4H.100''. ''*'' matches all. - ''Metric'': exact match or ''*''. Partial globs do NOT work. - ''Attributes Filter'': all clauses must match. Empty = match all datapoints. - ''Exclusive = true'': datapoint consumed by first matching row. Later rows skip it. - ''Metric = false'': row evaluates alarms but publishes no metric datapoints. ===== Collected Metrics ===== Base key: ''promonitor.cloud_alm.exm.*'' ^ Metric key ^ Unit ^ Description ^ | ''promonitor.cloud_alm.exm.exm.counter'' | count | Total exception count | | ''promonitor.cloud_alm.exm.exm.counter_available'' | count | Exceptions during Available status | | ''promonitor.cloud_alm.exm.exm.counter_disruption'' | count | Exceptions during Disruption | | ''promonitor.cloud_alm.exm.exm.counter_degradation'' | count | Exceptions during Degradation | | ''promonitor.cloud_alm.exm.exm.counter_maintenance'' | count | Exceptions during Maintenance | | ''promonitor.cloud_alm.exm.exm.counter_unknown'' | count | Exceptions with unknown status | Tags published with each datapoint: ''service.name'' ''sap.service.name'' ''service.namespace'' ''service.instance.id'' ''sap.service.display_name'' plus datapoint-level tags ''serviceId'' ''serviceName'' ''categoryName'' ''serviceType'' ''useCase'' All metrics are stored as non-monotonic delta counts (converted from gauge before storage). ===== Alarm Evaluation ===== Alarms use a suppression key to deduplicate. Format: * With category: ''{monitorId}_{connectorId}_alm_exm_{service}_{metric}_{categoryName}_{rowIdx}'' * Without category: ''{monitorId}_{connectorId}_alm_exm_{service}_{metric}_{rowIdx}'' One alarm per unique key. A new value overwrites the previous alarm state for the same key. ===== Time Window ===== The time window is computed at collection time using 5-minute alignment with a 1-minute delay: adjustedNow = UTC.now() - 1 minute minuteFloor = floor(adjustedNow.minute / 5) * 5 to = adjustedNow with minute=minuteFloor, second=0, nanosecond=0 from = to - 5 minutes Example at 14:37 UTC: ''from=20260601143000'' ''to=20260601143500'' This ensures the query always covers a complete 5-minute window that the EXM backend has already finalized. ===== Examples ===== ==== 1. Publish all metrics for all services ==== Set ''Service name'' = ''*'' and ''Metric'' = ''*''. An empty table sends nothing. ==== 2. Alarm on total exceptions above threshold ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^ | true | * | exm.counter | (empty) | G2W:1 W2M:10 | true | true | true | Alarms as soon as 1 exception appears. ''G2W:1'' means any non-zero value triggers warning. ==== 3. Alarm on short dumps for one service ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^ | true | S4H.100 | exm.counter | categoryName:ABAP_SHORT_DUMP | G2W:1 W2M:5 | true | true | true | Only datapoints for ''S4H.100'' and category ''ABAP_SHORT_DUMP'' are evaluated. ==== 4. Alarm on disruption exceptions only ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^ | true | * | exm.counter_disruption | (empty) | G2W:1 W2M:5 | true | true | true | Alarms when any exception occurs during a Disruption period. ==== 5. Alarm without publishing metrics ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Alarm ^ Metric ^ | true | * | exm.counter | (empty) | G2W:1 W2M:10 | true | false | ''Metric = false'': alarms fire but no datapoints written to time-series. ==== 6. Monitor disruption and degradation with different tags ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Alarm tag ^ Exclusive ^ Alarm ^ Metric ^ | true | * | exm.counter_disruption | (empty) | G2W:1 W2M:5 | EXM_DISRUPT | true | true | true | | true | * | exm.counter_degradation | (empty) | G2W:1 W2M:5 | EXM_DEGRADE | true | true | true | Each metric type gets its own alarm tag. Exclusive rows prevent cross-matching. ==== 7. Two services with different thresholds ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^ | true | S4H.100 | exm.counter | (empty) | G2W:1 W2M:5 | true | true | true | | true | * | exm.counter | (empty) | G2W:5 W2M:20 | true | true | true | Row 1 consumes ''S4H.100'' datapoints (''Exclusive = true''). Row 2 applies a looser threshold to all other services. ==== 8. Suppress a category from alarming ==== ^ Active ^ Service name ^ Metric ^ Attributes Filter ^ Thresholds ^ Exclusive ^ Alarm ^ Metric ^ | true | * | exm.counter | categoryName:INFORMATIONAL | G2W:80 W2M:90 | true | false | false | | true | * | exm.counter | (empty) | G2W:1 W2M:10 | true | true | true | Row 1 consumes informational exceptions (''Exclusive = true'', ''Alarm = false''). Row 2 alarms on all others. ===== Troubleshooting ===== ^ Symptom ^ Check ^ | Empty table after Load Services | Connector uses ''CLOUD_ALM'' auth. Service key is valid. Tenant has active EXM data. | | HTTP 401 or 403 | Regenerate service key in BTP instance. Verify OAuth scopes include ''calm-api.exm.read''. | | Metrics show 0 but Cloud ALM has data | Some measures legitimately report 0 when no exceptions occurred in the 5-minute window. | | Alarms not triggering | Row is Active. Alarm is enabled. Metric and Service name match incoming data. Threshold syntax is correct. | | Duplicate alarms for same category | Add ''Attributes Filter'' with ''categoryName:'' to target a specific category per row. | | No metrics stored when data exists | Enable ''Metric'' on the relevant row. | | Stale data after adding new services | Click **Load Services** again or wait for the next scheduled run. | ===== Limitations ===== * ''Metric'' field does not support partial glob patterns. ''*disruption'' does NOT match ''exm.counter_disruption''. Use exact values or ''*''. * ''Service ID'' is auto-populated by data collection. Manual entry is possible if the UUID is known. * Malformed ''Attributes Filter'' clauses are silently ignored. Validate format before saving. * The time window is always the last complete 5-minute slot. Data more recent than 1 minute is not included. * ''Exclusive = true'' means first matching row wins. Order rows from most specific to most general. * Thresholds apply to the raw numeric count value of the selected metric.