HANA performance monitoring: the signals that matter
A practical shortlist of what to watch on a SAP HANA database — memory, CPU, savepoints, disk, delta merges, blocked transactions, system replication — and what to alert on versus trend.
HANA exposes an enormous number of system views, and it is easy to drown in them. The useful question is not "what can I monitor" but "what should I watch continuously so that a problem surfaces while it is still small". This is a practical shortlist — the signals that earn a permanent place on a HANA monitoring view, and what to alert on versus what to merely trend.
Memory: the signal HANA lives and dies by
HANA is an in-memory database, so memory is the first thing to watch and the one most likely to cause a bad day. Two numbers matter more than the rest:
- Used memory against the global allocation limit. Not against the physical RAM — against the limit HANA is allowed to use. Sustained pressure here is what precedes an out-of-memory event. Trend it always; alert when it crosses a high-water mark with enough headroom to react.
- Resident memory against physical RAM. If other processes on the host are squeezing HANA, that shows up here before it shows up as a HANA error.
Column-store table memory is worth a periodic look too — a table that grows unexpectedly is often the early sign of a housekeeping job that stopped running.
CPU and the savepoint rhythm
CPU on a HANA host is usually fine until it suddenly is not, often during a heavy batch window colliding with online users. Trend host CPU and watch for the sustained saturation that means the workload has outgrown the sizing. Alongside it, watch savepoint duration: savepoints that start taking noticeably longer are a sign of I/O pressure on the log or data volumes, and they are an early warning that the storage layer needs attention.
Disk: data, log, and the trace directory nobody watches
- Data and log volume fill. The classic slow-motion outage. A log volume that fills stops the database. Trend the fill rate and alert with days of headroom, not minutes.
- The trace directory. Verbose tracing left on after a troubleshooting session can quietly fill a filesystem. It is the unglamorous cause of a surprising number of incidents. Watch the directory size, not just the data and log volumes.
- Backup catalog and log backups. A backup that silently stopped is only discovered when you need a restore. Monitor that log backups are actually happening, not just that the job is scheduled.
Delta merges and the column store
HANA's column store accumulates writes in a delta store and periodically merges them into the main store. When merges fall behind — because they are failing, or because the system is too busy to run them — query performance degrades and memory use climbs. Watch for tables with a large or growing delta, and for merge operations that error. This is one of the signals that separates "the database is slow" from "the database is slow for this specific, fixable reason".
Blocked transactions and long-runners
A single long-running statement holding a lock can stall a queue of others behind it. Watch for blocked transactions and for statements running far longer than their historical norm. The point is not to kill them automatically — it is to make the human aware while the queue is short rather than after the helpdesk lights up.
System replication, if you run it
If the database is protected by HANA System Replication, the replication state is a first-class signal. Watch that the secondary is connected and in sync, and watch the replication backlog. A secondary that has silently fallen out of sync is a DR posture that exists on paper only — and you find out at the worst possible moment.
What to alert on versus what to trend
A monitoring view that alerts on everything trains its operators to ignore it. A rough division that works:
- Alert on the things that lead to an outage with little warning: memory crossing the danger threshold, a log volume nearing full, replication broken, backups not running.
- Trend and review the things that degrade slowly: CPU saturation patterns, savepoint duration, delta-store growth, column-table memory. These inform sizing and housekeeping conversations rather than paging someone at 3 a.m.
Multi-tenant adds one rule
On an MDC system, every signal above exists per tenant as well as for SYSTEMDB, and a healthy SYSTEMDB does not mean a healthy tenant. The monitoring layer has to show them as one operational view while keeping each tenant's data isolated — which is a harder problem than collecting the metrics, and the subject of a separate guide on multi-tenant HANA.
Where Farrenio fits
Farrenio's HANA database monitoring is built around exactly this: the handful of signals worth watching continuously, across many databases, with the tenant boundary intact and an audit trail behind every access. It sits next to the application-side Basis transactions so the database view and the SM50/SM37/ST22 view are one screen, not two tools.
If you want to see these signals against your own HANA, write to contact@farrenio.com and we will scope a short trial on a non-production database.
Run Farrenio against your own SIDs.
14-day sandbox tenant. No card. Real data.