we run heartbeats to keep track of both uptime for catchup subscriptions and end-end latency.
a heartbeat generator app has a pacemaker which calls each API (we have several) once per minute.
everything goes through the normal plumbing; the API generates a heartbeat command, an aggregate is read from its stream, a heartbeat event is emitted.
all downstream projectors subscribe to the $ce-heartbeat stream and apply heartbeats to their relevant readstore.
a separate heartbeat monitor process polls each readstore every minute and asserts the latest heartbeat is not too stale - fires alerts on failure
and this is the output - with alerting when thresholds are exceeded.
you can see 4 APIs receiving heartbeats (the ordering is a bit out, the first one should be “heartbeats generated from generator to APIs”)
then we happen to have 4 readstores though these weren’t always the same number as APIs. And you can see there’s actually a fifth projector which writes to an external system
the key graph/SLO is time from generator to database. That’s our eventual consistency window.
hope that’s useful.