Hey peoples,
We use collectd + riemann + influxdb + grafana for monitoring. After a recent spate of eventstore flakiness, I’d like to add some monitoring to our eventstore cluster. I’m assuming the simplest route is to curl the gossip and stats endpoints, but does anyone have any idea what the vital metrics are?
What things should we flag as critical, and what things should we raise for further investigation?
– Bob.