Cost of collecting stats

I’ve been looking at performance on our Eventstore cluster (4.0.3), and one things that stands out in the logs is warnings the monitoring queue, e.g:

[00001,61,12:14:56.603] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1094ms. Q: 0/0.

[00001,108,12:14:59.976] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1482ms. Q: 0/0.

[00001,26,12:15:02.853] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1318ms. Q: 0/0.

[00001,12,12:15:08.748] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1213ms. Q: 0/0.

[00001,10,12:15:15.655] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1100ms. Q: 0/0.

[00001,50,12:15:20.699] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1166ms. Q: 0/0.

While we have Eventstore set to record stats at the default interval of 30 seconds, we also have Prometheus scraping metrics of the /stats HTTP endpoint and it turns out this operations appears to be more expensive than expected.

Is collecting stats expected to be an expensive operation? Would this be CPU bound or IO bound? Is it possible that calculating stats may have an impact on other ongoing operations?

Cheers,

Kristian

Kristian,
I’m seeing similar SLOW QUEUE MSG messages, which is how I found this post. Did you ever find a cause or solution?

ES stores the stats in a stream so you can fetch the latest stats from the stream without incurring the cost of generating new ones.

curl --request GET \

–url http://127.0.0.1:2113/streams/%24stats-0.0.0.0%3A2113/head/backward/1 \

–header ‘accept: application/vnd.eventstore.atom+json’ \

-u “admin:changeit”

And from there you can fetch the latest event

https://eventstore.org/docs/http-api/reading-streams/index.html

Cheers

Mike