Slow queue reported and server stopped accepting requests

Morning all,

We ran into an issue with our eventstore deployment yesterday where it started reporting warning about extremely slow queues. It eventually stopped accepting requests and had to be restarted.

When it came back online it had to rebuild the read index which took a considerable amount of time (roughly 60 mins for about 30m items)

I noticed that the eventstore process started consuming a large amount of memory and eventually started paging aggressively which I assume is what caused the perf issues.

We are running on version 3.9.3. We have an automated task which runs a scavenge every evening at 9PM. The issue happened at 5:30PM.

I have attached the error log and stats. Any idea what could have caused this?

Thanks

10.10.0.101-2113-cluster-node-err.log (50.2 KB)

10.10.0.101-2113-cluster-node-stats.csv (6.96 MB)

Would you mind attaching the normal logs as well?

Hi Pieter,

Normal logs attached, thanks

10.10.0.101-2113-cluster-node.log.zip (3.09 MB)