We’ve gone from almost none (in August, there were 13 across four days) to 5,000-10,000 per day, since a few days ago.
Previous deploy was two weeks ago.
There is nothing in our application logs that suggests that anything else is going wrong.
No evidence of connections dropping.
Does anyone have any ideas what could be causing this?
Is the persistent subscriptinos feeding a service for loading into a read model?
Pure guess here, but I suspect it’s whatever is consuming the events is 1) either failing or 2) taking long than usual. The default retry is 10 seconds per event.
They’ve all got timeouts of 30 seconds.
They’re generally pretty simple: Hydrate model from events, call method and persist.
Perhaps ES is taking too long to do one of the operations?
How can I tell whether ES is at its limit or close to?
I’ve got logging and statsd metrics around fetches (length of stream and time taken) but neither of them appear to indicate that ES is saturated (eg only 30 seconds of the minute spent reading events)
Buffer Size
20
Check Point After
2000
Extra Statistics
false
Live Buffer Size
500
Max Checkpoint Count
1000
Max Retry Count
500
Message Timeout (ms)
30000
Min Checkpoint Count
10
Consumer Strategy
RoundRobin
Read Batch Size
10
Resolve Link tos
true
Start From Event
-1