Hello!
We are currently facing a problem in the consumption of events from our persistent subscriptions. At any point in time, many of our subscribers are sitting idle with all their available slots while there are still many events to handle.
We have a large amount of events in many streams being funneled into larger streams via the $by_category projection to which our persistent subscription is registered to, which our clients are then subscribed to consuming these events. Because of the large amount of events, we (horizontally) scaled up the consuming service to attempt to get through these quicker. We currently have 10 instances of our service trying to consume these.
Here is the configuration for the persistent subscription:
“config”: {
“resolveLinktos”: true,
“startFrom”: 0,
“messageTimeoutMilliseconds”: 25000,
“extraStatistics”: false,
“maxRetryCount”: 10,
“liveBufferSize”: 500,
“bufferSize”: 500,
“readBatchSize”: 20,
“preferRoundRobin”: false,
“checkPointAfterMilliseconds”: 2000,
“minCheckPointCount”: 10,
“maxCheckPointCount”: 1000,
“maxSubscriberCount”: 100,
“namedConsumerStrategy”: “Pinned”
}
``
A sample response from the persistent subscription’s knowledge of our clients:
{
“from”: “instance-1-ip”,
“username”: “some-username”,
“averageItemsPerSecond”: 0,
“totalItemsProcessed”: 39809,
“countSinceLastMeasurement”: 0,
“extraStatistics”: [],
“availableSlots”: 50,
“inFlightMessages”: 0
},
{
“from”: “instance-2-ip”,
“username”: “some-username”,
“averageItemsPerSecond”: 17,
“totalItemsProcessed”: 42887,
“countSinceLastMeasurement”: 14,
“extraStatistics”: [],
“availableSlots”: 0,
“inFlightMessages”: 50
}
``
As you can see, one is at 50 in flight messages, and the other is at 0, while there are thousands (event hundreds of thousands) of events to process still.
We know there are events being NAK’d and retried by the services, but I would still think other services could be consuming the remaining events?
Any help tracking down what the problem is would be helpful.
Thank you!
Napoleone