Message retried even with high timeout on Competing Consumer

Eric_Swann · May 10, 2016, 1:21pm

We’re using ES as both an event store and as a lightweight event bus. We’re using competing consumers to support our event bus flow. In some cases, the event consumer can perform a relatively long operation (such as a couple of external http calls that take a few seconds). We’ve bumped the MessageTimeout on those competing consumers to something ridiculously high (like a minute) just to validate what we are seeing. Even though the MessageTimeout is a large value, I’ll see the message get reprocessed after what looks to be >= 5 seconds. The situation is being handled but we’d rather not have the message get processed twice. The message is not erring in the handler…it is completing successfully. Also we are testing with only one consumer.

Is there some other timeout or situation that would impact this and cause the message to be reprocessed?

Thanks

pieter.germishuys · May 10, 2016, 1:49pm

Can you confirm that the value of the timeout is correctly being set to a minute?
This can be either done through the UI or a web request to

curl -i http://localhost:2113/subscriptions/stream_name/group_name/info

``

Eric_Swann · May 10, 2016, 2:04pm

Hi Pieter,

It does show it as a minute in the UI (60,000 ms). Let me also specify that this appears to work locally in my dev environment (on Windows) but not on our dev server (Docker/Linux). But I also have some other settings that are different (like locally I have a very long heartbeat timeout for debugging purposes).

Here are the settings:

“config”: {

“resolveLinktos”: true,

“startFrom”: -1,

“messageTimeoutMilliseconds”: 60000,

“extraStatistics”: false,

“maxRetryCount”: 5,

“liveBufferSize”: 500,

“bufferSize”: 500,

“readBatchSize”: 20,

“preferRoundRobin”: true,

“checkPointAfterMilliseconds”: 2000,

“minCheckPointCount”: 1,

“maxCheckPointCount”: 1,

“maxSubscriberCount”: 10,

“namedConsumerStrategy”: “RoundRobin”

},

Greg_Young1 · May 10, 2016, 2:38pm

Are you getting a connection drop etc while processing? This would
cause a retry.

Eric_Swann · May 10, 2016, 4:46pm

Hi Greg, I’ll double check on that but I didn’t see anything in our logs on dropped connections (which we do log). Is there anything you can think of that would cause the connection to get dropped based on some other timeout or criteria? If it happened only occasionally I wouldn’t sweat it but it seems to be happening quite a lot.

Thanks

Greg_Young1 · May 10, 2016, 4:47pm

Heartbeat timeouts are the most common reason. Often this can be
caused by threadpool exhaustion or jitter depending on the settings.