.NET ClientAPI: operation blocked by pending reconnection

Hi,

I am using EventStore 4.0.3 and EventStore.ClientAPI.NetCore 4.0.2-rc. I experience a weird behavior IMHO.

I start an EventStore node, I connect to it with KeepReconnecting, so I avoid “Object Disposed” error if my unique connection is closed (server down for example).

=> 1) First, is there any better way to manage an always up unique connection? Should I instanciate manually a new connection when connection is closed?

The strange behavior with KeepReconnecting is that if I append events while my connection is down (and then trying reconnection), all AppendToStreamOperation are enqueued but never timeout.

I checked the code, and I saw that “_operations.CheckTimeoutsAndRetry(_connection);” is called only when state is Connected (see TimerTick in EventStoreConnectionLogicHandler). So I understand that when reconnecting, messages enqueues and operations never timeout until connection is up again. If I have a web app, it means every request will block until reconnection, with lots of parallel waiting Task until we raise a max…and then boom, web app down.

=> 2) Isn’t there a potential issue ? Shouldn’t operation timeout if reconnecting takes longer than defined Timeout for operation?

Let me know if I should create an issue on GitHub. Note I checked source code of EventStore.ClientAPI, it looks like it is the same behavior.

Thanks a lot.

Clément

I added a unit test here to show the expected behavior : https://github.com/devcrafting/ClientAPI.NetCore/tree/fix/operation-timeout-while-reconnecting

Note I also added a very naive implementation, calling “_operations.CheckTimeoutsAndRetry(_connection)” even in Connecting state, we probably don’t want to Retry, but only check Timeout?

I created an issue : https://github.com/EventStore/ClientAPI.NetCore/issues/20

Not specific to dotnet core, also reproducible on classic dotnet lib.

Am I really the only one to misunderstand use of connection or experiencing this weird behavior?

Thanks

Hi Clément,

Thank you for reporting this and for providing a sample

This appears to be the same as this issue on github, you are welcome to track it there