CatchUpError handling

Hi,

I have faced problem with ES subscription when it cannot establish connection.

I have prepared cluster with 3 nodes (on local machine).

Simple scenario with basic code to subscribe to stream is working correctly till all 3 nodes are disconnected; when it happen then after 10 attempts to connect subscription is falling into infinite loop where subscription drop handler is invoked and ES connection object is disposed.
Same when 2 of 3 nodes are disconnected (on of them must be the node that subscription have established connection).

From this post - https://groups.google.com/forum/#!starred/event-store/AdKzv8TxabM - I’m deducing it should be like this and somehow I should handle it.

My idea is to call Close() and then Dispose() and then create new connection object and try to connect to ES but only when subscription drop reason would be CatchUpError. Is this proper way to handle CatchUpError ?

Gist with basic subscription

What do the logs say (on the client side) eg enableverboselogging() and on the server side.

Regarding server side logs you mean EventStore ?
If so then nothing special besides losing connection to first disconnected node and when second is disconnect then the only one node that is alive logging :

[PID:09896:013 2018.03.12 08:47:14.547 DEBUG ElectionsService ] ELECTIONS: (V=22) TIMED OUT! (S=ElectingLeader, M=).
[PID:09896:013 2018.03.12 08:47:14.547 DEBUG ElectionsService ] ELECTIONS: (V=23) SHIFT TO LEADER ELECTION.
[PID:09896:013 2018.03.12 08:47:14.547 DEBUG ElectionsService ] ELECTIONS: (V=23) VIEWCHANGE FROM [127.0.0.1:1113, {38a5486c-8183-4022-b12b-1b41cb6ada4f}].
[PID:09896:004 2018.03.12 08:47:14.575 INFO TcpConnectionManager] Connection ‘master-normal’ ({d8a341b8-14c0-4671-96da-a4fa8ffc7cdd}) to [127.0.0.1:2111] failed: ConnectionRefused.
[PID:09896:013 2018.03.12 08:47:15.082 TRACE TcpConnectionManager] Closing connection ‘master-normal’ [127.0.0.1:2111, L, {d8a341b8-14c0-4671-96da-a4fa8ffc7cdd}] cleanly. Reason: Reconnecting from old master [127.0.0.1:2111] to new master: [127.0.0.1:2111].
[PID:09896:013 2018.03.12 08:47:15.082 INFO TcpConnection ] ES TcpConnection closed [08:47:15.082: N127.0.0.1:2111, L, {d8a341b8-14c0-4671-96da-a4fa8ffc7cdd}]:Received bytes: 0, Sent bytes: 0
[PID:09896:013 2018.03.12 08:47:15.082 INFO TcpConnection ] ES TcpConnection closed [08:47:15.082: N127.0.0.1:2111, L, {d8a341b8-14c0-4671-96da-a4fa8ffc7cdd}]:Send calls: 0, callbacks: 0
[PID:09896:013 2018.03.12 08:47:15.082 INFO TcpConnection ] ES TcpConnection closed [08:47:15.082: N127.0.0.1:2111, L, {d8a341b8-14c0-4671-96da-a4fa8ffc7cdd}]:Receive calls: 0, callbacks: 0
[PID:09896:013 2018.03.12 08:47:15.082 INFO TcpConnection ] ES TcpConnection closed [08:47:15.082: N127.0.0.1:2111, L, {d8a341b8-14c0-4671-96da-a4fa8ffc7cdd}]:Close reason: [Success] Reconnecting from old master [127.0.0.1:2111] to new master: [127.0.0.1:2111].
[PID:09896:013 2018.03.12 08:47:15.082 INFO ReplicaService ] Subscribing at LogPosition: 384046769 (0x16E416B1) to MASTER [127.0.0.1:2111, {e23f6b7d-25a8-4c37-bafa-83a7a5511e86}] as replica with SubscriptionId: {f8ce3413-009a-4df6-9f04-d77498462e06}, ConnectionId: {d6eb9875-3a3b-4ce9-bd53-04aa973cfda3}, LocalEndPoint: [], Epochs:

Repost log entries (previouse message was removed … maybe I should not enclose entries exported to csv).
Client getting over and over :

12 Mar 2018 09:21:26.986

[STOP] Subscription DummyStream-1 dropped with reason CatchUpError

12 Mar 2018 09:21:26.985

Catch-up Subscription to DummyStream-1: dropping subscription, reason: CatchUpError System.ObjectDisposedException: Cannot access a disposed object.
Object name: ‘ES-52500d65-bc5e-4063-861a-8b76da0d5765’…

12 Mar 2018 09:21:26.984

EventStoreConnection ‘ES-52500d65-bc5e-4063-861a-8b76da0d5765’: enqueueing message EventStore.ClientAPI.Internal.StartOperationMessage…

12 Mar 2018 09:21:26.983

Catch-up Subscription to DummyStream-1: pulling events…

12 Mar 2018 09:21:26.977

Catch-up Subscription to DummyStream-1: running…

12 Mar 2018 09:21:26.977

[START] Subscription to DummyStream-1 subscribed

12 Mar 2018 09:21:26.977

Catch-up Subscription to DummyStream-1: starting…

12 Mar 2018 09:21:26.976

[START] Subscription to DummyStream-1 starting

12 Mar 2018 09:21:26.976

[STOP] Subscription to DummyStream-1 resurrecting

12 Mar 2018 09:21:26.976

Catch-up Subscription to {0}: unhooking from connection.Connected.

12 Mar 2018 09:21:26.975

Catch-up Subscription to DummyStream-1: requesting stop…

12 Mar 2018 09:21:26.975

[STOP] Subscription DummyStream-1 dropped with reason CatchUpError

It looks like setting maxreconnections to -1 helped in case of 2 nodes down in 3-nodes cluster.
I don’t receive anymore CatchUpError System.ObjectDisposedException.

Problem still occures when whole cluster is turned off. Client attempts 10 times reconnect and then it receives:

CloseConnection, reason Failed to resolve TCP end point to which to connect., exception System.AggregateException: One or more errors occurred. —> EventStore.ClientAPI.Exceptions.ClusterException: Failed to discover candidate in 10 attempts.
at EventStore.ClientAPI.Internal.ClusterDnsEndPointDiscoverer.<>c__DisplayClass10_0.b__0()
at System.Threading.Tasks.Task1.InnerInvoke() at System.Threading.Tasks.Task.Execute() --- End of inner exception stack trace --- ---> (Inner Exception #0) EventStore.ClientAPI.Exceptions.ClusterException: Failed to discover candidate in 10 attempts. at EventStore.ClientAPI.Internal.ClusterDnsEndPointDiscoverer.<>c__DisplayClass10_0.<DiscoverAsync>b__0() at System.Threading.Tasks.Task1.InnerInvoke()
at System.Threading.Tasks.Task.Execute()<—

Next getting:

EventStore.ClientAPI.Exceptions.ConnectionClosedException: Connection ‘ES-91b69d3e-4301-4353-93f0-31c9771cc4a4’ was closed
EventStoreConnection ‘ES-91b69d3e-4301-4353-93f0-31c9771cc4a4’: CloseTcpConnection.
EventStoreConnection ‘ES-91b69d3e-4301-4353-93f0-31c9771cc4a4’: IGNORED
(_state: Closed, _conn.ID: {4e39a407-ac9d-4418-8fd8-3ff77646e701}, conn.ID: {4e39a407-ac9d-4418-8fd8-3ff77646e701}): TCP connection to [127.0.0.1:1112, L127.0.0.1:4493] closed.

  Catch-up Subscription to test_stream_636566976673691773: requesting stop.

EventStoreConnection ‘ES-91b69d3e-4301-4353-93f0-31c9771cc4a4’: Closed. Reason: Failed to resolve TCP end point to which to connect.
Catch-up Subscription to {0}: unhooking from connection.Connected.

but after that when client try to establish new subscription then each time getting:

  Catch-up Subscription to test_stream_636566976673691773: starting

Catch-up Subscription to test_stream_636566976673691773: running.
Catch-up Subscription to test_stream_636566976673691773: pulling events.

  Catch-up Subscription to test_stream_636566976673691773: dropping subscription, reason: CatchUpError System.ObjectDisposedException: Cannot access a disposed object.

Object name: ‘ES-91b69d3e-4301-4353-93f0-31c9771cc4a4’.

How I should proper handle ClusterException ? Should I drop old ES connection object and create new one ? Is it ClusterException critical exception that should in result cause client app being turn off ?