We have some long running subscriptions to EventStore that sometimes may sit for days without any messages appearing. Recently, one of these appear to have terminated itself without error in production, leading to missing actions when events eventually do start to appear.
From the logs, I can see that EventStore has replied with reason Unsubscribed, suggesting that it is user initiated, but our service has definitely not requested that the subscription would be stopped:
12:05:26.000[esio-1-1] - c.g.m.e.t.h.OperationHandler - HandleTcpPackage SUBSCRIPTION DECISION EndOperation (SubscriptionDropped: Unsubscribed), Subscription VolatileSubscriptionOperation (e029e4ca-ba0a-4bfc-88dc-f6da809738df): com.github.msemys.esjc.subscription.VolatileSubscriptionOperation@78ede8dc, is subscribed: true, retry count: 0, created: 2016-10-09T09:53:26Z, last updated: 2016-10-09T09:53:26Z
The EventStore server logs show that nodes appear to be restarting and master election taking place at the time.
Running against EventStore 3.7.0 in a 3 node cluster, with the ESJC client in Java (https://github.com/msemys/esjc).
Are there any known circumstances under which the server would close a subscription with reason Unsubscribed, without it being the result of a user initiated subscription close? I’ve noticed that Unsubscribed is the default in the protobuf message, is it possible that a reason may not have been set?
Cheers,
Kristian
Lot's of reasons. Losing a connection to a node is one. Back pressure
is another. Volatile subscriptions are well volatile.
I understand that there would be multiple reasons for a subscription to be dropped, but I would have have expected the reason to be a bit more specific.
The ESJC client converts Unsubscribed to a reason of UserInitiated - is this an issue in the client implementation? (see https://github.com/msemys/esjc/blob/master/src/main/java/com/github/msemys/esjc/subscription/AbstractSubscriptionOperation.java#L146).
Yeah I think UserInitiated might be wrong here (looking the .net
client does the same). In either case it is the responsibility of the
client to reconnect the subscription at this point (at least in the
.net client). EG "We have some long running subscriptions to
EventStore that sometimes may sit for days without any messages
appearing." My guess is the error came up and no re-subscription was
made? In general using catchupsubscriptions for such use cases is a
better overall strategy (also will get messages that appear when not
subscribed after)
This is in fact using a catch up subscription, and yes - we were not resubscribing.
Our logic would trigger a resubscribe on all errors except UserInitiated, as that was assumed to be an intended close. This is now getting changed on our side so that we resubscribe on all errors including “user initiated”, and only terminate cleanly when we know that close was called by us in process.
Do you know what the official Java client woud do in this case (EventStore.JVM)?
Regarding the .NET Client. My understanding with the EventStoreCatchUpSubscription is that it will automatically resubscribe (read all the events and start live processing again) in the case where the underlying connection reconnects itself.
I just happened to be here and read your statement about this and immediately got confused. Looking at the code it should handle reconnects itself right?
i.e. this code:
private void OnReconnect(object sender, ClientConnectionEventArgs clientConnectionEventArgs)
{
if (Verbose) Log.Debug(“Catch-up Subscription to {0}: recovering after reconnection.”);
if (Verbose) Log.Debug(“Catch-up Subscription to {0}: unhooking from connection.Connected.”);
_connection.Connected -= OnReconnect;
RunSubscription();
}
``
Cheers