So I’ve found the ShouldStop flag in the code… and think i might have worked it out. The SubscriptionDropReason.CatchUpError and SubscriptionDropReason.UserInitiated reasons both cause the RunSubscription method to exit before the OnReconnect event is attached.
ES auto-reconnects (i.e. ConnectionClosed), cool just let it
Do not reconnect (i.e. AccessDenied). bad things have happened, no point retrying
Manually handle it (i.e. CatchUpError). I’m wondering what the underlying reasons for this might be and how recoverable it is.
As a side question, does ES provide a way to monitor/query subscriptions? i,e, if i create 10 subscriptions, is there an API where I can query what the state of each of them is? I’m presuming not, and we have to handle this in our application code through subscribing to the events.
The whole point of the subscription model for catch up subscription is
that its a client driven subscription the server has no subscription
state (only the client) if you want such things use persistent
subscriptions (competing consumers)
Seems like CatchUpError, SubscribingError, and ProcessingQueueOverflow (and maybe ServerError?) could back-off resubscribe. UserInitiated is just shutdown. And the rest are crashable to a view update service.
We’ll improve the docs on this. Originally the idea was that on a catch-up subscription everything except a user dropping the subscription by calling subscription.Stop() would automatically continue on reconnect, and that for volatile subscriptions nothing would continue on reconnect. That might not be the current state of the world though (I’d need to look through the code to investigate). This could probably use better test coverage as well.
I decided that for me, the only drop reason to safely retry on (after a backoff period) is ProcessingQueueOverflow.
Most of the other errors I will get on restart (or at least a view restart), and I had rather know that immediately than have it retry for a period. So I let them cause a crash. For example, if you start a subscription with a connection that has already been closed, it gets immediately dropped with CatchUpError, because it can’t read the old events.
ConnectionDropped is retried automatically, so I ignore this one
UserInitiated, I also ignore, because I call subscription.Stop synchronously, so I don’t need/want to be notified through a back channel that it stopped.
On non-catchup subscriptions, I haven’t been able to find any resubscribe logic. On a catchup subscription, as far as I can tell only ConnectionClosed will resubscribe, and that’s only if the subscription has already started live processing. A failure before live processing started will have resulted in a CatchUpError or UserInitiated, and no resubscribe will be attempted.
I only see one drop reason that is absolutely resubscribe-able: ProcessingQueueOverflow, after draining the queue.
CatchUpError might be resub-able. It happens when there is any kind of error while reading catchup events or trying to create a live subscription after catching up. The likely scenario is that the server can’t be reached when you start the catch-up subscription. It could also happen after an automatic resubscribe, but the server/network would have to be pretty unstable to successfully reconnect but fail on catchup. Either way, my choice is to not retry… it’s likely to happen during a maintenance window when starting a view update service, and I want ops to see it to fail/stop immediately, not minutes later after retries are exhausted.
ServerError, it is doubtful that a resub will help. It happens when the server responds with bad TCP command, or with an unknown response. Considering TCP is a “reliable” protocol, there’s likely a bug or the payload has exceeded some platform limit, and resending isn’t going to help.
subscription stopped manually, i.e. before exiting the app/service (UserInitiated)
security configuration problem (NotAuthenticated, AccessDenied)
event handling code is broken (EventHandlerException)
Oh, and SubscribingError doesn’t appear to be hooked up to anything. It’s called from an internal method, but I can’t find any place where that method is called.