Hi,
for weeks now, I am trying to diagnose a rare problem that occurs every few days.
For no apparent reason, most or all of the EventStoreConnection’s of an app close the socket (cleanly) to the EventStore, then immediately and successfully reconnects only to close the socket again.
Log:
Thu Sep 25 21:43:36.2828 +00:00 2014 MyAppName : Info, ClientAPI TcpConnection closed [21:43:36.282: N10.1.3.4:1113, L10.1.1.4:49251, {63690066-057c-4aa4-ad18-ea7be4706dbe}]: Received bytes: 4744759, Sent bytes: 460351 Send calls: 20921, callbacks: 20921 Receive calls: 21795, callbacks: 21795 Close reason: [Success] Socket closed
Thu Sep 25 21:43:36.2828 +00:00 2014 MyAppName : Trace, TcpPackageConnection: connection [10.1.3.4:1113, L10.1.1.4:49251, {63690066-057c-4aa4-ad18-ea7be4706dbe}] was closed cleanly.
Thu Sep 25 21:43:36.2985 +00:00 2014 MyAppName : Info, EventSubscriber for Stream “MyStream”: EventStoreConnection disconnected
Thu Sep 25 21:43:36.2985 +00:00 2014 MyAppName : Warn, EventSubscriber “MyStream”: Subscription stopped, Reason: ConnectionClosed, Exception: EventStore.ClientAPI.Exceptions.ConnectionClosedException: Connection was closed.
Thu Sep 25 21:43:36.5172 +00:00 2014 MyAppName : Info, EventSubscriber for Stream "MyStream": EventStoreConnection reconnecting
Thu Sep 25 21:43:36.5172 +00:00 2014 MyAppName : Trace, TcpPackageConnection: connected to [10.1.3.4:1113, L10.1.1.4:49823, {9929ffd3-ee4e-4434-b48c-0f1d62718f7b}].
Thu Sep 25 21:43:36.5172 +00:00 2014 MyAppName : Info, EventSubscriber for Stream “MyStream”: EventStoreConnection connected
Thu Sep 25 21:43:39.5251 +00:00 2014 MyAppName : Info, ClientAPI TcpConnection closed [21:43:39.525: N10.1.3.4:1113, L10.1.1.4:49824, {ad5ce277-b856-4b76-847c-67293114c064}]: Received bytes: 22, Sent bytes: 0 Send calls: 0, callbacks: 0 Receive calls: 2, callbacks: 2 Close reason: [Success] Socket closed
Thu Sep 25 21:43:39.5251 +00:00 2014 MyAppName : Trace, TcpPackageConnection: connection [10.1.3.4:1113, L10.1.1.4:49824, {ad5ce277-b856-4b76-847c-67293114c064}] was closed cleanly.
Thu Sep 25 21:43:40.2907 +00:00 2014 MyAppName : Info, EventSubscriber for Stream “MyStream”: EventStoreConnection disconnected
Thu Sep 25 21:43:41.2935 +00:00 2014 MyAppName : Info, EventProcessor “MyStream”: EventStoreConnection reconnecting
Thu Sep 25 21:43:41.2935 +00:00 2014 MyAppName : Trace, TcpPackageConnection: connected to [10.1.3.4:1113, L10.1.1.4:49831, {5e9c3ca2-8da2-40be-8479-19ec1b0669b6}].
Thu Sep 25 21:43:41.2935 +00:00 2014 MyAppName : Info, EventSubscriber for Stream “MyStream”: EventStoreConnection connected
Thu Sep 25 21:43:43.7334 +00:00 2014 MyAppName : Info, EventStoreConnection ‘ES-4a751cfb-8a13-4b72-a293-fc4ff884fc5b’: closing TCP connection [10.1.3.4:1113, L10.1.1.4:49831, {5e9c3ca2-8da2-40be-8479-19ec1b0669b6}] due to HEARTBEAT TIMEOUT at pkgNum 21585.
``
This continues on endlessly, it never starts with Heartbeat Timeouts but always has them after at least 1 or 2 reconnect cycles.
The network connectivity of the (Azure-based) VM on which the Event Store resides is limited to 5 MBit/s, which usually is more than enough because the network usage never goes beyond a fracture of that, but as soon as this cycling begins, Network Out of the EventStore VM is always at the maximum of 5 MBit/s. I’d guess that is because of the overhead of endlessly reconnecting sockets which would also explain the Heartbeat Timeouts.
I can’t find any reason for this, no log entries that explain what could lead to this and no logged exceptions. My best guess is that there is a small bug in the reconnection logic and the first disconnect is just a normal network hickup (as is expected inside Azure) but somehow the reconnect misbehaves after that. Obviously thats just a wild theory.
Restarting the VM with the EventStore-accessing application solves the problem, the endless reconnections stop, network usage of the EventStore-VM goes down to a normal (low) level and after the application is restarted everything works fine again.
EventStore ConnectionSettings are set with KeepReconnecting and KeepRetrying aswell as a custom Logger.
Regards,
Nicolas