TCP Connection guidelines?

I’ll have to rewrite the connectionhandling for an app, since it’s become a bit of a mess (heartbeat timeouts wich it cant seem to recover from. Might be down to multithreaded access to the same connection, and hence reconnection chaos when an error occurs), so I figured I’d check if there are any guidelines?

Right now there are two connections, one for read/write and one for subscriptions. Both of them are kept open. Should the read/write one maybe be opened/closed per operation instead? Or should I strive to have only one connection in the entire app? Use a connection pool?

It’s my vacation, and trying solve some urgent problems for a company I don’t even work for so no time for experimentation I’m afraid. Very grateful for advice!

/Peter

You should be able to keep one connection open without issue.

Open/close per operation is much less efficient

Greg

Also you said you can’t recover? Can you go into more detail about your issue?

Seems like I get another heartbeat timeout a couple of seconds after I reconnect:

[PID:02124:028 2014.07.16 12:14:44.801 INFO TcpService ] External TCP connection accepted: [Normal, 127.0.0.1:61254, L127.0.0.1:1113, {958ad3a4-3f62-4d33-9ba3-91252ca59453}].
[PID:02124:028 2014.07.16 12:14:50.557 INFO TcpService ] External TCP connection accepted: [Normal, 127.0.0.1:61258, L127.0.0.1:1113, {5c6bbbdc-4b97-4779-8416-f6d11dedd468}].
[PID:02124:028 2014.07.16 12:14:50.573 INFO TcpService ] External TCP connection accepted: [Normal, 127.0.0.1:61259, L127.0.0.1:1113, {062947b4-753f-4093-845b-8f255cb2c283}].
[PID:02124:015 2014.07.16 12:14:52.835 TRACE TcpConnectionManager] Closing connection ‘external-normal’ [127.0.0.1:61254, L127.0.0.1:1113, {958ad3a4-3f62-4d33-9ba3-91252ca59453}] cleanly. Reason: HEARTBEAT TIMEOUT at msgNum 0
[PID:02124:015 2014.07.16 12:14:52.835 INFO TcpConnection ] [12:14:52.835: N127.0.0.1:61254, L127.0.0.1:1113, {958ad3a4-3f62-4d33-9ba3-91252ca59453}]:
Received bytes: 0, Sent bytes: 22
Send calls: 1, callbacks: 1
Receive calls: 1, callbacks: 0
Close reason: [Success] HEARTBEAT TIMEOUT at msgNum 0

[PID:02124:015 2014.07.16 12:14:52.835 INFO TcpConnectionManager] Connection ‘external-normal’ [127.0.0.1:61254, {958ad3a4-3f62-4d33-9ba3-91252ca59453}] closed: Success.

Settings are:

.SetHeartbeatInterval(TimeSpan.FromMinutes(1))

.SetHeartbeatTimeout(TimeSpan.FromMinutes(10))

Like I said, the code has become a bit of mess trying to catch errors,recover and retrying, so it might be down to trying to dispose and reconnect the same connection on separate threads. The original implementation only used the Disconnect callback, but couldn’t recover either though.

I can’t reproduce this locally unfortenately, and it seems to happen max 1-2 times per day on customer site. Right now the only solution is to restart the service.

Looks like it’s triggered by slow writes to disk:

ID:02124:014 2014.07.16 12:09:12.755 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 889ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:09:13.535 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionData - 780ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:09:17.841 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 2043ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:09:18.886 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 1045ms. Q: 0/2.
[PID:02124:014 2014.07.16 12:09:25.642 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionPrepare - 3057ms. Q: 0/1.
[PID:02124:044 2014.07.16 12:09:26.437 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 218ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:10:11.866 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 1996ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:10:17.685 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 3042ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:10:21.617 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 3416ms. Q: 0/2.
[PID:02124:014 2014.07.16 12:10:28.965 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 858ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:10:31.648 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 639ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:10:36.609 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 3556ms. Q: 0/5.
[PID:02124:014 2014.07.16 12:10:40.619 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 2979ms. Q: 0/2.
[PID:02124:015 2014.07.16 12:11:11.851 TRACE TcpConnectionManager] Closing connection ‘external-normal’ [127.0.0.1:56730, L127.0.0.1:1113, {e02f0b9f-65af-46b2-a839-d9da3f07cf12}] cleanly. Reason: HEARTBEAT TIMEOUT at msgNum 11282

There’s a MS SQL database on the same disk I’m afraid, so even if there aren’t many transaction per second it’s very hard to predict performance…

Skickat från Windows E-post

.SetHeartbeatInterval(TimeSpan.FromMinutes(1))

.SetHeartbeatTimeout(TimeSpan.FromMinutes(10))

Those timings are pretty weird - I’d expect heartbeat interval and timeout to be in the order of 1 second, not 10 minutes!

Yeah, I’ve increased them more and more trying to avoid this happening at all… They started out a lot lower .

/Peter

What are the rest of the connection settings and your reconnection logic etc?

See below. Just realized that heartbeat timeout might trigger Closed instead of disconnected though, so maybe that was the problem all along?


protected GetEventStoreStreamBase(IPEndPoint ipEndPoint, ISerializer serializer)
        {
            _logger = new EventStoreLog4netLogger();

            _ipEndPoint = ipEndPoint;
            _serializer = serializer;
            _settings = ConnectionSettings.Create()
                                          .DisableVerboseLogging()
                                          .FailOnNoServerResponse()
                                          .SetHeartbeatInterval(TimeSpan.FromMinutes(1))
                                          .SetHeartbeatTimeout(TimeSpan.FromMinutes(10))
                                          .KeepRetrying()
                                          .LimitRetriesForOperationTo(10)
                                          .SetOperationTimeoutTo(TimeSpan.FromMinutes(2))
                                          .UseCustomLogger(_logger)
                                          .OnClosed((c, _) => _logger.Info("Eventstore connection closed"))
                                          .OnConnected((c, _) =>
                                              {
                                                  _isConnected = true;
                                                  _logger.Debug("Connected");
                                              })
                                          .OnDisconnected((c, _) =>
                                              {
                                                  _isConnected = false;
                                                  _logger.Info("Disconnected, attempting to reconnect");
                                                  using (_connection) { }
                                                  Connect();
                                              })
                                          .OnErrorOccurred((c, e) => _logger.Error(e, "Error in EventStore connection"))
                                          .OnReconnecting(c => _logger.Debug("Reconnecting..."))
                                          .SetDefaultUserCredentials(new UserCredentials("admin", "changeit"));

            Connect();
        }

 private void Connect()
        {
            lock (_settings)
            {
                if (_isConnected)
                {
                    _logger.Debug("Already connected");
                    return;
                }

                _logger.Debug("Connecting...");

                _connection = EventStoreConnection.Create(_settings, _ipEndPoint);
                _connection.Connect();

                _logger.Debug("Connected!");
            }
        }


Skickat från Windows E-post