Seems like I get another heartbeat timeout a couple of seconds after I reconnect:
[PID:02124:028 2014.07.16 12:14:44.801 INFO TcpService ] External TCP connection accepted: [Normal, 127.0.0.1:61254, L127.0.0.1:1113, {958ad3a4-3f62-4d33-9ba3-91252ca59453}].
[PID:02124:028 2014.07.16 12:14:50.557 INFO TcpService ] External TCP connection accepted: [Normal, 127.0.0.1:61258, L127.0.0.1:1113, {5c6bbbdc-4b97-4779-8416-f6d11dedd468}].
[PID:02124:028 2014.07.16 12:14:50.573 INFO TcpService ] External TCP connection accepted: [Normal, 127.0.0.1:61259, L127.0.0.1:1113, {062947b4-753f-4093-845b-8f255cb2c283}].
[PID:02124:015 2014.07.16 12:14:52.835 TRACE TcpConnectionManager] Closing connection ‘external-normal’ [127.0.0.1:61254, L127.0.0.1:1113, {958ad3a4-3f62-4d33-9ba3-91252ca59453}] cleanly. Reason: HEARTBEAT TIMEOUT at msgNum 0
[PID:02124:015 2014.07.16 12:14:52.835 INFO TcpConnection ] [12:14:52.835: N127.0.0.1:61254, L127.0.0.1:1113, {958ad3a4-3f62-4d33-9ba3-91252ca59453}]:
Received bytes: 0, Sent bytes: 22
Send calls: 1, callbacks: 1
Receive calls: 1, callbacks: 0
Close reason: [Success] HEARTBEAT TIMEOUT at msgNum 0
[PID:02124:015 2014.07.16 12:14:52.835 INFO TcpConnectionManager] Connection ‘external-normal’ [127.0.0.1:61254, {958ad3a4-3f62-4d33-9ba3-91252ca59453}] closed: Success.
Settings are:
.SetHeartbeatInterval(TimeSpan.FromMinutes(1))
.SetHeartbeatTimeout(TimeSpan.FromMinutes(10))
Like I said, the code has become a bit of mess trying to catch errors,recover and retrying, so it might be down to trying to dispose and reconnect the same connection on separate threads. The original implementation only used the Disconnect callback, but couldn’t recover either though.
I can’t reproduce this locally unfortenately, and it seems to happen max 1-2 times per day on customer site. Right now the only solution is to restart the service.
Looks like it’s triggered by slow writes to disk:
ID:02124:014 2014.07.16 12:09:12.755 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 889ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:09:13.535 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionData - 780ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:09:17.841 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 2043ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:09:18.886 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 1045ms. Q: 0/2.
[PID:02124:014 2014.07.16 12:09:25.642 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionPrepare - 3057ms. Q: 0/1.
[PID:02124:044 2014.07.16 12:09:26.437 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [StorageReaderQueue #4]: ReadStreamEventsForward - 218ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:10:11.866 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 1996ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:10:17.685 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 3042ms. Q: 0/1.
[PID:02124:014 2014.07.16 12:10:21.617 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 3416ms. Q: 0/2.
[PID:02124:014 2014.07.16 12:10:28.965 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 858ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:10:31.648 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 639ms. Q: 0/0.
[PID:02124:014 2014.07.16 12:10:36.609 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 3556ms. Q: 0/5.
[PID:02124:014 2014.07.16 12:10:40.619 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WriteTransactionStart - 2979ms. Q: 0/2.
[PID:02124:015 2014.07.16 12:11:11.851 TRACE TcpConnectionManager] Closing connection ‘external-normal’ [127.0.0.1:56730, L127.0.0.1:1113, {e02f0b9f-65af-46b2-a839-d9da3f07cf12}] cleanly. Reason: HEARTBEAT TIMEOUT at msgNum 11282
There’s a MS SQL database on the same disk I’m afraid, so even if there aren’t many transaction per second it’s very hard to predict performance…
Skickat från Windows E-post