Hi,
I have an EventStore cluster running fine, most of the time. However, occasionally, we’re seeing hickups where a large number of TCP connections appear to be closed. The server simply states that they are closed, as does the clients, it would appear the connections are simply cut off on both ends without a reason.
Is there anything that could cause EventStore to mass evict connected TCP clients?
In the logs for the EventStore node, I see this (followed by a lot of lines describing other connections being closed). Everything then recovers, with TCP reconnects following.
I [00001,80,14:47:38.387] SLOW QUEUE MSG [Worker #5]: TcpSend - 761ms. Q: 0/6.
I [00001,51,14:47:38.918] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 1336ms. Q: 0/0.
I [00001,58,14:47:39.295] ES TcpConnection closed [14:47:39.295: N10.4.5.187:45604, L10.4.4.13:1113, {6be7fdbc-adb7-44e8-b01d-4c32893af28d}]:Send calls: 5331, callbacks: 5331
I [00001,58,14:47:39.295] ES TcpConnection closed [14:47:39.296: N10.4.5.187:45604, L10.4.4.13:1113, {6be7fdbc-adb7-44e8-b01d-4c32893af28d}]:Receive calls: 5330, callbacks: 5330
I [00001,97,14:47:39.295] Lost connection from 10.4.2.247:57822
I [00001,58,14:47:39.295] ES TcpConnection closed [14:47:39.296: N10.4.5.187:45604, L10.4.4.13:1113, {6be7fdbc-adb7-44e8-b01d-4c32893af28d}]:Close reason: [Success] Socket closed
I [00001,58,14:47:39.296] Connection ‘external-normal’ [10.4.5.187:45604, {6be7fdbc-adb7-44e8-b01d-4c32893af28d}] closed: Success.
I [00001,77,14:47:39.296] Lost connection from 10.4.5.187:45604
I [00001,50,14:47:39.306] ES TcpConnection closed [14:47:39.306: N10.4.14.58:45086, L10.4.4.13:1113, {e8f946f2-3029-4d2b-a288-c26afeef86e6}]:Received bytes: 1107177, Sent bytes: 21937708
I [00001,50,14:47:39.306] ES TcpConnection closed [14:47:39.307: N10.4.14.58:45086, L10.4.4.13:1113, {e8f946f2-3029-4d2b-a288-c26afeef86e6}]:Send calls: 47019, callbacks: 47019
I [00001,50,14:47:39.306] ES TcpConnection closed [14:47:39.307: N10.4.14.58:45086, L10.4.4.13:1113, {e8f946f2-3029-4d2b-a288-c26afeef86e6}]:Receive calls: 46744, callbacks: 46744
I [00001,50,14:47:39.306] ES TcpConnection closed [14:47:39.307: N10.4.14.58:45086, L10.4.4.13:1113, {e8f946f2-3029-4d2b-a288-c26afeef86e6}]:Close reason: [Success] Socket closed
Is it possible that disk IO may be starving network traffic or similar? A slow TCP connection starving the others and causing drops?
Cheers,
Kristian