We have had this system running for about a year and half, it is a 3 node cluster. We are receiving this in our client logs:
{ “date”: “2016-10-14 11:20:01.9779”, “level”: “Error”, “logger”: “EventStore.ClientAPI.IEventStoreConnection”, “machinename”: “RD0003FFA47995”, “processname”: “WebJobTransactionSubscription”, “processid”: “34516”, “threadid”: “3”, “message”: “TcpPackageConnection: [104.40.20.177:1113, L100.106.76.88:50043, {1b588cdf-a98c-4de0-b267-5c31ee51bd1a}]. Invalid TCP frame received.”, “exception”: “Package size is out of bounds: 69991366 (max: 67108864). This is likely an exceptionally large message (reading too many things) or there is a problem with the framing if working on a new client.” }
{ “date”: “2016-10-14 11:20:01.9779”, “level”: “Info”, “logger”: “EventStore.ClientAPI.IEventStoreConnection”, “machinename”: “RD0003FFA47995”, “processname”: “WebJobTransactionSubscription”, “processid”: “34516”, “threadid”: “3”, “message”: “ClientAPI TcpConnection closed [11:20:01.977: N104.40.20.177:1113, L100.106.76.88:50043, {1b588cdf-a98c-4de0-b267-5c31ee51bd1a}]:\nReceived bytes: 134858453, Sent bytes: 6319\nSend calls: 147, callbacks: 147\nReceive calls: 17625, callbacks: 17624\nClose reason: [Success] Invalid TCP frame received.\n” }
Our eventstore log:
[PID:01656:041 2016.10.14 11:20:01.704 TRACE InMemoryBus ] SLOW BUS MSG [Worker #1 Bus]: TcpSend - 598ms. Handler: TcpSendService.
[PID:01656:041 2016.10.14 11:20:01.704 TRACE QueuedHandlerThreadP] SLOW QUEUE MSG [Worker #1]: TcpSend - 598ms. Q: 0/3.
[PID:01656:033 2016.10.14 11:20:01.720 INFO TcpConnection ] ES TcpConnection closed [11:20:01.720: N40.83.183.69:61132, L10.208.34.4:1113, {1df3f5aa-60a0-4b00-9e33-4766781d3066}]:Received bytes: 6319, Sent bytes: 204669823
[PID:01656:033 2016.10.14 11:20:01.720 INFO TcpConnection ] ES TcpConnection closed [11:20:01.720: N40.83.183.69:61132, L10.208.34.4:1113, {1df3f5aa-60a0-4b00-9e33-4766781d3066}]:Send calls: 147, callbacks: 147
[PID:01656:033 2016.10.14 11:20:01.720 INFO TcpConnection ] ES TcpConnection closed [11:20:01.720: N40.83.183.69:61132, L10.208.34.4:1113, {1df3f5aa-60a0-4b00-9e33-4766781d3066}]:Receive calls: 148, callbacks: 148
[PID:01656:033 2016.10.14 11:20:01.720 INFO TcpConnection ] ES TcpConnection closed [11:20:01.720: N40.83.183.69:61132, L10.208.34.4:1113, {1df3f5aa-60a0-4b00-9e33-4766781d3066}]:Close reason: [Success] Socket closed
[PID:01656:033 2016.10.14 11:20:01.720 INFO TcpConnectionManager] Connection ‘external-normal’ [40.83.183.69:61132, {1df3f5aa-60a0-4b00-9e33-4766781d3066}] closed: Success.
[PID:01656:023 2016.10.14 11:20:01.720 DEBUG PersistentSubscripti] Lost connection from 40.83.183.69:61132
[PID:01656:038 2016.10.14 11:20:01.954 INFO TcpService ] External TCP connection accepted: [Normal, 40.83.183.69:61133, L10.208.34.4:1113, {a76905e1-f518-4d7d-a46c-1a04d79211ba}].
I also did find one of these in our node error log:
[PID:01656:005 2016.10.14 12:21:08.423 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #3]: ReadStreamEventsForward - 30258ms. Q: 0/16.
[PID:01656:033 2016.10.14 12:21:21.895 ERROR QueuedHandlerThreadP] —!!! VERY SLOW QUEUE MSG [StorageReaderQueue #2]: ReadStreamEventsForward - 12436ms. Q: 0/10.
[PID:01656:006 2016.10.14 13:42:24.997 ERROR LengthPrefixMessageF] FRAMING ERROR! Data:
000000: 47 45 54 20 2F 20 48 54 54 50 2F 31 2E 31 0D 0A | GET / HTTP/1.1…
000016: 48 6F 73 74 3A 20 31 30 34 2E 34 30 2E 32 30 2E | Host: 104.40.20.
000032: 31 37 37 3A 31 31 31 33 0D 0A 43 6F 6E 6E 65 63 | 177:1113…Connec
000048: 74 69 6F 6E 3A 20 6B 65 65 70 2D 61 6C 69 76 65 | tion: keep-alive
000064: 0D 0A 55 70 67 72 61 64 65 2D 49 6E 73 65 63 75 | …Upgrade-Insecu
000080: 72 65 2D 52 65 71 75 65 73 74 73 3A 20 31 0D 0A | re-Requests: 1…
000096: 55 73 65 72 2D 41 67 65 6E 74 3A 20 4D 6F 7A 69 | User-Agent: Mozi
000112: 6C 6C 61 2F 35 2E 30 20 28 57 69 6E 64 6F 77 73 | lla/5.0 (Windows
000128: 20 4E 54 20 31 30 2E 30 3B 20 57 4F 57 36 34 29 | NT 10.0; WOW64)
000144: 20 41 70 70 6C 65 57 65 62 4B 69 74 2F 35 33 37 | AppleWebKit/537
000160: 2E 33 36 20 28 4B 48 54 4D 4C 2C 20 6C 69 6B 65 | .36 (KHTML, like
000176: 20 47 65 63 6B 6F 29 20 43 68 72 6F 6D 65 2F 35 | Gecko) Chrome/5
000192: 33 2E 30 2E 32 37 38 35 2E 31 34 33 20 53 61 66 | 3.0.2785.143 Saf
000208: 61 72 69 2F 35 33 37 2E 33 36 0D 0A 41 63 63 65 | ari/537.36…Acce
000224: 70 74 3A 20 74 65 78 74 2F 68 74 6D 6C 2C 61 70 | pt: text/html,ap
000240: 70 6C 69 63 61 74 69 6F 6E 2F 78 68 74 6D 6C 2B | plication/xhtml+
000256: 78 6D 6C 2C 61 70 70 6C 69 63 61 74 69 6F 6E 2F | xml,application/
000272: 78 6D 6C 3B 71 3D 30 2E 39 2C 69 6D 61 67 65 2F | xml;q=0.9,image/
000288: 77 65 62 70 2C 2A 2F 2A 3B 71 3D 30 2E 38 0D 0A | webp,/;q=0.8…
000304: 41 63 63 65 70 74 2D 45 6E 63 6F 64 69 6E 67 3A | Accept-Encoding:
000320: 20 67 7A 69 70 2C 20 64 65 66 6C 61 74 65 2C 20 | gzip, deflate,
000336: 73 64 63 68 0D 0A 41 63 63 65 70 74 2D 4C 61 6E | sdch…Accept-Lan
000352: 67 75 61 67 65 3A 20 65 6E 2D 55 53 2C 65 6E 3B | guage: en-US,en;
000368: 71 3D 30 2E 38 0D 0A 0D 0A | q=0.8…
[PID:01656:006 2016.10.14 13:42:25.075 ERROR TcpConnectionManager] Closing connection ‘external-normal’ [50.93.103.27:61916, L10.208.34.4:1113, {71b61115-60cd-4963-8528-c701a65ebe34}] due to error. Reason: Invalid TCP frame received. Error: Package size is out of bounds: 542393671 (max: 67108864)…
The Node that is running is reporting this for memory usage:
Image PID Hard Faults/sec Commit (KB) Working Set (KB) Shareable (KB) Private (KB)
EventStore.ClusterNode.exe 1656 0 986,996 780,208 10,740 769,468
This seems to hold up our system and no further processing can occur. We have tried rebuilding the indexes on the nodes, rebooting the machines, and restarting our client webjob. Looking for any advice as this is a production system that has been down a few days now.
Thanks!