Hi community!
I would like to get some help from the experts here. So we have a 5 node cluster deployed in GCP (no K8S, just plain old VM instances)
The cluster nodes each run on 16 vCPUs and 16GB E2 machines.
Anyway, we have a client application (written in golang, using the official ESDB client) which is importing a moderately large amount of events into ESDB. This client is sending the events with about 500RPS.
What we noticed is that after a few hours, the leader node starts using a fairly large and unexpected size of the allocated memory, which leads to a few symptoms:
- The client starts receiving
[ErrorCodeDeadlineExceeded] gRPC deadline exceeded error: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
errors - The server starts logging a few
SLOW BUS MSG ["bus"]: "CommittedEventDistributed" - 30537ms. Handler: "EventReaderCoreService".
errors, then later even a few
---!!! VERY SLOW BUS MSG ["bus"]: "CommittedEventDistributed" - 30537ms. Handler: "EventReaderCoreService".
errors and then the nodes start the re-election process and the leader reboots itself
Also, we use ESDB to run a user-defined projection. We have millions of streams, which all send their events to this projection, which after completing it’s job emitting another event to result streams (so the input and output streams have a one-to-one correspondence). We also have 3 golang persistent subscriptions, which are connecting to the $all
stream via a persistent group, and listening to the result streams by filtering the $all
stream.
We noticed that when the re-election completes, and the leader re-boots itself, the projection also gets to reset.
Has anyone seen such behavior before? What can cause this? Not enough CPU power? Not enough disk IO throughput? Memory leak in ESDB?
In any case, here is a sample of the server logs (some messages are omitted):
199eunvf2ug8uo,"Nov 21 06:01:33 node3 bash[148776]: [148776,43,06:01:33.512,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 77ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vo,"Nov 21 06:01:33 node3 bash[148776]: [148776,19,06:01:33.982,INF] CLUSTER HAS CHANGED ""Elections Done""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vp,Nov 21 06:01:33 node3 bash[148776]: Old:,node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vq,"Nov 21 06:01:33 node3 bash[148776]: [""Priority: 0 VND {4a89ba97-fd00-440d-9928-fc66477ea5a6} <LIVE> [Follower, Unspecified/node4.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.16:1113, n/a, Unspecified/10.0.2.16:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104521893314/104521893839/104521893839/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:33.664"", ""Priority: 0 VND {7de5dff4-4b41-43d2-bd94-a85fe708535d} <LIVE> [Leader, Unspecified/node1.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.15:1113, n/a, Unspecified/10.0.2.15:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104522178323/104522198268/104522184998/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:33.958"", ""Priority: 0 VND {74264931-050d-4154-a70e-a11fe3b3c2e1} <LIVE> [Follower, Unspecified/node5.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.14:1113, n/a, Unspecified/10.0.2.14:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104518260444/104518264047/104518264047/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:32.405"", ""Priority: 0 VND {943a9344-f475-4261-93ab-657fa1b80236} <LIVE> [Leader, Unspecified/node3.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.13:1113, n/a, Unspecified/10.0.2.13:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104505007704/104505065009/104505065009/E152@98200119748:{bb7ec13f-2ff8-41eb-a6ce-85a8e80da6e0} | 2024-11-21 06:01:33.971"", ""Priority: 0 VND {739b64ea-15eb-4828-980a-e9f77a76525f} <LIVE> [Follower, Unspecified/node2.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.12:1113, n/a, Unspecified/10.0.2.12:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104517010901/104517011947/104517011947/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:30.420""]",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vr,Nov 21 06:01:33 node3 bash[148776]: New:,node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vs,"Nov 21 06:01:33 node3 bash[148776]: [""Priority: 0 VND {4a89ba97-fd00-440d-9928-fc66477ea5a6} <LIVE> [Unknown, Unspecified/node4.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.16:1113, n/a, Unspecified/10.0.2.16:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104521893314/104521893839/104521893839/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:33.982"", ""Priority: 0 VND {7de5dff4-4b41-43d2-bd94-a85fe708535d} <LIVE> [Leader, Unspecified/node1.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.15:1113, n/a, Unspecified/10.0.2.15:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104522178323/104522198268/104522184998/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:33.982"", ""Priority: 0 VND {74264931-050d-4154-a70e-a11fe3b3c2e1} <LIVE> [Unknown, Unspecified/node5.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.14:1113, n/a, Unspecified/10.0.2.14:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104518260444/104518264047/104518264047/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:33.982"", ""Priority: 0 VND {943a9344-f475-4261-93ab-657fa1b80236} <LIVE> [Unknown, Unspecified/node3.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.13:1113, n/a, Unspecified/10.0.2.13:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104505007704/104505065009/104505065009/E152@98200119748:{bb7ec13f-2ff8-41eb-a6ce-85a8e80da6e0} | 2024-11-21 06:01:33.982"", ""Priority: 0 VND {739b64ea-15eb-4828-980a-e9f77a76525f} <LIVE> [Unknown, Unspecified/node2.esdb.prod.******.**********:1112, n/a, Unspecified/10.0.2.12:1113, n/a, Unspecified/10.0.2.12:2113, (ADVERTISED: HTTP::0, TCP::0), Version: 23.10.1.0] 104517010901/104517011947/104517011947/E154@104505008225:{28fb022a-737c-4546-b7dc-326ffedfaac6} | 2024-11-21 06:01:33.982""]",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vt,"Nov 21 06:01:33 node3 bash[148776]: [148776,19,06:01:33.982,INF] ========== [""10.0.2.13:2113""] PRE-REPLICA STATE, WAITING FOR CHASER TO CATCH UP... LEADER IS [""Unspecified/10.0.2.15:2113"",{7de5dff4-4b41-43d2-bd94-a85fe708535d}]",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vu,"Nov 21 06:01:33 node3 bash[148776]: [148776,37,06:01:33.992,INF] Failed to write events to stream ""$et-replacement"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vv,"Nov 21 06:01:33 node3 bash[148776]: [148776,39,06:01:33.992,INF] Failed to write events to stream ""**************e2eResult-01932dd6-abaf-795e-9516-db0cecb0e13a"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vw,"Nov 21 06:01:33 node3 bash[148776]: [148776,26,06:01:33.992,INF] Failed to write events to stream ""$ce-**************e2eInput"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vx,"Nov 21 06:01:33 node3 bash[148776]: [148776, 8,06:01:33.992,INF] Failed to write projection checkpoint to stream ""$projections-$streams-checkpoint"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vy,"Nov 21 06:01:33 node3 bash[148776]: [148776,29,06:01:33.992,INF] Failed to write projection checkpoint to stream ""$projections-$stream_by_category-checkpoint"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8vz,"Nov 21 06:01:33 node3 bash[148776]: [148776,83,06:01:33.992,DBG] Error writing checkpoint for ""$persistentsubscription-$all::******-subscriber-group-checkpoint"": CommitTimeout",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8w0,"Nov 21 06:01:33 node3 bash[148776]: [148776,83,06:01:33.992,DBG] Persistent subscriptions received state change to PreReplica. Stopping listening",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8w7,"Nov 21 06:01:33 node3 bash[148776]: [148776,39,06:01:33.993,INF] Failed to write events to stream ""**************e2eResult-01932dd6-ee15-7ec2-8ec9-8c18b12ac45d"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8w8,"Nov 21 06:01:33 node3 bash[148776]: [148776,26,06:01:33.993,INF] Failed to write events to stream ""$ce-**************e2eResult"". Error: ""CommitTimeout""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8yy,"Nov 21 06:01:34 node3 bash[148776]: [148776,83,06:01:33.995,DBG] Persistent Subscriptions have been stopped.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8yz,"Nov 21 06:01:34 node3 bash[148776]: [148776,28,06:01:34.000,INF] PROJECTIONS SUBSYSTEM: Stopping components for Instance: 9c1badae-8117-4149-a2e3-d5c189111b28",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8z5,"Nov 21 06:01:34 node3 bash[148776]: [148776,19,06:01:34.001,INF] ELECTIONS: (V=81) ACCEPT FROM [""Unspecified/10.0.2.14:2113"",{74264931-050d-4154-a70e-a11fe3b3c2e1}] M=[""Unspecified/10.0.2.15:2113"",{7de5dff4-4b41-43d2-bd94-a85fe708535d}]).",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8z6,"Nov 21 06:01:34 node3 bash[148776]: [148776,19,06:01:34.001,INF] ""InaugurationManager"" in state (PreReplica, Idle): IGNORING ""ChaserCaughtUp"". Not waiting for chaser.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8z7,"Nov 21 06:01:34 node3 bash[148776]: [148776,19,06:01:34.001,INF] Subscribing at LogPosition: 104505065009 (0x1854FCBE31) to LEADER [""Unspecified/node1.esdb.prod.******.**********:1112"", {7de5dff4-4b41-43d2-bd94-a85fe708535d}] as replica with SubscriptionId: {f7ce7a11-ccad-4534-b741-31e46ccaa63f}, ConnectionId: {520e5f07-1049-437e-b877-c0d261639a3d}, LocalEndPoint: [""10.0.2.13:47464""], Epochs:",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8z8,"Nov 21 06:01:34 node3 bash[148776]: ""E152@98200119748:{bb7ec13f-2ff8-41eb-a6ce-85a8e80da6e0}\nE151@98199052087:{6376105e-a1a7-49f5-a9ba-551ed770d68d}\nE150@98199051413:{18894467-59f8-4e94-967b-919d578cf0ff}\nE149@98199050739:{a05afdcd-c41c-4b0d-af53-08bb21e51b88}\nE148@98199050065:{17bc31fa-9ed0-48e7-a4bb-d2af20e4296c}\nE147@98199049391:{4877e2b9-6cb7-49ea-90da-bb6f0f41fa83}\nE146@98192737876:{8682b5db-be04-43b9-8cc0-78de7129cd01}\nE144@90638199426:{31e214f7-ff68-4da4-9bf8-575f2b9e2b13}\nE143@90637865606:{12e5a554-b5ee-463b-aa21-01413596ddfd}\nE142@90637515215:{70cd4020-1d3f-4f01-9b7a-2f58513ae4c9}\nE141@90624286505:{f63e9a78-04cd-42df-8656-652aa5c98ba3}\nE136@84686393395:{1b848f60-ddfe-4e37-aa07-75b2ecd2f0a0}\nE135@84686220958:{460ea9c2-c57f-4c45-ae68-fa75dc1be371}\nE134@84686206920:{4b52a50c-dbea-4445-a778-81426e29c0af}\nE133@76797811139:{73bc6f65-7e68-48de-b035-9fb3fa5fd2f0}\nE132@76797810274:{940dca7f-8cfc-4322-85d4-45acdd987131}\nE129@75274979757:{da2fecb6-9180-4ad0-881f-de73ef0d2308}\nE128@75268727523:{83915836-f4d0-4c72-812c-69b7568cf464}\nE127@67575716195:{113636be-6d1a-4ae4-8cc6-8f19f00e0d4f}\nE126@67573703794:{abbf2e19-f45e-47dc-ad0a-5423cc557aee}""...",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
199eunvf2ug8z9,Nov 21 06:01:34 node3 bash[148776]: .,node3,projects/******-****/logs/syslog,,2024-11-21T06:01:34.069042838Z
1dxz9p5f7ig8nr,"Nov 21 06:01:01 node3 bash[148776]: [148776,37,06:01:01.325,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 50ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:02.055965375Z
1dxz9p5f7ig8ns,"Nov 21 06:01:01 node3 bash[148776]: [148776,26,06:01:01.377,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 48ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:02.055965375Z
1dxz9p5f7ig8nt,"Nov 21 06:01:01 node3 bash[148776]: [148776,43,06:01:01.379,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 50ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:02.055965375Z
1dxz9p5f7ig8nu,"Nov 21 06:01:01 node3 bash[148776]: [148776,44,06:01:01.516,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 53ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:02.055965375Z
1dxz9p5f7ig8nv,"Nov 21 06:01:01 node3 bash[148776]: [148776,47,06:01:01.517,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 51ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:02.055965375Z
1dxz9p5f7ig8nw,"Nov 21 06:01:01 node3 bash[148776]: [148776,42,06:01:01.524,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 50ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:02.055965375Z
1dz8r0af7kop1b,"Nov 21 06:01:32 node3 bash[148776]: [148776,46,06:01:32.329,DBG] SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 30536ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1c,"Nov 21 06:01:32 node3 bash[148776]: [148776,28,06:01:32.329,DBG] SLOW BUS MSG [""manager input bus""]: ""RegularTimeout"" - 30536ms. Handler: ""ProjectionCoreCoordinator"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1d,"Nov 21 06:01:32 node3 bash[148776]: [148776,46,06:01:32.329,ERR] ---!!! VERY SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 30536ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1e,"Nov 21 06:01:32 node3 bash[148776]: [148776,42,06:01:32.329,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 30539ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1f,"Nov 21 06:01:32 node3 bash[148776]: [148776,42,06:01:32.329,ERR] ---!!! VERY SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 30539ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1g,"Nov 21 06:01:32 node3 bash[148776]: [148776,26,06:01:32.329,DBG] SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 30537ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1h,"Nov 21 06:01:32 node3 bash[148776]: [148776,26,06:01:32.329,ERR] ---!!! VERY SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 30537ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1i,"Nov 21 06:01:32 node3 bash[148776]: [148776,28,06:01:32.329,ERR] ---!!! VERY SLOW BUS MSG [""manager input bus""]: ""RegularTimeout"" - 30536ms. Handler: ""ProjectionCoreCoordinator"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1j,"Nov 21 06:01:32 node3 bash[148776]: [148776,85,06:01:32.329,DBG] HTTP Request Received",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1k,Nov 21 06:01:32 node3 bash[148776]: 11/21/2024 06:01:32,node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1l,"Nov 21 06:01:32 node3 bash[148776]: From: ""10.0.2.15:39526""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1m,"Nov 21 06:01:32 node3 bash[148776]: ""GET"" http://10.0.2.13:2113/projection/**************Projectione2e/statistics",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1n,"Nov 21 06:01:32 node3 bash[148776]: {[""Host""]=[""10.0.2.13:2113""], [""User-Agent""]=[""Go-http-client/1.1""], [""Accept-Encoding""]=[""gzip""], [""traceparent""]=[""00-88bbb19cfd3482e40819a2bd869a1a40-36e91bb0f8c9cfd2-00""], [""X-Forwarded-Host""]=[""cluster.esdb.prod.******.**********:2113""]}",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1o,"Nov 21 06:01:32 node3 bash[148776]: """"",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1p,"Nov 21 06:01:32 node3 bash[148776]: [148776,48,06:01:32.331,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 30539ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1q,"Nov 21 06:01:32 node3 bash[148776]: [148776,48,06:01:32.331,ERR] ---!!! VERY SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 30539ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1r,Nov 21 06:01:32 node3 rsyslogd: action 'action-8-builtin:omfile' resumed (module 'builtin:omfile') [v8.2112.0 try https://www.rsyslog.com/e/2359 ],node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1s,"Nov 21 06:01:32 node3 bash[148776]: [148776,29,06:01:32.331,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 30539ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop1t,"Nov 21 06:01:32 node3 bash[148776]: [148776,29,06:01:32.331,ERR] ---!!! VERY SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 30539ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop26,"Nov 21 06:01:32 node3 bash[148776]: [148776,51,06:01:32.333,INF] Connection '""internal-normal""""""' [""10.0.2.16:41252"", {b82e349f-e6e2-4ff1-8a87-6983e0c00689}] closed: Success.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop27,"Nov 21 06:01:32 node3 bash[148776]: [148776,62,06:01:32.333,INF] Connection '""internal-normal""""""' [""10.0.2.14:35436"", {fb24af04-d1d4-4b50-b28a-dbbbc724ed04}] closed: Success.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop28,"Nov 21 06:01:32 node3 bash[148776]: [148776,30,06:01:32.333,INF] Connection '""internal-normal""""""' [""10.0.2.12:42312"", {5db8aebc-2fef-497d-86c7-0b09c5041e53}] closed: Success.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2b,"Nov 21 06:01:32 node3 bash[148776]: [148776,107,06:01:32.333,INF] ES ""TcpConnection"" closed [06:01:32.333: N""10.0.2.15:43270"", L""10.0.2.13:1112"", {8835dd13-f844-41c0-9d5d-67d539335ac0}]:Receive calls: 1145705, callbacks: 1145705",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2c,"Nov 21 06:01:32 node3 bash[148776]: [148776,107,06:01:32.333,INF] ES ""TcpConnection"" closed [06:01:32.333: N""10.0.2.15:43270"", L""10.0.2.13:1112"", {8835dd13-f844-41c0-9d5d-67d539335ac0}]:Close reason: [Success] ""Socket closed""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2d,"Nov 21 06:01:32 node3 rsyslogd: action 'action-8-builtin:omfile' suspended (module 'builtin:omfile'), retry 0. There should be messages before this one giving the reason for suspension. [v8.2112.0 try https://www.rsyslog.com/e/2007 ]",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2e,"Nov 21 06:01:32 node3 bash[148776]: [148776,107,06:01:32.333,INF] Connection '""internal-normal""""""' [""10.0.2.15:43270"", {8835dd13-f844-41c0-9d5d-67d539335ac0}] closed: Success.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2f,"Nov 21 06:01:32 node3 bash[148776]: [148776,19,06:01:32.333,INF] Looks like node [""Unspecified/node4.esdb.prod.******.**********:1112""] is DEAD (TCP connection lost). Issuing a gossip to confirm.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2g,"Nov 21 06:01:32 node3 bash[148776]: [148776,30,06:01:32.334,DBG] HTTP Request Received",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2h,Nov 21 06:01:32 node3 bash[148776]: 11/21/2024 06:01:32,node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2i,"Nov 21 06:01:32 node3 bash[148776]: From: ""10.0.1.8:54735""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2j,"Nov 21 06:01:32 node3 bash[148776]: ""GET"" http://node3.esdb.prod.******.**********:2113/projections/all-non-transient",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2k,"Nov 21 06:01:32 node3 bash[148776]: {[""Accept""]=[""application/json""], [""Connection""]=[""keep-alive""], [""Host""]=[""node3.esdb.prod.******.**********:2113""], [""User-Agent""]=[""Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36""], [""Accept-Encoding""]=[""gzip, deflate""], [""Accept-Language""]=[""en,en-US;q=0.9,ja;q=0.8""], [""Referer""]=[""http://node3.esdb.prod.******.**********:2113/web/index.html""], [""ES-LongPoll""]=[""5""]}",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2l,"Nov 21 06:01:32 node3 bash[148776]: """"",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2m,"Nov 21 06:01:32 node3 bash[148776]: [148776,81,06:01:32.334,INF] ES ""TcpConnection"" closed [06:01:32.334: N""10.0.2.14:52410"", L""10.0.2.13:1112"", {c54ab3ce-ff59-43a8-9b14-472dcec5310a}]:Received bytes: 2197, Sent bytes: 0",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2q,"Nov 21 06:01:32 node3 bash[148776]: [148776,81,06:01:32.334,INF] Internal TCP connection accepted: [Normal, ""10.0.2.14:52410"", L""10.0.2.13:1112"", {c54ab3ce-ff59-43a8-9b14-472dcec5310a}].",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2r,"Nov 21 06:01:32 node3 bash[148776]: [148776,18,06:01:32.336,DBG] Closing connection '""internal-normal""""""' [""10.0.2.16:41252"", L""10.0.2.13:1112"", {b82e349f-e6e2-4ff1-8a87-6983e0c00689}] cleanly."" Reason: Closing replication subscription connection.""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2s,Nov 21 06:01:32 node3 rsyslogd: action 'action-8-builtin:omfile' resumed (module 'builtin:omfile') [v8.2112.0 try https://www.rsyslog.com/e/2359 ],node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2t,"Nov 21 06:01:32 node3 bash[148776]: [148776,18,06:01:32.336,DBG] Closing connection '""internal-normal""""""' [""10.0.2.12:42312"", L""10.0.2.13:1112"", {5db8aebc-2fef-497d-86c7-0b09c5041e53}] cleanly."" Reason: Closing replication subscription connection.""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2w,"Nov 21 06:01:32 node3 bash[148776]: [148776,19,06:01:32.336,INF] Looks like node [""Unspecified/node2.esdb.prod.******.**********:1112""] is DEAD (TCP connection lost). Issuing a gossip to confirm.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2x,"Nov 21 06:01:32 node3 bash[148776]: [148776,19,06:01:32.336,INF] Looks like node [""Unspecified/node5.esdb.prod.******.**********:1112""] is DEAD (TCP connection lost). Issuing a gossip to confirm.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop2y,"Nov 21 06:01:32 node3 bash[148776]: [148776,19,06:01:32.336,INF] Looks like node [""Unspecified/node1.esdb.prod.******.**********:1112""] is DEAD (TCP connection lost). Issuing a gossip to confirm.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop5u,"Nov 21 06:01:32 node3 bash[148776]: [148776, 8,06:01:32.514,DBG] SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 57ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop5v,"Nov 21 06:01:32 node3 bash[148776]: [148776,26,06:01:32.514,DBG] SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 57ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop5w,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.515,DBG] SLOW BUS MSG [""PersistentSubscriptionsBus""]: ""UnsubscribeFromStream"" - 111ms. Handler: ""PersistentSubscriptionService`1"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop5x,"Nov 21 06:01:32 node3 bash[148776]: [148776,40,06:01:32.515,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 60ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop5y,"Nov 21 06:01:32 node3 bash[148776]: [148776,46,06:01:32.516,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 60ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop5z,"Nov 21 06:01:32 node3 bash[148776]: [148776,29,06:01:32.517,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 60ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop60,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.519,DBG] Retrying message ""$all::******-subscriber-group"" ""**************e2eResult-01932dd6-bea4-753f-843c-aa5a3413bf3b""/1 (stream position=""C:104504975584/P:104504975584"")",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop61,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.519,DBG] Retrying message ""$all::******-subscriber-group"" ""**************e2eResult-01932dd6-bea4-753f-843c-aa5a3413bf3b""/1 (stream position=""C:104504975584/P:104504975584"")",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop62,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.521,DBG] New connection to persistent subscription ""$all::******-subscriber-group"" by e56db625-8041-4de5-a3e4-713498783cda",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop63,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.572,DBG] SLOW BUS MSG [""PersistentSubscriptionsBus""]: ""ConnectToPersistentSubscriptionToAll"" - 53ms. Handler: ""PersistentSubscriptionService`1"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop64,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.572,DBG] New connection to persistent subscription ""$all::******-subscriber-group"" by 7c11437b-5e8e-44ef-8b44-ec9ea92e02b0",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop65,"Nov 21 06:01:32 node3 bash[148776]: [148776,91,06:01:32.573,DBG] New connection to persistent subscription ""$all::******-subscriber-group"" by 0e0dcc89-0516-4e43-8469-b22f8e7ed5a5",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop66,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read Stream Events Backward operation has expired for Stream: ""**************e2eResult-01932dd6-edec-789a-b578-4926b1739c7e"", From Event Number: -1, Max Count: 1. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop67,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read All Stream Events Forward operation has expired for C:104501508953/P:0. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop68,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read Stream Events Backward operation has expired for Stream: ""**************e2eResult-01932dd6-edec-75dc-b02c-67a083fbafe9"", From Event Number: -1, Max Count: 1. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop69,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read Stream Events Backward operation has expired for Stream: ""**************e2eResult-01932dd6-ee14-7792-afd8-03f36b82f9b9"", From Event Number: -1, Max Count: 1. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop6a,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read All Stream Events Forward operation has expired for C:104502608695/P:0. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop6b,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read All Stream Events Forward operation has expired for C:104500633199/P:0. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dz8r0af7kop6c,"Nov 21 06:01:32 node3 bash[148776]: [148776,71,06:01:32.605,DBG] Read All Stream Events Forward operation has expired for C:104503603956/P:0. Operation Expired at 11/21/2024 06:01:11",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:33.069564917Z
1dzkesbf7l1prt,"Nov 21 06:01:00 node3 bash[148776]: [148776,37,06:01:00.915,DBG] SLOW BUS MSG [""bus""]: ""UnwrapEnvelopeMessage"" - 49ms. Handler: ""UnwrapEnvelopeHandler"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:01.054265811Z
1dzpwzcf7klzp6,"Nov 21 06:01:34 node3 bash[148776]: [148776,37,06:01:34.079,DBG] SLOW BUS MSG [""bus""]: ""CommittedEventDistributed"" - 77ms. Handler: ""EventReaderCoreService"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:35.065485288Z
1dzpwzcf7klzp7,"Nov 21 06:01:34 node3 bash[148776]: [148776,28,06:01:34.080,DBG] PROJECTIONS: Stopping Projections Core Coordinator",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:35.065485288Z
1dzpwzcf7klzp8,"Nov 21 06:01:34 node3 bash[148776]: [148776,28,06:01:34.080,DBG] SLOW BUS MSG [""manager input bus""]: ""StopComponents"" - 79ms. Handler: ""ProjectionCoreCoordinator"".",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:35.065485288Z
1dzpwzcf7klzp9,"Nov 21 06:01:34 node3 bash[148776]: [148776,28,06:01:34.080,DBG] PROJECTIONS: Stopping Projections Manager. Correlation 9c1badae-8117-4149-a2e3-d5c189111b28",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:35.065485288Z
1dzpwzcf7klzpa,"Nov 21 06:01:34 node3 bash[148776]: [148776,19,06:01:34.083,INF] Offline truncation will happen, shutting down ""ReplicationTrackingService""",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:35.065485288Z
1dzpwzcf7klzpb,"Nov 21 06:01:34 node3 bash[148776]: [148776,15,06:01:34.083,INF] === SUBSCRIBED to [""Unspecified/node1.esdb.prod.******.**********:1112"",{7de5dff4-4b41-43d2-bd94-a85fe708535d}] at 104505008225 (0x1854FBE061). SubscriptionId: {f7ce7a11-ccad-4534-b741-31e46ccaa63f}.",node3,projects/******-****/logs/syslog,,2024-11-21T06:01:35.065485288Z
And here are the monitoring dashboards during the failures happening:
Some help from the community would be much appreciated!