Thanks for the response!
It seems like there is something “unhealthy” in general going on. I think there cause might be visible here in those logs if I am not mistaken
{"@t":"2023-10-17T19:00:36.5109173+00:00","@mt":"Looks like node [{nodeEndPoint}] is DEAD (TCP connection lost). Issuing a gossip to confirm.","@l":"Information","@i":226695735,"nodeEndPoint":"Unspecified/10.0.1.237:1112","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:36.5110825+00:00","@mt":"Gossip Failed, The node [{nodeEndpoint}] is being marked as DEAD. Reason: {reason}","@l":"Information","@i":355410486,"nodeEndpoint":"10.0.1.237:2113/eventstore.default.svc.cluster.local","reason":"Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:36.5112216+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"TCP connection lost to [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432588/84432429/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.507","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.063","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <LIVE> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.676"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432588/84432429/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.507","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <LIVE> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.676"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:36.5124149+00:00","@mt":"View Change Proof Send Failed to {Server}","@l":"Information","@i":802807059,"@x":"System.AggregateException: One or more errors occurred. (Status(StatusCode=\"DeadlineExceeded\", Detail=\"\"))\n ---> Grpc.Core.RpcException: Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")\n at EventStore.Core.Cluster.EventStoreClusterClient.SendViewChangeProofAsync(Guid serverId, EndPoint serverHttpEndPoint, Int32 installedView, DateTime deadline) in /home/runner/work/TrainStation/TrainStation/build/oss-eventstore/src/EventStore.Core/Cluster/EventStoreClusterClient.Elections.cs:line 115\n --- End of inner exception stack trace ---","Server":"10.0.1.237:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Cluster.EventStoreClusterClient","ProcessId":1,"ThreadId":31}
{"@t":"2023-10-17T19:00:37.5274832+00:00","@mt":"Looks like node [{nodeEndPoint}] is DEAD (Gossip send failed).","@l":"Information","@i":1718796462,"nodeEndPoint":"10.0.1.164:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:37.5280777+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip send failed to [10.0.1.164:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.474","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <LIVE> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.676"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.474","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.527"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:37.5281246+00:00","@mt":"Gossip Failed, The node [{nodeEndpoint}] is being marked as DEAD. Reason: {reason}","@l":"Information","@i":355410486,"nodeEndpoint":"10.0.1.164:2113/eventstore.default.svc.cluster.local","reason":"Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:37.5307633+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip received from [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.474","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:35.065","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5047687+00:00","@mt":"Looks like node [{nodeEndPoint}] is DEAD (Gossip send failed).","@l":"Information","@i":1718796462,"nodeEndPoint":"10.0.1.237:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5048847+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip send failed to [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:35.065","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:38.504","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5074878+00:00","@mt":"Gossip Received, The node [{nodeEndpoint}] is not DEAD.","@l":"Information","@i":189044222,"nodeEndpoint":"10.0.1.237:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5076076+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip received from [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:38.504","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:38.507","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [Leader, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84432557/84432930/84432930/E173@84432270:{16c17a2f-3fc6-430e-8b4d-a9558d00aa42} | 2023-10-17 19:00:37.533","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5076249+00:00","@mt":"There are MULTIPLE LEADERS according to gossip, need to start elections. LEADER: [{leader}]","@l":"Debug","@i":145878891,"leader":"Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [PreReplica, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 83885134/84098115/84098115/E170@83884846:{20a32ff0-a3a7-4a44-bdff-977207c121b9} | 2023-10-17 18:58:49.392","SourceContext":"EventStore.Core.Services.VNode.ClusterVNodeController","ProcessId":1,"ThreadId":15}
So I will try to increase the gossip timeout as you said it makes sense. But still, it should not take that long and previously it was not an issue. After that I will try to investigate why everything is running slower, including those SLOW BUS MSG logs which I think should happen rarely and I see them often