Offline truncation happening really often

Hi,
I recently change some things in how my streams are built and on my test environment when running some simple load tests (with really small load at start) I get a lot of restarts of eventstore instances. I have a cluster of 3 instances.

I see in logs some logs like “Election timeout”, “SLOW BUS MSG”, “Looks like node is dead” etc. and in the meantime instance decides:

{"@t":"2023-10-17T16:26:21.9991546+00:00","@mt":"Offline truncation will happen, shutting down {service}","@l":"Information","@i":193379922,"service":"ReplicationTrackingService","SourceContext":"EventStore.Core.Services.Replication.ReplicationTrackingService","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T16:26:21.9992719+00:00","@mt":"=== SUBSCRIBED to [{leaderEndPoint},{leaderId:B}] at {subscriptionPosition} (0x{subscriptionPosition:X}). SubscriptionId: {subscriptionId:B}.","@r":["{38ab044d-be7c-47d1-aab7-f049c4abeea0}","1A66F41","{06d71159-9edf-4842-9fbf-08132eebdd2e}"],"@l":"Information","@i":3944923921,"leaderEndPoint":"10.0.1.237:1112/eventstore.default.svc.cluster.local","leaderId":"38ab044d-be7c-47d1-aab7-f049c4abeea0","subscriptionPosition":27684673,"subscriptionId":"06d71159-9edf-4842-9fbf-08132eebdd2e","SourceContext":"EventStore.Core.Services.ClusterStorageWriterService","ProcessId":1,"ThreadId":11}
{"@t":"2023-10-17T16:26:21.9992986+00:00","@mt":"Leader [{leaderEndPoint},{leaderId:B}] subscribed us at {subscriptionPosition} (0x{subscriptionPosition:X}), which is less than our writer checkpoint {writerCheckpoint} (0x{writerCheckpoint:X}). TRUNCATION IS NEEDED.","@r":["{38ab044d-be7c-47d1-aab7-f049c4abeea0}","1A66F41","1A66FDF"],"@l":"Information","@i":2777819662,"leaderEndPoint":"10.0.1.237:1112/eventstore.default.svc.cluster.local","leaderId":"38ab044d-be7c-47d1-aab7-f049c4abeea0","subscriptionPosition":27684673,"writerCheckpoint":27684831,"SourceContext":"EventStore.Core.Services.ClusterStorageWriterService","ProcessId":1,"ThreadId":11}
{"@t":"2023-10-17T16:26:21.9995123+00:00","@mt":"ONLINE TRUNCATION IS NEEDED. NOT IMPLEMENTED. OFFLINE TRUNCATION WILL BE PERFORMED. SHUTTING DOWN NODE.","@l":"Information","@i":244102064,"SourceContext":"EventStore.Core.Services.ClusterStorageWriterService","ProcessId":1,"ThreadId":11}

Also in the code I get quite a lot of DeadlineExceededErrors.

I am not sure what is the root cause. I am thinking that my streams get quite long after the change, maybe that’s the cause. Or maybe I messed up something in the environment (although I did not change anything after last load tests which did perform well). The resources (CPU, memory) are not the issue as I look at the usage.

After last load tests I performed 2 changes in critical path I am testing - one is that system reads streams for aggregates by first checking stream metadata for snapshot position and then reads from snapshot. The other thing is that the streams are quite long and during load tests the snapshotting does not happen quick enough maybe.

My question is what could be the cause here? Where should I look for further information? I am not sure what this “offline truncation” is.

I use eventstoredb with version 21.10.9

Hi!

Offline truncation occurs when an eventstore node needs to remove the most recently written data from the end of its log. This can be necessary if the node was the leader and it had written some data down locally that it was then unable to replicate due to a network partition or similar. When it rejoins the cluster it will realise that it has this extra data and the procedure for removing it currently involves restarting the node. The data to be removed has not been fully written and the write from the client will not have received a successful response.

This shouldn’t happen often but it is more likely to happen if you are having a lot of elections. If elections are happening ofen the cause ought to be visible on the logs, but sometimes it can help to increase the GOSSIP TIMEOUT MS database setting

Thanks for the response!

It seems like there is something “unhealthy” in general going on. I think there cause might be visible here in those logs if I am not mistaken

{"@t":"2023-10-17T19:00:36.5109173+00:00","@mt":"Looks like node [{nodeEndPoint}] is DEAD (TCP connection lost). Issuing a gossip to confirm.","@l":"Information","@i":226695735,"nodeEndPoint":"Unspecified/10.0.1.237:1112","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:36.5110825+00:00","@mt":"Gossip Failed, The node [{nodeEndpoint}] is being marked as DEAD. Reason: {reason}","@l":"Information","@i":355410486,"nodeEndpoint":"10.0.1.237:2113/eventstore.default.svc.cluster.local","reason":"Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:36.5112216+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"TCP connection lost to [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432588/84432429/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.507","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.063","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <LIVE> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.676"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432588/84432429/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.507","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <LIVE> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.676"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:36.5124149+00:00","@mt":"View Change Proof Send Failed to {Server}","@l":"Information","@i":802807059,"@x":"System.AggregateException: One or more errors occurred. (Status(StatusCode=\"DeadlineExceeded\", Detail=\"\"))\n ---> Grpc.Core.RpcException: Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")\n   at EventStore.Core.Cluster.EventStoreClusterClient.SendViewChangeProofAsync(Guid serverId, EndPoint serverHttpEndPoint, Int32 installedView, DateTime deadline) in /home/runner/work/TrainStation/TrainStation/build/oss-eventstore/src/EventStore.Core/Cluster/EventStoreClusterClient.Elections.cs:line 115\n   --- End of inner exception stack trace ---","Server":"10.0.1.237:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Cluster.EventStoreClusterClient","ProcessId":1,"ThreadId":31}
{"@t":"2023-10-17T19:00:37.5274832+00:00","@mt":"Looks like node [{nodeEndPoint}] is DEAD (Gossip send failed).","@l":"Information","@i":1718796462,"nodeEndPoint":"10.0.1.164:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:37.5280777+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip send failed to [10.0.1.164:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.474","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <LIVE> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:33.676"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.474","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.527"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:37.5281246+00:00","@mt":"Gossip Failed, The node [{nodeEndpoint}] is being marked as DEAD. Reason: {reason}","@l":"Information","@i":355410486,"nodeEndpoint":"10.0.1.164:2113/eventstore.default.svc.cluster.local","reason":"Status(StatusCode=\"DeadlineExceeded\", Detail=\"\")","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:37.5307633+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip received from [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.474","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [Follower, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:36.511","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:35.065","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5047687+00:00","@mt":"Looks like node [{nodeEndPoint}] is DEAD (Gossip send failed).","@l":"Information","@i":1718796462,"nodeEndPoint":"10.0.1.237:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5048847+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip send failed to [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:35.065","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:38.504","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5074878+00:00","@mt":"Gossip Received, The node [{nodeEndpoint}] is not DEAD.","@l":"Information","@i":189044222,"nodeEndpoint":"10.0.1.237:2113/eventstore.default.svc.cluster.local","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5076076+00:00","@mt":"CLUSTER HAS CHANGED {source}\nOld:\n{oldMembers}\nNew:\n{newMembers}","@l":"Information","@i":1200958815,"source":"gossip received from [10.0.1.237:2113/eventstore.default.svc.cluster.local]","oldMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.530","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <DEAD> [CatchingUp, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:38.504","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"newMembers":["Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [Leader, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432906/84432906/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:38.507","Priority: 0 VND {e93e9c9c-b0fb-4ec3-8718-93692d8b16a2} <LIVE> [Leader, 10.0.1.237:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.237:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84432557/84432930/84432930/E173@84432270:{16c17a2f-3fc6-430e-8b4d-a9558d00aa42} | 2023-10-17 19:00:37.533","Priority: 0 VND {04f37cde-8387-4d69-9f27-41b6c6018bca} <DEAD> [Follower, 10.0.1.164:1112/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:1113/eventstore.default.svc.cluster.local, n/a, 10.0.1.164:2113/eventstore.default.svc.cluster.local, (ADVERTISED: HTTP::0, TCP::0)] 84106430/84432270/84432270/E172@84106142:{b376bfe1-6cc3-4cb6-8554-8e2f4dc81471} | 2023-10-17 19:00:37.528"],"SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":1,"ThreadId":15}
{"@t":"2023-10-17T19:00:38.5076249+00:00","@mt":"There are MULTIPLE LEADERS according to gossip, need to start elections. LEADER: [{leader}]","@l":"Debug","@i":145878891,"leader":"Priority: 0 VND {f2d705e7-2d22-4d0d-bc85-8fbf68f7b051} <LIVE> [PreReplica, Unspecified/10.0.1.8:1112, n/a, Unspecified/10.0.1.8:1113, n/a, Unspecified/10.0.1.8:2113, (ADVERTISED: HTTP::0, TCP::0)] 83885134/84098115/84098115/E170@83884846:{20a32ff0-a3a7-4a44-bdff-977207c121b9} | 2023-10-17 18:58:49.392","SourceContext":"EventStore.Core.Services.VNode.ClusterVNodeController","ProcessId":1,"ThreadId":15}

So I will try to increase the gossip timeout as you said it makes sense. But still, it should not take that long and previously it was not an issue. After that I will try to investigate why everything is running slower, including those SLOW BUS MSG logs which I think should happen rarely and I see them often

Hi, if you’ve started using persistent subscriptions recently or changed the way that you’re using them, then you might be running into a bug that is present in 21.10.9 where a slow persistent subscription consumer can prevent the server from responding to other messages such as gossip.

It would be worth trying 23.10.0. We will also be releasing a 21.10 patch in the coming weeks with a fix as well