Events lost during leader election

Hello, we are having some unknown issues that causes the leader of our three node cluster to freeze for some seconds, triggering a leader election in the two other nodes. Nearly every time this happens, we lose a couple of events when the previous leader starts responding again. It seems like it accepts some events, even though it is not the leader anymore, and tells the client that they were persisted. Shortly afterwards, it realizes this, and goes offline for truncation. Shouldn’t this be impossible by design?

We have had this problem for quite some time, starting with version 5, as discussed in https://discuss.eventstore.com/t/lost-events-during-leader-election-truncation. We recently upgraded our clusters to version 21.10.1, but the problem seems to be present still. We have been using both the Akka JVM client, the GRPC Java client, and ESJC, and the problem has occured with all of these.

The logs from all three nodes at the time of the incident can be found below. No events were accepted between 10:27:52 and 10:28:03. The events that are missing were written between 10:28:08.871 and 10:28:09.244. There were also other events written in this timespan that did not get lost.
https://gist.github.com/andersflemmen/9cf8630417b4e99f8ea1c2631306e88f <- NEW LEADER
https://gist.github.com/andersflemmen/17858e33dba5a18b305f35c45d2d1c2a <- OLD LEADER
https://gist.github.com/andersflemmen/c6d00cd67ca9f24a6c43be46762de941

2 Likes

Hi Anders,

Yes this should be prevented by design.

Were the events lost from the beginning of the streams

x-76296fea-2d6b-4d4b-8737-149b18277e73
x-ec16afd8-3814-414d-9d4a-580cb95dbba6

No, the events from those streams are still there.

The streams that lost events are not mentioned in the logs. Also, this has been happening with both new streams, and streams that already have events.

Hi Anders,

To our considerable surprise we were able to find a case where the behaviour you described could occur. I’ve filed a ticket for it here https://github.com/EventStore/EventStore/issues/3472

Wow, relieved to hear that! Looking forward to a fix, and hope that it solves the problems we have been seeing. Thanks!