We are trying to confirm our understanding of the (eventual) consistency of data in a cluster to help inform the code we write around it. We have looked at the documentation and blogs (including https://geteventstore.com/blog/20130301/ensuring-writes-multi-node-replication/index.html and http://docs.geteventstore.com/server/3.0.0/cluster-with-manager-nodes/) and done some searches on this group looking for the same topic but could not find a match. If this has already been discussed then apologies and a pointer to an answer would be appreciated.
We are using an eventstore cluster for eventsourcing and individual or load-balanced services which maintain in-memory read models.
Our understanding of the world is as follows. Please could you identify any incorrect assumptions:
-
At any one time, any given client could be taking to any single node in the cluster. It will use that node for reading, writing and subscriptions.
-
In any cluster, there is one master which is the only one that actually processes client write requests. All other cluster members will forward client write requests to the master.
-
In a cluster of N nodes, the master will forward the write request to N/2 other nodes as a set of node-to-node write requests and will not consider the client write request to be complete unless all of these node-to-node write requests succeed.
-
Once the client write request has been successfully persisted on N/2 + 1 nodes, the master will not make any more specific efforts to persist this client write request data on the remaining nodes but will leave it up to the gossip protocol to propagate it.
Based on this:
-
Until the gossip protocol has propagated the client write request data, a given client could be talking to a node that does not have the latest data.
-
If there are two eventstore clients talking to different nodes and those clients are instances of a service that maintain the same in-memory read model, then their read models could be out of sync.
-
This means that the clients of those services could get different answers from the read model based on which instance of the service they talk to.
-
The clients of the read model services (and anything else further towards the user) just has to deal with the potential temporary inconsistency (which is just another form of eventual consistency).
There is no issue around dealing with the eventual consistency around the read models (after all, one service could be running a lot slower processing events than another one).
However, if we needed to reconstitute an aggregate based on the current state of events to determine if a CQRS command is valid, then this could be something we have to consider. Hence we wanted to make sure our understanding of how this works is right so we are not working on an incorrect assumption.
Thanks
Andy