Cluster decision: Small quorum size + clone vs bigger quorum

ndangthieu · July 14, 2018, 12:57pm

Hello,
I’m about to setup an event store cluster on our production environment. One question: Which would be better - using 5 instance cluster (quorum size 3) vs 3 instance cluster (quorum size 2) with 2 added “clone” nodes?

Thanks,

Thieu

Greg_Young1 · July 14, 2018, 1:15pm

The clones are just warm replicas.

The main benefit to having the clones is you can quickly change their config and let them join in the cluster (eg they have all the data a quick change and they are almost ready to join!). Another benefit is that you can dispatch reads to them to take load off the other nodes.

A further benefit is you can geographically distribute them allowing for remote models etc to be able to do replays etc locally instead of needing to have longer network traffic (this is a really common case). Let’s imagine we are a bank and we have offices in JoBurg and in London. All of transaction processing happens in London but we also have a network in ZA. I might want to put a clone in JoBurg so local read models etc can build off that local node as opposed to needing to talk to London. To throw another few factors into it we may have different teams between London and JoBurg and it will also help reduce needed interactions between them.

Does this make sense?

Cheers,

Greg

Greg_Young1 · July 14, 2018, 1:24pm

BTW on a complete side note another use for a clone I have quite enjoyed over the years is keeping a production clone in the development office so that we have a constant access to actual production data without any risk of damaging production (its read only). This is useful when writing/testing new projections etc. It is also useful for all those adhoc questions people like to ask about production and the ability to look into them “without risk” the worst thing I may do is kill the clone if I do something stupid

James_Connor · July 14, 2018, 3:24pm

Interested in this point rejoining point - you’re talking about a simple custom tool to replicate $all (on a master cluster) to a new node/cluster (a read-only clone/slave) and not something akin to this https://github.com/EventStore/EventStore/issues/263 right?

Just curious about promoting the slave/clone node to join the master cluster when using the former - will the global $all checkpoints etc would fine all compatible (in order to join the master cluster) without any other logic? What about $ce- streams if the projection systems were switched off on the slave/clone - wouldn’t their absence mean joining a cluster be problematic?

Cheers!

Greg_Young1 · July 14, 2018, 3:31pm

Replication is handled ia log shipping so they all have the same data. Projections write to the log so all nodes have the same data.

Chris_Elgin · July 14, 2018, 3:56pm

3 instance cluster (quorum size 2) with 2 added “clone” nodes?

I would be careful with this. I’m interpreting this statement as a cluster that looks like

master

slave

clone

In this configuration, if master fails, it is possible for elections to temporarily elect two masters (and thus two functioning clusters). The cluster will resolve itself on the next election, but the “losing” cluster will take itself offline for truncation.

Greg’s described usage of clones sounds fantastic as well, provided you don’t accidentally end up with your dev clone getting promoted. This could be avoided if clones could be configured as non-promotable.

Chris

Greg_Young1 · July 14, 2018, 4:01pm

"Greg’s described usage of clones sounds fantastic as well, provided you don’t accidentally end up with your dev clone getting promoted. This could be avoided if clones could be configured as non-promotable. "

Yes there is an option whether or not they can be promoted. I should have mentioned this. Though in the case the dev clone did get promoted you likely really wanted it to… if you stopped and had a beer after work to discuss thinking about all the other things that could have happened

Chris_Elgin · July 14, 2018, 4:08pm

Yes there is an option whether or not they can be promoted

Is there? I’d love to do this, but I can’t figure out how.

Though in the case the dev clone did get promoted you likely really wanted it to

Maybe. Cross continent latency could be painful, but if you need nodes in order to maintain quorum, then it’s certainly helpful.

As far as beer, anytime you’re in Austin, I’m buying.

James_Connor · July 14, 2018, 4:18pm

Gotcha. I got confused because when you run a user projection on $all you don’t get the full log eg $ce- streams

ndangthieu · July 14, 2018, 7:22pm

Thanks Greg and Chris for your quick responses.
I would stick with 5 node cluster then.
But I’d really love to have a local clone node on my dev environment as Greg mentioned above. Please help me with following concerns:

Is there any document describing the procedure to add a clone node to a running cluster?
How to make the clone non-promotable ? Will it automatically be promoted or we have to “quickly change their config and let them join in the cluster” ?
Is it possible to re-create the whole cluster using the data from the clone node (in case we have hardware issue with the production environment)?

I really appreciate your help.

ndangthieu · July 14, 2018, 7:41pm

After thinking for a while I assume:

Is there any document describing the procedure to add a clone node to a running cluster?

As long as the cluster is running, adding a node will result in a clone one.

How to make the clone non-promotable ? Will it automatically be promoted or we have to “quickly change their config and let them join in the cluster” ?

I think we need to change the config of other nodes so it knows the clone one. If we use the dns, then the node will automatically be promoted (?). May I make the clone non-promotable by using network configuration so that the clone node can access the cluster nodes but cluster nodes cannot see the clone?

Is it possible to re-create the whole cluster using the data from the clone node (in case we have hardware issue with the production environment)?

I would think yes.

Please correct me if my assumption is wrong. Thanks again.

ndangthieu · July 14, 2018, 7:48pm

Does it mean we risk with the cluster being offline for just a while ? And it’ll be working again without any manual actions? Thanks.

Chris_Elgin · July 15, 2018, 3:17pm

Does it mean we risk with the cluster being offline for just a while ?

Specifically in this (edge) case, you start with this:

Master
Slave
Slave
Clone
Clone

and if master dies, you can briefly be in a state like this following the subsequent election:

Dead
Master (sub-cluster 1)
Master (sub-cluster 2)
Slave (sub-cluster 1)
Slave (sub-cluster 2)

Upon the next gossip, the nodes will discover they have split brain and have elections again. When we observed this, it produced this:

Still dead/rebooting
Master
Dead - offline truncation
Slave
Dead - offline truncation

So, technically, depending on your definition of offline, the cluster was never offline in that it still has quorum (2 nodes). But it was at best noisy, and at worst lossy, in that (I think) the losing cluster could have written events that never made it to the other cluster - these events would be truncated, as if they never existed.

To attempt to answer your other questions:

How to make the clone non-promotable?

I’m not sure. In some local testing, once the cluster discovers the clone, it’s effectively in the cluster - which is to say no config change relating to cluster DNS or other network/cluster config would prevent the clone from being promoted.

Is it possible to re-create the whole cluster using the data from the clone node

Yes, it is possible to re-create the cluster given the data from the node. I haven’t done this before and there might be some steps around “growing” the cluster from a single node to your desired size, but there is probably details on this somewhere.

Chris

ndangthieu · July 15, 2018, 6:11pm

Thanks for your very detailed explanation, Chris. It’s really helpful.