Highly available resilient Event Store cluster in AWS

Hello,

we have an intention to use Event Store as a highly available resilient AWS cluster.

After reading documentation pages http://docs.geteventstore.com/server/4.0.2/cluster-without-manager-nodes/ and http://docs.geteventstore.com/server/4.0.2/cluster-with-manager-nodes/ we could not fully understand some important issues and decided to post our questions into this forum.

  • Which
    clusters have higher performance, the ones with less nodes due to the smaller internal communication overhead or the ones with more nodes?
  • Can a cluster node be stopped for e.g. maintenance and restarted later ?
  • How long is the single node downtime allowed to be?
  • Does Event store replicate all events stored during its downtime?
  • Can an IP address of a node in the cluster be changed when a cluster node goes down and up?
  • What
    happens if we restore a database of one cluster node from a backup? Does Event store replicate all events not available in the backup data?
  • How can we backup a complete cluster?
  • Can
    a number of nodes in the cluster be increased and the quorum size be changed without stopping the cluster completely? This feature was discussed in some postings in 2014 but I have not seen it mentioned in the documentation yet.
  • Which advantages do manager nodes have over AWS based autoscaling groups which can automatically restart database nodes?
  • Which additional tools do come along with commercial support?
  • Are there any new options related to setting up and operating a highly available AWS cluster?
    Any responses and clarifications are highly appreciated. We need to get better understanding for making well reasoned decisions with well understood risks.

Best regards,

Dimitry Polivaev

Hi Dimitry,

  1. Which clusters have higher performance, the ones with less nodes due to the smaller internal communication overhead or the ones with more nodes?

Smaller clusters would have better write performance.

When a write is sent to the cluster, the master will write the events and replicate them to the other nodes in the cluster.

The master will then wait for a quorum of nodes (cluster_size/2 + 1) to acknowledge the write before considering the write successful. Therefore, the more nodes in the cluster, the more hops a write would need to make before completing.

  1. Can a cluster node be stopped for e.g. maintenance and restarted later ?

Yes, a node can be stopped and restarted. It will rejoin the cluster and continue from where it left off.

This allows you to do operations such as rolling upgrades, where you take a node down, upgrade it, and bring it back up to join the cluster before upgrading the next node.

As long as the cluster has at least a quorum of nodes alive at a time, the cluster will still be able to process writes.

For example, in a cluster of 3 nodes the quorum would be 2, so the cluster will be able to tolerate one node going down without losing write functionality.

If there are less than a quorum of nodes left, the cluster will still be able to service reads, but not writes.

  1. How long is the single node downtime allowed to be?

As mentioned in point 2, the cluster can continue to accept and process writes if there are at least a quorum of nodes available.

  1. Does Event store replicate all events stored during its downtime?

Replication happens live. Event Store follows a “shared nothing” philosophy, so each node has its own db.

As mentioned in point 1, the master will replicate the events to the slaves and expect a certain number of them to acknowledge the write before the write is considered successful.

  1. Can an IP address of a node in the cluster be changed when a cluster node goes down and up?

Yes. The nodes in a cluster make use of either a gossip seed or dns record to find the other nodes in the cluster.

When a node starts up, it will attempt to gossip with a node from the provided addresses.

If existing nodes get a gossip from an ip address they do not recognise, they will add it to their list of known nodes.

So as long as the node starting up with a new ip knows where the other nodes are in order to start the gossip, it will be able to join the cluster.

  1. What happens if we restore a database of one cluster node from a backup? Does Event store replicate all events not available in the backup data?

You can start a node in a cluster from a backup. The node will join the cluster and subscribe to the master node to replicate any data it is missing.

  1. How can we backup a complete cluster?

You only need to back up one of the nodes in the cluster. Instructions for backing up a node can be found here.

  1. Can a number of nodes in the cluster be increased and the quorum size be changed without stopping the cluster completely? This feature was discussed in some postings in 2014 but I have not seen it mentioned in the documentation yet.

The cluster size is fixed and cannot be changed on the fly.

  1. Which advantages do manager nodes have over AWS based autoscaling groups which can automatically restart database nodes?

These are not necessarily exclusive.

Manager nodes are only available for running Event Store on Windows. These nodes ensure that the Event Store service is always running, and restarts it if something happens - such as an offline truncation.

This functionality is covered on unix systems by systemd and the like.

AWS autoscaling groups work well for ensuring a specific number of instances are always running and for recovering from one of the instances going down.

  1. Which additional tools do come along with commercial support?

Many thanks

Dimitry

  1. Which clusters have higher performance, the ones with less nodes due to the smaller internal communication overhead or the ones with more nodes?

Smaller clusters would have better write performance.

When a write is sent to the cluster, the master will write the events and replicate them to the other nodes in the cluster.

The master will then wait for a quorum of nodes (cluster_size/2 + 1) to acknowledge the write before considering the write successful. Therefore, the more nodes in the cluster, the more hops a write would need to make before completing.

And what is the story with reads? Is there a difference? Or do you always only read from a single node?

it depends on connection settings it can be read from any or read only from master

  1. Can a number of nodes in the cluster be increased and the quorum size be changed without stopping the cluster completely? This feature was discussed in some postings in 2014 but I have not seen it mentioned in the documentation yet.

The cluster size is fixed and cannot be changed on the fly.

Hi Hayley-Jean

By starting a Cluster with 3 nodes (running on same machine) and starting a 4th node with

gossip-seed=<the running 3 nodes>

it looks like the 4th node is included into the cluster (as seen in the log output and also the events are shared with the new node).

So it seems like the cluster size is not fixed, or am I missing here?

Cheers

Richard

``

I think the quorum size is
fixed

The cluster size is fixed and set as part of the configuration options.
The cluster size is used to calculate the number of nodes that would form a quorum.

If a new node joins a cluster after the configured number of nodes are up and have their roles assigned, the new nodes will become clones. This means that these nodes don’t participate in the quorum. If in this scenario you were to lose a slave or a master node, the clone can be promoted to one of these roles.

When I add a 4th node with --cluster-size=4 and then restart the first 3 in turn (also with --cluster-size=4), I end up with a cluster of 1 master and 3 slaves (no clone). The cluster is always up.

You really don’t want a cluster size of 4 with 4 nodes. That specifies if any node goes down the whole cluster is down.

Presumably this option exists on ConnectionSettings because there are reasons for both approaches.
Is there any guidance you can give when you would choose one over the other?

When would we not want to distribute reads across all nodes in the cluster?

A connection is not only for reads, it is also for writes. On writes if pointed to a slave node writes will be forwarded to the master. On reads it is possible that you get a “dirty read” from the cluster meaning that the slave node you read from may not yet have the write that you made.

ok, thanks, so as I understand it;

With regard to writes there seems no reason to choose one over the other, or at least no practical difference.
The eventual behaviour will be the same regardless of which setting is chosen as writes are always performed by master (either directly or forwarded from a node).

With regard to reads, the guidance would be that we can choose to either

  1. read from any node which will increase read throughput but introduce the possibility of “dirty reads”

  2. ensure full consistency by always reading from the master node (with a corresponding reduction in potential throughput)

Does that sound right?

Just to close off this thread for others’ benefit (it would be really useful if this kind of guidance could be made available via the GES docs site rather than users having to trawl through forums)…

Since there was no answer on the forum I put my this question directly to someone at GES. They advised that my understanding as described below is completely correct.