Building a self healing event store cluster in AWS

Good morning,

I am in the process of building an event store cluster in AWS. Can somebody recommend the best machine size/hard disk combination, also, if a node dies given I am using an autoscaling group, what is the best way to add a new node to the
cluster safely.

Kind regards


Hi Sean

We are using m4-large but this will all depend on your load.

We are also putting nodes across availability zones for redundancy.

For dealing with a failed node we make use of a few things. We use DNS(Route53) to allow ES to find the nodes and cluster. We also have to process completely scripted so that when we do lose a node the scripts kicks in to recreate the node and add it to the Cluster.

We do rely on ES to seed the new node so we do not do any restore to the new node.

We have tested this and it works well. We can get the cluster back up with all the nodes in about 15 min from the Cluster port going down on a slow AWS day.

We are looking now to make the region resilient by running a secondary cluster, but still working through the details on how to make sure the master don’t move the secondary cluster nodes.

Hope this help.


Chris what is your time frame on this?

Hi Chris!

We are using m4-large but this will all depend on your load.
We are also putting nodes across availability zones for redundancy.

Won't this impact the performance of the cluster writes, since they
are quorum based?

For us, we have all instances in the same zone for now. We have
already written a Logstash input plugin to read events, so if we then
add a Logstash output plugin we could let that do synchronization to a
completely separate ES instance, as an "online backup". Has anyone
else played around with this kind of setup?

regards, Rickard

Hi Greg

For our current setup we are already in production as EventStore as a secondary store.

For moving on the Master Store option with cross-region we are working actively on this now.

I am just about to reach out to support team about how to get the cluster stuff done outlined in the doc under other option here


Hi Rickard

impact on performance - we have not seen anything like that yet. The latency between AZ’s is very minimal from what we have seen to this point.

I just have to clarify that we are running a cross-availability zones in the same region. As we are using ES at them moment as a secondary store the data is very minimal. so reseeding does not show us any issues.

Doing all this with a larger data set, we may re-look at the reseeding options as that may not be the beset solution.

Hope that answer your question.


Hi Chris,

I am too are trying to depoy an Eventstore Cluster in AWS.
I am using RHEL, runing the scriptrun-node.shI managed to have a single node running. Would you be able to help me out how I would be able to run a cluster in Linux plz?
At the moment I have managed to get 3 EBS backed nodes running in 3 AZs each in their own ASG. Do you have any link to a resource that’ll help me set cluster in RHEL ?

Many thanks.