Running Multi-Node on Azure

Scott_Cate · November 2, 2012, 12:07am

We have your EventStore running on a VM in Azure, and we’re looking for guidance on your Multi-Node product

Is there a whitepaper, or setup docs that I can read?

Thanks,

@scottcate

Greg_Young1 · November 2, 2012, 7:06am

There will be an admin guide coming out shortly. I can give you the
basic explanation.

The first version of multi-node supports quorum based
replication/failover of a single vnode group. For an Azure deploy of a
reasonably small system you would probably want to deploy three nodes
(possibly a fourth, reasons to follow).

The three nodes would be set as active within a group. What happens is
every time a write occurs at least two of the nodes must agree that it
has happened (based upon a quorum). As such any combination of two of
your servers will allow a transaction to occur in a consistent
fashion. In terms of reaching high availability there is a trade off
that gets seen. The trade off happens between high availability and
latency. If you put two nodes next to each other in the same
availability zone they will be quite low latency however they also run
a chance of azure losing the entire availability group. Putting nodes
into multiple availability zones will increase latencies per write.

Our general recommendation if low latency is not your primary goal
(many systems can say that 50-100ms of latency is acceptable). Is to
place each node into a different availability zone. With this, so long
as 2/3 availability zones can talk to each other the node group will
be considered "up". In terms of setting up the group its just putting
either dns (eg: mycluster.mydomain.com has a list of nodes) or a list
of nodes onto the machines. The nodes will find the group and join.

There is another node type that can be added to the group. This is a
node that will act as a non-consistent warm replica. For many systems
having "perfect" data is not their major concern. They really want to
have a fallback node. They could then say setup 3 nodes in the same
availability group. There is a warm replica in another availability
group that is always say 200ms behind the main group. In a disaster
scenario this node can be used as a fallback. This provides lower
latency (main nodes are essentially on a shared network) but lacks
consistent fallover. This can also be quite useful if you want to keep
a warm replica say in your office but keep data in the cloud (a common
use case we have seen).

Overall the setup of such systems is relatively easy, its mostly
figuring out from a admin perspective what you want your
availability/latency/consistency profiles to look like. The client API
is being setup so it should be close to transparent whether you run on
a singlenode or with a node group.

I had mentioned consistent hashing (many node groups) and that its not
yet done. The reasoning for this is that there are very few systems
(<1% would be a good number) that actually need it. Even getting up
into write speeds of processing say real time stock market data can be
done with manual partitioning (say 4 node groups) relatively easily.
The consistent hashing is for systems much bigger than this (an
example might be a multi-tenant ES hosted in the cloud with 5000
tenants). This is currently scheduled to be added to the system in
q1/q2 next year.

Hope this helps,

Greg

Grover · January 10, 2013, 7:56pm

Is multi-node support more focusing on reliability or distributed work, I.E. full store replication across multiple nodes, or distributing the stores data across multiple nodes and converging on request to lower individual machine loads at the expense of latency?

jen20 · January 12, 2013, 9:38am

The version we are about to ship is focused on reliability. However we designed in consistent hashing to allow the distributed scenario from the start, so it will be part of the product at some point.

Cheers,

James

Greg_Young1 · January 12, 2013, 3:36pm

Current version is replications groups. We have been building everything since the beginning with consistent hashing in mind though so it shouldn’t be too long until its there. Basically the idea is …

Get single node stable.

Get single replication group stable.

Get consistent hashing of replication groups

We are at stage two now

Cheers,

Greg