Multi-cluster setup

Thomas_Weiss · February 25, 2015, 6:54am

Hi there,

I’m progressing on my reflexion about a multi-cluster setup (previous questions in the last reply of this thread).

Quick background: The idea is to use ES across multiple regions; all regions would eventually converge to the same state but I don’t want common streams and strong consistency between regions (I need high write availability so can’t afford reaching quorums with the latency between datacenters). Keeping strict event ordering between regions is not a must; despite what I initially thought, the same state can be reached with a different ordering of events if the events are properly designed.

So my current approach would be to have a cluster of ES nodes per region and some kind of replication happening so that the streams of each regions would be available in every region…

Two questions arise from that:

What would be the best way to replicate a stream to an other region? Obviously I could just subscribe and push, but is there a “stronger” way to exactly replicate a stream to another node?
Although I mentioned that x-region ordering is not vital, would there be any way to somehow merge & reorder streams from different nodes?

Thanks

Thomas

Thomas_Weiss · February 27, 2015, 4:55am

Hi,

I was wondering why my questions about HA & multi-cluster remain unanswered here, and I just realised that these are topics typically covered by the ES paid support!

So I just wanted to check if it’s really that (in which case I can consider switching to paid support) or if it’s pure “this is just a community forum with no guaranteed support and we didn’t have time to answer (yet)”, which I can fully understand as well.

Cheers

Thomas

Greg_Young1 · February 27, 2015, 9:20am

Sorry about that, my guess is people saw the email but it required
more than a sentence or two response so got put off for later and at
least for me personally forgotten

So you can do multi-data center with synchronization between them. The
easiest way is to have separate streams that get written to in each
place and asynchronous replication via a push as you describe.

I would however ask what latency do you need to hit? You might be
surprised at the latency that you end up with (in a good way as most
writes only require a single round trip).

In terms of can you get stronger consistency without latency the
answer is no. This is not an event store thing but a distributed
systems thing. In order to get consistency you need to get at minimum
n/2+1 servers to agree that a write has occurred in order to insure
consistency. This will obviously require latency of talking to n/2+1
servers and that n/2+1 servers actually are contactable. If you want
to read more about this let me know off list and I can give you a list
of papers that cover the topic from a distributed systems point of
view.

Cheers,

Greg

Thomas_Weiss · February 27, 2015, 10:19am

Hi Greg,

Thanks a lot for your reply.

High latency for an event to eventually appear in an other location is not too important; if my regions are in sync within a window of tens of seconds, that’s very fine (users are routed through Azure’s Traffic Manager so I assume that within a “session”, a user will always talk to the same datacenter and so will consistently read what he writes).

Latency on writes is more problematic because it would impact the response time from the API I expose (unless I first queue the event in some persistent service bus which defeats the purpose).

Say I deploy a single cluster over 3 regions A, B & C, each separate from the other by 250ms latency. My understanding is that, when writing to A, I could have the following situations:

if A is the master, it just has to wait for B or C to acknowledge the write and meet the quorum (250ms) -> 250ms latency
if A is not the master, it routes the write to B or C (250ms), which will wait for one of the 2 others to acknowledge the write and meet the quorum (250ms) -> 500ms latency
Is this correct?

Thomas

Jonathan_Curtis · June 1, 2015, 4:08pm

I am also interested in this topic. Does anyone have any rough numbers about latency for multi-datacenter clusters?

Greg_Young1 · June 1, 2015, 4:21pm

Depends on the latency of the network between the data centers mostly