Eventstore configuration for app with CP requirements over WAN

Denis_Mikhaylov · February 3, 2016, 12:09pm

Hi!

I have an app running in two datacenters backed by Eventstore.

App traffic is routed to only one DC at time.

However in case of outage of active DC1 I want to be able to route traffic to second DC.

And I really need consistency here (it’s a financial app). Even if I will be unavailable for some period of time.

I see two ways of setup:

Create Eventstore cluster over WAN. Not sure how it will behave though.
Configure JVM driver somehow so that it will write to both clusters reliably.

What would you recommend?

Greg_Young1 · February 4, 2016, 9:45am

Use 3 DCs is the best way. Using 2 DCs you will always have one that
can fail that contains a quorum of the nodes.

Denis_Mikhaylov · February 4, 2016, 10:11am

I’m aware of the problem. But we have only 2 at the moment.
Even if we had 3 do you suggest creating a cluster of eventstore across DCs?

As I said my app will be working only in on DC at any time, with manual switch to another in case of emergency.

What would happen if I add nodes from different clusters of ES to app driver config?

What else would you suggest to do in my conditions?

Denis_Mikhaylov · February 4, 2016, 10:16am

And as I said. I want my eventstore data to be the same in both DC. Even if it is slower and more fragile.

Greg_Young1 · February 4, 2016, 10:16am

No matter what CP cluster you try to run in 2 DCs it will have
problems: downtime on single failure, split brain, or one side having
a quorum and the other not. This is why people use an odd number of
nodes.

There is a new replication model under way that will be AP + conflict
detection. This would work in a 2 DC environment but during partitions
conflicts are possible. These conflicts are detected and can be
managed at an application level. If you want I can send you something
that works in a similar fashion outside of ES (EG most of the work is
doing the same thing but over internal protocols)

Greg

Denis_Mikhaylov · February 4, 2016, 10:32am

Yep I know about caveats of using even number of nodes in clusters.
What if I have 6 nodes (3 in each DC) and quorum of 5.

It will have same data on all nodes in normal operation, still giving the ability to perform OPS works on nodes one by one.

And it will stop completely if one DC goes down. That model is acceptable for me.

Please share your experience about how ES cluster behaves over WAN.

Is there any tuning we need to do, e.g. how to set quorum size and increase gossip timeout.

Anyway I’m very interested in what you suggest to send me and how conflict management is implemented at app level.

Thanks!

Greg_Young1 · February 4, 2016, 10:43am

You seem to have some fundamental misunderstandings about how quorums
work. There is no reasonable way to run a quorum across two DCs as you
specify. The quorum will always be contained in one DC and losing that
DC will end up with not being able to build a quorum. The simplest
case is 3 nodes (quorum size is 2). Placing 3 nodes in 2 DCs you will
always end up with one DC with one node and one with two. If you lose
the DC with two nodes the other node will not be able to build a
quorum.

You mention 3 nodes in each with a quorum of 5 (the quorum here would
be 4 (n/2+1). With a quorum of 4 losing either DC would cause a quorum
not to be able to be made. If you put 7 nodes then one side would be
able to make the quorum but the other not (same as above with 3
nodes). There is no way around this.

Things like "can I make a cluster of 6 nodes with a quorum of 2" break
in varying cases because you don't really have a quorum (eg it can
lose writes). The whole point of the quorum is if I require a majority
of nodes to ack a write and a majority to decide who is a leader there
is guaranteed to be at least one node that is in both sets (eg at
least one node will know of a write).

RE AP replication
https://gist.github.com/gregoryyoung/2f8da5a4a6aa7191c4b3 was my quick
spike. Its using embedded and working on a given stream but could be
modified pretty trivially to work on $all and with a node instead of
an embedded node.

Greg

Denis_Mikhaylov · February 4, 2016, 11:06am

N/2+1 is just a minimal amoun of nodes to fomr a quorum. Nothing stops you to set it to be 5 for 6 nodes. So what about ES cluster across WAN? Default settings would be fine?

Greg_Young1 · February 4, 2016, 11:15am

You likely want to up the heartbeat timeouts and intervals to avoid
getting false positives and elections occuring (they default to pretty
low values meant for a LAN).

"N/2+1 is just a minimal amoun of nodes to fomr a quorum. Nothing
stops you to set it to be 5 for 6 nodes."

Yes but setting it to 6 with 6 nodes in 2 data centers would just mean
you are assured to be down if either DC is down (whats the point of 2
DCs then?!) it would actually be down with any node down!

Greg_Young1 · February 4, 2016, 11:20am

To be clear: # of failures you can tolerate = cluster size - quorum
size (where quorum size is min n/2+1)

Denis_Mikhaylov · February 4, 2016, 11:24am

Yeas I meant that

Thanks for your answers, Greg.
Will try and see what will work for us.

Greg_Young1 · February 4, 2016, 11:25am

TBF if you setup a 6 node cluster with a quorum size of 6 it will be
less available than a single node