who has a sharded EventStore?

who has a sharded EventStore?

I’d have some questions, privately or publicly about your setup, problems etc…

I shard my eventstores - typically run 2 different stores in my lab and write to each with a simple “hash(streamId) % 2” system.

For production I’m planning on 3 sharded stores with 3-5 nodes in each.

Haven’t had any problems - it kinda just works.

I build the sharding feature into my library here https://github.com/volak/Aggregates.NET/tree/master/src/Aggregates.NET.GetEventStore

You probably want to use a consistent hash ring for this rather than simple modulus - it will hurt a lot less in the long term!

Consistent hashing is definitely a better strategy (and not that far
off from where you are).

When I decided to try sharding the stores I reallllllly didn’t want to overcomplicate it by supporting adding and removing stores while running. I’ve been kind of a “they’ll be 5 shards no more no less if you want to change it replay everything into a new deployment” haha

I don’t really have any place to start working on such a feature due to the format of streaming data, if a node is added or removed how would you detect and move streams from or to the new node?

I can understand the feature for object storage with replicas and such - but when the object is a stream of events?

When I decided to try sharding the stores I reallllllly didn’t want to overcomplicate it by supporting adding and removing stores while running. I’ve been kind of a “they’ll be 5 shards no more no less if you want to change it replay everything into a new deployment” haha

it’s well explained the right approach in this book chapter 6: http://shop.oreilly.com/product/0636920032175.do

I don’t really have any place to start working on such a feature due to the format of streaming data, if a node is added or removed how would you detect and move streams from or to the new node?

for EventStore the “data move” you do it manually I think

example you have 1 source machine and want to move data into 2 destination machines

I guess I’d do it like this, correct me if wrong

I’d create a projection:

(pseudocode)

foreach (streams in the source machine) {

streamname = “yoshi”

hash_of_the_stream_name == “cab9d8020afebbcefgh” // something

first_character_of_the_stream_hash == “c” // here it’s c

if first_character_of_the_stream_hash is in [0-7] {

put into destination1_stream

} else if first_character_of_the_stream_hash is in [8-f] {

put into destination2_stream

}

then you read destination1_stream and write it into destination1 machine

then you read destination2_stream and write it into destination2 machine

then you can decomission the source machine

What a coincidence I just bought that book last week!

But yeah a manual process is the best I can think of too - something that reads from $all of nodes and splits or merges streams into other node(s) - and I kind of figure if I’m going to resign myself to manually doing this I may as well manually spin up a whole new environment with more eventstores and replay from the begining then try to simply add a store into an existing deployment.