who has a sharded EventStore?
I’d have some questions, privately or publicly about your setup, problems etc…
who has a sharded EventStore?
I’d have some questions, privately or publicly about your setup, problems etc…
I shard my eventstores - typically run 2 different stores in my lab and write to each with a simple “hash(streamId) % 2” system.
For production I’m planning on 3 sharded stores with 3-5 nodes in each.
Haven’t had any problems - it kinda just works.
I build the sharding feature into my library here https://github.com/volak/Aggregates.NET/tree/master/src/Aggregates.NET.GetEventStore
You probably want to use a consistent hash ring for this rather than simple modulus - it will hurt a lot less in the long term!
Consistent hashing is definitely a better strategy (and not that far
off from where you are).
When I decided to try sharding the stores I reallllllly didn’t want to overcomplicate it by supporting adding and removing stores while running. I’ve been kind of a “they’ll be 5 shards no more no less if you want to change it replay everything into a new deployment” haha
I don’t really have any place to start working on such a feature due to the format of streaming data, if a node is added or removed how would you detect and move streams from or to the new node?
I can understand the feature for object storage with replicas and such - but when the object is a stream of events?
When I decided to try sharding the stores I reallllllly didn’t want to overcomplicate it by supporting adding and removing stores while running. I’ve been kind of a “they’ll be 5 shards no more no less if you want to change it replay everything into a new deployment” haha
it’s well explained the right approach in this book chapter 6: http://shop.oreilly.com/product/0636920032175.do
I don’t really have any place to start working on such a feature due to the format of streaming data, if a node is added or removed how would you detect and move streams from or to the new node?
for EventStore the “data move” you do it manually I think
example you have 1 source machine and want to move data into 2 destination machines
I guess I’d do it like this, correct me if wrong
I’d create a projection:
(pseudocode)
foreach (streams in the source machine) {
streamname = “yoshi”
hash_of_the_stream_name == “cab9d8020afebbcefgh” // something
first_character_of_the_stream_hash == “c” // here it’s c
if first_character_of_the_stream_hash is in [0-7] {
put into destination1_stream
} else if first_character_of_the_stream_hash is in [8-f] {
put into destination2_stream
}
then you read destination1_stream and write it into destination1 machine
then you read destination2_stream and write it into destination2 machine
then you can decomission the source machine
What a coincidence I just bought that book last week!
But yeah a manual process is the best I can think of too - something that reads from $all of nodes and splits or merges streams into other node(s) - and I kind of figure if I’m going to resign myself to manually doing this I may as well manually spin up a whole new environment with more eventstores and replay from the begining then try to simply add a store into an existing deployment.