Sharding/multiple replica sets

There has been a lot of talk leading up to sharding/mt with event store to support say up to 300 servers. If anyone has a particular need for this could you please email me with some details? I am trying to prioritize when/how the work should be done. As of now most systems can be relatively easily linearized on a single replica set so it’s been low priority.

Do you need it for size of data or throughout?

Do you need multiple replica sets for tenant isolation (eg multiple DBS) or do you need the ability to shard a single tenant across many servers?

What is the timeline you are looking at?

Are there any other requirements you may have?

Cheers,

Greg

We definitely have some important use cases that would require sharding. I will do some back of the envelope calculations monday and get back to you.

We are looking at writing/storing 50.000 small events per sec for about 20 years.

The events would not have to be stored in a single stream, but can be partitioned naturally into a few thousand streams.

We imagine the number of consumers being below hundred…

What about elasticity how do you want to scale out? Is a set number from beginning good enough? Do you want to grow easily to say 100-500 machines without thought?

Along with this do you need custom hash functions?

Cheers,

Greg

A set number would be fine.

Can you elaborate a bit on the custom hash function part?

Eg you want to handle hashing to a node yourself

I could imagine that being very usefull yes. So we could control the distribution of fast growing streams.

You could do that either way it’s more along the lines of domain specific sharding eg all data for this month should have locality

say a event is 300bytes, the sum is 1To per day;
Did you implement it with EventStore?