application specific stream partitioning


Has anyone implemented application specific stream partitioning? That is, partioning streams over several nodes, if a stream gets to big for a single machine…

I would also be very gratefull for any relevalt litteratur pointers…

Thank you.

We will be releasing in q1ish sharding support.

In terms of sharding a single stream over n nodes its actually pretty
easy. The big question is what ordering assurances etc you want
between the multiple nodes. You can do a waiting client that reorders,
no reordering, clock reordering, causal reordering etc. It depends
what you need.



Very happy to hear that you are going to implement sharding! Great news :slight_smile:

What would you be doing that a single stream won't fit on a single
node and there is no business level partitioning for it?


Storing tick data from financial markets… If for some stream we have about 10 updates per second, and we are able to store int.max events in a stream, that gives us about 10 years of storage…

But, we seem to be able to find a natural, if not almost natural…, business level partitioning in the cases we have investigated so far. Though we might loose locality of data, if the data is spread to thin…

We stored stream/session

Is that a reply to the previous post? :slight_smile:

yes e.g. market session (oct-14-google)

Ah, or specifically something like isin-mic-currency-date I guess, but that’s probably too fine-grained for sharding purposes in this case.

yes e.g. market session (oct-14-google)