application specific stream partitioning

Fredrik_Skeel_Lokke · December 9, 2014, 10:33am

Hi

Has anyone implemented application specific stream partitioning? That is, partioning streams over several nodes, if a stream gets to big for a single machine…

I would also be very gratefull for any relevalt litteratur pointers…

Thank you.

Greg_Young1 · December 9, 2014, 3:35pm

We will be releasing in q1ish sharding support.

In terms of sharding a single stream over n nodes its actually pretty
easy. The big question is what ordering assurances etc you want
between the multiple nodes. You can do a waiting client that reorders,
no reordering, clock reordering, causal reordering etc. It depends
what you need.

Cheers,

Greg

Fredrik_Skeel_Lokke · December 9, 2014, 3:45pm

Very happy to hear that you are going to implement sharding! Great news

Greg_Young1 · December 9, 2014, 3:47pm

What would you be doing that a single stream won't fit on a single
node and there is no business level partitioning for it?

Greg

Fredrik_Skeel_Lokke · December 10, 2014, 7:34am

Storing tick data from financial markets… If for some stream we have about 10 updates per second, and we are able to store int.max events in a stream, that gives us about 10 years of storage…

But, we seem to be able to find a natural, if not almost natural…, business level partitioning in the cases we have investigated so far. Though we might loose locality of data, if the data is spread to thin…

Greg_Young1 · December 10, 2014, 9:01am

We stored stream/session

Fredrik_Skeel_Lokke · December 10, 2014, 11:31am

Is that a reply to the previous post?

Greg_Young1 · December 10, 2014, 11:40am

yes e.g. market session (oct-14-google)

Jan_Sandquist · December 11, 2014, 4:53pm

Ah, or specifically something like isin-mic-currency-date I guess, but that’s probably too fine-grained for sharding purposes in this case.

yes e.g. market session (oct-14-google)