Large number of streams concern

Szymon_Pobiega · December 13, 2013, 6:25am

Hey

I would like to store documents (saga state) in ES and have indices on various properties of them (in order to look up a saga instance). I’d like to validate with you my approach as I am a bit concerned by the number of streams it will generate.

For each Saga type I would have a projection like this one:

fromCategory(‘SomeSaga’)

.when({

SagaSaved : function(s,e) {

linkTo(‘SomeSaga_ByOrderNumber-’+e.data.orderNumber,e);

return s;

}

})

For each property I’d like to index, for each active saga instance, it would create a stream with name containing the value of the property. So let’s say I have 1M active sagas (I guess this is a very high limit) and 5 index properties, I would end up with 5M streams only for indexing sagas. Wouldn’t that kill ES performance?

Szymon

Greg_Young1 · December 13, 2013, 8:06am

These would definitely amplify your writes. That said I would guess you would still be at an acceptable level of performance especially if done in a single projection. It is worth testing/benchmarking.

Greg

Szymon_Pobiega · December 13, 2013, 9:02am

Thx. I think the number of unnecessary writes can be decreased by adding ‘hasChanged’ flag to modified properties when saving. This way I will only emit new index events if the indexed property has actually changed and since saga routing is usually based on some IDs which does not change frequently, there will be probably one index event per property, emitted only when new saga instance is stored.

Szymon

Greg_Young1 · December 13, 2013, 9:05am

You can do that using your state as well instead of storing it on the message.

Szymon_Pobiega · December 13, 2013, 9:27am

Wouldn’t storing this in state for each saga instance impact perf more than just plain emitting events every time saga is saved? Or to rephrase this question, are stateful projections much more demanding than statless?

Greg_Young1 · December 13, 2013, 9:30am

state does not need to be written out on every change (its checkpointed)

Yuri_Solodkyy · December 13, 2013, 9:32am

State is persisted only as checkpoints. When a projection is recovering after the system restart it makes sure not to write already emitted events twice.

-yuriy