Projection with foreachStream() on large categories

roland.nasr · May 4, 2020, 9:47am

I’m looking in to using the foreachStream() method in my custom projection and from what I can gather from the documentation is that this should manage an individual state object per stream. I plan to run this on a category which has over 10 million streams in it and I want to track some data that is in the first event of every stream in state so that it can be used in the subsequent event handlers. Is my assumption correct? If so, how well does this scale and will this run continuously while keeping track of all state going forward?

Here is a basic example where I want to partition my stream of TransactionConfirmed events by UserId, but the UserId is in the TransactionCreated event. (I understand that we could have included the UserId to the Confirmed event to make this simpler)

fromCategory("transactions")
    .foreachStream()
    .when({
        TransactionCreated: function(state, event) {
            state.userId = event.data.userId;
            return state;
        },
        TransactionConfirmed: function(state, event) {
            var streamId = "UserTransactions-" + state.userId;
            linkTo(streamId, event);
            return state;
        }
    });

steven.blair · May 4, 2020, 11:35am

Roland,

We do something similar, but what I don’t know is if there performance penalty with the partition.
Not much help I guess, but at least you know someone else is using this technique lol

steven.blair · May 4, 2020, 12:06pm

In your TransactionCreated, you have userId, and guessing by the TransactionConfirmed handler, userId is not available on the event.
What would you do if there was another field(s) in the TransactionCreated that was import?
That field wouldn’t be available on the stream TransactionCreated-, so would require a hit to the eventstore to get the partition state.

In one of the solutions we run here, we have to include all the important fields in the events when we start delivering these events (via persistent subscriptions).

Are there any subscribers to UserTransactions streams in your solution?

roland.nasr · May 4, 2020, 1:59pm

Hi Steven,

Thanks for your response. In our use case we are only trying to enrich subsequent events in our stream with data that we know exists in the first “Created” event of every stream. Not sure how this is possible without a partition given that the state data is stream specific and our dataset is very large. We know we can do this externally with a read model or could have included this data when it was initially persisted to ES, but we didn’t and now would prefer to take this route if it will scale and is reliable.

-Roland