Projection capabilities

heiner.bunjes · April 20, 2022, 8:25pm

I am heaving a hard time understanding what’s possible with continuous projections and what’s not.

The most simple problem I want to solve looks like this (all others problems are impossible to solve if this simple one isn’t possible):

I have a million users.
Each user writes his UI interactions into the stream ui-{userId}.
Let’s say there are (amongst others) the EventTypes start and stop in ui-{userId}.

No I want to create projections that result in the following behaviour:

Each time a user stops within 4 hours after starting an event is appended to leftearly-{userId}.

Is this possible with a (or some) continuously running projections without creating a million projections (which I assume is not a good thing to do) and without a projection that has a state which contains all the last starts for a million users?

And if this is possible (which I hope very much), how?

The way I see it, when a stop is received by the projection the projection must somehow get access to the last start event in the stream ui-{userId}. But maybe I just don’t see the simple solution here.

Thanks in advance!

heiner.bunjes · April 20, 2022, 8:48pm

Ahhh, foreachStream() is the magical concept, right?

yves.lorphelin · April 21, 2022, 5:40am

foreachStream()

yes indeed it is

alexey.zimarev · April 21, 2022, 6:37am

I am not sure how you can do it because the projections runtime is reactive, it is called for each event appended to the log, but there’s no scheduler there. So, the “after 4 hours” condition implementation for me sounds quite hard.

heiner.bunjes · April 21, 2022, 8:54am

Luckily the condition is not “after 4 hours” but “when user stops earlier than 4hs”
So this pseudocode should do it:

.foreachStream()
.when(

    start: save datetime
    
    stop: if (now - {saved datetime} < 4h )
          then {emit ("to early")}

heiner.bunjes · April 21, 2022, 9:00am

To help me with architectural choices::

How is foreachStrem() implemented?
Specifically: How is the state for each stream persisted?

Is it a problem to have a million users and save a few hundred bytes per stream as the state of the projection?

Thanks for your support!

steven.blair · April 21, 2022, 3:41pm

heiner,

I think if you have a million streams needed datetime track, then you would have a partition stream per user (i.e a million new streams created)
That might not be a problem itself, but if you mutate the state (event something like s.count++) it adds a new event each time.

alexey.zimarev · May 12, 2022, 1:51pm

If I may, I’d actually get to the business case. It sounds like some sort of analytics, and tools like Google Tag Manager, or Kibana can do the work way better. Are you sure you want to build those things?

heiner.bunjes · July 6, 2022, 12:57pm

Thanks for your suggestion. Actually my example was totally imaginary and I agree: For that example I should have a look at those tools.

alexey.zimarev · July 9, 2022, 7:39am

What you’re looking for is a partitioned projection with state. I provided an example here https://stackoverflow.com/questions/72830757/eventstore-sort-by-partical-stream