Using small streams instead of read models

This is about event sourcing, not EventStore specifically.

I watched this great talk about event sourcing. One of the points I really liked was about using small streams and folding them for each read in lieu of a read model. I really like how simple this approach is and how it eliminates some of the client complexities involved with eventual consistency.

However, there was one concept I struggled with: projections. His suggestion was to create streaming projects, which is basically a collection of smaller streams. His example was: get all the users whose age is > 18 (adult users). He showed how you can just select the streams for these users and add them to a stream projection.

My questions for this approach:

  • You’d have to keep editing the projections, ideally in a simple way through the event handler. For example, a user who was 17 is going to turn 18, so the projection is going to have to update itself to include events from that user’s stream. How is this done without first computing the state?

  • His example is a fairly persistent project (“adult users”), but commonly a query is dynamic (e.g., get all users where age is between X and Y). It’s unclear how streaming projections get this done; do you compute the state for every aggregate, then point the events for the aggregates that match the condition to a stream?

  • Even if you have a “persistent” projection (e.g., adult users), we need a way to “remove” streams. For example, we may have a “children” projection that needs to remove events for aggregate instances that turn 18. How do you remove streams from a projection?

That sounds the the pattern I know as Ephemeral View-Models, they are created on demand by reading from the EventStore each time.

This is a great pattern and how I always start applications. As scale and performance require I add additional caching to the generated read-models. Starting with “hot” models which are simply kept in-memory and check for new messages when they are retrieved. From there it is easy to move up to something like mem-cache and from there to durable caches or even intermediate databases.

Using this approach I will next build read model hierarchies.
Given a category of User
With user streams in the format User-[UserId]
and a UserCreated event with data [UserId],[Birthday], etc.
and standard projections running
we can very simply build a [UserBirthdays] read model with 2 columns [UserId],[Birthday]
from a subscription to either $ce-users or $et-userCreated ($et will be more efficient, but $ce will make handling correction events easier)

A query to this read model will be able to get the UserIds of users in any date range.
These [UserId]s can then be used to build/load/update targeted [UserDetail] read models on demand.

And if the Application just requires something simple like a list of names, that can be added as another Column on the [UserBirthdays] read-model

One data design note here:
One change I made to the problem is moving from a read model with Age to a read model with Birthday.

This is because Birthday is source data and Age is derived data.
Age == Today - Birthday and is always changing.

Therefore, a list of Ages isn’t stable and need to be rerun to get current information.
However, a list of Birthdays will not change and is stable to reuse.
Further add new users will not invalidate any existing user.