Read model generation strategies

It’s probably an old topic, but I can’t find any recent discussion on this subject.

What I’d like to do is to generate a read model from different streams. To generate the read model I’ll need to make a sort of “join” between the two stream because my events in a first stream will have ids that will refer to aggregates in the other streams.

For example let’s say I have a ranking application where each user can rank a topic and add a comment.

So I have many “Users-Guid” stream (one per user) with the following events:

  • UserCreated(userid,username)

  • UserRankedTopic(userid,topicid,rank)

  • UserCommentedTopic(userid,topicid,comment)

And many streams “Topic-Guid” with the following events:

  • TopicCreated(topicid,title)

  • TopicTitleUpdated(topicid,newTitle)

Now i’d like to make a read model so that a user can see all the subjects he has commented and ranked like:

  1. Subject “Foo” 4/5 comment “Foo is the best”

  2. Subject “Bar” 2/5 comment “Not as good as foo”

  3. …an so on

And what I want to handle properly is the scenario when the title of a topic is updated: I want that reflected for all users read models.

That’s why I have Ids in my events and not a copy of each topic’s title.

What I’m doing at the moment is the following:

I want to use blob storage to store my read model, so I’ll have a serialized representation of that screen for each user.

When I want to display the screen for userId “Foo”, I go and look for the blob “foo” and deserialized it.

Here my current implementation:

  • Subscribe to all event

  • Do a huge left fold to build a in memory object graph of the whole state needed to generate the read model (So yes I’ll have in memory all users and topics)

  • This is just a cache so that I don’t go back and forth with blob storage, I just want to send writes to make thing faster
  • Any time I handle an event I update the corresponding impacted blobs.

What I don’t like with this approach is that I have to deal with too much state in my fold (the real app has more streams and events).

Now I’m sure there are better way to do it, I’ve read about the following possibilities:

  1. Enrich the events, so in my case I’d need to change my user events to
  • UserRankedTopic(userid,topicid, topicTitle,rank)

  • UserCommentedTopic(userid,topicid, topicTitle,comment)

Which I don’t like to much, because I’ll need to know up front what my UI will be so that the events have all necessary information to build the read model.

Also that means that the additional information will need to come from somewhere like another read model. That looks a bit spagetti to me.

And this works fine only if the topic’s title don’t change.

  1. Don’t join the stream at all.

Generate the read models with “unresolved” ids. The UI will do the mashup.

So when a user wants to see her page I will:

  • Fetch the read model of the user

  • Extract all needed topic ids

  • Fetch all read models for each needed topics

  • Dynamically generated a View that will merge/join the user read model topic ids with each topic read model

Any thoughts ?

modeling side note: My hunch is that User and Topic are data-centric (crud). Your real aggregate should probably be Ranking.

I don’t understand the problem definition: “have to deal with too much state in my fold”.

Are you handling the events by touching affected model in the same code block? As opposed to each read model having its own event handling.

should read: “by touching all affected models”

Yes the same code block is going to generate the read model based on events coming from users and topics stream.

But to me this is just One read model, it’s a particular view of the system that allow users to see the activity of another user, in that case: all comments and ranking made by a specific user.

I have other read models for other “views”/report of the applications that have their own event handling and own internal state to build their read model.

In that case the final read model I want to build is a simple key-value store.

The key is the user id and the value is some serialized json that will be sent down to the browser.

The JSON will contain all the necessary information for displaying all topics commented and ranked by the user.

To build the read model I start from an empty read model and I apply each event as they come to generate the next state of the read model internally and I update my key value store (just the affected users).

The state issue that I don’t like is that for example if I receive the event “TopicTitleUpdated” then I’ll have to go and look for all the users that commented that topic and update their read model with the new new topic title.

Because in what I end up with in my read model, each user has it’s own “copy” of each topic. (The data is denormalized).

I could use a relational database to avoid this issue for storing my read models, but I prefer to stick with a key value store.

I understand the issue now.

Since you really want the current state of the object to display in the user history report, not the state at the time it happened, then it’s best to just link to normalized data. Store the Post in (at least partially) normalized form (could still be in key-value with ID as key). Then the history report would only save the ID of the Post and User. When the UI went to display the report, it will have to load the linked Post and User to get current data.

Ah I see, that means I’ll need to change the UI to use links instead of being “eager” and fetching everything. That could simplify indeed my read models generation.