For a small microservice we’re building, we’ve decided to use event sourcing and have chosen EventStore for the write model. We are now trying to decide what to use for the read model. A few details about the application:
-
There is only 1 aggregate root object (representing the bounded context of the app’s domain).
-
We expect each stream to have only a few events in it – probably less than 5, most likely less than 10, definitely less than 20.
-
Each time we query to read a piece of data on the aggregate, we will always know the ID of the aggregate (and thus can deduce a stream name at read time).
Given this, it seems that we have at least 3 alternatives for how to implement a read model in this system:
1.) Stand up a separate document database instance like RavenDB in the app, and use domain event handlers to keep its state eventually consistent with the read model.
2.) Make the write-time aggregate root double as a read model by exposing public data properties on it.
- Make a separate read model object that is like the aggregate root but has no behavior methods, and instead exposes public data access properties. Load events into this read object (either directly from the write streams, or from separate streams generated by a user projection) and replay them to hydrate read model state.
It was my initial inclination to go with alternative #1. One of my team members has objected to this due to the additional overhead associated with it, and has suggested alternative #2. After initially arguing against that, I’ve put some thought into it and he does have some valid points (i.e. we will always know the aggregate root id, and there are not very many events to replay, so it should be fast). It still feels wrong though to expose public properties on the write model, since that is where the behavioral methods should be – I don’t want a read client to have access to those. (One solution to that problem could be to mark the behavior methods as internal, so that only command handlers in the same assembly can access them, which still feels wrong.)
Before I propose alternative #3, I wanted to pass it by this group to ask for opinions. Is alternative #3 a viable one? Are there any other alternatives I am not considering? It is wrong for the read model to hydrate from the same streams that the aggregate writes to, and if so, why? (For some reason I could’ve sworn I read somewhere to be cautious about running user projections in a production system, but I can’t remember where, and now I am wondering whether I imagined it.) Keep in mind this is a rather small service-to-service application that has no user interface and is currently not very complex. Most of the application’s complexity comes from our goals to make it continuously deliverable to production, and to compensate for cascading downstream failures when other external services it depends on go down.
Thanks,
What kinds of queries do you need to do? ES doesn't support many of
the query types a document db or sql does.
For example, I need to query an aggregate root object by its ID, to find out whether or not its data has been forwarded to external system X yet (after forwarded, there will be another MessageForwardedToExternalSystem event on the stream). I understand ES is not a good query database. Assume we will always know the ID of the aggregate we want to read data about for all of our queries. Our streams are named based on ID, so we would always be able to deduce a stream name to read from at query time.
If just using as key/value then sure its no problem
Does that mean alternative #2 or #3? If #3, should the read model be hydrated from the same write streams, or from user projection streams?
#2 or #3 would be the most likely.
We have found #3 works well for us in cases similar to yours.
We’ve used entities projected off of an entity’s event stream as a read model. It was done as an expediency under duress though, and not as something we find desirable. Would rather have separate stores for reading that are denormalized to favor specific use cases (ie: materialized views).
It’s a slippery slope. It invites unwanted coupling.
If you do this, you might consider using a query object so that you can replace the implementation later with something that does retrieve from another store rather than the in-memory list of entities.
As an aside, we keep those entities free of any behavior that doesn’t directly have to do with the entity’s attributes, i.e.: methods that return something derived from the attributes. We keep more course-grained business logic off of the entity itself.
I think there’s a risk in seeing Aggregate Root as something that needs to be physically represented in the implementation as a class, rather than as a logical thing that doesn’t have any direct representation in the implementation.
Our aggregate root pattern implementations are a there in the collection of handler objects, command objects, and the root entity itself, along with any child entities (aggregate member). We don’t really have to worry about exposing a “special” kind of entity that doesn’t allow any of the course-grained logic to be executed upon it that would typically be executed as the result of a command message. That business logic, for us, doesn’t co-exist in the same scope as the entity’s attributes themselves.
-Scott
Some further thoughts.
As in the case of any other read model, the state of the read model will always be built off of the streams your aggregates/domain write to, either directly or indirectly, so don’t worry about that.
Also the data model for the reads has no reason to match the domain aggregate even in this case.
For example, you mentioned mainly needing to know if data is forwarded.
So the read model might take a category stream subscription on your aggregate type and just update an in memory list of the ids of aggregates that have not forwarded data.
Then it could provide a simple boolean query method HasUnsentData(id).
If you use a catchup subscription you can recreate it at will.
If there are too many events to replay quickly, just use a persisted checkpoint with the list of IDs and the stream position.
-Chris
I have a concept I call ‘transient read models’ (might not be the best name, but helps me navigate the code).
They are on-demand read models projected from a single stream/aggregate root id.
I use them if I can’t have eventual consistency, or if they’re used so rarely there’s no point in persisting them in a read database.
Kept in a time bound cache, every time they’re accessed I check if there are any new events. No problem with performance up to a few thousand events.
/Peter