ES + CQRS Microservice Setup + Read Model Generation

Hi all

We’re exploring the idea of moving our architecture to an event-sourced + CQRS model. It sounds like a very interesting approach and I’m working on a POC currently. Our SaaS is in the productivity space and we have around 15 microservices that we are refactoring/rewriting.

We like the idea of having a built-in history to analyze user behavior and especially to have one source of truth to generate multiple read models.

1. Eventstoredb per Microservice
Is the common setup to have an event-store instance per microservice? This would allow for loose coupling, but I assume that for HA, we would require 15 x 3 = 45 instances of Eventstoredb?

2. Multi-Tenant Stream Naming Convention
I read on the forum that a “stream per instance” is the recommended way. In our soft multi-tenant infrastructure, would this mean that our stream naming convention could look something like this {backend-id}-{service-id}-{tenant-id}-{instance-id} = todoApp-tenantService-tenant1-task2345 (could use business domain instead of micro-service-id). And for each new todo-task we would create a new stream.

3. Generating Read Model
While I’m able to create events in streams, I am struggling with creating a read model out of a stream. What is the way to go? We use Typescript and I think that I need to write some sort of event handler that takes the event, parses it and writes it into the read model. And for each read model I need to create new handlers?

4. Event Source everywhere?
It sounds great to have the benefits of event sourcing, but is it recommended to use that model in the entire architecture in favor or “regular” microservice approaches (without CQRS / ES)?

Thanks and kind regards

Lucas,

My opinion on your points:

  1. I would be trying to keep the running instances of Evenstore down to the bare minimum (for various reasons). Do you really want to be rehydrating, projecting and general maintenance of multiple Evenstores? That’s before you even consider clusters.

  2. We run a multi tenant system, and rather having all our streams including the tenant id, we simple include a tenantId property in the event. We then use a projection to create a “tenant” stream which we can then work from. it make writing streams a whole lot easier (and trying to remember the names)

  3. I guess it depends what you need your read model to do. You can subscribe to various streams, consume the events, then populate our read model accordingly). Back on our Tenant based system, we simply create a subscription to each Tenant stream, consume each event, and dump the results into a SQL Database.

Hi Steven

Many thanks for your answer!

I was considering to have a centralized eventstore for all microservices, but one faulty upgrade to that, would knock out the write side for all services. But in terms of maintenance you’re right, maintaining one cluster would be a lot less overhead. Is there that much maintenance involved with rehydrating/projecting/maintaining an evenstore?

Good point, we could just store the tenant-id in the event. Could you elaborate on the “tenant” stream part? Does the stream contain all events from a domain, assuming it’s about task management, would this stream contain all tasks from all tenants/users?
I thought that one would create a stream for every single task-id and read from that. If we had 1 million tasks, we would have 1 million streams. Further, I thought that projections are mainly used for temporal queries, assuming we have 1 million tasks, having 1 million projections might be a bit much.

Thanks! Can I ask how you dump them into the database? Is it an upsert per instance-id. For, example: if the task id is 23495 => extract task id from event => upsert in database? Would the service that extracts the event somehow need to track which events have been inserted into the database?

Cheers

  1. Not much maintenance in my experience, but would not fancy scaling that up to multiple Eventstores.

  2. Exactly, this generated Tenant stream would have copies (linkTo) of each event. This would not be used for rehydrating though, and and probably not for querying.
    An example we have for tenants is our big stream per tenantm, with a projection subscribing to the streams. In the projection state we keep track things like “numberOfSales”, “lastSaleDate” etc which is quite handy for enriching events (and supporting).
    Out expensive querying (for example, “let me see all sales from 2021” would not be from the Eventstore. This is a SQL database query.

  3. Yeah, as our events arrive into the query model we have two paths. The first is just about getting the event into the datatabase as quickly as possible. So we do an INSERT (and ignore duplicate key - this handles retries easily). This is for events where we are receiving high volumes.
    Other events (enriched events) we will do an upsert operation.

In terms of tracking events, it depends on what your events are.

Take these examples:

{ “tenantId” : 1, “eventId”, 1, “description” : “test event 1” }

You can easily INSERT this into your read model, and if this is repeated (via a retry or a reset) you can silently handle duplicate keys (make event id your key)

if you have an event that might need to make changes to something in the read model, you need a mechanism of “knowing” if you want to persist the event.

Take this example:

{ “tenantId” : 1, “eventId”, 1, “saleBasketTotal” : 10.00, “date” : “2021-06-01 11:13:04” }

You can easily check if this event has been persisted (using the EventId) but if you are keep a running total, you could use the date as a way of “knowing” whether this should be processed.

*as a disclaimer here, it could get a bit more complicated depending on how you want to consume your events. They might not be played in order (depending on the choice you make for subscribing to streams)

On one of our systems we created two routes. One where we can blindly always insert events, and one that needs some logic to decide whether we persist.

Hope that helps!

I’ve built a system that included the tenant id to the stream name. It allowed us to clearly distinguish tenants, so, for example, each tenant might have their own read models in a separate database/collection. By doing so, we effectively eliminated the risk of exposing one tenant’s data to another tenant, unintentionally.

It would probably only be feasible for systems with a low number of tenants, or when read models are using a database that doesn’t have heavy-lifting for databases (like it’s in MongoDB, compared to MS SQL Server).

Another possible advantage of putting the tenant id prefix to streams is the ability to partition subscriptions. I don’t think you have any need to maintain the order of events across tenants. So, by partitioning and subscriptions by tenant id, you can increase the number of parallel read model updates from one (global order) to X, where X is <= tenant count.

Alexey,

I think we end up with the same result (I hope anyway)
We have a proejction which does something like this:

if (isAnOrganisationCreatedEvent(e)) {
    s.organisations[e.data.organisationId] = {
	name: e.data.organisationName.replace(/-/gi, "").replace(" ", "")
    };
}

linkTo(s.organisations[e.data.organisationId].name, e);

That gives us a stream per tenant, which we then subscribe to.

Have we missed a trick by not adding the tenant id into the stream name when we write the original events?

Well, it depends on what the goal is. You have to use a custom projection to link tenant events to the tenant stream. It add the write and CPU load to the server. With the tenant ID as a part of the stream name, you’d get it in the category stream. Alternatively, you could’ve used server-side filter by stream prefix in each tenant subscription, which would eliminate the need to run any projections, including the system ones. Server-side filtering can never use the event body for filtering, so that won’t work in your case.

Thanks. Something for us to consider.