Read Model - Denormalize without temp table

colonnanicolas · November 14, 2016, 10:31am

Hi,

We’re about to write a denormalizer that needs to write to something like this:

Events:

Employee stream:

EmployeeCreated{

Id,

FullName,

EmployeeTypeId,

DepartmentId

}

EmployeeType stream:

EmployeeTypeCreated{

Id,

Description

}

Department stream:

DepartmentCreated{

Id,

Description

}

We would like to have something like this:

Read Side (Sql Table)

Employee

Id

FullName

EmployeeTypeId

EmployeeTypeDescription

DepartmentId

DepartmentDescription

We couldn’t find a way to get this without temp tables so we think we need to have like a star schema. Are we correct?

Can anyone suggest what’s the best way to get this with or without a star schema?

Cheers,

Nicolas

Kasey_Speakman1 · November 14, 2016, 11:38pm

Write Department[CRUD]ed to a Department table, EmployeeType[CRUD]ed to an EmployeeType table. (You’ll need these anyway for drop-down lists in the UI, right?)

Then on EmployeeCreated do an INSERT … SELECT (or whatever equivalent your SQL dialect uses) with the select joining in the data from the other tables.

The Employee updater will also now have to listen for DepartmentUpdated and EmployeeTypeUpdated to update those descriptions. Assuming your intention is to keep the descriptions updated with the current value.

These events are obviously designed around normalized entities. So maybe it’s not worth denormalizing this CRUD view (depending on query load) and just do JOINs instead.

Some would advocate not making this event sourced at all. But I assume you need history for another part of your system which has behavior?

ncolonna · November 15, 2016, 11:03am

Thanks Kasey,

I haven’t mention that our idea is to build micro-services. I tried to simplify in a small example. It will be an Employee micro-service and then a Catalog service with all types (Department, EmployeeType). But this catalog service will be publishing events to GetEventStore after saves to the DB so it won’t be event sourced.

The Employee agreggate is in a different context than the department or the EmployeeType.

We want to do more or less what Netflix or Spotify does that a team could build a new read model for example for the “Top 10 songs” with no strong (or sync) calls, just subscribing to events.

But the only way that I can see that we could get this model populated in the read side:

Employee

Id

FullName

EmployeeTypeId

EmployeeTypeDescription

DepartmentId

DepartmentDescription

Is having like a star schema. As you mention, subscribing to all Department, EmployeeType and Employee events, populate three different tables and join them.

Peter_Hageus · November 15, 2016, 11:12am

Who’s the consumer of this read model?

If you wanna expose this readmodel from your employee service I think your mixing concerns, since it does not own the descriptions.

We’ve started using GraphQL for our front-end api’s, this has made it very natural to combine data from multiple sources. Read/view models are specific to the application, if your write side context is responsible for aggregating it becomes very coupled to consumers.

/Peter

ncolonna · November 15, 2016, 1:02pm

This read model won’t be part of the employee service.
The idea is having read models specific for each UI component (mostly grids).

The problem of not having the data in the same place starts if you want to allow users for example to filter employees by department name.

And in my opinion having the ability to take advantage of the event driven architecture and subscribing to events you can build the best read side to solve your queries in a better way rather than getting data from different sources.

I’m not familiar with GraphQL just had a quick look. I’ll check and see if it suits us. Thanks Peter

Kasey_Speakman1 · November 15, 2016, 4:05pm

I’ve got to give a standard disclaimer on microservices. I would very carefully consider whether microservices is the right approach for new projects. What you gain in isolation, you lose in debugging (which is now just reading logs), operation costs, and infrastructure code (automated deployment, gracefully handling dependent service failures, api gateway, etc). Most of these large providers like Amazon started as monolithic and then fragmented into microservices as the organization grew. I would suggest that microservices are not a place to start unless you already have team expertise with them. Once you have scaling problems (great problems to have), you can carve off parts of the monolith into microservices, and by that time you have a better handle on your domain and therefore how it should be organized as microservices.

https://gigaom.com/2015/11/06/why-monolithic-apps-are-often-better-than-microservices/

ncolonna · November 16, 2016, 5:24pm

Hi Kasey thanks for your suggestions. I think you just assumed that we’re starting a project from the scratch. It’s not the case. We already have a monolithic app with millions lines of code and 12 developers working on it.

Kasey_Speakman1 · November 16, 2016, 8:29pm

Ok, wasn’t sure so I thought I would mention it.

Your denormalized view needs to marshal data from other BCs. The way to do that is just to keep your own purpose-built copy of the required data in your BC, updated from the other BC’s messages. Then consult your own copy of it for generating denormalized views. And as you receive updates on that data, update your copy and any denormalized views that use it. You could also exercise the option to keep it normalized and join it in for views. (All as previously discussed.)

If the services involved are designed to be (usually) responsive to queries, you could go with an API gateway approach commonly used in microservices. The API gateway is responsible for marshaling data from all services to fulfill the view. In case a service is unavailable or slow to respond, the API gateway should answer from cache. If not in cache either, return some default value e.g. “loading” and a return code that indicates the client should retry.

It seems like your division of microservices (presented so far) makes normalization a better fit than denormalization.

Kasey_Speakman1 · December 16, 2016, 4:05pm

I thought I would update this post, since I had to delve into this decision for a new project. I have some data which is inherently normalized (i.e. Student and Course data), and another entity which links to it (i.e. Registration) that has behavior.

So to write a view for Registration:

Denormalize (burden on writer)

++ faster reads, no joins

+ logical failures (e.g. missing data) already handled by writer (faster reads)

- inserts require reading initial data from other tables

  - what if data in other tables are not populated?

- extra updates on table when related data changes

Keep normalized (burden on reader)

+ easy insert/update statements

- logical failures must be handled on read (slower reads)

-- joins (slower reads)

Since the purpose of these views are for reading, I decided to denormalize and pay the extra write cost for the fastest reads.

For me, there’s not yet any danger of data from other tables not being populated. But in your scenario, you don’t have perfect ordering due to distributing. Probably, I would just default to “Loading…” for unpopulated strings.

The other thing I learned in this exercise is that denormalized views require patch updates to multiple rows. E.g. StudentUpdated -> “UPDATE Registration SET StudentName = @StudentName WHERE StudentId = @StudentId”

My initial strategy (with very “normalized” mindset) was to keep a Registration (or any other view) object in memory, batch updates to it (during replay), and sync to DB periodically or when the subscription hit live. But with denormalized views, there’s no way to know that I have all affected objects in memory without asking the DB. So no real time savings.

Patch updates are batchable in some DBs… e.g. in SQL, keep the statement and its parameters in memory until a certain threshold, then run them all as one statement.

Another alternative is to setup a projection in EventStore, either to fold to current state of the denormalized view or to filter events down to only the ones needed for your denormalized view. I might explore this option further. It is difficult for me to pick this option, because there is more flexibility (but more initial work) in building the views myself.

Darien_Narvaez · December 19, 2016, 12:42pm

Just a lesson learned about denormalization from my own experience that it may be useful:

You do not need to read from other tables when denormalizing. Just do plain inserts from the payload of your events, and do not worry of other data when writing. Embrace eventuall consistency in the read side: do not be afraid of a few joins on the read side. If you are joining with a table that does not have the data yet… well, THATS eventual consistency. It will be there, soon or later