Migrating events to new streams (remodelled aggregate)

urbanhusky · June 20, 2016, 7:16am

Hi,
I’m using Event Store for our DDD/CQRS/ES architecture. Each aggregate is sourced from/to an event stream like {aggregate type}-{id}.
Since remodelling of the domain model could result in new aggregate boundaries (merging aggregates or splitting into multiple aggregates), this would break the 1:1 mapping between aggregates and streams.
What options do I have available?
Merged aggregateI could read from multiple streams, but doing so incrementally (streaming/batching the read) is a tough problem considering that I somehow have to retain the relative order between the streams as well. I can’t just read from $all because that will be very considering the billions of events we expect to have. A projection that links these streams would, as far as I understand it, only be eventually consistent…
Split aggregateEach aggregate could read from the shared stream and only use the events that apply to it.

Afterwards, each aggregate (merged or split) would emit events to its own stream.

Another alternative might be to somehow migrate the events - but since an event store is append-only (for good reasons), we can’t just replace the existing events.
If we migrate the events to new streams, we obfuscate any temporal correlation that we might have had - not to mention that the read side projections will see both the old and migrated events…

What strategies could I use here?

Greg_Young1 · June 20, 2016, 8:49am

This isn't really Event Store specific however:

"I could read from multiple streams, but doing so incrementally
(streaming/batching the read) is a tough problem considering that I
somehow have to retain the relative order between the streams as
well."

Yes this works per stream quite well.

" I can't just read from $all because that will be very considering
the billions of events we expect to have."

Migrating via $all is likely what you want if you also need relative
ordering (very hard to provide without). Another option here is
by-category if you have many different types of streams.

"If we migrate the events to new streams, we obfuscate any temporal
correlation that we might have had - not to mention that the read side
projections will see both the old and migrated events..."

What you normally do is migrate to *new* streams, then remove the old
ones. This will cause projections in the future to only see the new
streams.

urbanhusky · June 20, 2016, 9:07am

Is the position of an event related to the absolute order? I.e. if I have multiple streams to read from, could I read from each in parallel and reorder between the streams on the fly according to position?
Example: Stream A {0,1,3,6}, B {2,4,5,7,8}
Read A and B (streaming), 0 < 2 => use event from A, read next event from A; 1 < 2 => use event from A, read next from A; 2 < 3 => use event from B, read next from B…

Migration of events would still mess up the temporal correlation to all other events - the migrated events would happen “now”. This could ruin projections.
Especially if projections rely on data from multiple aggregates (e.g. for denormalisation of the customer name in an order I might need to load the customer - but now the customer events never happened before an order for that customer was placed).

So I guess the only option is to handle this with reading from multiple streams or interpreting a single stream in multiple ways. I’m not sold on replacing entire streams with new events. Hm.

P.S. I think that this is very related to Event Store, because other stores might need different approaches.

Greg_Young1 · June 20, 2016, 9:09am

"Migration of events would still mess up the temporal correlation to
all other events - the migrated events would happen "now". This could
ruin projections."

And if you changed historical ones you would be just as screwed (how
would you notify projections of those changes?)

urbanhusky · June 20, 2016, 9:19am

I’m sorry but I don’t follow what you mean?

Do you mean what would happen if I had to change the event type?
If I have to “change” the events, I would emit new event types (AccountOpened --> AccountOpened2 - or use some other magic so that AccountOpened is always the newest version). When reading, I could try to translate to the new version (directly during/after the deserialisation step), or I have to support the old version in my aggregate and projections.

Greg_Young1 · June 20, 2016, 9:24am

You miss my polnt. If you were allowed to *update events* your
projections would still be screwed. Just as if you write new events it
will need some level of understanding. The second is the easier of the
two problems.

EG: if you updated an event, how would your projections know about this?

urbanhusky · June 20, 2016, 9:44am

Well, your point was/is made with an example that entails a whole other set of issues and assumptions.
I don’t want to be able to update events, that’s not even the underlying issue here - so your example of “assuming that you would be able to update events, you would have problems too”, while valid, is merely tangential (to me).
Is the position of an event an indicator of absolute order over all streams?

I don’t know if it is. Assuming it is, then this whole “migration” scenario could most likely be solved merely by how I read the events whenever I have to reconstitute my aggregates. The projections would have to run from the existing events - which should still work, since they are not directly bound to how the aggregate boundaries are designed. The underlying process that the events describe should be more stable than the aggregate boundaries.
If it isn’t, and there isn’t any other way to identify which event came first between two streams, then migrating to new events would be the only option left.
Considering all the issues that migrations cause, I’d prefer to solve this while reading the old events, if possible. Which is why I’m not yet responding to the migration idea.

Greg_Young1 · June 20, 2016, 9:45am

"Is the position of an event an indicator of absolute order over all streams?"

Yes. Providing a single replica set.

Greg_Young1 · June 20, 2016, 9:46am

Also there is a lot of prior discussion (event sourcing in general)
around this topic.

urbanhusky · June 20, 2016, 9:52am

Could you perhaps give me a few pointers or keywords that would help me in my search? I only found two topics on the DDD/CQRS group and google is too greedy with the search results for the terms that I used.

urbanhusky · June 20, 2016, 10:06am

Is this interpretation correct then:

Within a replicated cluster of Event Store nodes, the position corresponds to the order in the $all stream
Multiple replication sets (i.e. clusters) would have different data and different $all streams anyway and there is no overall ordering between replica sets.

Greg_Young1 · June 20, 2016, 10:07am

Yes.

urbanhusky · June 20, 2016, 10:09am

Thank you