Migrating events to new streams (remodelled aggregate)

Hi,
I’m using Event Store for our DDD/CQRS/ES architecture. Each aggregate is sourced from/to an event stream like {aggregate type}-{id}.
Since remodelling of the domain model could result in new aggregate boundaries (merging aggregates or splitting into multiple aggregates), this would break the 1:1 mapping between aggregates and streams.
What options do I have available?
Merged aggregateI could read from multiple streams, but doing so incrementally (streaming/batching the read) is a tough problem considering that I somehow have to retain the relative order between the streams as well. I can’t just read from $all because that will be very considering the billions of events we expect to have. A projection that links these streams would, as far as I understand it, only be eventually consistent…
Split aggregateEach aggregate could read from the shared stream and only use the events that apply to it.

Afterwards, each aggregate (merged or split) would emit events to its own stream.

Another alternative might be to somehow migrate the events - but since an event store is append-only (for good reasons), we can’t just replace the existing events.
If we migrate the events to new streams, we obfuscate any temporal correlation that we might have had - not to mention that the read side projections will see both the old and migrated events…

What strategies could I use here?

This isn't really Event Store specific however:

"I could read from multiple streams, but doing so incrementally
(streaming/batching the read) is a tough problem considering that I
somehow have to retain the relative order between the streams as
well."

Yes this works per stream quite well.

" I can't just read from $all because that will be very considering
the billions of events we expect to have."

Migrating via $all is likely what you want if you also need relative
ordering (very hard to provide without). Another option here is
by-category if you have many different types of streams.

"If we migrate the events to new streams, we obfuscate any temporal
correlation that we might have had - not to mention that the read side
projections will see both the old and migrated events..."

What you normally do is migrate to *new* streams, then remove the old
ones. This will cause projections in the future to only see the new
streams.

Is the position of an event related to the absolute order? I.e. if I have multiple streams to read from, could I read from each in parallel and reorder between the streams on the fly according to position?
Example: Stream A {0,1,3,6}, B {2,4,5,7,8}
Read A and B (streaming), 0 < 2 => use event from A, read next event from A; 1 < 2 => use event from A, read next from A; 2 < 3 => use event from B, read next from B…

Migration of events would still mess up the temporal correlation to all other events - the migrated events would happen “now”. This could ruin projections.
Especially if projections rely on data from multiple aggregates (e.g. for denormalisation of the customer name in an order I might need to load the customer - but now the customer events never happened before an order for that customer was placed).

So I guess the only option is to handle this with reading from multiple streams or interpreting a single stream in multiple ways. I’m not sold on replacing entire streams with new events. Hm.

P.S. I think that this is very related to Event Store, because other stores might need different approaches.

"Migration of events would still mess up the temporal correlation to
all other events - the migrated events would happen "now". This could
ruin projections."

And if you changed historical ones you would be just as screwed (how
would you notify projections of those changes?)

I’m sorry but I don’t follow what you mean?

Do you mean what would happen if I had to change the event type?
If I have to “change” the events, I would emit new event types (AccountOpened --> AccountOpened2 - or use some other magic so that AccountOpened is always the newest version). When reading, I could try to translate to the new version (directly during/after the deserialisation step), or I have to support the old version in my aggregate and projections.

You miss my polnt. If you were allowed to *update events* your
projections would still be screwed. Just as if you write new events it
will need some level of understanding. The second is the easier of the
two problems.

EG: if you updated an event, how would your projections know about this?

Well, your point was/is made with an example that entails a whole other set of issues and assumptions.
I don’t want to be able to update events, that’s not even the underlying issue here - so your example of “assuming that you would be able to update events, you would have problems too”, while valid, is merely tangential (to me).
Is the position of an event an indicator of absolute order over all streams?

  • I don’t know if it is. Assuming it is, then this whole “migration” scenario could most likely be solved merely by how I read the events whenever I have to reconstitute my aggregates. The projections would have to run from the existing events - which should still work, since they are not directly bound to how the aggregate boundaries are designed. The underlying process that the events describe should be more stable than the aggregate boundaries.
  • If it isn’t, and there isn’t any other way to identify which event came first between two streams, then migrating to new events would be the only option left.
    Considering all the issues that migrations cause, I’d prefer to solve this while reading the old events, if possible. Which is why I’m not yet responding to the migration idea.

"Is the position of an event an indicator of absolute order over all streams?"

Yes. Providing a single replica set.

Also there is a lot of prior discussion (event sourcing in general)
around this topic.

Could you perhaps give me a few pointers or keywords that would help me in my search? I only found two topics on the DDD/CQRS group and google is too greedy with the search results for the terms that I used.

Is this interpretation correct then:

  • Within a replicated cluster of Event Store nodes, the position corresponds to the order in the $all stream
  • Multiple replication sets (i.e. clusters) would have different data and different $all streams anyway and there is no overall ordering between replica sets.

Yes.

Thank you