"Archiving" select events in a stream (linkTo and scavenging?)

Hi all,

I have a scenario where we have an aggregate stream with a mix of important events and events that don’t need to be kept forever. There are many, many such streams. Unfortunately we did not have enough foresight to split this into a “permanent” stream, and a “transient” stream and now we have many millions of events that we do not want to keep around.

I want to scavenge the “transient” events but retain the important events. In an attempt to achieve this I have created a projection that processes a stream and links any important events into a new stream:

originalStream-123

0 - BoringEvent

1 - BoringEvent

2 - BoringEvent

3 - ImportantEventXYZ <=== this one needs to be kept: linkTo(originalStreamArchive-123, ImportantEventXYZ)

4 - BoringEvent

Once the projection is finished I now have the original stream & a new stream with only the important events:

originalStreamArchive-123

0 - ImportantEventXYZ (this is a link to originalStream-123/3)

I then soft delete originalStream-123

This seems to work well until a scavenge occurs. At this point ImportantEventXYZ is no longer accessible in originalStreamArchive-123/0

Questions/Observations

  • Is my observation correct? being that a linkTo will not prevent an event being scavenged if the original stream is soft deleted?

  • If I use emit() rather than linkTo() is there any way to retain the original timestamps?

** This is a concern as we use the $et-ImportantEvent streams to rebuild read models across many streams and now the events will be out-of-order

  • Is there an alternative approach to consider?

Many thanks!

Cameron

Cameron,

Could you use emit rather than linkTo?

Hi Steven,

If I use emit() rather than linkTo() is there any way to retain the original event timestamps?

We use the $et-EventType projections fairly extensively and I am concerned that this would cause our read models to be built with events out-of-order.

Thanks for the response!

Cameron

Cameron,

Not as far as I know.

If the problem is now located in your read model, would it be possible to handle this differently?

So, accept the order the events arrive (which might be in the wrong order), but check your own datatime fiel on the emitted event, and if this is newer than the last read model datetime, then accept it.

As a side note, could the fact you have “important” and “boring” events be part of the problem.

If all these event belong to an Aggregate, surely they would be required to guarantee the state, and if not, perhaps they don’t belong in there in the first place?

Steven