Event Store Rollback Procedures.

Has anyone had experience with how to deal with deployments in scenarios where 100,000+ events are placed into the Event Store in quick succession but have needed to make provisions in case there are issues with the events. I.e. providing ‘rollback procedures’.

Currently I am migrating from an monolithic architecture to an event sourced architecture where data is being periodically added at increasingly frequent times so need to have an efficient process for this if possible. There are two ways in which I can think of dealing with this.

Idea 1:

  1. Shut down the Event Store and all microservices that subscribe to the Event Store before adding these events and take a backup of the Event Store, and the databases that back the microservices.

  2. Start up everything and add the events into the event store.

  3. If everything goes fine, then great. However if there are issues with the 100,000s of events that are placed into the Event Store then shut everything down and restore backups of event store, and microservices databases.

The plus here is that we can react quickly to issues with the events, but there is a lot of downtime and doesn’t seem to fit well with what Microservices are meant for.

Idea 2:

  1. Just let the events be added to the Event Store and if there are issues with them, then add correction events after that will fix any issues.

The plus here is that there is no down time, but you risk there being ‘data’ issues until correction events have been sourced, but I guess that’s a price to pay with eventual consistency?

Any preference of approaches that other people have seen?

Kind regards,

Mark

What about pushing events to a temporary stream, run a corrective process manager that will subscribe to the stream, make corrections (or not) and publish them to final stream.

100,000 events is not that many, you can even keep that old stream around (just to compare). Alternatively, set it to expire within an hour.

Unless I’m misunderstanding some business requirement…

Your Idea #2 is, however, how it would be done in a real system (from what I hear from Gregg’s lectures).

By the way, one should not just “push” events to a stream - you should have some sort of business logic (aggregate) that accepts commands (or rejects them if they are invalid), and emits events as a result. This way, events emitted will always be “correct”.

Sorry if I’m stating what you already know.