Apologies if this has been covered elsewhere. I took a quick look round the group but couldn’t spot anything.
I wondered if anyone can give guidance on good patterns for how to handle a failure in event processing. If we assume that there is a retry strategy in place for processing an event, if that fails you seem to be faced with a choice of:
-
Stop processing any events until the problem is fixed. Could be a long pause if the fix is not straightforward.
-
Carry on processing and compensate for it downstream (potentially rebuilding any state by replaying events).
I realise that this is very context-dependent based on the impact of the processing action (e.g. updating a read model vs sending emails) which is why I’m looking for patterns around this (or equivalent) to weigh up the forces at play. There are various patterns I know of from message processing in terms of things like dead letter queues so I wondered if there was an equivalent body of work around eventsourcing, even if it is only for a specific area such as maintaining read models.
Thanks
Andy