Projections - Dealing with idempotency

Hi,

I have a question related to ensuring that an event is only projected once.

I’ve read this blog post by Oskar Dudycz: https://event-driven.io/en/dealing_with_eventual_consistency_and_idempotency_in_mongodb_projections/ which addresses this.

If I’ve understood correctly, basically the trick is to make use of the event revision in the stream the event belongs to (e.g Order-155). Even though your subscription is subscribing to the $all stream - when recieving an event it knows the original stream id and revision. So then it’s quite easy to use this to make idempotent projections - when doing projections for a single stream. But often - in our systems - we want to make cross aggregate projections as well.

So my question is really if you have any recommendations - to make these projections idempotent.

Oskar mentions in his blog post: “Of course, what I showed here is the handling of the single stream. For multiple, things get more complex. We either have to maintain various revisions for each stream or store the event’s global position in a separate collection.”

I see storing the event’s global position in a separate collection could a viable solution. Then this maybe could (or maybe must be?) a separate projection.

Just use the built-in category stream or a custom data projection that includes the streams for the read model. The projected stream will have all the events and an atomic increment which is what you need.

In future versions we will be rolling out the data stream for the entire log and a unified check point with all of the events positions in its hierarchy. For now this will get you there.
-Chris

1 Like

Thanks for reply.

Follow-up issue:

For a cross aggregate projection using this approach - doesn’t this mean one would always have to update the revision number of all documents the actual projection is responsible for - even the documents which is not in need of reacting / projecting the event. If not the documents not reacting to an event will / may be in a state where the revision number is far behind.

Update:

This would of course also apply to single stream projections where the projection / read model does not actually need to react to an event. But still must update the revision number.

there are a couple of options
the simplest is to only update when you change data in the model
for some cases there may be large gaps of “dropped” events and in those edge cases and intermittent update evet few 1000 might make sense to avoid long replays.
mostly though it’s about aligning subscriptions and read models so they aren’t dealing with large numbers of irrelevant events.
I find designing the recovery model and how to deal with missed/out of order events gives the best and simplest design.
This is a rows vs columns problem, one isn’t inherently better than the other, it’s all about what’s correct for your situation.
-Chris

Remember that Oskar’s post is about methods of tackling idempotence issues when it is required. From my PoV, if you need to use methods like this, your projections suffer from being not idempotent by nature, and I personally have an issue with those.

If your event has all the necessary data (including aggregated data) to project, you don’t really care how many times the projection processes the event. If you update the field “Price” with the value “10” several times, the result will still be the same.

I see two cases where you’d want to care about not projecting events that have already been projected:

  • You don’t have all the information in the event, so you update the read model state using its previous state. For example, you say “Price.increase(10)”. Naturally, this operation is not idempotent. I would suggest avoiding this type of projection, as it’s very easy to fix by adding the necessary information to the event.
  • You want to avoid re-projecting in case of failure as an optimisation. There you can use the event commit position to skip the updates that have already been executed. In my experience, it only happens when you accidentally lose the checkpoint (that’s why I always recommend using the same database for both read models and checkpoints), or the checkpoint hasn’t been committed in time. In the second case, the gap will be very small, so I am not sure if it makes sense to optimise there.

I can suggest keeping the global log position of the last projected event in the read model as a good practice for other concerns, like stale data issues.