Other Idempotency questions

Bernard_Odoy · January 28, 2015, 10:54pm

In order to guard
against processing the same input event twice, we are producing deterministic
event ids. We were hoping to take
advantage of the Event Store and its ability to do idempotent writes (as documentedhere); meaning if an event with the same event id is produced it will not
be written to the stream a second time.

In order to attain
this behavior, we have been using ExpectedVersion.Any. This has worked well until recently and we
have been experiencing some odd behavior where duplicate events are being
written. Reading the documentation
closer it does say

Idempotence is not guaranteed
if ExpectedVersion.Any is used. The chance of a duplicate event being
written is small, but it does exist

Very well, but using
the other values for expected version do not seem to provide idempotent writes
either. If an input event comes in a
second time, we produce resulting events with consistent event ids. But when writing to the stream the second
time we set expected version equals current version (attained from the reading
of the stream) and the events are written again.

I have several
questions

Can I use the idempotent
features in eventstore to accomplish this?
If not, what do others do in
this situation? Does my
application code need to check for duplicate event ids and ignore the
writing.
What is the purpose of the
idempotence capability within Event Store? Based on the documentation, I am not sure how I would take advantage of it.

Thanks much,

Bernie

Greg_Young1 · January 28, 2015, 11:03pm

The expected version is the time at the write not the one now.

Bernard_Odoy · January 29, 2015, 8:12pm

Yes, understood, thank you. I was trying to prevent duplicate processing of an input event. My misunderstanding was to guarantee idempotence event id must be the same and the expected version

What I was doing

Output Stream A exists with a single event
Event 1 from Input Stream B is processed. Event with id 2 is produced and written to Output Stream A using expected version 0. (all is well)
Event 1 from Input Stream B received again and reprocessed. Event with id 2 will be produced, however, expected version would be 1 (this will write a second event with event id 2, not desired)
At step 3, since the client produced the same event id, I was hoping that the event would not get written. It seems you will get this is the behavior using ExpectedVersion.Any, however, it seems to be a limit based on the number of messages processed. At some point say 55,000 messages, the repeated event will get written again.

How it works

Output Stream A exists with a single event
Event 1 from Input Stream B is processed. Event with id 2 is produced and written to Output Stream A using expected version 0. (all is well)
Event 1 from Input Stream B received again and reprocessed. Event with id 2 will be produced with expected version 0 (event 2 would not be written again)

I’m not sure when I could take advantage of the event store idempotence. When would a client produce the same event id and the same ExpectedVersion. Would it be for competing consumers?

It seems if I don’t want a repeated event written to an output stream in the case of a reprocess, it is up to the application to ensure it does not happen.

Greg_Young1 · January 30, 2015, 9:58am

"I'm not sure when I could take advantage of the event store
idempotence. When would a client produce the same event id and the
same ExpectedVersion."

On a retry is the most common use case!

Bernard_Odoy · January 30, 2015, 12:50pm

I’m being dense or missing something fundamental. I would have expected a retry to be the common case, but I don’t get how a subscriber can generate the required expected version on a retry. Given the situation

Output Stream A exists with a single event
Event 1 from Input Stream B is processed. Event with id 2 is produced and written to Output Stream A using expected version 0.
Event 1 from Input Stream B received again (retry). Event with id 2 is produced, but written to Output Stream A with expected version 1 (therefore it will be written again)

I assume the subscriber must read the current version from the stream before reading (unless it is the only writer and it caches it)

Thanks

Greg_Young1 · January 30, 2015, 1:11pm

Im sending a write. My tcp socket breaks. I reconnect and retry same request. This is the normal case.

I think its mostly that you are misunderstanding what expected version is. If i just set it to current evrry time whats the point of setting it (it eould be the same as .any)

Joao_Braganca · January 30, 2015, 1:43pm

Where is the input event coming from? An external system or eventstore?

Greg_Young1 · January 30, 2015, 1:49pm

"I'm being dense or missing something fundamental. I would have
expected a retry to be the common case, but I don't get how a
subscriber can generate the required expected version on a retry.
Given the situation "

A retry would use the same expected version (by definition its a retry)

Bernard_Odoy · January 30, 2015, 9:13pm

My understanding is, it is the expected version of the stream when you write to it. It provides a mechanism for implementing optimistic concurrency. If you don’t need to worry about concurrency you use .Any

If the client is making changes and you need to consider the current state, to an aggregate as an example, you get the current version of the stream and when you write the expected version should match the current version else you get a wrong expected version error

Correct?

Bernard_Odoy · January 30, 2015, 9:15pm

The situation we are considering is from event store. If I receive an event a second time (event written, but checkpoint didn’t as an example), since I don’t have guaranteed once semantics

We were hoping to “cheat” because if it came from an external system, if I produce the same event id, I was hoping the event store would not write it again.

Joao_Braganca · January 30, 2015, 10:31pm

You can write the checkpoint (position) of the message that caused you in your metadata. Also, I wanted to know what the consistency boundary is of the messages you are writing out

Greg_Young1 · February 1, 2015, 5:49pm

@all shoukd we look at bringing in bloomburger (bloom filter) and a higher cost startup time? Anyone with thoughts?

Scott_Cate · February 1, 2015, 6:00pm

maybe as an option? I think it would mostly depend on the DB size. Larger == slower start up times already, anything making that slower would be nice to have an on|off flag.

#brainstorming