RetriesLimitReachedException: Item Operation AppendToStreamOperation

Tomas_Jansson · April 4, 2016, 10:28am

We are seeing this sometimes when we write to event store. It looks like the event(s) has been written to event store, so I don’t understand why are experiencing this.

The complete message is:

EventStore.ClientAPI.Exceptions.RetriesLimitReachedException: Item Operation AppendToStreamOperation (b33c645b-b436-4c72-83b6-fc319a5a9312): Stream: FT-Konsesjon-026dd87a-0d7d-4ac0-a9fd-6207b8bacc0d, ExpectedVersion: -2, retry count: 10, created: 08:20:55.872, last updated: 08:21:27.088 reached retries limit : 10

Any ideas?

Greg_Young1 · April 4, 2016, 10:29am

It means it retried 10 times without getting an explicit "write events
succeeded" message back. This can happen due to timeouts/dropped
connections/etc etc. The write may or may not have succeeded (it may
just be it got a timeout instead of a result)

Tomas_Jansson · April 4, 2016, 10:41am

Thanks. That basically what I expected. We have to run over a SAN that has some issues with it, so that is probably what causing it.

We are seeing some "—!!! VERY SLOW QUEUE MSG " and "SLOW QUEUE MSG ", that’s probably a good indication that timeout is the problem I guess.

–Tomas

Greg_Young1 · April 4, 2016, 11:00am

What is the

-!!! VERY SLOW QUEUE MSG

?

Also what environment is this?

Greg

Tomas_Jansson · April 4, 2016, 11:57am

This is actually our prod environment, and the message is from the log file to event store.

Greg_Young1 · April 4, 2016, 12:04pm

I mean what is the actual log message and what does the environment
look like? Is it networked disk etc?

Tomas_Jansson · April 4, 2016, 12:14pm

Yes.

It is a network disk. That’s what we get at our customer. There were a couple of slow queue messages at the same time:

[PID:02516:009 2016.04.04 08:21:14.469 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 36083ms. Q: 0/8.

[PID:02516:009 2016.04.04 08:21:14.469 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 36083ms. Q: 0/8.

[PID:02516:009 2016.04.04 08:21:48.781 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 34309ms. Q: 0/5.

[PID:02516:009 2016.04.04 08:21:48.781 ERROR QueuedHandlerMRES ] —!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 34309ms. Q: 0/5.

[PID:02516:009 2016.04.04 08:21:50.847 TRACE QueuedHandlerMRES ] SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares - 2066ms. Q: 0/4.

[PID:02516:015 2016.04.04 08:21:51.439 INFO CoreProjectionCheckp] Failed to write events to stream $ce-FT. Error: CommitTimeout

[PID:02516:015 2016.04.04 08:21:51.439 INFO CoreProjectionCheckp] Retrying write to $ce-FT

[PID:02516:013 2016.04.04 08:21:51.439 INFO CoreProjectionCheckp] Failed to write events to stream $et-EndretKonsesjonKlasse. Error: CommitTimeout

[PID:02516:013 2016.04.04 08:21:51.439 INFO CoreProjectionCheckp] Retrying write to $et-EndretKonsesjonKlasse

It looked good in the end though, in event store that is.

Greg_Young1 · April 4, 2016, 12:20pm

[PID:02516:009 2016.04.04 08:21:14.469 ERROR QueuedHandlerMRES ]
---!!! VERY SLOW QUEUE MSG [StorageWriterQueue]: WritePrepares -
36083ms. Q: 0/8.

Will certainly cause a timeout It doesn't mean that the write
failed it means it is non-deterministic. Usually a retry or two will
resolve things but in such cases my guess is you have probably worked
through your retries (10 x 5 second) for quite a few writes.

Tomas_Jansson · April 4, 2016, 12:23pm

I know… 36 s. is a long time, so fun to work with this amazing SAN

Greg_Young1 · April 4, 2016, 12:32pm

If you want better metrics (I would do this only a single node as it
does have performance impact)

Try running --enable-histograms for a day then you can check out
http://yournode:port/histograms/writer-flush. This will give you a
histogram of all writes to the disk. For reads there is also
reader-streamrange and reader-allrange