We’re seeing occasional EventStore.ClientAPI.Exceptions.RetriesLimitReachedException, a la
EventStore.ClientAPI.Exceptions.RetriesLimitReachedException: Item Operation ReadStreamEventsForwardOperation ({00aad7dd-4d0e-4b23-94ea-dc8657ccb60b}): Stream: Redacted_ffbb23ce-c221-46af-a676-000000000000, FromEventNumber: 0, MaxCount: 500, ResolveLinkTos: False, RequireMaster: True, retry count: 10, created: 00:58:56.944, last updated: 01:00:08.178 reached retries limit : 10
``
The particular circumstance under which we’re seeing this is a maintenance operation that rebuilds read models, so we’re iterating through many streams and reading all events in each stream, distributed over multiple threads on multiple machines. So, a higher than average read load for our environment.
This is an Azure environment, Windows Server 2012 R2 machines, ES HA 3.0.1 and the ES 3.0.0 client (which we’re now updating to 3.0.1). The particular environment where we’re observing this is a test environment with only one machine running ES.
Aside from “get faster disk” and “add retries” are there other things we can do more along the lines of tuning or optimization?
For example, we’re reading streams in chunks of 500 events, and our events are probably large relative to some use cases(they frequently contained encrypted data, which tends to bloat things.) 500 is arbitrary to us, we probably got it from sample code somewhere.
As a second example, we’ve also occasionally read messages of users with concerns about the amount of memory used by file system cache on Windows, and we’ve read discussions where people are advised to lower the max allowable % cache. These machines are dedicated to ES, and we’re happy that whatever the combination of Windows + ES determines is “The Right Thing” be the thing that we do on these machines. Does it matter that windows is mapping so many chunk files into memory? ES itself seems to consume a stable amount of memory, and on a (say) 14 GB machine we see most of the memory consumed by file cache, but intuitively this seems like a good thing. Perhaps it’s not, or perhaps it’s irrelevant.
Any thoughts appreciated. Note that from an Azure storage perspective, we are doing the optimal things in terms of multiple disks, storage account management, etc. (short of moving to the shiny new SSD backed storage or machines with SSD temp disks).
Brian