Projection Faulted - What value to use for 'Maximum Number Of Allowed Writes In Flight'?

**TL;DR: **

Our projections keep failing with Failed to write events to XXXXXX. Retry limit of 5 reached. Reason: CommitTimeout., and the docs say specifying a value for the projection config param AllowedInFlightMessages might help.

What value would be appropriate to specify: 2, 10, 100, 1000 …?

Full Question:

We have been seeing various projections fault a few times a week; sometimes multiple times a day. The error is always a variation of: Failed to write events to XXXXXX. Retry limit of 5 reached. Reason: CommitTimeout.

Projection Details:

The projection that faults is normally the built-in $by_category projection. When this projection faults, normally at least one other of our four custom projections fault at the same time. The stream that failed to have the events written to it appears to be random.

Our custom projections are all basic emit-enabled projections that look for the existence of a property value within the event, or look at the event type, and emit to a stream.

System Usage:

Our EventStore is used by a variety of services to read and write data. Writes are heavier than reads as a read-model is used to handle most read requests.

Additionally, we have about 25 persistent subscriptions, each with about two consumers.

EventStore Info:

Our EventStore is running version, and is comprised of a cluster of 3 servers, each running on an m5.xlarge instance.

Below is the relevant part of the server config.

ClusterSize: 3
ConnectionPendingSendBytesThreshold: 150000000
IntTcpHeartbeatInterval: 10000
IntTcpHeartbeatTimeout: 8000
ExtTcpHeartbeatInterval: 10000
ExtTcpHeartbeatTimeout: 8000
GossipIntervalMs: 2500
GossipTimeoutMs: 2000
PrepareTimeoutMs: 3000
CommitTimeoutMs: 3000
ProjectionThreads: 5
WorkerThreads: 8


What we need help on:

Upon researching the issue, we came across the following article:

This appears to address the exact issue we are having. However, there are no guidelines as to what an appropriate value would be.

Is there any recommended value for this? If not, is there a recommended way to determine what value should be used?

Also, is there a way we can determine what our system averages for in-flight writes, or is there a way to get the maximum in-flight writes?

Alternatively, have any others experienced this problem? If so, what did you do to resolve it?