We've had an issue the past couple of days with the system projections in EventStore 4.1.0.
So we are streaming events to the event store, sometimes at quite a volume with a large event every now and then. The reasons the projections give as to why they are failing is that they are failing to write their checkpoints after 5 retries because they time out. Looking through the logs from the server, I can't really gleam anything about why it is timing out, just that it is and that is failing the projections.
As a second point, on failing and trying to re-enable the projections, they would occasionally get stuck in a Prepared/Initial state.
These are the errors that we get when the projections fail:
Failed to write events to $et-VehicleTaxonomySupplied. Retry limit of 5 reached. Reason: CommitTimeout. Checkpoint: C:60617520740/P:60617520740
Failed to write events to $et-LineImageStatsReported. Retry limit of 5 reached. Reason: CommitTimeout. Checkpoint: C:60056375316/P:60056375316.
After retrying 5 times, we failed to write the checkpoint for $by_event_type to $projections-$by_event_type-checkpoint due to a CommitTimeout
So the taxonomy event is our big one (12kb) (there is a view to split it up into smaller events in the future)
LineImageStats is a very small event (just a couple of ints) and the last one is the projection failing to write its checkpoint.