System Projection kept on faulting

Hi, i have experiencing system projection kept on stopping because of commit timeout after 5 times retry. I can enable the projections now as a solution, but i am wondering why this is happening so i can work on a more permanent solution.

Here are the error logs when the projection stopped writing:

[PID:25022:032 2018.05.11 01:52:01.651 ERROR StorageReaderWorker ] Error during processing ReadAllEventsBackward request.
System.ArgumentException: Log record that ends at actual pos 112290514 has too large length: 841162853 bytes, while limit is 16777216 bytes. In chunk #367-367 (chunk-000367.000000).
at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk+TFChunkReadSide.TryReadBackwardInternal (EventStore.Core.TransactionLog.Chunks.TFChunk.ReaderWorkItem workItem, System.Int64 actualPosition, System.Int32& length, EventStore.Core.TransactionLog.LogRecords.LogRecord& record) [0x000a4] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk+TFChunkReadSideUnscavenged.TryReadClosestBackward (System.Int64 logicalPosition) [0x00028] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk.TryReadClosestBackward (System.Int64 logicalPosition) [0x00000] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.TransactionLog.Chunks.TFChunkReader.TryReadPrevInternal (System.Int32 retries) [0x0009e] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.TransactionLog.Chunks.TFChunkReader.TryReadPrev () [0x00000] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.TransactionLog.TFReaderLease.TryReadPrev () [0x00000] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.Services.Storage.ReaderIndex.AllReader.ReadAllEventsBackward (EventStore.Core.Data.TFPos pos, System.Int32 maxCount) [0x00047] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.Services.Storage.ReaderIndex.ReadIndex.EventStore.Core.Services.Storage.ReaderIndex.IReadIndex.ReadAllEventsBackward (EventStore.Core.Data.TFPos pos, System.Int32 maxCount) [0x00000] in <403b76bd05054d6ca039b6b4eeed8216>:0
at EventStore.Core.Services.Storage.StorageReaderWorker.ReadAllEventsBackward (EventStore.Core.Messages.ClientMessage+ReadAllEventsBackward msg) [0x00115] in <403b76bd05054d6ca039b6b4eeed8216>:0
[PID:25022:019 2018.05.11 02:54:05.613 ERROR TcpConnectionManager] Closing connection ‘external-normal’ [52.221.226.203:42165, L172.31.31.156:1113, {de0c9c09-1526-4f45-9643-ff946af62370}] due to error. Reason: Connection pending send bytes is too large: 10783496.
[PID:25022:064 2018.05.11 02:54:06.405 ERROR TcpConnectionManager] Closing connection ‘external-normal’ [52.221.226.203:49434, L172.31.31.156:1113, {ec15eac5-9fe7-4549-8f83-9ddba8459623}] due to error. Reason: Connection pending send bytes is too large: 10886560.
[PID:25022:039 2018.05.11 02:54:07.147 ERROR TcpConnectionManager] Closing connection ‘external-normal’ [52.221.226.203:49437, L172.31.31.156:1113, {e241cf05-9879-4521-9d3f-448fcea9afbf}] due to error. Reason: Connection pending send bytes is too large: 10978855.
[PID:25022:033 2018.05.11 03:49:50.926 ERROR ProjectionManager ] The ‘$by_category’ projection faulted due to ‘Failed to write events to $ce-ProductGroup. Retry limit of 5 reached. Reason: CommitTimeout. Checkpoint: C:98658503500/P:98658503500.’
[PID:25022:033 2018.05.11 03:49:51.923 ERROR ProjectionManager ] The ‘$stream_by_category’ projection faulted due to ‘After retrying 5 times, we failed to write the checkpoint for $stream_by_category to $projections-$stream_by_category-checkpoint due to a CommitTimeout’
[PID:25022:033 2018.05.11 03:49:51.941 ERROR ProjectionManager ] The ‘$by_event_type’ projection faulted due to ‘Failed to write events to $et-co.styletheory.context.userAndAccount.domain.contracts.user.BoxStatusOverridden. Retry limit of 5 reached. Reason: CommitTimeout. Checkpoint: C:98664496673/P:98664496673.’
[PID:25022:047 2018.05.11 04:02:24.185 INFO MiniWeb ] Error while replying from MiniWeb
System.IO.IOException: Unable to write data to the transport connection: The socket has been shut down. —> System.Net.Sockets.SocketException: The socket has been shut down
at System.Net.Sockets.Socket.EndSend (System.IAsyncResult result) [0x00033] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0
at System.Net.Sockets.NetworkStream.EndWrite (System.IAsyncResult asyncResult) [0x0005f] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0
— End of inner exception stack trace —
at System.Net.Sockets.NetworkStream.EndWrite (System.IAsyncResult asyncResult) [0x000af] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0
at System.Net.ResponseStream.EndWrite (System.IAsyncResult ares) [0x00065] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0
at EventStore.Transport.Http.AsyncQueuedBufferWriter.EndWrite (System.IAsyncResult ar) [0x00000] in <5d159d279691452ebf0be4b084a990ec>:0
System.Net.Sockets.SocketException (0x80004005): The socket has been shut down
at System.Net.Sockets.Socket.EndSend (System.IAsyncResult result) [0x00033] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0
at System.Net.Sockets.NetworkStream.EndWrite (System.IAsyncResult asyncResult) [0x0005f] in <5071a6e4a4564e19a2eda0f53e42f9bd>:0

Event Store 4.0.3.3 on Ubuntu 14.04, running on EC2 c4.2xlarge with 200gb disk of 5000 IOPS.

The error is pretty clear:

[PID:25022:033 2018.05.11 03:49:50.926 ERROR ProjectionManager ] The ‘$by_category’ projection faulted due to ‘Failed to write events to $ce-ProductGroup. Retry limit of 5 reached. Reason: CommitTimeout. Checkpoint: C:98658503500/P:98658503500.’
[PID:25022:033 2018.05.11 03:49:51.923 ERROR ProjectionManager ] The ‘$stream_by_category’ projection faulted due to ‘After retrying 5 times, we failed to write the checkpoint for $stream_by_category to $projections-$stream_by_category-checkpoint due to a CommitTimeout’
[PID:25022:033 2018.05.11 03:49:51.941 ERROR ProjectionManager ] The ‘$by_event_type’ projection faulted due to 'Failed to write events to $et-co.styletheory.context.userAndAccount.domain.contracts.user.BoxStatusOverridden. Retry limit of 5 reached. Reason: CommitTimeout. Check

Also it looks like you have some other issues going on with networking (losing connections etc).