CommitTimeout issues

I noticed in our server logs that we are getting a lot of CommitTimeouts. CommitTimeout default was upped from 2000 to 5000, although that doesn’t seem to change much. Is this normal, and if not, what is the recommended action to take? This is the master’s logs for a 3 unmanaged node cluster setup on 3 AWS c4.larges. I guess my problem is that I don’t know why this occurs so much.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream1

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $category-stream2. Error: CommitTimeout

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $ce-stream3. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $category-stream2

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $ce-stream3

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $category-stream4. Error: CommitTimeout

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $ce-stream5. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $category-stream4

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $ce-stream5

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream6. Error: CommitTimeout

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $ce-stream7. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream6

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $ce-stream7

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream8. Error: CommitTimeout

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $ce-stream9. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream8

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $ce-stream9

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream10. Error: CommitTimeout

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $ce-stream11. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream10

[PID:02184:024 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $ce-stream11

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream7Progress. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream7Progress

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $category-stream12. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $category-stream12

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream13. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream13

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $category-stream7. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $category-stream7

[PID:02184:013 2016.12.02 16:25:21.558 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID fd0b5a39-2ac8-4021-8d40-b7fc88e60251, EventStreamId: $en-stream1, CorrelationId: fd0b5a39-2ac8-4021-8d40-b7fc88e60251, FirstEventNumber: 117, LastEventNumber: 117.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream4Progress. Error: CommitTimeout

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream4Progress

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $category-stream5. Error: CommitTimeout

[PID:02184:013 2016.12.02 16:25:21.558 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID df554849-000f-4887-a1aa-b20b1eaf1105, EventStreamId: $ce-stream3, CorrelationId: df554849-000f-4887-a1aa-b20b1eaf1105, FirstEventNumber: 320, LastEventNumber: 320.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $category-stream5

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream2Progress. Error: CommitTimeout

[PID:02184:013 2016.12.02 16:25:21.558 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID 617d6287-f855-4736-84cd-d038280a4d2b, EventStreamId: $category-stream4, CorrelationId: 617d6287-f855-4736-84cd-d038280a4d2b, FirstEventNumber: 40, LastEventNumber: 40.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $en-stream2Progress

[PID:02184:013 2016.12.02 16:25:21.558 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID 934c2033-84c1-433f-a60b-3e8ba5cb2ec9, EventStreamId: $ce-stream5, CorrelationId: 934c2033-84c1-433f-a60b-3e8ba5cb2ec9, FirstEventNumber: 314, LastEventNumber: 314.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $category-stream15. Error: CommitTimeout

[PID:02184:013 2016.12.02 16:25:21.558 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID f516b772-8b6f-4d80-8c69-a99b254c5591, EventStreamId: $ce-stream7, CorrelationId: f516b772-8b6f-4d80-8c69-a99b254c5591, FirstEventNumber: 314, LastEventNumber: 314.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Retrying write to $category-stream15

[PID:02184:013 2016.12.02 16:25:21.558 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID 0620ee41-5890-469a-9106-0bec65504e13, EventStreamId: $ce-stream9, CorrelationId: 0620ee41-5890-469a-9106-0bec65504e13, FirstEventNumber: 314, LastEventNumber: 314.

[PID:02184:023 2016.12.02 16:25:21.558 INFO CoreProjectionCheckp] Failed to write events to stream $en-stream14. Error: CommitTimeout

[PID:02184:013 2016.12.02 16:25:21.560 TRACE TwoPhaseRequestManag] IDEMPOTENT WRITE TO STREAM ClientCorrelationID e562eb24-6b8e-4cc5-80e1-a9c8451a3ada, EventStreamId: $en-stream10, CorrelationId: e562eb24-6b8e-4cc5-80e1-a9c8451a3ada, FirstEventNumber: 117, LastEventNumber: 117.

[PID:02184:023 2016.12.02 16:25:21.560 INFO CoreProjectionCheckp] Retrying write to $en-stream14

``

What version of Event Store are you running and is this a new issue or have you just never reported it?

How often is this occuring?

This is for EventStore 3.9.2. This occurs every 1-10 minutes in bursts, getting worse when more load occurs. I can send some full logs if an e-mail is provided.

Is this on cloud storage or local storage? What is provisioned if
cloud storage or what kind of hardware if running local.

3 AWS EC2 instances. Each node is running on a c4.large with Windows. The only other service running on each EC2 is a consul agent.

I should also mention that this has been occurring for a few weeks now on our staging environment. It has about 60 services using EventStore so it handles quite a few events.

Basically its just saying that it sent a write and the write did not
complete within the specified timeout.

I just wanted to check if this normal/acceptable behavior, or if this impacted performance enough to require a higher timeout or even a beefier instance.

What is your write throughput like?

Here is a snapshot of our dashboard during spikes in events: http://i.imgur.com/A84yTE7.png and our stats: https://www.dropbox.com/s/lxbux9vu2f7sfow/0.0.0.0-2114-cluster-node-stats.csv?dl=0

Occasionally a process will kick off that triggers a lot of event writes and stream projections (we use projections a lot), perhaps it’s occurring there and all I can do is up the commit timeout a little higher and accept the error. The CPU/memory utilization on the instance always seems low, the only thing that spikes is bandwidth usage up to 10mbps occasionally. Almost all our events are at most a few kilobytes large.