Build in projections rarely create a checkpoint

sven.eppler · October 5, 2020, 1:56pm

Hello!
We are running EventStore 5.0.2.0 in an docker container and make heavy use of the build in projections (especially by-category).

Recently i noticed, that it takes like 2-3 minutes after restarting EventStore, before we get new events streamed via a subscription, that subscribes to an $ce-Category stream. After debugging a bit i noticed that after restarting EventStore all of the build-in projections we’re recreating. And after the rebuild was done, i received the events as expected (even the missed ones).

Some of the build in projections create a checkpoint after some time (by some rule i don’t understand) while others just show “Last Checkpoint: C:0/P:-1” even like 10 minutes after the projection has been rebuild. So restarting in this state leads to the full rebuild again and again. This hurts us, because we have an unstable deployment setup where we redeploy on any commit on master. So restarting the whole docker setup 10 times a day is “normal” for us. The system tests after deployment than regularly fail, because EventStore is rebuilding projections and our projections are not projection within an expected timespan.

Is there something fishy on our side i can have a look into? When should EventStore write a Checkpoint for the build-in projections?

Thanks in adavance!

hayley.campbell · October 6, 2020, 8:38am

Hi @sven.eppler,

It sounds like your projections aren’t processing enough events to checkpoint before the server is restarted. Projections checkpoint individually, and their checkpointing behaviour can be changed per projection.

You could try and decrease the number of events required to process, or the number of unhandled bytes to process before writing a checkpoint and seeing if that helps.

For details about the settings available to you, please check the docs here: https://eventstore.com/docs/projections/projections-config/index.html?tabs=tabid-5#checkpoint-options

sven.eppler · October 6, 2020, 9:25am

Hi @hayley.campbell,
thanks for that info! I think now i understand it better. Since CheckpointAfterMs defaults to 0, i was expecting it checkpoint quite fast/often. But looks like the algorithm is more complicated by throwing CheckpointHandledThreshold and CheckpointUnhandledBytesThreshold into the mix. And since CheckpointHandledThreshold defaults to 4000 we may reply a surprising amount of events on our “rapid development” system.

So basically the CheckpointAfterMs value just acts as some kind of rate-limiter when a lot of events are processed and the CheckpointHandledThreshold would be hit (too) often, right?

alexey.zimarev · October 27, 2020, 12:25pm

The timeout setting is secondary to the count setting. When the timeout is set to zero and count to 4000, the checkpoint will be written after each 4K events processed by the projection. The timeout setting is relevant is the projection manages to push through tens of thousands of events per second, so you might be willing to tell it to write the checkpoint less frequently.

If you have a small number of events and restart the server frequently, you can adjust the count setting to a lower number. Keep the timeout to zero.