Where to find number of events per chunk file for OptimizeIndexMerge

Vilmantas_J · November 29, 2018, 11:51am

Hi,

with 4.1.1 version EventStore introduced OptimizeIndexMerge.

We would like to try this new feature, but we are not sure how much additional memory we need. There was this formula in documentation:

You can estimate the maximum amount of additional memory (in GB) you will need with the following formula:

Avg. number of events per chunk file / 40000

My question is where to find this number of events per chunk file?

Greg_Young1 · November 29, 2018, 4:12pm

You can find it by looking in the mapping of the chunk (assuming its a scavenged chunk) for an exact count … we should expose this.

There are also some tools that dump this (I dont remember if the tool is public I will put the question on the internal list). If its not I will make sure this tool becomes public

James_Connor · November 29, 2018, 8:56pm

Just FYI, we just had a PR merged that significantly changes this area of ES. I believe it is earmarked for 4.2 but we have been running a custom build in production with high load for almost 6 months now.

In summary, when an index merge occurs ES does two things, 1) merges the indexes into a single file using a merge sort algorithm 2) whilst it does this, it skips any records whose data is no longer available. 1) is fine but 2) is reasonably heavyweight (it has to check the relevant chunk of the data is still there)- the optimise index merge flag you mention is an optimisation for that part only (using a bloom filter).

With the aforementioned merged PR, 2) will be completely removed from the index merge process and instead be performed during a node scavenge operation (which is when data is cleaned up now so makes sense to pair them).

This has numerous benefits (you’ll see them more in large clusters):

1) merges are faster (as it now only performs a merge sort between the files)

2) have less impact on the cluster’s performance (as they don’t have to check if the data exists or not)

Both of which is what you want because they occur at a time of ES’s choosing, not yours

Finally, 3) the index entries are scavenged immediately after the data is scavenged, and happen at a time of your choice (and can be stopped at will etc)

I can’t remember what the confit option is for it off the top of my head but it should be simple to find.

James_Connor · November 29, 2018, 9:28pm

I should add that OptimizeIndexMerge will still work to speed up index scavenges but that is obviously less critical now you can control their timing (eg scavenge overnight or during off peak periods)

Greg_Young1 · November 30, 2018, 1:14am

Interestingly this is how index merges originally worked

Riccardo_Di_Nuzzo · December 6, 2018, 11:07am

I also open a new PR on that area that add a new option AutomergeIndexes (default=true) and admin/mergeindexes endpoint to manually trigger the Index Merge.
In addition to your Scavenge_Indexes PR this can add further control on the Merge Indexes operation.

Chris_Ward1 · December 6, 2018, 11:25am

Just out of interest Vilmantas, where did you find this in the docs? I can’t find it myself! Work is in progress to update everything regarding scavenging at the moment, and wondered where this information was.

Vilmantas_J · December 11, 2018, 8:01am

Hi, I found it in github releases page: https://github.com/EventStore/EventStore/releases
Index Merge Improvements

Chris_Ward1 · December 13, 2018, 9:05am

Ah right, makes sense. It’s now in the docs scheduled to be deployed with the next release