I am looking to automate scavenging across a 3 node cluster. Our installation is relatively large, so running a full scavenge will take many hours - close to a day - per node.
The plan was to run this in a staggered way, e.g. say node 1 on Monday, node 2 on Wednesday, node 3 on Friday or similar - to make sure not to bog down the cluster as a whole.
- Does this approach make sense (or is it better to “pull the bandaid” across the full cluster at once)?
- If so, it would seem reasonable that you want the scavenge operation to run on a non master node (as our writers connect directly to the master node and as such it’s under more load
- If so - is there a way to trigger failover - make sure that a particular node is not the master node via the API or some HTTP endpoint (other than brutally bringing it down)
Another approach that we have considered, though I hope we will not need it, is to keep 4 nodes - but only 3 in active use at any point - and run scavenge on a node that is temporarily taken out of the cluster, though this would then require it catching up again when re-joining.