The .chk files are part of the database this could be a bit clearer. If you think of the chunk files as representing an appending log they represent the position within that log (almost always increment only!). The are just essentially a pointer.
truncate.chk is saying if a truncate were to occur where it should occur to. think of it a cluster as "I know I am ok to here! so in weird failures etc lets truncate to this point and redo stuff not redo everything
chaser.chk is saying where the chaser process has committed itself to.
writer.chk is saying the last place written to the log the writer considers good to
obviously there some interactions between these.
With chaser there is a process following the log (much like a catchupallsubscription! who is going through and doing things). This just marks where its last known position is.
There has been documentation on them at varying points…
What you seem to want though is step by step instructions for your configuration/situation.
Providing for a default is fairly reasonable however as example:
-
not everyone runs as a “service” and in linux people also run under varying forms of management thus a large number of these steps can vary from install to install
-
the data is not necessarily in those places it depends on configuration those are defaults eg it becomes “where you configured your data directory to be”
-
for large databases you absolutely do not want to do this (it can take (a) day(s) to re-replicate, you want to come from backup first then re-replicate as example a 1 TB database will take quite some time over the network which in some cases could be downtime if say a node is lost!). I want to restore first then come up catch up starts from backup location.
-
sometimes (most common) you only want to restore a single node not all nodes in a cluster (node A has issue, put in node A1 (possibly restore from backup) … bring up).
-
incremental backups can also be done (faster but worth discussion) … most of those files are immutable.
-
tape based backup can be done
-
often running (an) offsite clone(s) which are backups is preferred (near real-time). File copy backups are literally “we had bombs go off in 2 data center type scenarios”. Remember you already have N copies of this data running live!
-
often (sustained traffic etc) you don’t want backups on primary nodes but prefer to run a clone which is then backed up
-
many run disk/hw level backups instead of software as primary
-
this is obviously a bit different in the cloud
-
many environments are backed up at a lower level
There quite quickly become a lot of “buts” and “ifs” involving situational information beyond “this is how you backup”
I believe there is some tooling coming around this as well
…
This is also seems like a place where a document covering “a few common scenarios” would likely be useful as opposed to just the “copy these files”.
Backups are actually a good example of where what is likely needed is more of a holistic doc of “so you want to setup like XYZ1 … here is basic layout … here is how deploys work … here is how backups work … here is monitoring ideas, here is how minor/major upgrades work … here is how restores work…”, in other words tying together things in a nice package as opposed to “here are your 27 options”