Best practice for backup and restore of an Event Store?

Hi All,

I would like to know what are the best practices recommended for the backup/restore of a production Event Store.

Currently, we are using file-based backup tools, however considering the nature of the Event Store, it would actually make a lot of sense to have a “secondary” Event Store, remotely located, with read-only access to everything that mirrors in near real-time everything anything that happens on the “primary” Event Store.

If the “primary” Event Store dies, then the “secondary” Event Store could take over.

What do you think? Is there any tooling available to do this live mirroring?

Thanks in advance,

Joannes Vermorel

This is prt of what the clustering does. Running two master/slave with readonly has some issues though, how will you switch who can write? How will you determine when the other node is dead?

Hi Greg, Thanks a lot for your follow-up! Yes, clustering is probably the way to go.

My intent is to not cover a scenario with auto-recovery (desirable indeed, but that’s another angle), merely to cover the disaster scenario along with manual recovery. In the past, we had a cloud provider deleting all our subscriptions all availability zones included, by mistake J

In terms of ingredient: the “backup” node should have only read-only access to production, and that production should have access backup (ideally the same apply for the people operating both backup and production respectively). My challenge here is not to solve availability problems, but rather disaster recovery.

How would you suggest to use clustering for this scenario? Thanks!

If you dont need automatic recovery just drop a little program called shovel that reads fromall as a catch up sub and writes to other node :slight_smile:

Yes, indeed. That’s what I was thinking of. I wasn’t sure if there was a more packaged way. Thanks again.

There will be a more optimized version coming out soon iirc its done and just needs testing

Is shovel something included, or is that just a term?

We wrote a small util we call replicator, that solves this purpose of offline backup. It’s not meant to be an active node, just writing data for disaster recovery.

If it’ll help we can share the code.

@ScottCate

Sure why not. The one i had in mind though is < 20 loc. There is support built in for doing this btw (over replication) just need to check where it is testing wise

Hi Scott, Yes, a “replicator” is exactly what I am seeking. Don’t hesitate to share the code. J

Our (Scott and I) code is a bit specialized as we moved from v1 to v3. I read using the old TCP client and write via HTTP. It was easier than running through DB upgrades. :wink:

I can clean it up a bit and share tomorrow or Tuesday.

Hi Greg,

Do you have some ETA on this? This looks very interesting :slight_smile:

Best regards,

Joannes

Did this ever make it?