Best practice for backup and restore of an Event Store?

Joannes_Vermorel · February 1, 2015, 9:43am

Hi All,

I would like to know what are the best practices recommended for the backup/restore of a production Event Store.

Currently, we are using file-based backup tools, however considering the nature of the Event Store, it would actually make a lot of sense to have a “secondary” Event Store, remotely located, with read-only access to everything that mirrors in near real-time everything anything that happens on the “primary” Event Store.

If the “primary” Event Store dies, then the “secondary” Event Store could take over.

What do you think? Is there any tooling available to do this live mirroring?

Thanks in advance,

Joannes Vermorel

Greg_Young1 · February 1, 2015, 10:32am

This is prt of what the clustering does. Running two master/slave with readonly has some issues though, how will you switch who can write? How will you determine when the other node is dead?

Joannes_Vermorel · February 1, 2015, 11:17am

Hi Greg, Thanks a lot for your follow-up! Yes, clustering is probably the way to go.

My intent is to not cover a scenario with auto-recovery (desirable indeed, but that’s another angle), merely to cover the disaster scenario along with manual recovery. In the past, we had a cloud provider deleting all our subscriptions all availability zones included, by mistake J

In terms of ingredient: the “backup” node should have only read-only access to production, and that production should have access backup (ideally the same apply for the people operating both backup and production respectively). My challenge here is not to solve availability problems, but rather disaster recovery.

How would you suggest to use clustering for this scenario? Thanks!

Greg_Young1 · February 1, 2015, 11:23am

If you dont need automatic recovery just drop a little program called shovel that reads fromall as a catch up sub and writes to other node

Joannes_Vermorel · February 1, 2015, 11:29am

Yes, indeed. That’s what I was thinking of. I wasn’t sure if there was a more packaged way. Thanks again.

Greg_Young1 · February 1, 2015, 11:30am

There will be a more optimized version coming out soon iirc its done and just needs testing

Scott_Cate · February 1, 2015, 2:53pm

Is shovel something included, or is that just a term?

We wrote a small util we call replicator, that solves this purpose of offline backup. It’s not meant to be an active node, just writing data for disaster recovery.

If it’ll help we can share the code.

@ScottCate

Greg_Young1 · February 1, 2015, 3:26pm

Sure why not. The one i had in mind though is < 20 loc. There is support built in for doing this btw (over replication) just need to check where it is testing wise

Joannes_Vermorel · February 1, 2015, 3:39pm

Hi Scott, Yes, a “replicator” is exactly what I am seeking. Don’t hesitate to share the code. J

Chris_Martin · February 1, 2015, 9:31pm

Our (Scott and I) code is a bit specialized as we moved from v1 to v3. I read using the old TCP client and write via HTTP. It was easier than running through DB upgrades.

I can clean it up a bit and share tomorrow or Tuesday.

Joannes_Vermorel · February 18, 2015, 7:27am

Hi Greg,

Do you have some ETA on this? This looks very interesting

Best regards,

Joannes

David_Cumps · December 9, 2015, 9:52pm

Did this ever make it?