Archiving some of the events in the stream for backup

Hi

I have an Eventstore machine that’s been running in production for a while and, no surprise, it’s running out of disk space. I would like to archive events older than 3 months and move that older archived data somewhere else. So all my streams in this production instance would end up containing just the events emitted in the past 3 months and the db folder should contain just that.

Any hints how to do that? Seems I can move the chunk-. files elsewhere but I’m expecting EventStore to throw some sort of errors if it can’t find all the data in the db folder?

Thank you.

So the easiest way of doing this now would be to read from all then
write them off some place else (up until 3 months ago). You can then
put a max-age on the streams and those events will disappear. I will
look at copying files, that would be a possibly easier way (the one I
mention you can have varying retention strategies for varying streams
e.g. I am pretty sure you don't want to delete your user information)

Thank you. I am in the lucky case where my all of streams contain just user - products associations, so I can segment this data in any way and not loose meta data; user characteristics and products characteristics are stored elsewhere.

May I just reiterate to confirm I understood what you explained:

  • set up a different instance of ES, read from my streams up until 3 months ago (minus say a 1 day buffer - see last step) and publish all to this new instance; this will be the backup. I will have to live with the events’ time-stamps being different in this new instance;

  • set up max age on my current live instance to 3 months and wait for the db folder to shrink down;

  • set up max age back to ‘forever’, otherwise I will keep loosing the older than max-age data each day? (where I am assuming max age works as a sliding window; this is why I was thinking about the 1 day buffer in the first step).

About just moving the files, seems I am also lucky in that regard as they seem to be done by days. I’m wishfully thinking I could just move them and everything would still work. Sure this could work fine provided ES does not crash on next reboot. I presume it checks for some consistency between the data in index, the chk files, and the chunk-. files, I don’t know what the internals are. Or maybe die catastrophically if I try to retrieve an event whose chunk is gone etc.

You can even make this a long running process.

As you read from all moving events remember where you were
(checkpoint). Keep maxAge at say weeks (they disappear then) every
night move that days events over to the other system.

And sorry at 4 months! not 4 weeks.

Oh, I see, sounds good.

Are you talking about the scenario where I write the component to read from one and write to the other ES instance or just moving the files? What you explained could work for both :).

component to read from $all and write to other.

Thank you.

I just had an idea that’s a mix of both:

  • copy all of the files elsewhere, like in the normal backup procedure

  • set up max age

  • set up the automatic file backup system

When needed, this backup location will have all of the files that make up the complete streams (like I have never set max age) so I can just spin up an instance that has everything. So when I need older data it will just be a matter of getting an Amazon instance with enough disk space to run it for a short while.

Could this work or does setting max age do something to the .chk/index files so it won’t realize that there’s data older than max age available?

That will work so long as you disable chunk merging. If ever restoring
from it you would only see the last 3 months worth of data but if you
took that off you would see more.

The following $all with readallforward is probably the simplest option.

Got it, thank you.

I'm thinking of serializing the resolved events into json and continuously append to a file storage.
If I ever need them back into an ES db again, what would be the best way to reconstruct the db from the resolved events?
The order is not guaranteed (competing consumers) so first of all I would order them by their streams and then position I assume.
As with above, the new timestamps would not have any useful meaning.

If you use a catchup subscription it would be assured.

Also: https://github.com/EventStore/EventStore/issues/629

I imagined this was maybe a good use case for competing, that it would be a durable and light weight continuous backup.

A service runs 1-n writers, and a monitor that reads the subscription stats to see wether to add/remove writers. autoack = false and server keeps track of what has been processed.

Catchup and checkpoint does almost the same, except only one writer. You described a multicatchup scenario with a stream for coordination ("I lead") - would that be applicable?
Writing same event more than once isn't a problem though, can be filtered out when (if) reading them back. Maybe that's y competing not necessary here.

I would still stay on a catch up for this use case (better to make 2
catching up in total that to try to round robin consumers)

Hi

Have I misunderstood that setting maxAge to something will erase some of the files from the db folder? I have set it to 95 days, checked the metadata of the stream and it has been registered and also the stream appears to not start at 0 anymore - which is correct.

However I still have all of the files, including for the old dates, on the disk. I have made the change around 16h ago. Do they ever get erased or should I manually do it? Or I just have to wait for 24h for this to be picked up?

Don’t delete those files! After a scavenge operation, chunks get compacted not erased.

As Joao said. Run a scavenge ...

Thank you; I’m reading on this group that scavenge is an online operation and all I have to do is click the Scavenge button in the UI. There were some older messages saying something about chunk merging options begin set, is that still the case? I have the default setting for chunk merging.

The default setting is to merge chunks.

Yes this is an online operation. Note that scavenge does do disk based
operations and on a system with heavy load can cause a performance
drop while its running. The operation is best scheduled at times where
heavy load is not expected (it is writing to disk and as such might
cause contention on the disk with active writes). For systems not
under heavy load (1000s of writes/second) you will likely only see a
small increase in latency. If you are running in a cluster scavenging
a single node who is a slave will normally not show any performance
difference.

Cheers,

Greg