What would best practise be for handling about 1gig of data a day?

I appreciate the answer to this problem is do not event source everything / as much. The problem is the goal is 10x growth which goes up to 10gig a day.

This isn’t a storage problem per say - its more a cost * time problem. In this situation old streams are probably not going to be read again - really streams once they are over 5 days old will not get read again. IE the investment in storage for them over time will never give a return after 5 days (Except a few cases with audit requirements - but keeping them is fine).

There must be many systems that operate at this level but are cost-sensitive.
What is the best practice?

I could just delete them, but Feels Bad Man.
Is there a way of moving part of the store onto cold storage (and then back?!)?

I feel like I’ve backed myself into a simple logical contradiction - just want someone to confirm my options are simply
a) delete it
or b) pay for it.

or even better give me more, make this a real interesting bit of pain :smiley:

Hi @john_nicholas,

I’m relatively new to Event Sourcing, so take this with a grain of salt. But poking around the DDD/CQRS forums, I’ve heard the advice given to follow the example of accountants, who have a year-end process to close out the old books, and start new ones. In that instance, the final account balance is calculated, and new books are started with the previous year’s balance the first entry. The old books can be consulted if need be, but they don’t need to be actively carried around. (And similarly to database sizes, ledger books only have so many pages)

If your data really isn’t useful after 5 days or so, perhaps you can have some automatic process of archiving old streams (copy to S3 or something) so that you don’t need to keep carrying them around in your event store?

There are multiple strategies that come to my head.

  • When there’s a guarantee that the stream won’t be updated, it can be snapshotted to the latest state and closed by an explicit event saying “you can’t write here anymore”. The snapshot would represent the latest entity state, which no longer can be changed. Truncate the stream before that event. The drawback is that you lose the history.

  • Similarly, offload the state to another storage, like a blob. The downside is the same, plus if you need to load the entity state, you’d need to poke two storages.

  • Keep regular snapshots for each stream (usual pattern when you have lots of events in one stream). Decide how much history you’d need in addition to snapshots.