We have an EventStoreDB running that’s taking up 100GB, and we didn’t understand where that data is going, so we wrote a small Python tool to scan the $all
stream and count up events per stream. A $projections-ProjectionName-order stream is taking up one of the top spots, even though the projection itself is truncated.
I’ve performed the following reproduction steps:
- start a new EventStoreDB (5.0.11 in my test)
- Create a projection that links two streams, Test1 and Test2, into a stream Projection
- Dump 5000 events in Test1 and Test2
- Observe a lot of $projections-Projection-order events being created detailing the offsets in Test1 and Test2 for every event in Projection.
- Set a $maxCount of 1000 in Projection
- Dump events into a filler stream until the chunk is written
- Scavenge.
After these steps, only 1000 events from Projection remain in $all, but all 10000 events from $projections-Projection-order are still present.
What are we doing wrong? Also: how do we clean those up too?
First: I strongly encourage you to plan an upgrade to the 20.10.x or 21.10.x LTS versions.
The 5.0.x versions are out of support.
Let me try this out and get back to you.
Some quick math on the repro you gave tells me that $projection-…-order stream in this case will take approximatively 1MB of space in the data files. the space used in the indexes is negligeable in this case.
the total size due to the events themselves is probably a few orders of magnitude bigger.
How are you concluding they’re negligible? Using the “highly advanced debug technique” of string -n 10 chunk-000000.000001
, they just look like normal JSON blobs of {streamName: position, ...}
. And while the regular streams can be discarded and truncated regularly via $maxAge
/$maxCount
, the order streams will (apparently?) grow forever. Assuming a size of 100 bytes per event, even at 100 million events we’re already talking about 10GB per projection. Do the newer versions use a compressed format here?
edit: Reproduced the issue on 21.10.2-bionic. The format is unchanged.
edit: Especially for maxAge, it seems like it should be safe to truncate -order
automatically? Since the projection will never be able to have events before that point even if it’s reset?
How are you concluding they’re negligible
The space used in the indexes is negligeable
I’m talking about the space in the index files , not the data files.
The 1Mb I came up with was based on the repro you provided , not the real number of events in your production system, that was unknow at the time.
it seems like it should be safe to truncate -order automatically?
probably , it needs to be tested though, and since this is for the purpose of reodering , how many events to keep in that stream needs to be thought about .
1 Like
Ah okay.
Well, for our purposes, it would be entirely sufficient to have something like an assurance that setting maxAge or maxCount on an -order stream would not break things, instead just not generating events before that point on a rebuild. Then we could configure cleanup on our end. But when we tried that, there were some weird projection issues - and it’s not like the $projections-Foo-order streams are documented anywhere, anyways.
edit: Or even a way to turn them off if we don’t care about faithful recreation on reset.