Anonymisation of EventStore data.

Why do the stopping part? You can do this without any stopping of nodes…

image.png

Quick note to let you know this is still on my horizon and I’m still intending to make it, if nobody else beats me to it.

Hi,

I am interested in this subject. I tried a very simple impl on my side also : catchupsubscription on $all from beginning (SubscribeToAllFrom(Position.Start right?) takes a lot of time, no ? I let it run on a workstation overnight (~14h), and it treated only around 260k events out of 15M.

Is it what you intend to do ? Does it take so much time on your side ? Did you try another strategy ?

NB: we tried to read $streams, and rebuild each stream in parallel, it takes around 1h30 on the same computer. The drawback is that it does not keep order between events of different streams, but we should not rely on this BTW, right ?

Thanks for your feedback.
Clément

Hi Greg,

Did anyone get around to writing such a tool?

If not we are planning in trying out a subscription setup as described by yourself to push data from the production event store to a ‘staging’ one with anonymisation in between on certain personal data related events.

Kind regards,

Mark

Hi Mark, I haven’t got to a place at work yet where I need to start on it, but intend doing a really good job on it when the time comes. Would it be possible to start an open source project that allows injection of event modifiers?

To be fair such a system would take all of about an hour to write at a basic level maybe a day for something a bit more ready for use (eg being able to specify base/custom transformations on fields for anonymization etc).

As a starting point just take a catch up subscription -> transform through pipeline -> write

Then its a matter of filling out the possible transformations

Hi Greg,

You say it takes few hours, but don’t you encounter too long duration reading $all as described below ? Or am I missing something ?

Thanks