Hi all,
The organisation I work for are looking at how we can achieve the new GDPR requirements with the event store. Given that the events in the event store are immutable, which approaches have worked well for those who have needed to ‘delete’ customer data.
Encryption strategies are the only way in which I think you can delete data (crypto-thrash)?
Thoughts?
Kind regards,
Mark
For all events up to a certain point, truncateBefore.
For specific events, rewrite the stream as a new stream.
You can delete a stream then scavenge the database and it will be physically removed from disk.
Right now it will only actually rewrite the chunk if the result is smaller than the original but we could trivially add a flag to do it anyways.
Hi Greg,
Thanks for the response here. I think that would work for us in that a Customers data is on its own stream therefore deleting that stream would be useful, however if that data is then projected onto other streams I’d guess we’d have a problem on that stream(???), but I’ll have a look into prototyping this to see what that looks like.
Kind regards,
Mark
Is it projected with a linkTo? If so the linkTo will stop working when the underlying event is deleted and will later be scavenged away
This fits with what we are seeing when deleting and scavenging. Are there plans to add this flag?
It would be pretty trivial to add just a bit #WTF (we are making your data bigger by removing some)
So removing a stream and scavenging can result in a larger chunk?
Unfortunately we are trying to find a way to ensure the data is removed in order to comply, even if this results in the chunk being larger.
Yes it can. In order to remove data a map needs to be written adjusting positions. The map can be larger than the data saved by scavenging. It is trivial enough to disable this check but by default it looks to see if it will actually save space by removing things given the cost of the map.
Thanks Greg, I understand.
Sorry to repeat my initial question, are there any plans to implement the ability to disable this check? The ability to ensure we are removing data is more important to the business I am working for than the data not taking additional space.
Its a pretty trivial thing to do. Want to put up a github issue? My guess is its a few hours of work mostly to add the command line option
Thanks Greg, I have done that. #1570
Hi guys,
Reviving this thread, as our org is looking into this right now as
well (last minute, but still).
For me, a small fraction of user/customer events would be considered
private data (change name/email/etc.), and the rest are non-private.
The easiest way I can think of to be GDRP-compliant is to to use
symmetric AES encryption for just the private data. So encode those
event fields (name,email) with a user specific key (generated on user
creation). That way the event store and any read models are considered
"anonymised", and therefore don't contain private data. Same with
backups(!). You only decode the encrypted fields on viewing in UI,
using the user specific key (probably along with some other keys as
well, we will use three keys total).
When a user invokes Article 17 ("right to erasure") you throw away
that users key. Your system, as a whole, is now considered purged of
that private data, and you didn't have to change any databases or
event store.
Seems like a fairly straight forward way to go about it.
/Rickard
Hi Andrew,
Good observations. For us, invoking Right to Erasure means you’re no longer a customer to us, so we’ll just delete everything. We have no internal needs for erasure, so this is just for compliance.
We do full blue/green deployments all the time which involves importing events from one existing cluster to another fresh one, which gives us a place to do encryption of existing events as well as reencryption in the future. YMMV of course.
Hi,
We are taking the same approach, anonymising fields where necessary whilst providing access controls around who can see the data.
What encryption are you planning to use?
Kind regards
Sean.