Handling GDPR data requests

bernard.oflynn · December 9, 2020, 6:50pm

Hi
We’d like to use Eventstore to store insurance type data. We’d have multiple EventStoreDBs (one per microservice)
Under GDPR we need to be able to delete a particular customer’s data.
I’ve had one engineer telling me that the only way to do this is to split out the PII data into a separate mongodb so that it can be deleted from there because ‘EventStore doesn’t allow data to be deleted’
Another proposed solution was to encrypt PII data in the event but this makes aggregations, etc. a bit hard The encryption key for that customer could be deleted and this would essentially make the data inaccessible.
However a quick read through the docs shows me that we probably should be creating a stream per customer (or any other entity that stores PII) in each microservice and then just delete that stream and do a scavenge and that should remove all the customer events containing PII and thereby the data.
Is this the best way to deal with GDPR? Are there downsides to this approach?
Does it have an impact on any aggregations, projections, etc.?

Thanks
Bernard

steven.blair · December 9, 2020, 7:50pm

People with more experience might have a better answer, but I would be tempted to say create a new stream customer with sensitive data. That way you can freely truncate it, and not worry about losing non sensitive events from your Customer.

alexey.zimarev · December 10, 2020, 4:49pm

Separating streams is one way to do it, definitely. Writing a deletion event and truncating the stream there will remove the personal data at the next scavenge and also tell projections to delete any read model that contain personal data for that person. Crypto-shredding is another way to do it, when deleting the key makes personal data unreadable. We have an article for that in the pipeline, but it needs some work before we publish it.

michael1 · December 11, 2020, 2:05pm

Hello Bernard,

As GDPR also requires to encrypt sensitive data in general, we chose to encrypt those parts of the messages and all other personal data stored in the ES. This makes the data in the streams more secure and also allows to “delete” something by throwing away the person’s individual encryption key.

Any projection (outside the ES) will of course need access to the currently used encryption keys. HashiCorp Vault is a nice way to share this information between your different software layers or components. Projections inside the ES will unfortunately not work for such encrypted data, but that was not a problem in our case.

There are also other scenarios were encryption is quite handy. Think about sharing streams with a restricted audience. Everyone can read the stream, but not everyone is allowed to decode all parts of a message. You can easily publish symmetric encryption keys and encrypt them with with public keys of the targeted audience. (The symmetric encryption key should be changed quite often to allow revoking access for later messages)

Cheers,
Michael

alexey.zimarev · January 6, 2021, 9:26pm

I don’t think GDPR suggests encryption but it doesn’t “require to encrypt sensitive data in general”, although it’s a good thing to do, as you suggest. GDPR says you should protect personal data and prevent such data from leaking outside of the organisation and from being accessed by those who aren’t authorised to access it. But, I might be wrong in terms that some industries can have stronger requirements.

Lucas · May 14, 2021, 4:08pm

We would be very interested in using crypto-shedding, any news regarding the article?