I’m currently working on a solution where I need to scramble data stored in EventStoreDB. I understand that one approach is to publish new events reflecting updates to an aggregate’s current state. However, I would also like to explore ways to effectively scramble the data in the original events themselves.
Here are a few specific questions I have:
- Scrambling Existing Events:
- What is the best practice for scrambling or obfuscating data in existing events that have already been stored in EventStoreDB? Is it possible to modify past events while maintaining the integrity of the event log, or should I create new events to represent the changes and ignore or delete the original ones?
- Data Compliance and Auditing:
- In scenarios where scrambling is required for compliance (e.g., GDPR, CCPA), how do you ensure that the historical audit trail remains intact while also adhering to the data scrambling or anonymization requirements?
- Reprocessing vs. New Events:
- Is there a preferred method for handling scrambling at the event level, such as reprocessing old events or emitting new “scramble” events? How does this approach align with event sourcing principles?
- Performance Implications:
- Are there any known performance impacts when dealing with large datasets and applying a data scrambling mechanism? If so, how can they be mitigated?
- Scrambling Sensitive Data:
- For sensitive data fields in the events (e.g., PII), what encryption or scrambling techniques would you recommend that work well with EventStoreDB’s architecture?
- Tools & Libraries:
- Are there any existing tools or libraries specifically designed for scrambling or anonymizing event data in EventStoreDB that you could recommend?
I’m looking for a solution that balances compliance with data privacy laws and best practices within event-sourced systems. Any insights, strategies, or experiences from the community would be greatly appreciated!
Thanks in advance!