I saw this topic discussed in the past in this forum but I’d like to know what is the current recommendation. Which option is the preferred one and why
Context: a multi national organization wants to isolate their data per site (up to 15 sites, about 10^6-10^7 events per site). Normal users should have only ever access to data of their site. Some power users can aggregate (some) data from 1 to many sites (depending on their configured rights)
having a single event store and the streams are prefixed by site name (or more general tenant name)
having an event store per site
Number 1 is easier to manage and data aggregation is super easy
Number 2 offers more isolation of the data. E.g. backups could be done per site. Also the data sets would be smaller
This is one of the questions where the answer is “it depends”.
From what you’ve described, I think currently the best option would be to have an event store per site (assuming these are physically separate sites and there’s high latency between them, for example), and to aggregate to some central reporting ES asynchronously.
Furthermore, it’s worth considering at what level in your architecture you want to enforce this - there are plans for ES to support Access Control Lists for streams soon (not yet certain but that’s likely to remain in the closed source version).
I have a related question, given the first option which is having a single event store and prefix streams with tenant name, what if you want to delete (wipe) a tenant, what is the most efficient way to do it? do you have to run a projection that list all streams by prefix and delete one by one ? is there any better way?
What if you want to replay all stream events of a given tenant to rebuild views without affecting other tenants?
Assuming a distributed environment (we always assume this) yes your best bet would be to write a projection. There are a few ways of handling this depending how large the data is. We are in process of adding a fromStreamsMatching(function(s){}) (generalixation of fromCategory) that could match streams defined by the function. It works very easily for historical data but not so much for realtime which is what we are trying to figure out
OK I have a question along these lines. As I understand when I delete a stream the stream is still in the event store just unreachable unless I use a projection. In the case of a multi tenancy application what happens when a client does not want to use my companies service any more and we contractually obligated to delete anything we know about them? Is it possible to physically delete all of their events from the event store.
You have to scavenge DB after you deleted stream to physically remove the data. All events from deleted streams are removed during scavenging, except the rare cases, where prepare and corresponding commit records are in two different physical chunks.
Actually, just checked the scavenging algorithm All prepare record (containing events) will be removed, some of commits may stay if they are in different physical chunk. So scavenging completely solves your problems.