Rock-hard deletes in non-production environments

Okay so I understand the reasons for soft-deletes (caching) and that normally, the tombstone left behind from hard-deletes preventing re-creating that stream (caching again).

Like a few other questioneers, my use-case is for resetting a db in QA scenarios. Rather than stop the service, delete the entire /data directory and fire it up again (which will require re-setting the admin password etc), I’d like to use the new “unsafe-ignore-hard-deletes” option (see https://groups.google.com/d/msg/event-store/dF48Hk2QOhY/SYaqlB_vGAAJ and https://github.com/EventStore/EventStore/pull/853/commits/e36d9797764a99a19412bd2f8e7c9914dbf5cefb for more information).

However, I have a few questions about it’s usage:

  1. If I hard-delete a stream before enabling this option, will subsequent scavenging operations still go back and delete the tombstones of those hard-deleted streams?
  2. Having hard-deleted a stream and run the scavenging operation, do I really need to stop the node, delete it’s index directory, and re-start the node, or can I just re-create the stream straightaway after the scavenging is complete?
  3. If so, then assuming a clustered environment, is it okay to repeat this for each node rather than bringing them all down at once?
    Cheers, ears!

I would not really recommend using ignore hard deletes like this. Its
mostly if you accidentally use hard deletes and later want to recreate
the streams.

So the scavenge only affects the chunk files (not including the
current chunk) so this won't delete everything.

But the answers to your questions:

If I hard-delete a stream *before* enabling this option, will
subsequent scavenging operations still go back and delete the
tombstones of those hard-deleted streams?

With the option enabled yes it will go back and remove those hard
deletes. Once you turn the option off it won't remove hard deletes.

Having hard-deleted a stream and run the scavenging operation, do I
really need to stop the node, delete it's index directory, and
re-start the node, or can I just re-create the stream straightaway
after the scavenging is complete?

Yes as the index needs to be recreated.

If so, then assuming a clustered environment, is it okay to repeat
this for each node rather than bringing them all down at once?

As I said it would be easier to just start with fresh dbs

Fair enough. My motivation for not just deleting the dbs entirely is that they are shared. In other words, if we have two apps in QA that use GES, they’ll both be using the QA “instance” of GES. We don’t have a GES instance per-environment-per-application-per-clusternode, we just have a GES instance per-environment-per-clusternode.

So in other words, if I delete all my data for a fresh start, I delete the test data for the other apps using that environment at the same time. (Side note: Out stream ID format is [app][environment][entityname]-[guid]).