ERROR QueuedHandlerMRES] Error EventStore.Core.Messages.ClientMessage+ReadAllEventsForward in queued handler 'StorageReaderQueue #2'.

Hi James,
I had sent you the file last week (5/15) using ShareFile. Hope you received it.

Thanks

Bindiya

Hi Bindiya,

I’ve looked through the chunk file you sent over. The hashes verify as OK. Can you confirm whether you were running without database verification turned on?

James

It’s turned off.

Hey James,
Do we know anything else about the chunk? Is there a way to prevent this from happening in the future or a way to troubleshoot this at our end.

Thanks

" Is there a way to prevent this from happening in the future or a way to troubleshoot this at our end."

Store on a disk that is configured to be durable.

Hi Bindiya,

It looks like the following sequence of events has occurred:

  1. The Event Store was misconfigured to both skip verification (dangerous) and on a disk which was not configured to be durable

  2. At some point during writing the chunk in question, the Event Store has restarted, with a corrupt file on disk.

  3. Because database verification was skipped, the Event Store immediately started writing from the last checkpoint, which was not necessarily consistent with the actual chunk file because of the disk configuration.

  4. The file is therefore corrupt but still passes hash verification as this is calculated from the actual data in the file.

The morals of this story are:

  1. Don’t run with database verification turned off. If memory use by the Windows file cache is that much of a problem (though why are you even running other stuff on a database server?), use the sys internals utilities to limit that.

  2. Don’t run a non-replicated database on drives which are not actually durable (as Greg suggests). If you have Windows write caching turned on and your disk controller does not have a battery backup, you’re likely Doing It Wrong™.

  3. Don’t run databases without understanding how to configure them for your actual operational requirements.

  4. A corollary of (3) - don’t expect complex issues caused by misconfiguration to be resolved quickly by community support - commercial support exists for a reason and is comparatively cheap.

It may be possible to extract some data from the chunk file, but at minimum the stream where the corrupt record resides is likely to have lost some data. If you want us to investigate this for you, please contact us off-list and we’ll be happy to discuss
the rates for this work.

Regards,

James

Thank you so much Greg and James for replying. Ill find out what we want to do and get in touch outside of this thread.

Thanks for all the help,

Bindiya