Verifying Hash

Ryan_Dunn · April 22, 2013, 11:43am

We are experiencing very long hash verification times when running this in Windows Azure using an attached data disk:

[PID:1268 2013.04.22 11:06:55.499 Trace TFChunk 1]

Verifying hash for TFChunk ‘f:\eventdb\chunk-000000.000000’…

[PID:1268 2013.04.22 11:14:20.712 Trace TFChunk 1]

Verifying hash for TFChunk ‘f:\eventdb\chunk-000001.000000’…

[PID:1268 2013.04.22 11:22:11.615 Trace TFChunk 1]

Verifying hash for TFChunk ‘f:\eventdb\chunk-000002.000000’…

[PID:1268 2013.04.22 11:29:14.411 Trace TFChunk 1]

Verifying hash for TFChunk ‘f:\eventdb\chunk-000003.000000’…

[PID:1268 2013.04.22 11:36:18.104 Trace TFChunk 1]

Verifying hash for TFChunk ‘f:\eventdb\chunk-000004.000000’…

This is causing the instance to not respond to requests for quite a bit of time (which brings everything crashing down). Any idea on a.) how to speed this up? or b.) how to skip this step?

Not sure if skipping this step would be safe, but the verification is taking way too long right now.

Greg_Young1 · April 22, 2013, 11:47am

Wow minutes to compute a hash on a 255mb file?! are you sure this is a
local disk? What is the processor.

This step can be skipped see --skip-db-verify
https://github.com/EventStore/EventStore/wiki/Running-the-Event-Store#on-windows-and-net

Greg_Young1 · April 22, 2013, 11:54am

Are these sequential messages in the log? These are all for the same chunk.

Ryan_Dunn · April 22, 2013, 12:13pm

Those are each different files (eg chunk-00000x.000000, where x is different). They are 255MB each and they are on an attached data disk, which is a pass through disk to blob storage. The performance I am sure is related to this, but it is particularly bad here.

Machine size is S(mall) - so 1.7 Ghz and ~2GB RAM.

Greg_Young1 · April 22, 2013, 12:16pm

I have not tried with blob storage before but it sounds quite slow
even if its pulling over 255mb at a time.

I might also guess that since the files are immutable, if you dont
take them over during the checksuming (likely cached after) you will
probably see slowdowns some place else. 2+ minutes to download 255mb
seems a bit on the high side though.

Cheers,

Greg

Yevhen_Bobrov · April 22, 2013, 1:22pm

Try creating Striped Volume (4 disks or more), that may give you additional performance.

Yevhen_Bobrov · April 22, 2013, 1:25pm

Also make sure you Storage Account is Gen 2.

Ryan_Dunn · April 25, 2013, 12:58pm

It’s a newer account with the higher thresholds. I don’t want to stripe disks at this point because it just means more complexity and more disks that could get corrupted.

From what I can tell, it looks like the event store db was corrupted and we ended up losing all our data. I still have our entire folder of files that is dead now if anyone wants to investigate.

Greg_Young1 · April 25, 2013, 1:01pm

How did this go from long verificiation times to the event store db
being corrupted? Did I miss an email somewhere?

Even on a corrupt db we can extract probably 99% of the information
from the db using internal utilities we have.

I am a bit confused what happened in this thread, can you clarify?

Greg

Ryan_Dunn · April 25, 2013, 1:16pm

You didn’t miss anything. You answered the question. My response was more for Yevhen regarding why we are not going to stripe disks in azure. The perf is more than adequate when not doing a hash verification.

The original intent here was to figure out how we could avoid these super long verification times. Every time our VM was rebooted, our service was down for a long time while this hash verification occurred. You answered that we could skip it - so, I did. That showed me that something else was wrong - the eventstore server no longer responded, everything just timed out. We ended up having to create a new db and rebuild the entire eventstore again from our views (held elsewhere luckily). That was the only way that the eventstore became responsive again. I am just assuming at this point it is because the db was corrupted. That might be a bad assumption.

Greg_Young1 · April 25, 2013, 2:22pm

What do the logs say?

Ryan_Dunn · April 26, 2013, 5:24am

Nothing that interesting that I can find. For that particular day, the only thing I found that was not in other logs was:

ReadIndex Rebuilding: processed 100000 records.

Perhaps it was stuck rebuilding the index when I gave up and just created a new eventstore db.

Greg_Young1 · April 26, 2013, 6:28am

It does that every time you restart