Restoring Backup to New Cluster

Hi,

I

’m having some trouble restoring a backup to EventStore.

I have an empty cluster running Commercial version 3.0.1, the backup is
from Open Source version 3.0.0_RC2

I’ve taken the following steps in order to restore the
backup:

·
Stopped a node on the new cluster

·
Copied chaser.chk from the backup to the db
location of the stopped node and renamed it as truncate.chk

·
Copied all other files to the db location of the
stopped node (including chaser.chk again – should I be doing this? And I’ve obviously not copied truncate.chk from the backup location as I’ve already written it to the new db location)

·
Restarted the stopped node

All seems to start off OK, I can see messages from the
IndexCommitter and the index appears to be rebuilding, but as it approaches
completion I get the following error:

[PID:00336:017 2014.11.20 12:02:56.215 DEBUG IndexCommitter ] ReadIndex Rebuilding: processed 674807 records (92.1%).

[PID:00336:017 2014.11.20 12:03:01.222 DEBUG IndexCommitter ] ReadIndex Rebuilding: processed 682258 records (94.9%).

[PID:00336:017 2014.11.20 12:03:01.425 DEBUG IndexCommitter ] ReadIndex rebuilding done: total processed 695584 records, time elapsed: 00:12:48.3055175.

[PID:00336:013 2014.11.20 12:03:01.456 INFO ClusterVNodeControll] ========== [xx.xx.xx.xx:8080] Service ‘StorageChaser’ initialized.

[PID:00336:013 2014.11.20 12:03:01.456 INFO ClusterVNodeControll] ========== [xx.xx.xx.xx:8080] SYSTEM START…

[PID:00336:013 2014.11.20 12:03:01.472 INFO ClusterVNodeControll] ========== [xx.xx.xx.xx:8080] IS UNKNOWN!!! WHOA!!!

[PID:00336:013 2014.11.20 12:03:01.503 DEBUG ElectionsService ] ELECTIONS: STARTING ELECTIONS.

[PID:00336:013 2014.11.20 12:03:01.519 DEBUG ElectionsService ] ELECTIONS: (V=0) SHIFT TO LEADER ELECTION.

[PID:00336:013 2014.11.20 12:03:01.519 DEBUG ElectionsService ] ELECTIONS: (V=0) VIEWCHANGE FROM [xx.xx.xx.xx:8080,{5a2bcf36-5d8a-47af-9fb1-3e45c6aacf64}].

[PID:03548:017 2014.11.20 12:50:02.534 FATAL StorageChaser ] Error in StorageChaser. SOMETHING VERY BAD HAPPENED.
Terminating…
EventStore.Core.TransactionLog.Chunks.TFChunk.InvalidReadException: Log record at actual pos 52550725 has non-positive length: 0. in chunk.
at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk.TFChunkReadSide.TryReadForwardInternal(ReaderWorkItem workItem, Int64 actualPosition, Int32& length, LogRecord& record) in c:\EventStore.CommercialHA\oss\src\EventStore.Core\TransactionLog\Chunks\TFChunk\TFChunkReadSide.cs:line 465
at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk.TFChunkReadSideUnscavenged.TryReadClosestForward(Int64 logicalPosition) in c:\EventStore.CommercialHA\oss\src\EventStore.Core\TransactionLog\Chunks\TFChunk\TFChunkReadSide.cs:line82
at EventStore.Core.TransactionLog.Chunks.TFChunkReader.TryReadNextInternal(Int32 retries) in c:\EventStore.CommercialHA\oss\src\EventStore.Core\TransactionLog\Chunks\TFChunkReader.cs:line 57
at EventStore.Core.Services.Storage.ReaderIndex.IndexCommitter.Init(Int64 buildToPosition) in c:\EventStore.CommercialHA\oss\src\EventStore.Core\Services\Storage\ReaderIndex\IndexCommitter.cs:line 81
at EventStore.Core.Services.Storage.StorageChaser.ChaseTransactionLog() in c:\EventStore.CommercialHA\oss\src\EventStore.Core\Services\Storage\StorageChaser.cs:line 99
[PID:03548:017 2014.11.20 12:50:02.596 ERROR Application ] Exiting with exit code: 1.
Exit reason: Error in StorageChaser. SOMETHING VERY BAD HAPPENED. Terminating…
Error: Log record at actual pos 52550725 has non-positive length: 0. in chunk.

``

Any ideas as to what is happening? I’m assuming I’m either
getting the process wrong, the two versions are not compatible or there is a
problem with the files from the backup.

There shouldn’t be any compatibility issues.

…OK. Can I get some sort of explanation into the stack trace? What does it actually mean? Was it an error building the indexes? Is the backup corrupt in some way? What steps should I be taking to resolve this?

Looks like the files are corrupt in some way basically it’s panicking trying to read from them

Almost certainly when you see this the chunk files were copied before the checkpoints - therefore the checkpoints refer to data which is just zero’s in the chunk file. This is likely fixable though is not straightforward.

If you have a commercial contract feel free to open a support ticket in Zendesk and we’ll see what we can do to fix this.

Cheers,

James

And in general then, to backup an ES directory, so long as we make sure to copy the checkpoints first as per https://eventstore.org/docs/server/database-backup/ then all should be well?

Yes, copying the checkpoint files first ensures that they will correspond to an existing position in the chunk files.
When restoring, you also need to copy chaser.chk to truncate.chk to force the database to truncate itself to the chaser position.

Regards,

Shaan

Yes. This is a reference issue. Just like with pushing some git commits without first pushing your submodule changes.