Backing up EventStore (2.0.1)

We currently have Event Store 2.0.1.0 and we back it up by simply making a copy of the entire database directory (chunk files, *.chk files, index directory).

We do not stop EventStore.SingleNode.exe

At the time of our backup there are no connections.

We have one particular database that has (3) chunk files 0 thru 2. When we restore and start EventStore.SingleNode.exe we get the error below about unexpected chunk files 1 and 2.

Our solution was to stop EventStore prior to making the copy. This does not occur with any other restores for any other backups.

Can anyone shed some light as to why we are forced to stop EventStore.SingleNode before making a copy of this one database?

Configuration Result:

[Success] Name GetEventStore

[Success] DisplayName GetEventStore

[Success] Description GetEventStore Service

[Success] ServiceName GetEventStore

Topshelf v3.1.122.0, .NET Framework v4.0.30319.18063

The GetEventStore service is now running, press Control+C to exit.

[04176,01,21:14:11.036]

ES VERSION: 2.0.1.0 (master/549d96219418572625b2f68d46ec809d9e86f7df, Thu, 1 Aug 2013

18:26:54 +0100)

OS: Windows (Microsoft Windows NT 6.1.7601 Service Pack 1)

RUNTIME: .NET 4.0.30319.18063 (64-bit)

GC: 3 GENERATIONS

LOGS: C:\EventStoreDb\CONN1-logs

SHOW HELP: False ()

SHOW VERSION: False ()

LOGS DIR: ()

CONFIGS: ()

DEFINES: ()

IP: 172.10.16.132 (–ip from command line)

TCP PORT: 1113 (–tcp-port from command line)

SECURE TCP PORT: 0 ()

HTTP PORT: 2113 (–http-port from command line)

STATS PERIOD SEC: 30 (–stats-period-sec from command line)

CACHED CHUNKS: 1 (–cached-chunks from command line)

CHUNKS CACHE SIZE: 256 (–chunks-cache-size from command line)

MIN FLUSH DELAY MS: 2 ()

DB PATH: …\EventStoreDb\CONN1 (–db from command line)

SKIP DB VERIFY: False ()

RUN PROJECTIONS: All (–run-projections from command line)

PROJECTION THREADS: 3 ()

WORKER THREADS: 5 ()

HTTP PREFIXES: ()

ENABLE TRUSTED AUTH: False ()

CERTIFICATE STORE: ()

CERTIFICATE NAME: ()

CERTIFICATE FILE: ()

CERTIFICATE PASSWORD: ()

PREPARE TIMEOUT MS: 2000 ()

COMMIT TIMEOUT MS: 2000 ()

FORCE: False ()

[04176,01,21:14:11.098]

DATABASE: C:\EventStoreDb\CONN1

WRITER CHECKPOINT: 192759454 (0xB7D469E)

CHASER CHECKPOINT: 192759454 (0xB7D469E)

EPOCH CHECKPOINT: 167096955 (0x9F5B27B)

TRUNCATE CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

[04176,01,21:14:11.223] MessageHierarchy initialization took 00:00:00.0716099.

Exiting with exit code: 1.

Exit reason: Corrupt database detected.

Unexpected files: C:\EventStoreDb\CONN1\chunk-000001.000000,

                C:\EventStoreDb\CONN1\chunk-000002.000000, 

[04176,01,21:14:11.925] Unhandled exception while starting application:

Corrupt database detected.

Unexpected files: C:\ EventStoreDb\CONN1\chunk-000001.000000,

                C:\   EventStoreDb\CONN1\chunk-000002.000000, 

Corrupt database detected.

[04176,01,21:14:11.941] Exiting with exit code: 1.

Exit reason: Corrupt database detected.

Thanks

Chk files appear to be an old version etc was there an error during copy? They point into chunk 0

WRITER CHECKPOINT: 192759454 (0xB7D469E)

CHASER CHECKPOINT: 192759454 (0xB7D469E)

EPOCH CHECKPOINT: 167096955 (0x9F5B27B)

TRUNCATE

Thanks for replying Greg.

As we stated, the database was running (ES-SingleNode running), and RoboCopy /MIR (mirror) was used to grab all the files.

Is there a specific order of copy (such as what we have read on GES 3.0/1) while the engine is running?

Curiosity as to how you know the CHK files are pointing at Chunk 0.

You copy the .chk files first then on backup do the rename to truncate the same as specified

I have a half dozen 2.0.1 Event Store deployments with as many backup patterns
and have been asked by sys admins as to why there must be a two step backup of
the Event Store database. At the time we backup, our application pool has been
stopped and we have no connections. They are pushing back asking for a reason
before they begin rebuilding/testing their backup processes. With no database
connections, why are the chk files copied prior to the chunk files? I’ve also been
asked that if they were to stop EventStore.SingleNode would they still be required
to perform the backup in two steps ( chk files/chunk files) ? Also, will this apply
to version 3.0 (no connections at time of backup).

Thank you for you kind assistance.

There is no need to stop the database… It’s just an order of files copy with db running

If you prefer do a real time incremental backup.

If the node is totally stopped you can of course just copy the directory. This does not apply to no connections as there are things internally that change the db at various points. The .chk files will go away probably in the next 3-6 months at which point it will become “just copy directory”

The “correct” backup process is documented here:

http://docs.geteventstore.com/server/3.0.1/database-backup/

This applies to ALL versions of Event Store regardless of whether they are running in single or multi node configuration. I’ll add a note to make it clear that the nodes do not need to be stopped.

Even if there are no connections, the Event Store writes some internal metadata to it’s database (statistics etc) so you still need to follow this sequence.

James