chunk gets corrupted after restart

Hi,

Can anyone explain the following error popping up when running Eventstore 1.0.1 on Mono?

(We can reproduce it on Ubuntu or Debian.)

It happens every now and then, usually after eventstore process is restarted after being killed.

After that, the process is running but does not accept client connections.

For now we fix this by clearing the db directory, but it would be nice to come up with a better solution before we go live.

Here’s the tail of logs at the time of failure:

OS: Linux (Unix 3.2.0.23)

RUNTIME: 3.0.3 ((no/39c48d5 Wed Mar 13 15:05:01 UTC 2013) (EventStore build) (64-bit)

GC: 2 GENERATIONS

LOGS: /home/eventstore/db-logs

SHOW HELP: False ()

SHOW VERSION: False ()

LOGS DIR: ()

CONFIGS: ()

DEFINES: ()

IP: 192.168.33.10 (–ip from command line)

TCP PORT: 1113 ()

HTTP PORT: 2113 ()

STATS PERIOD SEC: 30 ()

CACHED CHUNKS: 2 ()

DB PATH: /home/eventstore/db (–db from command line)

SKIP DB VERIFY: False ()

RUN PROJECTIONS: False ()

PROJECTION THREADS: 3 ()

TCP SEND THREADS: 3 ()

HTTP RECEIVE THREADS: 5 ()

HTTP SEND THREADS: 3 ()

HTTP PREFIXES: ()

[PID:20713 2013.07.05 10:38:52.416 Info ProgramBase`1 1]

DATABASE: /home/eventstore/db

[PID:20713 2013.07.05 10:38:52.719 Trace TFChunk 1]

CACHED TFChunk #0 at chunk-000000.000000 in 00:00:00.0187851.

[PID:20713 2013.07.05 10:38:52.847 Info MiniWeb 1]

Starting MiniWeb for /web ==> /home/eventstore/eventstore/singlenode-web

[PID:20713 2013.07.05 10:38:52.848 Info MiniWeb 1]

Starting MiniWeb for /web/es ==> /home/eventstore/eventstore/es-common-web

[PID:20713 2013.07.05 10:38:52.848 Info MiniWeb 1]

Starting MiniWeb for /web/es/js/projections ==> /home/eventstore/eventstore/singlenode-web/js/projections

[PID:20713 2013.07.05 10:38:52.848 Trace MiniWeb 1]

Binding Mini Web to /web/{*remaining_path}

[PID:20713 2013.07.05 10:38:52.848 Trace MiniWeb 1]

Binding Mini Web to /web/es/{*remaining_path}

[PID:20713 2013.07.05 10:38:52.849 Trace MiniWeb 1]

Binding Mini Web to /web/es/js/projections/{*remaining_path}

[PID:20713 2013.07.05 10:38:52.902 Info SingleVNodeController 14]

========= SystemInit: SingleVNodeController =========

[PID:20713 2013.07.05 10:38:52.971 Info HttpAsyncServer 14]

Starting HTTP server on [http://192.168.33.10:2113/]…

[PID:20713 2013.07.05 10:38:52.997 Info HttpAsyncServer 14]

HTTP server is up and listening on [http://192.168.33.10:2113/]

[PID:20713 2013.07.05 10:38:53.037 Trace InMemoryBus 14]

SLOW BUS MSG [MainBus]: SystemInit - 91ms. Handler: HttpService.

[PID:20713 2013.07.05 10:38:53.066 Trace QueuedHandlerAutoReset 14]

SLOW QUEUE MSG [MainQueue]: SystemInit - 158ms. Q: 0/802.

[PID:20713 2013.07.05 10:38:53.067 Error QueuedHandlerAutoReset 7]

Error while processing message EventStore.Core.Messages.SystemMessage+SystemInit in queued handler ‘StorageWriterQueue’.

System.ArgumentException: Log record at actual pos 15382101 has non-positive length: 0. Something is seriously wrong in chunk 0-0 (/home/eventstore/db/chunk-000000.000000).

at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk+TFChunkReadSide.TryReadForwardInternal (EventStore.Core.TransactionLog.Chunks.TFChunk.ReaderWorkItem workItem, Int32 actualPosition, System.Int32& length, EventStore.Core.TransactionLog.LogRecords.LogRecord& record) [0x00000] in :0

at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk+TFChunkReadSideUnscavenged.TryReadClosestForward (Int32 logicalPosition) [0x00000] in :0

at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk.TryReadClosestForward (Int32 logicalPosition) [0x00000] in :0

at EventStore.Core.TransactionLog.Chunks.TFChunkSequentialReader.TryReadNextInternal (Int64 position, Int32 retries, Boolean allowNonFlushed) [0x00000] in :0

at EventStore.Core.TransactionLog.Chunks.TFChunkSequentialReader.TryReadNext () [0x00000] in :0

at EventStore.Core.Services.Storage.ReaderIndex.ReadIndex.Build () [0x00000] in :0

at EventStore.Core.Services.Storage.StorageWriterService.EventStore.Core.Bus.IHandle<EventStore.Core.Messages.SystemMessage.SystemInit>.Handle (EventStore.Core.Messages.SystemInit message) [0x00000] in :0

at EventStore.Core.Bus.MessageHandler`1[EventStore.Core.Messages.SystemMessage+SystemInit].TryHandle (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.PublishByType (EventStore.Core.Messaging.Message message, System.Type type) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.DispatchByType (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.Publish (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.Handle (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.QueuedHandlerAutoReset.ReadFromQueue (System.Object o) [0x00000] in :0

So, could anyone shed some light on why this happens or how to solve it without clearing all of the data?

regards,

Osvaldas

Can you upgrade to the dev branch? Its about to move to production
(literally probably early next week) and I believe this is a
correction of a known issue that was worked around by us here is a
link to the bug https://bugzilla.xamarin.com/show_bug.cgi?id=11699.
Actually a bug in mono's filestream implementation. Otherwise we can
fix that db for you but if you can throw it away it would save some
time :slight_smile: If you get the same issue please let us know as we believe we
have worked around that bug.

Cheers,

Greg