EventStore crashes with StorageReaderQueue at 100%

Hi everyone,

I am doing some experiments looking forward to a task of Stream renaming discussed here.

Today I have started my day with a really simple projection just to warm myself up after a week on other tasks, the projection is a basic raw counter:

fromAll()

** .when({**

** $init: function() {**

** return { counter: 0 };**

** },**

** $any: function(state, event) {**

** state.counter++;**

** }**

** })**

When projection was at 81% EventStore crashed. I was not logged via SSH but the web interface was giving me a 503. I ssh into the machine and I have restarted the EventStore, but now it goes down after 4-5 minutes.

There are no strange warnings, the only two signs are:

[PID:09092:013 2014.04.03 08:17:58.735 ERROR QueuedHandlerAutoRes] Error while processing message EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage in queued handler ‘Projection Core #0’.

System.NullReferenceException: Object reference not set to an instance of an object

at System.Collections.Generic.InternalStringComparer.GetHashCode (System.String obj) [0x00000] in :0

at System.Collections.Generic.HashSet`1[System.String].GetItemHashCode (System.String item) [0x00000] in :0

at System.Collections.Generic.HashSet`1[System.String].Contains (System.String item) [0x00000] in :0

at EventStore.Projections.Core.Services.Processing.EventByTypeIndexEventReader+TfBased.Handle (EventStore.Core.Messages.ReadAllEventsForwardCompleted message) [0x00000] in :0

at EventStore.Core.Messaging.SendToThisEnvelope.ReplyWith[ReadAllEventsForwardCompleted] (EventStore.Core.Messages.ReadAllEventsForwardCompleted message) [0x00000] in :0

at EventStore.Projections.Core.Messaging.PublishToWrapEnvelop+c__AnonStorey2`1[EventStore.Core.Messages.ClientMessage+ReadAllEventsForwardCompleted].<>m__0 () [0x00000] in :0

at EventStore.Projections.Core.Messaging.UnwrapEnvelopeHandler.Handle (EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage message) [0x00000] in :0

at EventStore.Core.Bus.MessageHandler`1[EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage].TryHandle (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.Publish (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.Handle (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.QueuedHandlerAutoReset.ReadFromQueue (System.Object o) [0x00000] in :0

And

This is not version 3 but 2.0.1 for Mono. How can I understand what is happening?

Cheers,

Mirko

Mirko,

  1. Do you have any deleted stream in the system?

  2. Do you have any event expiration/truncation configured on your streams?

-yuriy

Hi Yuriy, yes I have some deleted streams in my system but no expiration/truncation configured on streams.

Mirko

Mirko,

when you start the ES do you see a position where the projection fails on UI? or is it inaccessible completely?

It has just crashed also with “custom” projections disabled and only the SYSTEM projections running.

Let me restart it and check the log for other info, I also keep 2 chrome tab to post the position you are asking.

Mirko

Yuriy this is the situation of the positions from the UI.

Thanks

Mirko,

the screenshot doe snot show any custom projections. If the system crashes under load with just system projections running
it is likely not to be related to projections at all.

What mono version do you use? We had some mono versions crashing in GC under load.

If you start ES from the terminal window, do you see any output after the crash?

-yuriy

Yes, it is because I have disabled custom projections to avoid the problem but it happened again. Here are the info about Mono.

ES VERSION: 2.0.1.0 (master/549d96219418572625b2f68d46ec809d9e86f7df, Thu, 1 Aug 2013 18:26:54 +0100)

OS: Unknown (Unix 3.2.0.49)

RUNTIME: 3.0.12 ((no/d6c5db8 Thu Mar 6 11:54:33 CET 2014) (EventStore build) (64-bit)

GC: 2 GENERATIONS

If the output from the terminal is reported into ES logs I can’t see anything but this (on the error.log):

[PID:10991:012 2014.04.03 09:14:21.528 ERROR QueuedHandlerAutoRes] Error while processing message EventStore.Core.Messages.ClientMessage+ReadStreamEventsBackwardCompleted in queu

ed handler ‘Projections Master’.

System.Exception: The projection cannot be loaded as stopped as it was stored in the old format. Update the projection query text to force prepare

at EventStore.Projections.Core.Services.Management.ManagedProjection.BeginCreatePrepared (EventStore.Projections.Core.Services.ProjectionConfig config, System.Action onPrepared

) [0x00000] in :0

at EventStore.Projections.Core.Services.Management.ManagedProjection.CreatePrepared (System.Action onPrepared) [0x00000] in :0

at EventStore.Projections.Core.Services.Management.ManagedProjection.LoadCompleted (EventStore.Core.Messages.ReadStreamEventsBackwardCompleted completed) [0x00000] in <filename

:0

at EventStore.Projections.Core.Services.Management.ManagedProjection.m__10 (EventStore.Core.Messages.ReadStreamEventsBackwardCompleted completed) [0x00000] in <filen

ame unknown>:0

at EventStore.Core.Messaging.RequestResponseDispatcher`2[EventStore.Core.Messages.ClientMessage+ReadStreamEventsBackward,EventStore.Core.Messages.ClientMessage+ReadStreamEvents

BackwardCompleted].Handle (EventStore.Core.Messages.ReadStreamEventsBackwardCompleted message) [0x00000] in :0

at EventStore.Projections.Core.Services.Management.ProjectionManager.Handle (EventStore.Core.Messages.ReadStreamEventsBackwardCompleted message) [0x00000] in

:0

at EventStore.Core.Bus.MessageHandler`1[EventStore.Core.Messages.ClientMessage+ReadStreamEventsBackwardCompleted].TryHandle (EventStore.Core.Messaging.Message message) [0x00000

] in :0

at EventStore.Core.Bus.InMemoryBus.Publish (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.Handle (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.QueuedHandlerAutoReset.ReadFromQueue (System.Object o) [0x00000] in :0

and this from the application.log:

[PID:11651:013 2014.04.03 09:30:07.761 ERROR QueuedHandlerAutoRes] Error while processing message EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage in queued handler 'P

rojection Core #0’.

System.NullReferenceException: Object reference not set to an instance of an object

at System.Collections.Generic.InternalStringComparer.GetHashCode (System.String obj) [0x00000] in :0

at System.Collections.Generic.HashSet`1[System.String].GetItemHashCode (System.String item) [0x00000] in :0

at System.Collections.Generic.HashSet`1[System.String].Contains (System.String item) [0x00000] in :0

at EventStore.Projections.Core.Services.Processing.EventByTypeIndexEventReader+TfBased.Handle (EventStore.Core.Messages.ReadAllEventsForwardCompleted message) [0x00000] in <fil

ename unknown>:0

at EventStore.Core.Messaging.SendToThisEnvelope.ReplyWith[ReadAllEventsForwardCompleted] (EventStore.Core.Messages.ReadAllEventsForwardCompleted message) [0x00000] in <filename

:0

at EventStore.Projections.Core.Messaging.PublishToWrapEnvelop+c__AnonStorey2`1[EventStore.Core.Messages.ClientMessage+ReadAllEventsForwardCompleted].<>m__0 () [0x000

00] in :0

at EventStore.Projections.Core.Messaging.UnwrapEnvelopeHandler.Handle (EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage message) [0x00000] in :0

at EventStore.Core.Bus.MessageHandler`1[EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage].TryHandle (EventStore.Core.Messaging.Message message) [0x00000] in <filenam

e unknown>:0

at EventStore.Core.Bus.InMemoryBus.Publish (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.InMemoryBus.Handle (EventStore.Core.Messaging.Message message) [0x00000] in :0

at EventStore.Core.Bus.QueuedHandlerAutoReset.ReadFromQueue (System.Object o) [0x00000] in :0

Mirko

If you run the ES with custom projections, could you make the same screenshot with failed projection?

Unfortunately I had to restart the integration pipeline and I have at the moment done a backup of the db folder so I can try it during the lunch break.

But I can paste the custom projections status when they were stopped. I don’t know if it helps but later today I can reload the backup and simulate it again. Let me know.

Thanks,

What I try to find out is what event caused the problem. The screenshot you sent i likely showing the last checkpoint position. If you start the projection it will advance. If you have exact position you can try reading the event from the db. Having details about the event will help me identify the problem. I suspect some things like: linkTo, linkTo to a deleted stream or expired event, tombstone event etc.

-yuriy

It is clear and I am really interested in understand what is happening. So give me an hour and I can load the previous DB files so we can go and get it.

So it could be related to a non scavenged event or link?

Mirko

Here I am. Is this the information you were looking for?

You say crashes here but it doesn’t looked like it “crashed” do you have something running and doing reads at this time? It’s only doing about 700/sec. Could the projection etc still be running or are there clients querying?

Hi Greg, I say crashed because if I refresh the UI I get a 503 and a “ps faxww | grep event” shows that the process is not running anymore.

Mirko

If running in a console it must say something about the crash? This is on mono I assume do you get something like a segfault if you run directly in the console? Also what version of mono?

Mono version is 3.0.12, I will try to run it from the console in few minutes.

Mirko

or it would be written in syslog but console is the easiest. my guess is you are getting a sigseg etc.

Gerg this is the Stack Trace from console:

[17300,11,14:00:44.708] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 103ms. Q: 0/0.

[17300,11,14:00:47.843] SLOW QUEUE MSG [MonitoringQueue]: GetFreshStats - 108ms. Q: 0/0.

[17300,14,14:01:42.413] Error while processing message EventStore.Projections.Core.Messaging.UnwrapEnvelopeMessage in queued handler ‘Projection Core #1’.

Object reference not set to an instance of an object

no object of size 451461952

Stacktrace:

at <0xffffffff>

at (wrapper managed-to-native) object.__icall_wrapper_mono_gc_alloc_vector (intptr,intptr,intptr) <0xffffffff>

at (wrapper alloc) object.AllocVector (intptr,intptr) <0xffffffff>

at System.IO.BinaryReader.ReadBytes (int) <0x0005f>

at EventStore.Core.TransactionLog.LogRecords.PrepareLogRecord…ctor (System.IO.BinaryReader,byte,long) <0x00266>

at EventStore.Core.TransactionLog.LogRecords.LogRecord.ReadFrom (System.IO.BinaryReader) <0x000b9>

at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk/TFChunkReadSide.TryReadForwardInternal (EventStore.Core.TransactionLog.Chunks.TFChunk.ReaderWorkItem,long,int&,EventStore.Core.TransactionLog.LogRecords.LogRecord&) <0x000e0>

at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk/TFChunkReadSideScavenged.TryReadClosestForward (long) <0x001e0>

at EventStore.Core.TransactionLog.Chunks.TFChunk.TFChunk.TryReadClosestForward (long) <0x00031>

at EventStore.Core.TransactionLog.Chunks.TFChunkReader.TryReadNextInternal (int) <0x001b7>

at EventStore.Core.TransactionLog.Chunks.TFChunkReader.TryReadNext () <0x00025>

at EventStore.Core.Services.Storage.ReaderIndex.ReadIndex.EventStore.Core.Services.Storage.ReaderIndex.IReadIndex.ReadAllEventsForward (EventStore.Core.Data.TFPos,int) <0x0029f>

at EventStore.Core.Services.Storage.StorageReaderWorker.ReadAllEventsForward (EventStore.Core.Messages.ClientMessage/ReadAllEventsForward) <0x00391>

at EventStore.Core.Services.Storage.StorageReaderWorker.EventStore.Core.Bus.IHandle<EventStore.Core.Messages.ClientMessage.ReadAllEventsForward>.Handle (EventStore.Core.Messages.ClientMessage/ReadAllEventsForward) <0x0002d>

at EventStore.Core.Bus.MessageHandler`1.TryHandle (EventStore.Core.Messaging.Message) <0x000a9>

at EventStore.Core.Bus.InMemoryBus.Publish (EventStore.Core.Messaging.Message) <0x0051b>

at EventStore.Core.Bus.InMemoryBus.Handle (EventStore.Core.Messaging.Message) <0x00019>

at EventStore.Core.Bus.QueuedHandlerThreadPool.ReadFromQueue (object) <0x00155>

at (wrapper runtime-invoke) .runtime_invoke_void__this___object (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

mono-sgen() [0x4a69cd]

/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f500e85ecb0]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f500e4c6425]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f500e4c9b8b]

mono-sgen() [0x627e4d]

mono-sgen() [0x627ee2]

mono-sgen() [0x5cf5a2]

mono-sgen() [0x5d35de]

mono-sgen() [0x5eb19d]

mono-sgen() [0x5ec427]

mono-sgen() [0x5c833f]

mono-sgen() [0x5ce1de]

mono-sgen() [0x5ce45e]

mono-sgen() [0x5e4868]

mono-sgen() [0x5e495c]

[0x41324322]

Debug info from gdb:

This looks like the mono gc problem

can you run gdb and provide a back trace? Here is instructions :slight_smile:

http://goodenoughsoftware.net/2014/03/01/debugging-segmentation-faults-in-mono/