Constantly restarting ES - Object reference not set to an instance of an object

I have an eventstore instance on a brand new VM that’s doing this constantly:

[PID:07608:021 2016.11.19 01:59:11.512 FATAL GLOBAL-LOGGER ] Global Unhandled Exception occurred.

System.NullReferenceException: Object reference not set to an instance of an object

at EventStore.Core.Bus.QueuedHandlerAutoReset.ReadFromQueue (System.Object o) [0x00000] in :0

at System.Threading.Thread.StartInternal () [0x00000] in :0

I have stopped the ES process, deleted the entire data directory, and restarted it, and I still get these errors constantly - like every 5 seconds the ES process restarts.

Here are the stats from startup:

[PID:07608:001 2016.11.19 01:59:10.779 INFO ProgramBase`1 ]

ES VERSION: 3.9.2.0 (HEAD/791b51f1a3c82e4e22d43c3574b964b9b789abee, Mon, 10 Oct 2016 10:22:54 +0200)

OS: Linux (Unix 4.4.0.45)

RUNTIME: 3.12.1 (es-mono-3.12.1/463d5dd) (64-bit)

GC: 2 GENERATIONS

LOGS: /var/log/eventstore

MODIFIED OPTIONS:

CONFIG: /etc/eventstore/eventstore-message-bus.conf (Command Line)

DB: /var/lib/event-store-message-bus (Config File)

CLUSTER SIZE: 1 (Config File)

CLUSTER DNS: eventstore-message-bus.service.consul (Config File)

EXT IP: 10.130.227.16 (Config File)

EXT TCP PORT: 1213 (Config File)

EXT HTTP PREFIXES: http://*:2213/ (Config File)

EXT HTTP PORT: 2213 (Config File)

INT IP: 10.130.227.16 (Config File)

INT TCP PORT: 1212 (Config File)

INT HTTP PREFIXES: http://*:2212/ (Config File)

INT TCP HEARTBEAT TIMEOUT: 2000 (Config File)

INT HTTP PORT: 2212 (Config File)

CLUSTER GOSSIP PORT: 2212 (Config File)

RUN PROJECTIONS: All (Config File)

ADD INTERFACE PREFIXES: false (Config File)

Something to note that may be odd about this - I am running three instances of ES on a small VM (1 core, 1GB ram). Memory usage is tiny (all three instances are basically empty), but CPU usage is quite high (though I suspect it is because instance ‘message-bus’ is constantly restarting.

Is this to be expected?

Hi Justin,
Apologies for not replying sooner. This is the symptom of a known regression with Mono on certain versions of the linux kernel as noted here.

The issue has been noted a couple of times on the Event Store github page here and here.

The solution is to either revert/upgrade to a known good version of the kernel.

Will make a note of it in the docs and the github README.

Thank you Pieter

I noticed that in the the latest couple 14.04 kernels this remains an issue

4.4.0-45
4.4.0-51

You said you were going to update something in docs/readme but I don’t see that anywhere. Is there a list of compatible kernel versions anywhere? Or some way we can try to diagnose whether a kernel will work other than start up an ES instance & begin making requests until it starts failing?

Hi Justin,
Here is a note about about the known good versions of the Kernel : https://github.com/EventStore/EventStore/pull/1116

There are a couple of test programs from the mono list (https://bugzilla.xamarin.com/show_bug.cgi?id=29299) that attempt to reproduce the issue consistently and none of them have been really reliable. I haven’t seen a case where a bad kernel has managed to survive a write flood for 10 clients and a 1 million requests (The write flood is from the Event Store Test Client). It generally crashes within a couple of seconds.