Problems starting cluster and crashes on SLES 12

Hi

I am sorry to have to ask this, but i can’t seem to get my cluster to work. (first time trying)

I have 3 nodes, on virtualbox machines running SLES 12.

./run-node.sh --log /tmp/eslog --db /tmp/db --cluster-dns escluster.mooo.com --run-projections All --cluster-size 3 --int-ip 192.168.33.11 --ext-ip 192.168.33.11 --cluster-gossip-port 3077

./run-node.sh --log /tmp/eslog --db /tmp/db --cluster-dns escluster.mooo.com --run-projections All --cluster-size 3 --int-ip 192.168.33.13 --ext-ip 192.168.33.13 --cluster-gossip-port 3077

./run-node.sh --log /tmp/eslog --db /tmp/db --cluster-dns escluster.mooo.com --run-projections All --cluster-size 3 --int-ip 192.168.33.12 --ext-ip 192.168.33.12 --cluster-gossip-port 3077

They can’t seem to find eachother, telnet to between the noeds on the ports seem to work. Not 3077 though.

I also get lost of these error after a few mins of running:
[ERROR] FATAL UNHANDLED EXCEPTION: System.NullReferenceException: Object reference not set to an instance of an object

at System.Threading.Timer+Scheduler.SchedulerThread () [0x00000] in :0

at System.Threading.Thread.StartInternal () [0x00000] in :0

And

Stacktrace:

Native stacktrace:

./clusternode() [0x612962]

./clusternode() [0x5beb0b]

./clusternode() [0x4584f3]

/lib64/libpthread.so.0(+0xf890) [0x7fae3cbcf890]

Debug info from gdb:

warning: /etc/gdbinit.d/gdb-heap.py: No such file or directory

[New LWP 21830]

[New LWP 21826]

[New LWP 21825]

[New LWP 21824]

[New LWP 21823]

[New LWP 21822]

[New LWP 21821]

[New LWP 21820]

[New LWP 21819]

[New LWP 21818]

[New LWP 21817]

[New LWP 21816]

[New LWP 21815]

[New LWP 21813]

[New LWP 21812]

[New LWP 21811]

[New LWP 21809]

[Thread debugging using libthread_db enabled]

Using host libthread_db library “/lib64/libthread_db.so.1”.

0x00007fae3cbcc05f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

Id Target Id Frame

18 Thread 0x7fae3c733700 (LWP 21809) “Finalizer” 0x00007fae3cbce010 in sem_wait () from /lib64/libpthread.so.0

17 Thread 0x7fae3adff700 (LWP 21811) “clusternode” 0x00007fae3c8f0d2d in read () from /lib64/libc.so.6

16 Thread 0x7fae3abfe700 (LWP 21812) “Timer-Scheduler” 0x00007fae3cbcf489 in waitpid () from /lib64/libpthread.so.0

15 Thread 0x7fae3afff700 (LWP 21813) “Threadpool moni” 0x00007fae3c909de4 in clock_nanosleep () from /lib64/libc.so.6

14 Thread 0x7fae3a5ff700 (LWP 21815) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

13 Thread 0x7fae3a3fe700 (LWP 21816) “clusternode” 0x00007fae3c909de4 in clock_nanosleep () from /lib64/libc.so.6

12 Thread 0x7fae3a1fd700 (LWP 21817) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

11 Thread 0x7fae39ffc700 (LWP 21818) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

10 Thread 0x7fae39dfb700 (LWP 21819) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

9 Thread 0x7fae39bfa700 (LWP 21820) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

8 Thread 0x7fae399f9700 (LWP 21821) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

7 Thread 0x7fae397f8700 (LWP 21822) “clusternode” 0x00007fae3cbcc408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

6 Thread 0x7fae3a77b700 (LWP 21823) “clusternode” 0x00007fae3c8fd663 in epoll_wait () from /lib64/libc.so.6

5 Thread 0x7fae394ef700 (LWP 21824) “IO Threadpool w” 0x00007fae3cbce0f0 in sem_timedwait () from /lib64/libpthread.so.0

4 Thread 0x7fae394ae700 (LWP 21825) “clusternode” 0x00007fae3c909de4 in clock_nanosleep () from /lib64/libc.so.6

3 Thread 0x7fae392ad700 (LWP 21826) “Threadpool work” 0x00007fae3cbce0f0 in sem_timedwait () from /lib64/libpthread.so.0

2 Thread 0x7fae38aa9700 (LWP 21830) “Threadpool work” 0x00007fae3cbce0f0 in sem_timedwait () from /lib64/libpthread.so.0

  • 1 Thread 0x7fae3d6f9780 (LWP 21808) “clusternode” 0x00007fae3cbcc05f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

Thread 18 (Thread 0x7fae3c733700 (LWP 21809)):

#0 0x00007fae3cbce010 in sem_wait () from /lib64/libpthread.so.0

#1 0x0000000000564fa7 in mono_sem_wait (sem=sem@entry=0x1d193c0 <finalizer_sem>, alertable=alertable@entry=1) at mono-semaphore.c:101

#2 0x0000000000494a95 in finalizer_thread (unused=) at gc.c:1077

#3 0x0000000000532577 in start_wrapper_internal (data=) at threads.c:663

#4 start_wrapper (data=) at threads.c:710

#5 0x0000000000566a1e in inner_start_thread (arg=0x7ffc2fa107f0) at mono-threads-posix.c:88

#6 0x00007fae3cbc80a4 in start_thread () from /lib64/libpthread.so.0

#7 0x00007fae3c8fd08d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7fae3adff700 (LWP 21811)):

#0 0x00007fae3c8f0d2d in read () from /lib64/libc.so.6

#1 0x0000000040e4dcbd in ?? ()

#2 0x00007fae2c002650 in ?? ()

#3 0x00007fae3adfee40 in ?? ()

#4 0x00007fae3c343f18 in ?? ()

#5 0x00007fae3c342d00 in ?? ()

#6 0x00007fae3c342af8 in ?? ()

#7 0x00007fae2c0025f0 in ?? ()

#8 0x0000000040e4dc30 in ?? ()

#9 0x00007fae3adfebc0 in ?? ()

#10 0x00007fae3adfeb00 in ?? ()

#11 0x0000000040e4dc30 in ?? ()

#12 0x00007fae3c343f38 in ?? ()

#13 0x0000000040e4db20 in ?? ()

#14 0x0000000000000000 in ?? ()

Any ideas on what is wrong?

The nodes gossip on the internal http port. By default the internal http port is 2112.
In your case, the last argument needs to be set to --cluster-gossip-port 2112

This information is available via the documentation.

http://docs.geteventstore.com/server/3.0.5/cluster-without-manager-nodes/

The other error is on the list iirc a glibc mismatch for binaries.

Thanks, i don’t know where i got gossip port 3077 from :slight_smile:

What version of glibc do i need to have to run the tar.gz distribution? It fails on SLES 12 and on docker https://registry.hub.docker.com/u/wkruse/eventstore/dockerfile/ on SLES 12.

Or do i need to compile it myself?

My guess is compiling on the box I’ll make the issue go away

I am now trying to compile and package evenstore on SLES 12.

Which wasn’t that easy.

When i get to package part i get this error from mkbundle

cc -o clusternode -Wall -D_REENTRANT -I/usr/lib64/pkgconfig/…/…/include/mono-2.0 clusternode.c -L/usr/lib64/pkgconfig/…/…/lib64 -Wl,-Bstatic -lmonosgen-2.0 -Wl,-Bdynamic -lmonosgen-2.0 -lm -lrt -ldl -lpthread clusternode.a

/usr/lib64/gcc/x86_64-suse-linux/4.8/…/…/…/…/x86_64-suse-linux/bin/ld: cannot find -lmonosgen-2.0

collect2: error: ld returned 1 exit status

Thats because there is no libmonosgen2.0 static library installed.

Ok so then i try to run the exe file and get:

[12223,01,11:26:35.213] Exiting with exit code: 4.

Exit reason: Appears that we are running in linux with a version 2 build of mono. This is generally not a good idea.We recommend running with 3.0 or higher (3.2 especially). If you really want to run with this version of mono use --force to override this error.

mono --version gives: Mono JIT compiler version 4.0.2 (Stable 4.0.2.5/c99aa0c Wed Jun 24 05:31:11 EDT 2015)

I am unfortunetly new to using mono :frowning:

Ah thats been updated. You are running version 4 of mono ... it was
checking for == 3. Just run with --force as it says it should be ok

I wouldn't worry about static linking for now (not needed to test).

ah ok :slight_smile:

its been updated to check >=3 :slight_smile: There are quite a few old checks like
these from the time mono was transitioning between bohm gc and sgen
(bohm was a really really bad idea to run with!)

ah

mono EventStore.ClusterNode.exe --force --log /tmp/eslog --db /tmp/db --cluster-dns escluster.mooo.com --run-projections All --cluster-size 3 --int-ip 192.168.33.11 --ext-ip 192.168.33.11 --cluster-gossip-port 2112

Seems to still crash

[28344,10,11:44:13.063] ELECTIONS: (V=79) VIEWCHANGE FROM [192.168.33.13:2112, {0885d0a1-5788-481b-ad94-c4807203188c}].

Stacktrace:

Native stacktrace:

mono() [0x4b90c2]

mono() [0x5101be]

mono() [0x428e19]

/lib64/libpthread.so.0(+0xf890) [0x7fb320177890]

Debug info from gdb:

warning: /etc/gdbinit.d/gdb-heap.py: No such file or directory

warning: File “/usr/bin/mono-sgen-gdb.py” auto-loading has been declined by your `auto-load safe-path’ set to “$debugdir:$datadir/auto-load”.

To enable execution of this file add

add-auto-load-safe-path /usr/bin/mono-sgen-gdb.py

line to your configuration file “/home/vagrant/.gdbinit”.

To completely disable this security protection add

set auto-load safe-path /

line to your configuration file “/home/vagrant/.gdbinit”.

For more information about this security protection see the

“Auto-loading safe path” section in the GDB manual. E.g., run from the shell:

info “(gdb)Auto-loading safe path”

[New LWP 28366]

[New LWP 28365]

[New LWP 28361]

[New LWP 28360]

[New LWP 28359]

[New LWP 28358]

[New LWP 28357]

[New LWP 28356]

[New LWP 28355]

[New LWP 28354]

[New LWP 28353]

[New LWP 28352]

[New LWP 28351]

[New LWP 28349]

[New LWP 28348]

[New LWP 28347]

[New LWP 28345]

[Thread debugging using libthread_db enabled]

Using host libthread_db library “/lib64/libthread_db.so.1”.

0x00007fb32017405f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

Id Target Id Frame

18 Thread 0x7fb31fa93700 (LWP 28345) “Finalizer” 0x00007fb320176010 in sem_wait () from /lib64/libpthread.so.0

17 Thread 0x7fb31d043700 (LWP 28347) “mono” 0x00007fb31fc81d2d in read () from /lib64/libc.so.6

16 Thread 0x7fb31ce42700 (LWP 28348) “Timer-Scheduler” 0x00007fb320177489 in waitpid () from /lib64/libpthread.so.0

15 Thread 0x7fb31cb05700 (LWP 28349) “Threadpool moni” 0x00007fb31fc9ade4 in clock_nanosleep () from /lib64/libc.so.6

14 Thread 0x7fb2f7ffe700 (LWP 28351) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

13 Thread 0x7fb2f7bf1700 (LWP 28352) “mono” 0x00007fb31fc9ade4 in clock_nanosleep () from /lib64/libc.so.6

12 Thread 0x7fb2f79f0700 (LWP 28353) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

11 Thread 0x7fb2f76e7700 (LWP 28354) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

10 Thread 0x7fb2f74e6700 (LWP 28355) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

9 Thread 0x7fb2f72e5700 (LWP 28356) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

8 Thread 0x7fb2f70e4700 (LWP 28357) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

7 Thread 0x7fb2f6ee3700 (LWP 28358) “mono” 0x00007fb320174408 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

6 Thread 0x7fb2f6ce2700 (LWP 28359) “mono” 0x00007fb31fc8e663 in epoll_wait () from /lib64/libc.so.6

5 Thread 0x7fb2f6c1f700 (LWP 28360) “IO Threadpool w” 0x00007fb3201760f0 in sem_timedwait () from /lib64/libpthread.so.0

4 Thread 0x7fb2f6ba9700 (LWP 28361) “mono” 0x00007fb31fc9ade4 in clock_nanosleep () from /lib64/libc.so.6

3 Thread 0x7fb2f6367700 (LWP 28365) “Threadpool work” 0x00007fb3201760f0 in sem_timedwait () from /lib64/libpthread.so.0

2 Thread 0x7fb2f6771700 (LWP 28366) “Threadpool work” 0x00007fb3201760f0 in sem_timedwait () from /lib64/libpthread.so.0

  • 1 Thread 0x7fb320ca1780 (LWP 28344) “mono” 0x00007fb32017405f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

Thread 18 (Thread 0x7fb31fa93700 (LWP 28345)):

#0 0x00007fb320176010 in sem_wait () from /lib64/libpthread.so.0

#1 0x000000000062a2f7 in mono_sem_wait ()

#2 0x00000000005ac4de in finalizer_thread ()

#3 0x0000000000590614 in start_wrapper ()

#4 0x000000000062f196 in inner_start_thread ()

#5 0x00007fb3201700a4 in start_thread () from /lib64/libpthread.so.0

#6 0x00007fb31fc8e08d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7fb31d043700 (LWP 28347)):

#0 0x00007fb31fc81d2d in read () from /lib64/libc.so.6

#1 0x0000000041c349e5 in ?? ()

#2 0x00007fb310001450 in ?? ()

#3 0x00007fb31d042e30 in ?? ()

#4 0x00007fb31f747fa0 in ?? ()

#5 0x00007fb31f747070 in ?? ()

#6 0x00007fb31f746e28 in ?? ()

#7 0x00007fb310000bd0 in ?? ()

#8 0x0000000041c34960 in ?? ()

#9 0x00007fb31d042aa0 in ?? ()

#10 0x00007fb31d0429e0 in ?? ()

#11 0x0000000041c34960 in ?? ()

#12 0x00007fb31f747fc0 in ?? ()

#13 0x0000000041c34858 in ?? ()

#14 0x0000000000000000 in ?? ()

Thread 16 (Thread 0x7fb31ce42700 (LWP 28348)):

#0 0x00007fb320177489 in waitpid () from /lib64/libpthread.so.0

#1 0x00000000004b914f in mono_handle_native_sigsegv ()

#2 0x00000000005101be in mono_arch_handle_altstack_exception ()

#3 0x0000000000428e19 in mono_sigsegv_signal_handler ()

#4

#5 0x0000000000000000 in ?? ()

…/…/gdb/dwarf2-frame.c:692: internal-error: Unknown CFI encountered.

A problem internal to GDB has been detected,

further debugging may prove unreliable.

Quit this debugging session? (y or n) [answered Y; input not from terminal]

…/…/gdb/dwarf2-frame.c:692: internal-error: Unknown CFI encountered.

A problem internal to GDB has been detected,

further debugging may prove unreliable.

Create a core file of GDB? (y or n) [answered Y; input not from terminal]

Try running like this and send up a full back trace?

http://goodenoughsoftware.net/2014/03/01/debugging-segmentation-faults-in-mono/

This way we will have symbols etc as opposed to just memory positions :slight_smile:

Hum… not sure this helps that much:

VND {20c47a2c-a849-4acf-be8d-a8b0bf054297} [Unknown, 192.168.33.12:1112, n/a, 192.168.33.12:1113, n/a, 192.168.33.12:2112, 192.168.33.12:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2015-07-03 12:09:18.476

New:

MAN {00000000-0000-0000-0000-000000000000} [Manager, 192.168.33.13:2112, 192.168.33.13:2112] | 2015-07-03 12:09:18.496

VND {20c47a2c-a849-4acf-be8d-a8b0bf054297} [Unknown, 192.168.33.12:1112, n/a, 192.168.33.12:1113, n/a, 192.168.33.12:2112, 192.168.33.12:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2015-07-03 12:09:18.476

MAN {00000000-0000-0000-0000-000000000000} [Manager, 192.168.33.11:2112, 192.168.33.11:2112] | 2015-07-03 12:09:18.496

And you are at a gdb prompt yes? Type backtrace

(gdb) backtrace

#0 0x000000004019ee75 in ?? ()

#1 0x0000000000000000 in ?? ()

Its random when it happends, sometimes it runs for a few mins, and sometimes just after starting.

This one was at boot:

[20140,10,12:09:18.523] ========== [192.168.33.12:2112] IS UNKNOWN!!! WHOA!!!

[20140,10,12:09:18.575] ELECTIONS: STARTING ELECTIONS.

[20140,10,12:09:18.576] ELECTIONS: (V=0) SHIFT TO LEADER ELECTION.

[20140,10,12:09:18.578] ELECTIONS: (V=0) VIEWCHANGE FROM [192.168.33.12:2112, {20c47a2c-a849-4acf-be8d-a8b0bf054297}].

[20140,10,12:09:18.581] SLOW BUS MSG [MainBus]: StartElections - 51ms. Handler: ElectionsService.

[20140,10,12:09:18.581] SLOW QUEUE MSG [MainQueue]: StartElections - 51ms. Q: 0/2.

Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 0x7fffceea1700 (LWP 20151)]

0x000000004019ee75 in ?? ()

(gdb) backtrace

#0 0x000000004019ee75 in ?? ()

#1 0x0000000000000000 in ?? ()