eventstore cluster does not work under mono

Hi!

I’m trying to run eventstore in cluster mode under mono.

My environment is:

  1. CentOS6 x86_64 (up to date);

  2. mono-3.2.5;

  3. eventstore build from git master branch (5433fdc82895c181e0325a5730d36424de8eb28f);

  4. all cluster nodes (3 instances) running on the same machine.

There are some additional facts:

  1. nodes run in this order: run instance1, then wait 10 seconds and run instance2, then wait 10 seconds again and run instance3;

  2. database is empty (all directories clean before start);

  3. each instance running by follow cmdlines:

/opt/mono/bin/mono-sgen /opt/eventstore/current/EventStore.ClusterNode.exe --enable-trusted-auth --stats-period-sec 300 --db /srv/eventsore/db-cluster1 --log /srv/eventsore/db-cluster1-logs --run-projections=ALL --int-ip 10.80.10.11 --ext-ip 10.80.10.11 --int-tcp-port=11001 --ext-tcp-port=11002 --int-http-port=11003 --ext-http-port=11004 --nodes-count 3 --use-dns-discovery- --gossip-seed 10.80.10.11:12003 --gossip-seed 10.80.10.11:13003

/opt/mono/bin/mono-sgen /opt/eventstore/current/EventStore.ClusterNode.exe --enable-trusted-auth --stats-period-sec 300 --db /srv/eventsore/db-cluster2 --log /srv/eventsore/db-cluster2-logs --run-projections=ALL --int-ip 10.80.10.11 --ext-ip 10.80.10.11 --int-tcp-port=12001 --ext-tcp-port=12002 --int-http-port=12003 --ext-http-port=12004 --nodes-count 3 --use-dns-discovery- --gossip-seed 10.80.10.11:11003 --gossip-seed 10.80.10.11:13003

/opt/mono/bin/mono-sgen /opt/eventstore/current/EventStore.ClusterNode.exe --enable-trusted-auth --stats-period-sec 300 --db /srv/eventsore/db-cluster3 --log /srv/eventsore/db-cluster3-logs --run-projections=ALL --int-ip 10.80.10.11 --ext-ip 10.80.10.11 --int-tcp-port=13001 --ext-tcp-port=13002 --int-http-port=13003 --ext-http-port=13004 --nodes-count 3 --use-dns-discovery- --gossip-seed 10.80.10.11:11003 --gossip-seed 10.80.10.11:12003

  1. when run 1 or 2 any instances all looks like OK.

  2. when run 3rd instance cluster falls to election loop (every few seconds cluster trying to elect new master).

  3. no any failed processes occured while this experiment (no exits, no sigfaults, no core dumps).

  4. no any writes from occurs in cluster from outside during all experiment time.

Logs for each instance follow.

Instance1:

cat /srv/eventsore/db-cluster1-logs/2014-01-14/10.80.10.11-11004-cluster-node-err.log

[PID:21080:019 2014.01.14 11:00:16.106 ERROR GossipController ] Received as POST invalid ClusterInfo from [http://10.80.10.11:11003/gossip]. Content-Type: application/json, Body:

.

[PID:21080:011 2014.01.14 11:00:21.884 FATAL ProjectionManager ] Cannot initialize projections subsystem. Cannot write a fake projection

cat /srv/eventsore/db-cluster1-logs/2014-01-14/10.80.10.11-11004-cluster-node.log

[PID:21080:001 2014.01.14 11:00:02.697 INFO ProgramBase`1 ]

ES VERSION: no-werror (master/5433fdc82895c181e0325a5730d36424de8eb28f, Fri, 10 Jan 2014 13:48:40 +0000)

OS: Unknown (Unix 2.6.32.431)

RUNTIME: 3.2.5 (tarball Thu Dec 12 12:56:24 MSK 2013) (64-bit)

GC: 2 GENERATIONS

LOGS: /srv/eventsore/db-cluster1-logs

SHOW HELP: False ()

SHOW VERSION: False ()

LOGS DIR: /srv/eventsore/db-cluster1-logs (–logsdir from command line)

CONFIGS: ()

DEFINES: ()

INTERNAL IP: 10.80.10.11 (–internal-ip from command line)

EXTERNAL IP: 10.80.10.11 (–external-ip from command line)

INTERNAL HTTP PORT: 11003 (–internal-http-port from command line)

EXTERNAL HTTP PORT: 11004 (–external-http-port from command line)

INTERNAL TCP PORT: 11001 (–internal-tcp-port from command line)

INTERNAL SECURE TCP PORT: 0 ()

EXTERNAL TCP PORT: 11002 (–external-tcp-port from command line)

EXTERNAL SECURE TCP PORT: 0 ()

FORCE: False ()

CLUSTER SIZE: 3 (–cluster-size from command line)

MIN FLUSH DELAY MS: 2 ()

NODE PRIORITY: 0 ()

COMMIT COUNT: 2 ()

PREPARE COUNT: 2 ()

DISCOVER VIA DNS: False (–use-dns-discovery from command line)

CLUSTER DNS: fake.dns ()

CLUSTER GOSSIP PORT: 30777 ()

GOSSIP SEEDS: 10.80.10.11:12003, 10.80.10.11:13003 (–gossip-seed from command line)

STATS PERIOD SEC: 300 (–stats-period-sec from command line)

CACHED CHUNKS: -1 ()

CHUNKS CACHE SIZE: 536871424 ()

DB PATH: /srv/eventsore/db-cluster1 (–db from command line)

IN MEM DB: False ()

SKIP DB VERIFY: False ()

RUN PROJECTIONS: All (–run-projections from command line)

PROJECTION THREADS: 3 ()

WORKER THREADS: 5 ()

HTTP PREFIXES: ()

ENABLE TRUSTED AUTH: True (–enable-trusted-auth from command line)

CERTIFICATE STORE: ()

CERTIFICATE NAME: ()

CERTIFICATE FILE: ()

CERTIFICATE PASSWORD: ()

USE INTERNAL SSL: False ()

SSL TARGET HOST: n/a ()

SSL VALIDATE SERVER: True ()

AUTHENTICATION TYPE: internal ()

AUTHENTICATION CONFIG FILE: ()

PREPARE TIMEOUT MS: 2000 ()

COMMIT TIMEOUT MS: 2000 ()

DISABLE SCAVENGE MERGING: False ()

[PID:21080:001 2014.01.14 11:00:02.782 INFO ProgramBase`1 ] Quorum size set to 2

[PID:21080:001 2014.01.14 11:00:02.798 INFO ProgramBase`1 ] Can’t find plugins path: /opt/eventstore/eventstore-mono-20140114.1110/plugins

[PID:21080:001 2014.01.14 11:00:02.874 INFO ProgramBase`1 ]

INSTANCE ID: 46577878-07ac-4f88-9a42-8889a62e649e

DATABASE: /srv/eventsore/db-cluster1

WRITER CHECKPOINT: 0 (0x0)

CHASER CHECKPOINT: 0 (0x0)

EPOCH CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

TRUNCATE CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

[PID:21080:001 2014.01.14 11:00:03.165 TRACE MessageHierarchy ] MessageHierarchy initialization took 00:00:00.2243466.

[PID:21080:001 2014.01.14 11:00:03.325 TRACE TFChunk ] CACHED TFChunk #0-0 (chunk-000000.000000) in 00:00:00.0040912.

[PID:21080:001 2014.01.14 11:00:03.563 INFO MiniWeb ] Starting MiniWeb for /web/es/js/projections ==> /opt/eventstore/current/singlenode-web/js/projections

[PID:21080:001 2014.01.14 11:00:03.565 INFO MiniWeb ] Starting MiniWeb for /web/es/js/projections/v8/Prelude ==> /opt/eventstore/current/Prelude

[PID:21080:001 2014.01.14 11:00:03.565 INFO MiniWeb ] Starting MiniWeb for /web/es/js/projections/resources ==> /opt/eventstore/current/web-resources/js

[PID:21080:001 2014.01.14 11:00:03.567 TRACE MiniWeb ] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.567 TRACE MiniWeb ] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.568 TRACE MiniWeb ] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.569 TRACE MiniWeb ] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.569 TRACE MiniWeb ] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.569 TRACE MiniWeb ] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.575 INFO MiniWeb ] Starting MiniWeb for /web ==> /opt/eventstore/current/clusternode-web

[PID:21080:001 2014.01.14 11:00:03.575 INFO MiniWeb ] Starting MiniWeb for /web/es ==> /opt/eventstore/current/es-common-web

[PID:21080:001 2014.01.14 11:00:03.575 TRACE MiniWeb ] Binding MiniWeb to /web/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.575 TRACE MiniWeb ] Binding MiniWeb to /web/es/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.576 INFO MiniWeb ] Starting MiniWeb for /web ==> /opt/eventstore/current/clusternode-web

[PID:21080:001 2014.01.14 11:00:03.576 INFO MiniWeb ] Starting MiniWeb for /web/es ==> /opt/eventstore/current/es-common-web

[PID:21080:001 2014.01.14 11:00:03.576 TRACE MiniWeb ] Binding MiniWeb to /web/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.576 TRACE MiniWeb ] Binding MiniWeb to /web/es/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.577 INFO MiniWeb ] Starting MiniWeb for /web/users ==> /opt/eventstore/current/Users/web

[PID:21080:001 2014.01.14 11:00:03.577 TRACE MiniWeb ] Binding MiniWeb to /web/users/{*remaining_path}

[PID:21080:001 2014.01.14 11:00:03.577 INFO MiniWeb ] Starting MiniWeb for /web/users ==> /opt/eventstore/current/Users/web

[PID:21080:001 2014.01.14 11:00:03.577 TRACE MiniWeb ] Binding MiniWeb to /web/users/{*remaining_path}

[PID:21080:010 2014.01.14 11:00:03.623 INFO ClusterVNodeControll] ========== [10.80.10.11:11003] SYSTEM INIT…

[PID:21080:010 2014.01.14 11:00:03.664 INFO TcpServerListener ] Starting Normal TCP listening on TCP endpoint: 10.80.10.11:11002.

[PID:21080:015 2014.01.14 11:00:03.703 INFO IndexCommitter ] TableIndex initialization…

[PID:21080:015 2014.01.14 11:00:03.728 INFO IndexCommitter ] ReadIndex building…

[PID:21080:015 2014.01.14 11:00:03.732 DEBUG IndexCommitter ] ReadIndex rebuilding done: total processed 0 records, time elapsed: 00:00:00.0034520.

[PID:21080:010 2014.01.14 11:00:03.749 TRACE InMemoryBus ] SLOW BUS MSG [MainBus]: SystemInit - 53ms. Handler: TcpService.

[PID:21080:010 2014.01.14 11:00:03.749 INFO TcpServerListener ] Starting Normal TCP listening on TCP endpoint: 10.80.10.11:11001.

[PID:21080:010 2014.01.14 11:00:03.783 INFO HttpAsyncServer ] Starting HTTP server on [http://10.80.10.11:11004/]…

[PID:21080:010 2014.01.14 11:00:03.789 INFO HttpAsyncServer ] HTTP server is up and listening on [http://10.80.10.11:11004/]

[PID:21080:010 2014.01.14 11:00:03.789 INFO HttpAsyncServer ] Starting HTTP server on [http://10.80.10.11:11003/]…

[PID:21080:010 2014.01.14 11:00:03.790 INFO HttpAsyncServer ] HTTP server is up and listening on [http://10.80.10.11:11003/]

[PID:21080:010 2014.01.14 11:00:03.830 TRACE QueuedHandlerAutoRes] SLOW QUEUE MSG [MainQueue]: SystemInit - 214ms. Q: 0/7.

[PID:21080:010 2014.01.14 11:00:03.832 INFO ClusterVNodeControll] ========== [10.80.10.11:11003] Service ‘StorageReader’ initialized.

[PID:21080:010 2014.01.14 11:00:03.832 INFO ClusterVNodeControll] ========== [10.80.10.11:11003] Service ‘StorageWriter’ initialized.

[PID:21080:010 2014.01.14 11:00:03.832 INFO ClusterVNodeControll] ========== [10.80.10.11:11003] Service ‘StorageChaser’ initialized.

[PID:21080:010 2014.01.14 11:00:03.895 TRACE GossipServiceBase ] CLUSTER HAS CHANGED

Old:

VND {46577878-07ac-4f88-9a42-8889a62e649e} [Unknown, 10.80.10.11:11001, n/a, 10.80.10.11:11002, n/a, 10.80.10.11:11003, 10.80.10.11:11004] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-01-14 11:00:03.791

New:

MAN {00000000-0000-0000-0000-000000000000} [Manager, 10.80.10.11:13003, 10.80.10.11:13003] | 2014-01-14 11:00:03.836

MAN {00000000-0000-0000-0000-000000000000} [Manager, 10.80.10.11:12003, 10.80.10.11:12003] | 2014-01-14 11:00:03.836

VND {46577878-07ac-4f88-9a42-8889a62e649e} [Unknown, 10.80.10.11:11001, n/a, 10.80.10.11:11002, n/a, 10.80.10.11:11003, 10.80.10.11:11004] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-01-14 11:00:03.791

I can’t spot anything obvious wrong with the commands, will give it a try during lunch today locally and see if I can reproduce.

Cheers,

James

Dear James,

what about reproducing our bug?

Do you need some additional details?

Thank you.

Hi,

Sorry, not had a chance to look at it yet.

Cheers,

James

Hi,

I can reproduce it, investigating at the moment.

Cheers,

James

James did you repro in centos or Ubuntu?

Does it still happen without separating internal/external on command line?

Hi,

This seems to be machine-dependent. I have some machines which exhibit the same behaviour as you’re seeing, and some that don’t. I haven’t managed to narrow down the specific config options causing issues though.

It doesn’t appear to be a problem when running a cluster across multiple machines, even including ones which fail running all three on one box.

Will continue to investigate, but can you think of anything (maybe firewalls?) that might be having an impact?

Cheers,

James

Hi!

No any firewall rules applied on test machine. I have specially checked that all configured tcp-ports on internal and external (its the same) IP address available and telnet successfully connects to them.

I know that my colleague checked cluster in the same manner (3 instances on 1 node) on windows8 and all works fine.

Did you reproduce problem on windows too?

Do you have working fine cluster on one node running under Linux? If yes - which name of this linux distribution?

Hi!

Did you investigate the problem? Do you know how to solve the problem?

Now I’m running production and testing environment in single mode configuration and I want to run eventstore in cluster mode, but for testing environment no reason to run 3 instances on different machines and I want run 3 instances cluster on one machine.

If this configuration (multiple instances in single cluster on one machine) not supportable (but documented here https://github.com/EventStore/EventStore/wiki/Setting-Up-OSS-Cluster) please say about it.

Thank you.

Hi,

I can reproduce the problem on one box only, but currently don’t have time to find the config difference. As in your case a cluster is formed but then the heartbeats time out. I suspect this may be a mono-related TCP regression.

Running three nodes on one box buys you very little and is not recommended for production - it works in the majority of cases however.

Right now this issue is in the queue for investigation but has a fairly low priority associated with it (with no ETA). If you want to escalate it please get in touch with sales about commercial support.

Thanks,

James

Hi James,

I’m unfortunately having the same issue, although for me it’s enough to try to join up two nodes and they pretty much loose each other on the second heartbeat:

Jul 24 12:46:53 core2 bash[17853]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 12:46:51.567

Jul 24 12:46:53 core2 bash[17853]: VND {f5c9ba85-acdf-43e9-b684-f6286fe20073} [Unknown, 127.0.0.1:1112, n/a, 127.0.0.1:1113, n/a, 127.0.0.1:2112, 127.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 12:46:52.222

Jul 24 12:46:53 core2 bash[17853]: New:

Jul 24 12:46:53 core2 bash[17853]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 12:46:52.237

Jul 24 12:46:53 core2 bash[17853]: VND {f5c9ba85-acdf-43e9-b684-f6286fe20073} [Unknown, 127.0.0.1:1112, n/a, 127.0.0.1:1113, n/a, 127.0.0.1:2112, 127.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 12:46:52.222

I’ve got ES set up on different machines (CoreOS) in Docker containers that run Debian Wheezy and Mono 3.2.8.

This is how I initialize ES:

mono-sgen EventStore.ClusterNode.exe \

–log=/var/log/eventstore \

–ip=0.0.0.0 \

–tcp-port=1113 \

–http-port=2113 \

–cluster-size=3 \

–use-dns-discovery- \

–db=/data/eventstore \

–http-prefix=http://*:2113/ \

–gossip-seed=172.17.8.101:2113 --gossip-seed=172.17.8.102:2113 --gossip-seed=172.17.8.103:2113

I’m happy to share the Dockerfile, systemd services (fleet) and Vagrantfile to easily reproduce this environment.

Thanks,

Mattias

Can you provide more of the log?

Also what version are you running … there are some slight config changes between versions

I’m running version 3.0.0 RC2.

Here’s more of the log:

Jul 24 14:16:55 core2 bash[19959]: ES VERSION: 3.0.0.0 (master/30f7fa64d73ba65028e5a6ca720639985cac1458, Tue, 18 Mar 2014 20:03:41 +0000)

Jul 24 14:16:55 core2 bash[19959]: OS: Unknown (Unix 3.15.2.0)

Jul 24 14:16:55 core2 bash[19959]: RUNTIME: 3.2.8 (Debian 3.2.8+dfsg-7) (64-bit)

Jul 24 14:16:55 core2 bash[19959]: GC: 2 GENERATIONS

Jul 24 14:16:55 core2 bash[19959]: LOGS: /var/log/eventstore

Jul 24 14:16:55 core2 bash[19959]: SHOW HELP: False ()

Jul 24 14:16:55 core2 bash[19959]: SHOW VERSION: False ()

Jul 24 14:16:55 core2 bash[19959]: LOGS DIR: /var/log/eventstore (–logsdir from command line)

Jul 24 14:16:55 core2 bash[19959]: CONFIGS: ()

Jul 24 14:16:55 core2 bash[19959]: DEFINES: ()

Jul 24 14:16:55 core2 bash[19959]: INTERNAL IP: 127.0.0.1 ()

Jul 24 14:16:55 core2 bash[19959]: EXTERNAL IP: 127.0.0.1 ()

Jul 24 14:16:55 core2 bash[19959]: INTERNAL HTTP PORT: 2112 ()

Jul 24 14:16:55 core2 bash[19959]: EXTERNAL HTTP PORT: 2113 ()

Jul 24 14:16:55 core2 bash[19959]: INTERNAL TCP PORT: 1112 ()

Jul 24 14:16:55 core2 bash[19959]: INTERNAL SECURE TCP PORT: 0 ()

Jul 24 14:16:55 core2 bash[19959]: EXTERNAL TCP PORT: 1113 ()

Jul 24 14:16:55 core2 bash[19959]: EXTERNAL SECURE TCP PORT: 0 ()

Jul 24 14:16:55 core2 bash[19959]: FORCE: False ()

Jul 24 14:16:55 core2 bash[19959]: CLUSTER SIZE: 3 (–cluster-size from command line)

Jul 24 14:16:55 core2 bash[19959]: MIN FLUSH DELAY MS: 2 ()

Jul 24 14:16:55 core2 bash[19959]: NODE PRIORITY: 0 ()

Jul 24 14:16:55 core2 bash[19959]: COMMIT COUNT: 2 ()

Jul 24 14:16:55 core2 bash[19959]: PREPARE COUNT: 2 ()

Jul 24 14:16:55 core2 bash[19959]: MAX MEM TABLE SIZE: 1000000 ()

Jul 24 14:16:55 core2 bash[19959]: DISCOVER VIA DNS: False (–use-dns-discovery from command line)

Jul 24 14:16:55 core2 bash[19959]: CLUSTER DNS: fake.dns ()

Jul 24 14:16:55 core2 bash[19959]: CLUSTER GOSSIP PORT: 30777 ()

Jul 24 14:16:55 core2 bash[19959]: GOSSIP SEEDS: 172.17.8.103:2113 (–gossip-seed from command line)

Jul 24 14:16:55 core2 bash[19959]: STATS PERIOD SEC: 30 ()

Jul 24 14:16:55 core2 bash[19959]: CACHED CHUNKS: -1 ()

Jul 24 14:16:55 core2 bash[19959]: CHUNKS CACHE SIZE: 536871424 ()

Jul 24 14:16:55 core2 bash[19959]: DB PATH: /data/eventstore (–db from command line)

Jul 24 14:16:55 core2 bash[19959]: IN MEM DB: False ()

Jul 24 14:16:55 core2 bash[19959]: SKIP DB VERIFY: False ()

Jul 24 14:16:55 core2 bash[19959]: RUN PROJECTIONS: System ()

Jul 24 14:16:55 core2 bash[19959]: PROJECTION THREADS: 3 ()

Jul 24 14:16:55 core2 bash[19959]: WORKER THREADS: 5 ()

Jul 24 14:16:55 core2 bash[19959]: HTTP PREFIXES: http://*:2113/ (–http-prefix from command line)

Jul 24 14:16:55 core2 bash[19959]: ENABLE TRUSTED AUTH: False ()

Jul 24 14:16:55 core2 bash[19959]: CERTIFICATE STORE: ()

Jul 24 14:16:55 core2 bash[19959]: CERTIFICATE NAME: ()

Jul 24 14:16:55 core2 bash[19959]: CERTIFICATE FILE: ()

Jul 24 14:16:55 core2 bash[19959]: CERTIFICATE PASSWORD: ()

Jul 24 14:16:55 core2 bash[19959]: USE INTERNAL SSL: False ()

Jul 24 14:16:55 core2 bash[19959]: SSL TARGET HOST: n/a ()

Jul 24 14:16:55 core2 bash[19959]: SSL VALIDATE SERVER: True ()

Jul 24 14:16:55 core2 bash[19959]: AUTHENTICATION TYPE: internal ()

Jul 24 14:16:55 core2 bash[19959]: AUTHENTICATION CONFIG FILE: ()

Jul 24 14:16:55 core2 bash[19959]: PREPARE TIMEOUT MS: 2000 ()

Jul 24 14:16:55 core2 bash[19959]: COMMIT TIMEOUT MS: 2000 ()

Jul 24 14:16:55 core2 bash[19959]: DISABLE SCAVENGE MERGING: False ()

Jul 24 14:16:55 core2 bash[19959]: GOSSIP ON EXT: True ()

Jul 24 14:16:55 core2 bash[19959]: STATS ON EXT: True ()

Jul 24 14:16:55 core2 bash[19959]: ADMIN ON EXT: True ()

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:54.980] Quorum size set to 2

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:54.989] Can’t find plugins path: /opt/eventstore/plugins

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.062]

Jul 24 14:16:55 core2 bash[19959]: INSTANCE ID: 5bba350f-ccdf-42be-a6b9-ef7d9966019f

Jul 24 14:16:55 core2 bash[19959]: DATABASE: /data/eventstore

Jul 24 14:16:55 core2 bash[19959]: WRITER CHECKPOINT: 0 (0x0)

Jul 24 14:16:55 core2 bash[19959]: CHASER CHECKPOINT: 0 (0x0)

Jul 24 14:16:55 core2 bash[19959]: EPOCH CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 14:16:55 core2 bash[19959]: TRUNCATE CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.224] MessageHierarchy initialization took 00:00:00.1409734.

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.372] CACHED TFChunk #0-0 (chunk-000000.000000) in 00:00:00.0044950.

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.481] Starting MiniWeb for /web/es/js/projections ==> /opt/eventstore/singlenode-web/js/projections

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.482] Starting MiniWeb for /web/es/js/projections/v8/Prelude ==> /opt/eventstore/Prelude

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.482] Starting MiniWeb for /web/es/js/projections/resources ==> /opt/eventstore/web-resources/js

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.482] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.482] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.483] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.483] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.483] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.483] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Binding MiniWeb to /web/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Binding MiniWeb to /web/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 14:16:55 core2 bash[19959]: [00001,01,14:16:55.488] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 14:16:55 core2 bash[19959]: [00001,10,14:16:55.505] ========== [127.0.0.1:2112] SYSTEM INIT…

Jul 24 14:16:55 core2 bash[19959]: [00001,10,14:16:55.539] Starting Normal TCP listening on TCP endpoint: 127.0.0.1:1113.

Jul 24 14:16:55 core2 bash[19959]: [00001,13,14:16:55.560] TableIndex initialization…

Jul 24 14:16:55 core2 bash[19959]: [00001,13,14:16:55.574] ReadIndex building…

Jul 24 14:16:55 core2 bash[19959]: [00001,13,14:16:55.578] ReadIndex rebuilding done: total processed 0 records, time elapsed: 00:00:00.0031960.

Jul 24 14:16:55 core2 bash[19959]: [00001,10,14:16:55.580] SLOW BUS MSG [MainBus]: SystemInit - 49ms. Handler: TcpService.

Jul 24 14:16:55 core2 bash[19959]: [00001,10,14:16:55.580] Starting Normal TCP listening on TCP endpoint: 127.0.0.1:1112.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.609] Starting HTTP server on [http://*:2113/]…

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.613] HTTP server is up and listening on [http://*:2113/]

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.613] Starting HTTP server on [http://127.0.0.1:2112/]…

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.613] HTTP server is up and listening on [http://127.0.0.1:2112/]

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.625] SLOW QUEUE MSG [MainQueue]: SystemInit - 126ms. Q: 0/8.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.625] ========== [127.0.0.1:2112] Service ‘StorageWriter’ initialized.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.625] ========== [127.0.0.1:2112] Service ‘StorageReader’ initialized.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.625] ========== [127.0.0.1:2112] Service ‘StorageChaser’ initialized.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.646] CLUSTER HAS CHANGED

Jul 24 14:16:56 core2 bash[19959]: Old:

Jul 24 14:16:56 core2 bash[19959]: VND {5bba350f-ccdf-42be-a6b9-ef7d9966019f} [Unknown, 127.0.0.1:1112, n/a, 127.0.0.1:1113, n/a, 127.0.0.1:2112, 127.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 14:16:55.613

Jul 24 14:16:56 core2 bash[19959]: New:

Jul 24 14:16:56 core2 bash[19959]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 14:16:55.626

Jul 24 14:16:56 core2 bash[19959]: VND {5bba350f-ccdf-42be-a6b9-ef7d9966019f} [Unknown, 127.0.0.1:1112, n/a, 127.0.0.1:1113, n/a, 127.0.0.1:2112, 127.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 14:16:55.613

Jul 24 14:16:56 core2 bash[19959]: --------------------------------------------------------------------------------

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.648] ========== [127.0.0.1:2112] SYSTEM START…

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.653] ========== [127.0.0.1:2112] IS UNKNOWN!!! WHOA!!!

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.682] ELECTIONS: STARTING ELECTIONS.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.682] ELECTIONS: (V=0) SHIFT TO LEADER ELECTION.

Jul 24 14:16:56 core2 bash[19959]: [00001,10,14:16:55.683] ELECTIONS: (V=0) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:56.286] Looks like node [172.17.8.103:2113] is DEAD (Gossip send failed).

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:56.286] CLUSTER HAS CHANGED (gossip send failed to [172.17.8.103:2113])

Jul 24 14:16:57 core2 bash[19959]: Old:

Jul 24 14:16:57 core2 bash[19959]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 14:16:55.626

Jul 24 14:16:57 core2 bash[19959]: VND {5bba350f-ccdf-42be-a6b9-ef7d9966019f} [Unknown, 127.0.0.1:1112, n/a, 127.0.0.1:1113, n/a, 127.0.0.1:2112, 127.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 14:16:56.260

Jul 24 14:16:57 core2 bash[19959]: New:

Jul 24 14:16:57 core2 bash[19959]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 14:16:56.287

Jul 24 14:16:57 core2 bash[19959]: VND {5bba350f-ccdf-42be-a6b9-ef7d9966019f} [Unknown, 127.0.0.1:1112, n/a, 127.0.0.1:1113, n/a, 127.0.0.1:2112, 127.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 14:16:56.260

Jul 24 14:16:57 core2 bash[19959]: --------------------------------------------------------------------------------

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:56.684] ELECTIONS: (V=0) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:56.684] ELECTIONS: (V=1) SHIFT TO LEADER ELECTION.

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:56.684] ELECTIONS: (V=1) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:57.684] ELECTIONS: (V=1) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:57.684] ELECTIONS: (V=2) SHIFT TO LEADER ELECTION.

Jul 24 14:16:57 core2 bash[19959]: [00001,10,14:16:57.684] ELECTIONS: (V=2) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:16:58 core2 bash[19959]: [00001,10,14:16:58.685] ELECTIONS: (V=2) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:16:58 core2 bash[19959]: [00001,10,14:16:58.685] ELECTIONS: (V=3) SHIFT TO LEADER ELECTION.

Jul 24 14:16:58 core2 bash[19959]: [00001,10,14:16:58.685] ELECTIONS: (V=3) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:00 core2 bash[19959]: [00001,10,14:16:59.686] ELECTIONS: (V=3) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:17:00 core2 bash[19959]: [00001,10,14:16:59.686] ELECTIONS: (V=4) SHIFT TO LEADER ELECTION.

Jul 24 14:17:00 core2 bash[19959]: [00001,10,14:16:59.686] ELECTIONS: (V=4) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:01 core2 bash[19959]: [00001,10,14:17:00.687] ELECTIONS: (V=4) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:17:01 core2 bash[19959]: [00001,10,14:17:00.687] ELECTIONS: (V=5) SHIFT TO LEADER ELECTION.

Jul 24 14:17:01 core2 bash[19959]: [00001,10,14:17:00.687] ELECTIONS: (V=5) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:01 core2 bash[19959]: [00001,10,14:17:01.687] ELECTIONS: (V=5) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:17:01 core2 bash[19959]: [00001,10,14:17:01.687] ELECTIONS: (V=6) SHIFT TO LEADER ELECTION.

Jul 24 14:17:01 core2 bash[19959]: [00001,10,14:17:01.687] ELECTIONS: (V=6) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:03 core2 bash[19959]: [00001,10,14:17:02.687] ELECTIONS: (V=6) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:17:03 core2 bash[19959]: [00001,10,14:17:02.687] ELECTIONS: (V=7) SHIFT TO LEADER ELECTION.

Jul 24 14:17:03 core2 bash[19959]: [00001,10,14:17:02.687] ELECTIONS: (V=7) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:04 core2 bash[19959]: [00001,10,14:17:03.688] ELECTIONS: (V=7) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:17:04 core2 bash[19959]: [00001,10,14:17:03.688] ELECTIONS: (V=8) SHIFT TO LEADER ELECTION.

Jul 24 14:17:04 core2 bash[19959]: [00001,10,14:17:03.688] ELECTIONS: (V=8) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:05 core2 bash[19959]: [00001,10,14:17:04.689] ELECTIONS: (V=8) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 14:17:05 core2 bash[19959]: [00001,10,14:17:04.689] ELECTIONS: (V=9) SHIFT TO LEADER ELECTION.

Jul 24 14:17:05 core2 bash[19959]: [00001,10,14:17:04.689] ELECTIONS: (V=9) VIEWCHANGE FROM [127.0.0.1:2112, {5bba350f-ccdf-42be-a6b9-ef7d9966019f}].

Jul 24 14:17:06 core2 bash[19959]: [00001,10,14:17:05.689] ELECTIONS: (V=9) TIMED OUT! (S=ElectingLeader, M=).

… this continues repeatedly after that without change.

Can you try with setting the internal vs external http and tcp options? An example is here:


start EventStore.ClusterNode.exe --mem-db --log .\logs\log1 --int-ip 127.0.0.1 --ext-ip 127.0.0.1 --int-tcp-port=1111 --ext-tcp-port=1112 --int-http-port=1113 --ext-http-port=1114 --nodes-count 3 --use-dns-discovery- --gossip-seed [127.0.0.1:2113](http://127.0.0.1:2113) --gossip-seed [127.0.0.1:3113](http://127.0.0.1:3113)

`

`

https://github.com/eventstore/eventstore/wiki/Setting-up-an-OSS-cluster

Shows for all 3 nodes btw (so you know which to put in gossip seeds)

Another thing to try I’ve seen success with: try binding to your externally facing IP instead of loopback.

Setting the tcp and http ports options unfortunately didn’t make a difference.

My initial suspicion was the loopback interface binding so I experimented with binding to the external ip, but that just resulted in ES crashing (logs below). Binding to 0.0.0.0 does however expose ES to the public interface so I can reach it from all 3 nodes in the cluster without problems… it’s just ES not accepting them.

Here are the logs from the crash:

Jul 24 15:08:10 core2 bash[22692]: ES VERSION: 3.0.0.0 (master/30f7fa64d73ba65028e5a6ca720639985cac1458, Tue, 18 Mar 2014 20:03:41 +0000)

Jul 24 15:08:10 core2 bash[22692]: OS: Unknown (Unix 3.15.2.0)

Jul 24 15:08:10 core2 bash[22692]: RUNTIME: 3.2.8 (Debian 3.2.8+dfsg-7) (64-bit)

Jul 24 15:08:10 core2 bash[22692]: GC: 2 GENERATIONS

Jul 24 15:08:10 core2 bash[22692]: LOGS: /var/log/eventstore

Jul 24 15:08:10 core2 bash[22692]: SHOW HELP: False ()

Jul 24 15:08:10 core2 bash[22692]: SHOW VERSION: False ()

Jul 24 15:08:10 core2 bash[22692]: LOGS DIR: /var/log/eventstore (–logsdir from command line)

Jul 24 15:08:10 core2 bash[22692]: CONFIGS: ()

Jul 24 15:08:10 core2 bash[22692]: DEFINES: ()

Jul 24 15:08:10 core2 bash[22692]: INTERNAL IP: 127.0.0.1 ()

Jul 24 15:08:10 core2 bash[22692]: EXTERNAL IP: 127.0.0.1 ()

Jul 24 15:08:10 core2 bash[22692]: INTERNAL HTTP PORT: 2112 ()

Jul 24 15:08:10 core2 bash[22692]: EXTERNAL HTTP PORT: 2113 ()

Jul 24 15:08:10 core2 bash[22692]: INTERNAL TCP PORT: 1112 ()

Jul 24 15:08:10 core2 bash[22692]: INTERNAL SECURE TCP PORT: 0 ()

Jul 24 15:08:10 core2 bash[22692]: EXTERNAL TCP PORT: 1113 ()

Jul 24 15:08:10 core2 bash[22692]: EXTERNAL SECURE TCP PORT: 0 ()

Jul 24 15:08:10 core2 bash[22692]: FORCE: False ()

Jul 24 15:08:10 core2 bash[22692]: CLUSTER SIZE: 3 (–cluster-size from command line)

Jul 24 15:08:10 core2 bash[22692]: MIN FLUSH DELAY MS: 2 ()

Jul 24 15:08:10 core2 bash[22692]: NODE PRIORITY: 0 ()

Jul 24 15:08:10 core2 bash[22692]: COMMIT COUNT: 2 ()

Jul 24 15:08:10 core2 bash[22692]: PREPARE COUNT: 2 ()

Jul 24 15:08:10 core2 bash[22692]: MAX MEM TABLE SIZE: 1000000 ()

Jul 24 15:08:10 core2 bash[22692]: DISCOVER VIA DNS: False (–use-dns-discovery from command line)

Jul 24 15:08:10 core2 bash[22692]: CLUSTER DNS: fake.dns ()

Jul 24 15:08:10 core2 bash[22692]: CLUSTER GOSSIP PORT: 30777 ()

Jul 24 15:08:10 core2 bash[22692]: GOSSIP SEEDS: ()

Jul 24 15:08:10 core2 bash[22692]: STATS PERIOD SEC: 30 ()

Jul 24 15:08:10 core2 bash[22692]: CACHED CHUNKS: -1 ()

Jul 24 15:08:10 core2 bash[22692]: CHUNKS CACHE SIZE: 536871424 ()

Jul 24 15:08:10 core2 bash[22692]: DB PATH: /data/eventstore (–db from command line)

Jul 24 15:08:10 core2 bash[22692]: IN MEM DB: False ()

Jul 24 15:08:10 core2 bash[22692]: SKIP DB VERIFY: False ()

Jul 24 15:08:10 core2 bash[22692]: RUN PROJECTIONS: System ()

Jul 24 15:08:10 core2 bash[22692]: PROJECTION THREADS: 3 ()

Jul 24 15:08:10 core2 bash[22692]: WORKER THREADS: 5 ()

Jul 24 15:08:10 core2 bash[22692]: HTTP PREFIXES: http://172.17.8.102:2113/ (–http-prefix from command line)

Jul 24 15:08:10 core2 bash[22692]: ENABLE TRUSTED AUTH: False ()

Jul 24 15:08:10 core2 bash[22692]: CERTIFICATE STORE: ()

Jul 24 15:08:10 core2 bash[22692]: CERTIFICATE NAME: ()

Jul 24 15:08:10 core2 bash[22692]: CERTIFICATE FILE: ()

Jul 24 15:08:10 core2 bash[22692]: CERTIFICATE PASSWORD: ()

Jul 24 15:08:10 core2 bash[22692]: USE INTERNAL SSL: False ()

Jul 24 15:08:10 core2 bash[22692]: SSL TARGET HOST: n/a ()

Jul 24 15:08:10 core2 bash[22692]: SSL VALIDATE SERVER: True ()

Jul 24 15:08:10 core2 bash[22692]: AUTHENTICATION TYPE: internal ()

Jul 24 15:08:10 core2 bash[22692]: AUTHENTICATION CONFIG FILE: ()

Jul 24 15:08:10 core2 bash[22692]: PREPARE TIMEOUT MS: 2000 ()

Jul 24 15:08:10 core2 bash[22692]: COMMIT TIMEOUT MS: 2000 ()

Jul 24 15:08:10 core2 bash[22692]: DISABLE SCAVENGE MERGING: False ()

Jul 24 15:08:10 core2 bash[22692]: GOSSIP ON EXT: True ()

Jul 24 15:08:10 core2 bash[22692]: STATS ON EXT: True ()

Jul 24 15:08:10 core2 bash[22692]: ADMIN ON EXT: True ()

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.078] Quorum size set to 2

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.083] Can’t find plugins path: /opt/eventstore/plugins

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.157] DNS discovery is disabled, but no gossip seed endpoints have been specified. Specify gossip seeds using the --gossip-seed= command line option.

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.157]

Jul 24 15:08:10 core2 bash[22692]: INSTANCE ID: b580fb5a-5a6d-4379-9b39-ca8d4f80230d

Jul 24 15:08:10 core2 bash[22692]: DATABASE: /data/eventstore

Jul 24 15:08:10 core2 bash[22692]: WRITER CHECKPOINT: 0 (0x0)

Jul 24 15:08:10 core2 bash[22692]: CHASER CHECKPOINT: 0 (0x0)

Jul 24 15:08:10 core2 bash[22692]: EPOCH CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 15:08:10 core2 bash[22692]: TRUNCATE CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.293] MessageHierarchy initialization took 00:00:00.1001182.

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.365] CACHED TFChunk #0-0 (chunk-000000.000000) in 00:00:00.0013540.

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.469] Starting MiniWeb for /web/es/js/projections ==> /opt/eventstore/singlenode-web/js/projections

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.469] Starting MiniWeb for /web/es/js/projections/v8/Prelude ==> /opt/eventstore/Prelude

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.469] Starting MiniWeb for /web/es/js/projections/resources ==> /opt/eventstore/web-resources/js

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.470] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.470] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.470] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.471] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.471] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.471] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Binding MiniWeb to /web/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Binding MiniWeb to /web/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.474] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 15:08:10 core2 bash[22692]: [00001,10,15:08:10.505] ========== [127.0.0.1:2112] SYSTEM INIT…

Jul 24 15:08:10 core2 bash[22692]: Exiting with exit code: 1.

Jul 24 15:08:10 core2 bash[22692]: Exit reason: Http async server failed to start listening at [http://172.17.8.102:2113/].

Jul 24 15:08:10 core2 bash[22692]: [00001,10,15:08:10.542] Starting Normal TCP listening on TCP endpoint: 127.0.0.1:1113.

Jul 24 15:08:10 core2 bash[22692]: [00001,13,15:08:10.553] TableIndex initialization…

Jul 24 15:08:10 core2 bash[22692]: [00001,10,15:08:10.554] Starting Normal TCP listening on TCP endpoint: 127.0.0.1:1112.

Jul 24 15:08:10 core2 bash[22692]: [00001,13,15:08:10.570] ReadIndex building…

Jul 24 15:08:10 core2 bash[22692]: [00001,13,15:08:10.573] ReadIndex rebuilding done: total processed 0 records, time elapsed: 00:00:00.0005270.

Jul 24 15:08:10 core2 bash[22692]: [00001,10,15:08:10.577] Starting HTTP server on [http://172.17.8.102:2113/]…

Jul 24 15:08:10 core2 bash[22692]: [00001,10,15:08:10.582] Failed to start http server

Jul 24 15:08:10 core2 bash[22692]: The requested address is not valid in this context

Jul 24 15:08:10 core2 bash[22692]: [00001,10,15:08:10.608] Exiting with exit code: 1.

Jul 24 15:08:10 core2 bash[22692]: Exit reason: Http async server failed to start listening at [http://172.17.8.102:2113/].

I’m not sure if this is a Docker-influenced issue since it does poke around with networking.

You didn’t include your command line :slight_smile:

and

Jul 24 15:08:10 core2 bash[22692]: [00001,01,15:08:10.157] DNS discovery is disabled, but no gossip seed endpoints have been specified. Specify gossip seeds using the --gossip-seed= command line option.

Command with tcp/http options:

mono-sgen EventStore.ClusterNode.exe --log=/var/log/eventstore --int-ip=0.0.0.0 --ext-ip=0.0.0.0 --int-tcp-port=1112 --ext-tcp-port=1113 --int-http-port=2112 --ext-http-port=2113 --tcp-port=1113 --http-port=2113 --cluster-size=3 --use-dns-discovery- --db=/data/eventstore --http-prefix=http://*:2113/ --gossip-seed=172.17.8.101:2113 --gossip-seed=172.17.8.102:2113 --gossip-seed=172.17.8.103:2113

The real IP of the node is 172.17.8.103 but I have to set the IP to 0.0.0.0 or 127.0.0.1 or I get the crash further down.

This is the log from the third node which I started after the other two were already running:

Jul 24 15:44:05 core1 bash[14718]: ES VERSION: 3.0.0.0 (master/30f7fa64d73ba65028e5a6ca720639985cac1458, Tue, 18 Mar 2014 20:03:41 +0000)

Jul 24 15:44:05 core1 bash[14718]: OS: Unknown (Unix 3.15.2.0)

Jul 24 15:44:05 core1 bash[14718]: RUNTIME: 3.2.8 (Debian 3.2.8+dfsg-7) (64-bit)

Jul 24 15:44:05 core1 bash[14718]: GC: 2 GENERATIONS

Jul 24 15:44:05 core1 bash[14718]: LOGS: /var/log/eventstore

Jul 24 15:44:05 core1 bash[14718]: SHOW HELP: False ()

Jul 24 15:44:05 core1 bash[14718]: SHOW VERSION: False ()

Jul 24 15:44:05 core1 bash[14718]: LOGS DIR: /var/log/eventstore (–logsdir from command line)

Jul 24 15:44:05 core1 bash[14718]: CONFIGS: ()

Jul 24 15:44:05 core1 bash[14718]: DEFINES: ()

Jul 24 15:44:05 core1 bash[14718]: INTERNAL IP: 0.0.0.0 (–internal-ip from command line)

Jul 24 15:44:05 core1 bash[14718]: EXTERNAL IP: 0.0.0.0 (–external-ip from command line)

Jul 24 15:44:05 core1 bash[14718]: INTERNAL HTTP PORT: 2112 (–internal-http-port from command line)

Jul 24 15:44:05 core1 bash[14718]: EXTERNAL HTTP PORT: 2113 (–external-http-port from command line)

Jul 24 15:44:05 core1 bash[14718]: INTERNAL TCP PORT: 1112 (–internal-tcp-port from command line)

Jul 24 15:44:05 core1 bash[14718]: INTERNAL SECURE TCP PORT: 0 ()

Jul 24 15:44:05 core1 bash[14718]: EXTERNAL TCP PORT: 1113 (–external-tcp-port from command line)

Jul 24 15:44:05 core1 bash[14718]: EXTERNAL SECURE TCP PORT: 0 ()

Jul 24 15:44:05 core1 bash[14718]: FORCE: False ()

Jul 24 15:44:05 core1 bash[14718]: CLUSTER SIZE: 3 (–cluster-size from command line)

Jul 24 15:44:05 core1 bash[14718]: MIN FLUSH DELAY MS: 2 ()

Jul 24 15:44:05 core1 bash[14718]: NODE PRIORITY: 0 ()

Jul 24 15:44:05 core1 bash[14718]: COMMIT COUNT: 2 ()

Jul 24 15:44:05 core1 bash[14718]: PREPARE COUNT: 2 ()

Jul 24 15:44:05 core1 bash[14718]: MAX MEM TABLE SIZE: 1000000 ()

Jul 24 15:44:05 core1 bash[14718]: DISCOVER VIA DNS: False (–use-dns-discovery from command line)

Jul 24 15:44:05 core1 bash[14718]: CLUSTER DNS: fake.dns ()

Jul 24 15:44:05 core1 bash[14718]: CLUSTER GOSSIP PORT: 30777 ()

Jul 24 15:44:05 core1 bash[14718]: GOSSIP SEEDS: 172.17.8.101:2113, 172.17.8.102:2113, 172.17.8.103:2113 (–gossip-seed from command line)

Jul 24 15:44:05 core1 bash[14718]: STATS PERIOD SEC: 30 ()

Jul 24 15:44:05 core1 bash[14718]: CACHED CHUNKS: -1 ()

Jul 24 15:44:05 core1 bash[14718]: CHUNKS CACHE SIZE: 536871424 ()

Jul 24 15:44:05 core1 bash[14718]: DB PATH: /data/eventstore (–db from command line)

Jul 24 15:44:05 core1 bash[14718]: IN MEM DB: False ()

Jul 24 15:44:05 core1 bash[14718]: SKIP DB VERIFY: False ()

Jul 24 15:44:05 core1 bash[14718]: RUN PROJECTIONS: System ()

Jul 24 15:44:05 core1 bash[14718]: PROJECTION THREADS: 3 ()

Jul 24 15:44:05 core1 bash[14718]: WORKER THREADS: 5 ()

Jul 24 15:44:05 core1 bash[14718]: HTTP PREFIXES: http://*:2113/ (–http-prefix from command line)

Jul 24 15:44:05 core1 bash[14718]: ENABLE TRUSTED AUTH: False ()

Jul 24 15:44:05 core1 bash[14718]: CERTIFICATE STORE: ()

Jul 24 15:44:05 core1 bash[14718]: CERTIFICATE NAME: ()

Jul 24 15:44:05 core1 bash[14718]: CERTIFICATE FILE: ()

Jul 24 15:44:05 core1 bash[14718]: CERTIFICATE PASSWORD: ()

Jul 24 15:44:05 core1 bash[14718]: USE INTERNAL SSL: False ()

Jul 24 15:44:05 core1 bash[14718]: SSL TARGET HOST: n/a ()

Jul 24 15:44:05 core1 bash[14718]: SSL VALIDATE SERVER: True ()

Jul 24 15:44:05 core1 bash[14718]: AUTHENTICATION TYPE: internal ()

Jul 24 15:44:05 core1 bash[14718]: AUTHENTICATION CONFIG FILE: ()

Jul 24 15:44:05 core1 bash[14718]: PREPARE TIMEOUT MS: 2000 ()

Jul 24 15:44:05 core1 bash[14718]: COMMIT TIMEOUT MS: 2000 ()

Jul 24 15:44:05 core1 bash[14718]: DISABLE SCAVENGE MERGING: False ()

Jul 24 15:44:05 core1 bash[14718]: GOSSIP ON EXT: True ()

Jul 24 15:44:05 core1 bash[14718]: STATS ON EXT: True ()

Jul 24 15:44:05 core1 bash[14718]: ADMIN ON EXT: True ()

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.220] Quorum size set to 2

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.225] Can’t find plugins path: /opt/eventstore/plugins

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.292]

Jul 24 15:44:05 core1 bash[14718]: INSTANCE ID: a796c3bf-f72b-46d0-bb90-7c34bb989ae7

Jul 24 15:44:05 core1 bash[14718]: DATABASE: /data/eventstore

Jul 24 15:44:05 core1 bash[14718]: WRITER CHECKPOINT: 0 (0x0)

Jul 24 15:44:05 core1 bash[14718]: CHASER CHECKPOINT: 0 (0x0)

Jul 24 15:44:05 core1 bash[14718]: EPOCH CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 15:44:05 core1 bash[14718]: TRUNCATE CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.460] MessageHierarchy initialization took 00:00:00.1409839.

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.516] CACHED TFChunk #0-0 (chunk-000000.000000) in 00:00:00.0013381.

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.618] Starting MiniWeb for /web/es/js/projections ==> /opt/eventstore/singlenode-web/js/projections

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.618] Starting MiniWeb for /web/es/js/projections/v8/Prelude ==> /opt/eventstore/Prelude

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.618] Starting MiniWeb for /web/es/js/projections/resources ==> /opt/eventstore/web-resources/js

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.622] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.622] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.622] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.626] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.626] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.626] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.629] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.630] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.630] Binding MiniWeb to /web/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.630] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Binding MiniWeb to /web/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 15:44:05 core1 bash[14718]: [00001,01,15:44:05.631] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.647] ========== [0.0.0.0:2112] SYSTEM INIT…

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.684] Starting Normal TCP listening on TCP endpoint: 0.0.0.0:1113.

Jul 24 15:44:05 core1 bash[14718]: [00001,13,15:44:05.708] TableIndex initialization…

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.726] SLOW BUS MSG [MainBus]: SystemInit - 51ms. Handler: TcpService.

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.726] Starting Normal TCP listening on TCP endpoint: 0.0.0.0:1112.

Jul 24 15:44:05 core1 bash[14718]: [00001,13,15:44:05.729] ReadIndex building…

Jul 24 15:44:05 core1 bash[14718]: [00001,13,15:44:05.731] ReadIndex rebuilding done: total processed 0 records, time elapsed: 00:00:00.0010600.

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.747] Starting HTTP server on [http://*:2113/]…

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.750] HTTP server is up and listening on [http://*:2113/]

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.750] Starting HTTP server on [http://0.0.0.0:2112/]…

Jul 24 15:44:05 core1 bash[14718]: [00001,10,15:44:05.751] HTTP server is up and listening on [http://0.0.0.0:2112/]

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.771] SLOW QUEUE MSG [MainQueue]: SystemInit - 121ms. Q: 0/8.

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.773] ========== [0.0.0.0:2112] Service ‘StorageWriter’ initialized.

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.773] ========== [0.0.0.0:2112] Service ‘StorageReader’ initialized.

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.774] ========== [0.0.0.0:2112] Service ‘StorageChaser’ initialized.

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.794] CLUSTER HAS CHANGED

Jul 24 15:44:06 core1 bash[14718]: Old:

Jul 24 15:44:06 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:05.751

Jul 24 15:44:06 core1 bash[14718]: New:

Jul 24 15:44:06 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:06 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:06 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:06 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:05.751

Jul 24 15:44:06 core1 bash[14718]: --------------------------------------------------------------------------------

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.796] ========== [0.0.0.0:2112] SYSTEM START…

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.800] ========== [0.0.0.0:2112] IS UNKNOWN!!! WHOA!!!

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.834] ELECTIONS: STARTING ELECTIONS.

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.834] ELECTIONS: (V=0) SHIFT TO LEADER ELECTION.

Jul 24 15:44:06 core1 bash[14718]: [00001,10,15:44:05.834] ELECTIONS: (V=0) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:06.431] Looks like node [172.17.8.101:2113] is DEAD (Gossip send failed).

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:06.431] CLUSTER HAS CHANGED (gossip send failed to [172.17.8.101:2113])

Jul 24 15:44:07 core1 bash[14718]: Old:

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:06.405

Jul 24 15:44:07 core1 bash[14718]: New:

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:06.431

Jul 24 15:44:07 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:06.405

Jul 24 15:44:07 core1 bash[14718]: --------------------------------------------------------------------------------

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:06.837] ELECTIONS: (V=0) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:06.837] ELECTIONS: (V=1) SHIFT TO LEADER ELECTION.

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:06.837] ELECTIONS: (V=1) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:07.240] Looks like node [172.17.8.103:2113] is DEAD (Gossip send failed).

Jul 24 15:44:07 core1 bash[14718]: [00001,10,15:44:07.240] CLUSTER HAS CHANGED (gossip send failed to [172.17.8.103:2113])

Jul 24 15:44:07 core1 bash[14718]: Old:

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:06.431

Jul 24 15:44:07 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:07.213

Jul 24 15:44:07 core1 bash[14718]: New:

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:07.240

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:07 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:06.431

Jul 24 15:44:07 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:07.213

Jul 24 15:44:07 core1 bash[14718]: --------------------------------------------------------------------------------

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:07.771] Looks like node [172.17.8.102:2113] is DEAD (Gossip send failed).

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:07.771] CLUSTER HAS CHANGED (gossip send failed to [172.17.8.102:2113])

Jul 24 15:44:08 core1 bash[14718]: Old:

Jul 24 15:44:08 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:07.240

Jul 24 15:44:08 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:05.775

Jul 24 15:44:08 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:06.431

Jul 24 15:44:08 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:07.717

Jul 24 15:44:08 core1 bash[14718]: New:

Jul 24 15:44:08 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.103:2113, 172.17.8.103:2113] | 2014-07-24 15:44:07.240

Jul 24 15:44:08 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.102:2113, 172.17.8.102:2113] | 2014-07-24 15:44:07.771

Jul 24 15:44:08 core1 bash[14718]: MAN {00000000-0000-0000-0000-000000000000} [Manager, 172.17.8.101:2113, 172.17.8.101:2113] | 2014-07-24 15:44:06.431

Jul 24 15:44:08 core1 bash[14718]: VND {a796c3bf-f72b-46d0-bb90-7c34bb989ae7} [Unknown, 0.0.0.0:1112, n/a, 0.0.0.0:1113, n/a, 0.0.0.0:2112, 0.0.0.0:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2014-07-24 15:44:07.717

Jul 24 15:44:08 core1 bash[14718]: --------------------------------------------------------------------------------

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:07.837] ELECTIONS: (V=1) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:07.837] ELECTIONS: (V=2) SHIFT TO LEADER ELECTION.

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:07.837] ELECTIONS: (V=2) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:08.838] ELECTIONS: (V=2) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:08.838] ELECTIONS: (V=3) SHIFT TO LEADER ELECTION.

Jul 24 15:44:08 core1 bash[14718]: [00001,10,15:44:08.838] ELECTIONS: (V=3) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:10 core1 bash[14718]: [00001,10,15:44:09.839] ELECTIONS: (V=3) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:10 core1 bash[14718]: [00001,10,15:44:09.839] ELECTIONS: (V=4) SHIFT TO LEADER ELECTION.

Jul 24 15:44:10 core1 bash[14718]: [00001,10,15:44:09.839] ELECTIONS: (V=4) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:11 core1 bash[14718]: [00001,10,15:44:10.840] ELECTIONS: (V=4) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:11 core1 bash[14718]: [00001,10,15:44:10.840] ELECTIONS: (V=5) SHIFT TO LEADER ELECTION.

Jul 24 15:44:11 core1 bash[14718]: [00001,10,15:44:10.840] ELECTIONS: (V=5) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:12 core1 bash[14718]: [00001,10,15:44:11.841] ELECTIONS: (V=5) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:12 core1 bash[14718]: [00001,10,15:44:11.841] ELECTIONS: (V=6) SHIFT TO LEADER ELECTION.

Jul 24 15:44:12 core1 bash[14718]: [00001,10,15:44:11.841] ELECTIONS: (V=6) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:13 core1 bash[14718]: [00001,10,15:44:12.843] ELECTIONS: (V=6) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:13 core1 bash[14718]: [00001,10,15:44:12.843] ELECTIONS: (V=7) SHIFT TO LEADER ELECTION.

Jul 24 15:44:13 core1 bash[14718]: [00001,10,15:44:12.843] ELECTIONS: (V=7) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:14 core1 bash[14718]: [00001,10,15:44:13.844] ELECTIONS: (V=7) TIMED OUT! (S=ElectingLeader, M=).

Jul 24 15:44:14 core1 bash[14718]: [00001,10,15:44:13.844] ELECTIONS: (V=8) SHIFT TO LEADER ELECTION.

Jul 24 15:44:14 core1 bash[14718]: [00001,10,15:44:13.844] ELECTIONS: (V=8) VIEWCHANGE FROM [0.0.0.0:2112, {a796c3bf-f72b-46d0-bb90-7c34bb989ae7}].

Jul 24 15:44:15 core1 bash[14718]: [00001,10,15:44:14.846] ELECTIONS: (V=8) TIMED OUT! (S=ElectingLeader, M=).

The crash:

The command you requested earlier binding to the external interface:

mono-sgen EventStore.ClusterNode.exe --log=/var/log/eventstore --ip=172.17.8.103 --tcp-port=1113 --http-port=2113 --cluster-size=3 --use-dns-discovery- --db=/data/eventstore --http-prefix=http://172.17.8.103:2113/ --gossip-seed=172.17.8.101:2113 --gossip-seed=172.17.8.102:2113 --gossip-seed=172.17.8.103:2113

The crash log:

Jul 24 15:28:24 core3 bash[20310]: ES VERSION: 3.0.0.0 (master/30f7fa64d73ba65028e5a6ca720639985cac1458, Tue, 18 Mar 2014 20:03:41 +0000)

Jul 24 15:28:24 core3 bash[20310]: OS: Unknown (Unix 3.15.2.0)

Jul 24 15:28:24 core3 bash[20310]: RUNTIME: 3.2.8 (Debian 3.2.8+dfsg-7) (64-bit)

Jul 24 15:28:24 core3 bash[20310]: GC: 2 GENERATIONS

Jul 24 15:28:24 core3 bash[20310]: LOGS: /var/log/eventstore

Jul 24 15:28:24 core3 bash[20310]: SHOW HELP: False ()

Jul 24 15:28:24 core3 bash[20310]: SHOW VERSION: False ()

Jul 24 15:28:24 core3 bash[20310]: LOGS DIR: /var/log/eventstore (–logsdir from command line)

Jul 24 15:28:24 core3 bash[20310]: CONFIGS: ()

Jul 24 15:28:24 core3 bash[20310]: DEFINES: ()

Jul 24 15:28:24 core3 bash[20310]: INTERNAL IP: 127.0.0.1 ()

Jul 24 15:28:24 core3 bash[20310]: EXTERNAL IP: 127.0.0.1 ()

Jul 24 15:28:24 core3 bash[20310]: INTERNAL HTTP PORT: 2112 ()

Jul 24 15:28:24 core3 bash[20310]: EXTERNAL HTTP PORT: 2113 ()

Jul 24 15:28:24 core3 bash[20310]: INTERNAL TCP PORT: 1112 ()

Jul 24 15:28:24 core3 bash[20310]: INTERNAL SECURE TCP PORT: 0 ()

Jul 24 15:28:24 core3 bash[20310]: EXTERNAL TCP PORT: 1113 ()

Jul 24 15:28:24 core3 bash[20310]: EXTERNAL SECURE TCP PORT: 0 ()

Jul 24 15:28:24 core3 bash[20310]: FORCE: False ()

Jul 24 15:28:24 core3 bash[20310]: CLUSTER SIZE: 3 (–cluster-size from command line)

Jul 24 15:28:24 core3 bash[20310]: MIN FLUSH DELAY MS: 2 ()

Jul 24 15:28:24 core3 bash[20310]: NODE PRIORITY: 0 ()

Jul 24 15:28:24 core3 bash[20310]: COMMIT COUNT: 2 ()

Jul 24 15:28:24 core3 bash[20310]: PREPARE COUNT: 2 ()

Jul 24 15:28:24 core3 bash[20310]: MAX MEM TABLE SIZE: 1000000 ()

Jul 24 15:28:24 core3 bash[20310]: DISCOVER VIA DNS: False (–use-dns-discovery from command line)

Jul 24 15:28:24 core3 bash[20310]: CLUSTER DNS: fake.dns ()

Jul 24 15:28:24 core3 bash[20310]: CLUSTER GOSSIP PORT: 30777 ()

Jul 24 15:28:24 core3 bash[20310]: GOSSIP SEEDS: 172.17.8.101:2113, 172.17.8.102:2113, 172.17.8.103:2113 (–gossip-seed from command line)

Jul 24 15:28:24 core3 bash[20310]: STATS PERIOD SEC: 30 ()

Jul 24 15:28:24 core3 bash[20310]: CACHED CHUNKS: -1 ()

Jul 24 15:28:24 core3 bash[20310]: CHUNKS CACHE SIZE: 536871424 ()

Jul 24 15:28:24 core3 bash[20310]: DB PATH: /data/eventstore (–db from command line)

Jul 24 15:28:24 core3 bash[20310]: IN MEM DB: False ()

Jul 24 15:28:24 core3 bash[20310]: SKIP DB VERIFY: False ()

Jul 24 15:28:24 core3 bash[20310]: RUN PROJECTIONS: System ()

Jul 24 15:28:24 core3 bash[20310]: PROJECTION THREADS: 3 ()

Jul 24 15:28:24 core3 bash[20310]: WORKER THREADS: 5 ()

Jul 24 15:28:24 core3 bash[20310]: HTTP PREFIXES: http://172.17.8.103:2113/ (–http-prefix from command line)

Jul 24 15:28:24 core3 bash[20310]: ENABLE TRUSTED AUTH: False ()

Jul 24 15:28:24 core3 bash[20310]: CERTIFICATE STORE: ()

Jul 24 15:28:24 core3 bash[20310]: CERTIFICATE NAME: ()

Jul 24 15:28:24 core3 bash[20310]: CERTIFICATE FILE: ()

Jul 24 15:28:24 core3 bash[20310]: CERTIFICATE PASSWORD: ()

Jul 24 15:28:24 core3 bash[20310]: USE INTERNAL SSL: False ()

Jul 24 15:28:24 core3 bash[20310]: SSL TARGET HOST: n/a ()

Jul 24 15:28:24 core3 bash[20310]: SSL VALIDATE SERVER: True ()

Jul 24 15:28:24 core3 bash[20310]: AUTHENTICATION TYPE: internal ()

Jul 24 15:28:24 core3 bash[20310]: AUTHENTICATION CONFIG FILE: ()

Jul 24 15:28:24 core3 bash[20310]: PREPARE TIMEOUT MS: 2000 ()

Jul 24 15:28:24 core3 bash[20310]: COMMIT TIMEOUT MS: 2000 ()

Jul 24 15:28:24 core3 bash[20310]: DISABLE SCAVENGE MERGING: False ()

Jul 24 15:28:24 core3 bash[20310]: GOSSIP ON EXT: True ()

Jul 24 15:28:24 core3 bash[20310]: STATS ON EXT: True ()

Jul 24 15:28:24 core3 bash[20310]: ADMIN ON EXT: True ()

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.279] Quorum size set to 2

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.285] Can’t find plugins path: /opt/eventstore/plugins

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.378]

Jul 24 15:28:24 core3 bash[20310]: INSTANCE ID: 7e9b8298-b798-4821-9b80-9a872223014c

Jul 24 15:28:24 core3 bash[20310]: DATABASE: /data/eventstore

Jul 24 15:28:24 core3 bash[20310]: WRITER CHECKPOINT: 0 (0x0)

Jul 24 15:28:24 core3 bash[20310]: CHASER CHECKPOINT: 0 (0x0)

Jul 24 15:28:24 core3 bash[20310]: EPOCH CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 15:28:24 core3 bash[20310]: TRUNCATE CHECKPOINT: -1 (0xFFFFFFFFFFFFFFFF)

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.515] MessageHierarchy initialization took 00:00:00.1121425.

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.618] CACHED TFChunk #0-0 (chunk-000000.000000) in 00:00:00.0012573.

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.720] Starting MiniWeb for /web/es/js/projections ==> /opt/eventstore/singlenode-web/js/projections

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.720] Starting MiniWeb for /web/es/js/projections/v8/Prelude ==> /opt/eventstore/Prelude

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.720] Starting MiniWeb for /web/es/js/projections/resources ==> /opt/eventstore/web-resources/js

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.721] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.721] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.721] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.722] Binding MiniWeb to /web/es/js/projections/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.722] Binding MiniWeb to /web/es/js/projections/v8/Prelude/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.722] Binding MiniWeb to /web/es/js/projections/resources/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Binding MiniWeb to /web/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Starting MiniWeb for /web ==> /opt/eventstore/clusternode-web

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Starting MiniWeb for /web/es ==> /opt/eventstore/es-common-web

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Binding MiniWeb to /web/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Binding MiniWeb to /web/es/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Starting MiniWeb for /web/users ==> /opt/eventstore/Users/web

Jul 24 15:28:24 core3 bash[20310]: [00001,01,15:28:24.725] Binding MiniWeb to /web/users/{*remaining_path}

Jul 24 15:28:24 core3 bash[20310]: [00001,10,15:28:24.758] ========== [127.0.0.1:2112] SYSTEM INIT…

Jul 24 15:28:24 core3 bash[20310]: [00001,10,15:28:24.796] Starting Normal TCP listening on TCP endpoint: 127.0.0.1:1113.

Jul 24 15:28:24 core3 bash[20310]: Exiting with exit code: 1.

Jul 24 15:28:24 core3 bash[20310]: Exit reason: Http async server failed to start listening at [http://172.17.8.103:2113/].

Jul 24 15:28:24 core3 bash[20310]: [00001,13,15:28:24.808] TableIndex initialization…

Jul 24 15:28:24 core3 bash[20310]: [00001,10,15:28:24.812] Starting Normal TCP listening on TCP endpoint: 127.0.0.1:1112.

Jul 24 15:28:24 core3 bash[20310]: [00001,13,15:28:24.823] ReadIndex building…

Jul 24 15:28:24 core3 bash[20310]: [00001,13,15:28:24.825] ReadIndex rebuilding done: total processed 0 records, time elapsed: 00:00:00.0015060.

Jul 24 15:28:24 core3 bash[20310]: [00001,10,15:28:24.830] Starting HTTP server on [http://172.17.8.103:2113/]…

Jul 24 15:28:24 core3 bash[20310]: [00001,10,15:28:24.833] Failed to start http server

Jul 24 15:28:24 core3 bash[20310]: The requested address is not valid in this context

Jul 24 15:28:24 core3 bash[20310]: [00001,10,15:28:24.876] Exiting with exit code: 1.

Jul 24 15:28:24 core3 bash[20310]: Exit reason: Http async server failed to start listening at [http://172.17.8.103:2113/].