Unable to connect to a cluster

Hi,

We have some issues connecting to an EventStore cluster. It seems it tries to connect to an internal IP. We do specify the “ExtIpAdvertiseAs” though.

Any help is welcome.

Thanks

Guillaume

Setup :

  • V3.8.1

  • 2 nodes cluster running on Docker Swarm mode

  • JVM client V2.0.3

Logs:

2016-09-12 15:42:10,089 INFO e.cluster.ClusterDiscovererActor akka://perfimmo-eventstore/user/$a/cluster - Discovering cluster: attempt 1/10 successful: best candidate is MemberInfo(cf132853-5555-4177-a410-aa91d7a92bb0,2016-09-12T13:42:09.326Z,Master,true,/10.0.0.4:1112,/10.0.0.4:1116,/10.0.0.4:0,/10.0.0.4:0,/10.0.0.4:2112,/10.0.0.4:2116,2977893586,2977915544,2977915544,2977743091,19,8f8f1c2d-71e1-430f-87e2-06e7de2ad931,10)

2016-09-12 15:42:11,144 WARN eventstore.tcp.ConnectionActor akka://perfimmo-eventstore/user/$a - Connection failed to /10.0.0.4:1116

2016-09-12 15:42:21,113 INFO e.cluster.ClusterDiscovererActor akka://perfimmo-eventstore/user/$a/cluster - Failed to reach cluster best node MemberInfo(cf132853-5555-4177-a410-aa91d7a92bb0,2016-09-12T13:42:09.326Z,Master,true,/10.0.0.4:1112,/10.0.0.4:1116,/10.0.0.4:0,/10.0.0.4:0,/10.0.0.4:2112,/10.0.0.4:2116,2977893586,2977915544,2977915544,2977743091,19,8f8f1c2d-71e1-430f-87e2-06e7de2ad931,10) with error: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]

2016-09-12 15:42:21,129 WARN s.can.client.HttpClientConnection akka://perfimmo-eventstore/user/IO-HTTP/group-0/2 - Configured connecting timeout of 10 seconds expired, stopping

2016-09-12 15:42:21,132 WARN s.can.client.HttpHostConnectionSlot akka://perfimmo-eventstore/user/IO-HTTP/host-connector-2/0 - Connection attempt to 10.0.0.4:2116 failed in response to GET request to /gossip?format=json with 5 retries left, retrying…

``

EventStore Options

MODIFIED OPTIONS:

    DB:                       /data/db (Command Line)

    LOG:                      /data/logs (Command Line)

    RUN PROJECTIONS:          all (Command Line)

    EXT IP:                   0.0.0.0 (Command Line)

    INT IP:                   0.0.0.0 (Command Line)

    CLUSTER SIZE:             2 (Command Line)

    CLUSTER DNS:              eventstore-test (Command Line)

    CLUSTER GOSSIP PORT:      2112 (Command Line)

    CONFIG:                   /data/eventstore.yml (Command Line)

    MAX MEM TABLE SIZE:       100000 (Environment Variable)

    WORKER THREADS:           12 (Environment Variable)

    EXT IP ADVERTISE AS:      X.X.X.X (Config File)

    EXT TCP PORT ADVERTISE AS: 1116 (Config File)

    EXT HTTP PORT ADVERTISE AS: 2116 (Config File)

``

Connection code

lazy val settings = Settings(

address = new InetSocketAddress(“X.X.X.X”, 1116),

defaultCredentials = (login |@| password) { (l,p) => UserCredentials(l,p) },

maxReconnections = -1,

operationTimeout = 20 seconds,

operationMaxRetries = 5,

requireMaster = false,

cluster = ClusterSettings(

gossipSeedsOrDns = GossipSeedsOrDns.GossipSeeds(new InetSocketAddress(“X.X.X.X”, 2116), new InetSocketAddress(“Y.Y.Y.Y”, 2116)),

gossipTimeout = 10 seconds

).some

)

``

X.X.X.X , Y.Y.Y.Y are our external IPs

Can you curl to your public x.x.x.x:2116/gossip from the outside?

I am guessing the jvm client doesn't support advertise as (its trying
to connect to internal ips it looks like)

Yes I can curl X.X.X.X:2116/gossip

{

“members”: [

{

  "instanceId": "cf132853-5555-4177-a410-aa91d7a92bb0",

  "timeStamp": "2016-09-13T11:55:51.571563Z",

  "state": "Master",

  "isAlive": true,

  "internalTcpIp": "10.0.0.4",

  "internalTcpPort": 1112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "10.0.0.4",

  "externalTcpPort": 1116,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "10.0.0.4",

  "internalHttpPort": 2112,

  "externalHttpIp": "10.0.0.4",

  "externalHttpPort": 2116,

  "lastCommitPosition": 3086388329,

  "writerCheckpoint": 3086407147,

  "chaserCheckpoint": 3086407147,

  "epochPosition": 2977743091,

  "epochNumber": 19,

  "epochId": "8f8f1c2d-71e1-430f-87e2-06e7de2ad931",

  "nodePriority": 10

},

{

  "instanceId": "9a6d3596-f993-48a9-98f8-a2dd1c98d5b4",

  "timeStamp": "2016-09-13T11:55:50.729389Z",

  "state": "Slave",

  "isAlive": true,

  "internalTcpIp": "10.0.0.3",

  "internalTcpPort": 1112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "10.0.0.3",

  "externalTcpPort": 1116,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "10.0.0.3",

  "internalHttpPort": 2112,

  "externalHttpIp": "10.0.0.3",

  "externalHttpPort": 2116,

  "lastCommitPosition": 3086388329,

  "writerCheckpoint": 3086407147,

  "chaserCheckpoint": 3086407147,

  "epochPosition": 2977743091,

  "epochNumber": 19,

  "epochId": "8f8f1c2d-71e1-430f-87e2-06e7de2ad931",

  "nodePriority": 0

}

],

“serverIp”: “10.0.0.4”,

“serverPort”: 2112

}

``

I would expect the external IP to appear in the above CURL response. Is the
ExtIpAdvertiseAs option not taken into account?

It should be ... we just tested here with:

mono bin/clusternode/EventStore.ClusterNode.exe --int-http-port=2112
--ext-http-port=2113 --ext-tcp-port=2114 --int-tcp-port=2115
--cluster-size=2 --discover-via-dns=false --gossip-seed=127.0.0.1:1112
--mem-db *--ext-ip-advertise-as=10.0.0.8*

  "members": [
    {
      "instanceId": "3ba107df-e8e4-421e-9829-1f566a64e549",
      "timeStamp": "2016-09-13T12:07:07.887315Z",
      "state": "Slave",
      "isAlive": true,
      "internalTcpIp": "127.0.0.1",
      "internalTcpPort": 2115,
      "internalSecureTcpPort": 0,
      "externalTcpIp": "10.0.0.8",
      "externalTcpPort": 2114,
      "externalSecureTcpPort": 0,
      "internalHttpIp": "127.0.0.1",
      "internalHttpPort": 2112,
      "externalHttpIp": "10.0.0.8",
      "externalHttpPort": 2113,
      "lastCommitPosition": 1109098,
      "writerCheckpoint": 1124220,
      "chaserCheckpoint": 1124220,
      "epochPosition": 0,
      "epochNumber": 0,
      "epochId": "707eb79f-d30d-46f4-a399-31714318f3a7",
      "nodePriority": 0
    },
    {
      "instanceId": "954cb52a-fc4e-4422-a24a-9c2794d32545",
      "timeStamp": "2016-09-13T12:07:07.100636Z",
      "state": "Master",
      "isAlive": true,
      "internalTcpIp": "127.0.0.1",
      "internalTcpPort": 1115,
      "internalSecureTcpPort": 0,
      "externalTcpIp": "10.0.0.8",
      "externalTcpPort": 1114,
      "externalSecureTcpPort": 0,
      "internalHttpIp": "127.0.0.1",
      "internalHttpPort": 1112,
      "externalHttpIp": "10.0.0.8",
      "externalHttpPort": 1113,
      "lastCommitPosition": 1109098,
      "writerCheckpoint": 1124220,
      "chaserCheckpoint": 1124220,
      "epochPosition": 0,
      "epochNumber": 0,
      "epochId": "707eb79f-d30d-46f4-a399-31714318f3a7",
      "nodePriority": 0
    }
  ],
  "serverIp": "127.0.0.1",
  "serverPort": 2112
}%

Trying to reproduce locally.

Can you possible amend your config to the following?

–add-interface-prefixes=false --ext-http-prefixes=http://:{ext-http-port}/ --int-http-prefixes=http://:{int-http-port}/

``

and let us know if that works?

With this configuration, the external IP seems to be advertised. But the cluster can’t form properly. 4 nodes are detected when there are only 2.

MODIFIED OPTIONS:

    DB:                       /data/db (Command Line)

    LOG:                      /data/logs (Command Line)

    RUN PROJECTIONS:          all (Command Line)

    INT IP:                   0.0.0.0 (Command Line)

    EXT IP:                   0.0.0.0 (Command Line)

    CLUSTER SIZE:             2 (Command Line)

    CLUSTER DNS:              eventstore-test (Command Line)

    CLUSTER GOSSIP PORT:      2112 (Command Line)

    CONFIG:                   /data/eventstore.yml (Command Line)

    ADD INTERFACE PREFIXES:   false (Command Line)

    EXT HTTP PREFIXES:        http://*:2113/ (Command Line)

    INT HTTP PREFIXES:        http://*:2112/ (Command Line)

    MAX MEM TABLE SIZE:       100000 (Environment Variable)

    WORKER THREADS:           12 (Environment Variable)

    EXT IP ADVERTISE AS:      X.X.X.X (Config File)

    EXT TCP PORT ADVERTISE AS: 1116 (Config File)

    EXT HTTP PORT ADVERTISE AS: 2116 (Config File)

``

{

“members”: [

{

  "instanceId": "577fa156-fb32-48ab-a844-432cd4529290",

  "timeStamp": "2016-09-13T12:46:46.309721Z",

  "state": "Unknown",

  "isAlive": true,

  "internalTcpIp": "X.X.X.X",

  "internalTcpPort": 1112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "X.X.X.X",

  "externalTcpPort": 1116,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "X.X.X.X",

  "internalHttpPort": 2112,

  "externalHttpIp": "X.X.X.X",

  "externalHttpPort": 2116,

  "lastCommitPosition": 3089881682,

  "writerCheckpoint": 3089900467,

  "chaserCheckpoint": 3089900467,

  "epochPosition": 3088994776,

  "epochNumber": 21,

  "epochId": "b6a0d549-1386-4f9a-a2c7-8e52057f7d22",

  "nodePriority": 0

},

{

  "instanceId": "00000000-0000-0000-0000-000000000000",

  "timeStamp": "2016-09-13T12:46:24.729475Z",

  "state": "Manager",

  "isAlive": true,

  "internalTcpIp": "10.0.0.4",

  "internalTcpPort": 2112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "10.0.0.4",

  "externalTcpPort": 2112,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "10.0.0.4",

  "internalHttpPort": 2112,

  "externalHttpIp": "10.0.0.4",

  "externalHttpPort": 2112,

  "lastCommitPosition": -1,

  "writerCheckpoint": -1,

  "chaserCheckpoint": -1,

  "epochPosition": -1,

  "epochNumber": -1,

  "epochId": "00000000-0000-0000-0000-000000000000",

  "nodePriority": 0

},

{

  "instanceId": "00000000-0000-0000-0000-000000000000",

  "timeStamp": "2016-09-13T12:46:24.729475Z",

  "state": "Manager",

  "isAlive": true,

  "internalTcpIp": "10.0.0.3",

  "internalTcpPort": 2112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "10.0.0.3",

  "externalTcpPort": 2112,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "10.0.0.3",

  "internalHttpPort": 2112,

  "externalHttpIp": "10.0.0.3",

  "externalHttpPort": 2112,

  "lastCommitPosition": -1,

  "writerCheckpoint": -1,

  "chaserCheckpoint": -1,

  "epochPosition": -1,

  "epochNumber": -1,

  "epochId": "00000000-0000-0000-0000-000000000000",

  "nodePriority": 0

},

{

  "instanceId": "9fa1e80c-a919-46ab-81ce-6a079d26e4a9",

  "timeStamp": "2016-09-13T12:46:45.803612Z",

  "state": "Unknown",

  "isAlive": true,

  "internalTcpIp": "0.0.0.0",

  "internalTcpPort": 1112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "Y.Y.Y.Y",

  "externalTcpPort": 1116,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "0.0.0.0",

  "internalHttpPort": 2112,

  "externalHttpIp": "Y.Y.Y.Y",

  "externalHttpPort": 2116,

  "lastCommitPosition": 3089881682,

  "writerCheckpoint": 3089900467,

  "chaserCheckpoint": 3089900467,

  "epochPosition": 3088994776,

  "epochNumber": 21,

  "epochId": "b6a0d549-1386-4f9a-a2c7-8e52057f7d22",

  "nodePriority": 10

}

],

“serverIp”: “X.X.X.X”,

“serverPort”: 2112

}

``

Apologies for not mentioning that you have to set the --int-ip-advertise-as as well

Since we deploy on docker, the internal IP is not fixed. Which make it difficult to set the int-ip-advertise-as option.

I tried and the result is as follow. External IPs are properly advertised, cluster has 2 nodes, but one of them stay in state ‘PreReplica’. Node election goes for ever.

Is there a workaround without specifying the internal IP?

Thanks

{

“members”: [

{

  "instanceId": "63e8cf8d-f35c-4a13-9c10-9345bd886fab",

  "timeStamp": "2016-09-13T13:34:56.058208Z",

  "state": "Master",

  "isAlive": true,

  "internalTcpIp": "10.0.0.4",

  "internalTcpPort": 1112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "Y.Y.Y.Y",

  "externalTcpPort": 1116,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "10.0.0.4",

  "internalHttpPort": 2112,

  "externalHttpIp": "Y.Y.Y.Y",

  "externalHttpPort": 2116,

  "lastCommitPosition": 3090672137,

  "writerCheckpoint": 3090672296,

  "chaserCheckpoint": 3090672296,

  "epochPosition": 3090533749,

  "epochNumber": 100,

  "epochId": "3f793fc4-2935-406a-b5fe-04fde7d15cd7",

  "nodePriority": 0

},

{

  "instanceId": "901da740-a6f2-4bbb-849d-5187dcb8815a",

  "timeStamp": "2016-09-13T13:34:56.3736Z",

  "state": "PreReplica",

  "isAlive": true,

  "internalTcpIp": "10.0.0.3",

  "internalTcpPort": 1112,

  "internalSecureTcpPort": 0,

  "externalTcpIp": "X.X.X.X",

  "externalTcpPort": 1116,

  "externalSecureTcpPort": 0,

  "internalHttpIp": "10.0.0.3",

  "internalHttpPort": 2112,

  "externalHttpIp": "X.X.X.X",

  "externalHttpPort": 2116,

  "lastCommitPosition": 3089881682,

  "writerCheckpoint": 3089900467,

  "chaserCheckpoint": 3089900467,

  "epochPosition": 3088994776,

  "epochNumber": 21,

  "epochId": "b6a0d549-1386-4f9a-a2c7-8e52057f7d22",

  "nodePriority": 0

}

],

“serverIp”: “10.0.0.3”,

“serverPort”: 2112

}

``

Apologies for not getting back to you yesterday. I don’t believe there is a work around that you can apply from an Event Store point of view. There should be a patch to ensure that the advertise as that is set is not being overridden internally when the ip addresses are set to 0.0.0.0 and add interface prefixes is true. Are you able to build Event Store locally? If so, I can try and get a branch with the patch to you?

Thanks for your reply Pieter.
I might try to build from source for testing purpose. Will the fix make it in the next 3.9.x release?

The patch will definitely make it into the 4.0.0 (current release branch), but it might be a couple of weeks before a 3.9.2 is considered.

Would you mind trying out the following branch? https://github.com/EventStore/EventStore/commits/advertise_as_patch

Just wanted to check with you whether you’ve had a chance to look at that branch yet?

Actually, I don’t have a Ubuntu 14.04 box (we run on docker image). So I suppose I can’t build from source.
We’ll wait for the fix to be released.

That’s a shame, I do however think I can help you out in a way. I will see if I can get you a binary that you can try out. Are you using the official Event Store docker image or a custom one?

Thanks. We use a custom one. But I could try with some other image