Cluster is not able to decide is it alive or not.

Hello.

We run ES for some time. Currently ES DB is 16 Gb.
Today I’ve changed configuration and ES is not operating as it was before.
BEFORE
IntTcpPort: 1111
IntHttpPort: 2113
IntHttpPrefixes: “http://:2113/"
ExtTcpPort: 1112
ExtHttpPort: 2114
ExtHttpPrefixes: "http://
:2114/”
ClusterSize: 3
ClusterDns: “eventstore.xxx.net
ClusterGossipPort: 2113
GossipOnExt: True

``

AFTER
IntTcpPort: 1111
IntHttpPort: 2114
IntHttpPrefixes: “http://:2114/"
ExtTcpPort: 1112
ExtHttpPort: 2113
ExtHttpPrefixes: "http://
:2113/”
ClusterSize: 3
ClusterDns: “eventstore.xxx.net
ClusterGossipPort: 2114
GossipOnExt: False

``

ES knows somehow that port has changed instead just using current settings.

CLUSTER HAS CHANGED (gossip send failed to [xxx.xxx.xx.xxx:2113])

In the same time it receives gossip on new port

CLUSTER HAS CHANGED (gossip received from [xxx.xxx.xx.xx:2114])

I can not use previous configuration.

Another problem that GossipOnExt: False I do not see cluster statistics on web page …/web/index.html#/clusterstatus

When GossipOnExt was True I could see that nodes were flapping all the time (Dead/Alive).

Why ES has got crazy? Any idea How to fix this? As I said before I can not revert configuration

Thank you.

Hi,

Did you deploy these configurations as a rolling upgrade - that is upgrading each node one at a time without bringing down the whole cluster?
Since the ports you have updated the cluster to use were in use by the cluster previously, you may need to take down the whole cluster in order to clear the previous gossip info.

As for not seeing the cluster status, that is expected. GossipOnExt prevents Event Store from sending gossip information over the external port. This is the port that the UI uses to get the cluster status information.

This is also why you are seeing gossips failing on the external port.

Hi,

Did you deploy these configurations as a rolling upgrade - that is upgrading each node one at a time without bringing down the whole cluster?

I stopped entire cluster(3 nodes) after changing.

Upgrade to latest version solved the problem.