Hello,
I have migrated a test cluster from v5 to v20.06 and while everything was smooth I am now not able to restart my cluster.
My config:
"ES VERSION:" "20.6.0" ("master"/"7d8855961882b044f20c031773e729aa73734d67", "Tue, 9 Jun 2020 08:43:44 +0200")
"OS:" Linux ("Unix 4.9.0.13")
"RUNTIME:" ".NET 3.1.4" (64-bit)
"GC:" "3 GENERATIONS"
"LOGS:" "/eventstore/logs"
MODIFIED OPTIONS:
RUN PROJECTIONS: All (Config File)
CLUSTER SIZE: 2 (Config File)
DB: /eventstore/db (Config File)
DISCOVER VIA DNS: false (Config File)
GOSSIP SEED: 10.0.0.2:2113 (Config File)
LOG: /eventstore/logs (Config File)
INDEX: /eventstore/indexes (Config File)
INT IP: 10.0.0.1 (Config File)
EXT IP: 10.0.0.1 (Config File)
ENABLE ATOM PUB OVER HTTP: true (Config File)
DISABLE INTERNAL TCP TLS: true (Config File)
TRUSTED ROOT CERTIFICATES PATH: /etc/eventstore/tls/ca/ (Config File)
CERTIFICATE FILE: /etc/eventstore/tls/node/thegood.pem (Config File)
CERTIFICATE PRIVATE KEY FILE: /etc/eventstore/tls/node/thegood.key (Config File)
DEFAULT OPTIONS:
CONFIG: /etc/eventstore/eventstore.conf (<DEFAULT>)
HELP: False (<DEFAULT>)
VERSION: False (<DEFAULT>)
WHAT IF: False (<DEFAULT>)
START STANDARD PROJECTIONS: False (<DEFAULT>)
DISABLE HTTP CACHING: False (<DEFAULT>)
HTTP PORT: 2113 (<DEFAULT>)
ENABLE EXTERNAL TCP: False (<DEFAULT>)
INT TCP PORT: 1112 (<DEFAULT>)
EXT TCP PORT: 1113 (<DEFAULT>)
EXT HOST ADVERTISE AS: <empty> (<DEFAULT>)
EXT TCP PORT ADVERTISE AS: 0 (<DEFAULT>)
HTTP PORT ADVERTISE AS: 0 (<DEFAULT>)
INT HOST ADVERTISE AS: <empty> (<DEFAULT>)
INT TCP PORT ADVERTISE AS: 0 (<DEFAULT>)
INT TCP HEARTBEAT TIMEOUT: 700 (<DEFAULT>)
EXT TCP HEARTBEAT TIMEOUT: 1000 (<DEFAULT>)
INT TCP HEARTBEAT INTERVAL: 700 (<DEFAULT>)
EXT TCP HEARTBEAT INTERVAL: 2000 (<DEFAULT>)
GOSSIP ON SINGLE NODE: False (<DEFAULT>)
CONNECTION PENDING SEND BYTES THRESHOLD: 10485760 (<DEFAULT>)
CONNECTION QUEUE SIZE THRESHOLD: 50000 (<DEFAULT>)
NODE PRIORITY: 0 (<DEFAULT>)
MIN FLUSH DELAY MS: 2 (<DEFAULT>)
COMMIT COUNT: -1 (<DEFAULT>)
PREPARE COUNT: -1 (<DEFAULT>)
DISABLE ADMIN UI: False (<DEFAULT>)
DISABLE STATS ON HTTP: False (<DEFAULT>)
DISABLE GOSSIP ON HTTP: False (<DEFAULT>)
DISABLE SCAVENGE MERGING: False (<DEFAULT>)
SCAVENGE HISTORY MAX AGE: 30 (<DEFAULT>)
CLUSTER DNS: fake.dns (<DEFAULT>)
CLUSTER GOSSIP PORT: 30777 (<DEFAULT>)
STATS PERIOD SEC: 30 (<DEFAULT>)
CACHED CHUNKS: -1 (<DEFAULT>)
READER THREADS COUNT: 4 (<DEFAULT>)
CHUNKS CACHE SIZE: 536871424 (<DEFAULT>)
MAX MEM TABLE SIZE: 1000000 (<DEFAULT>)
HASH COLLISION READ LIMIT: 100 (<DEFAULT>)
MEM DB: False (<DEFAULT>)
SKIP DB VERIFY: False (<DEFAULT>)
WRITE THROUGH: False (<DEFAULT>)
UNBUFFERED: False (<DEFAULT>)
CHUNK INITIAL READER COUNT: 5 (<DEFAULT>)
PROJECTION THREADS: 3 (<DEFAULT>)
WORKER THREADS: 5 (<DEFAULT>)
PROJECTIONS QUERY EXPIRY: 5 (<DEFAULT>)
FAULT OUT OF ORDER PROJECTIONS: False (<DEFAULT>)
ENABLE TRUSTED AUTH: False (<DEFAULT>)
CERTIFICATE PASSWORD: <empty> (<DEFAULT>)
CERTIFICATE STORE LOCATION: <empty> (<DEFAULT>)
CERTIFICATE STORE NAME: <empty> (<DEFAULT>)
CERTIFICATE SUBJECT NAME: <empty> (<DEFAULT>)
CERTIFICATE THUMBPRINT: <empty> (<DEFAULT>)
DISABLE EXTERNAL TCP TLS: False (<DEFAULT>)
AUTHORIZATION TYPE: internal (<DEFAULT>)
AUTHENTICATION TYPE: internal (<DEFAULT>)
AUTHORIZATION CONFIG: <empty> (<DEFAULT>)
AUTHENTICATION CONFIG: <empty> (<DEFAULT>)
DISABLE FIRST LEVEL HTTP AUTHORIZATION: False (<DEFAULT>)
PREPARE TIMEOUT MS: 2000 (<DEFAULT>)
COMMIT TIMEOUT MS: 2000 (<DEFAULT>)
WRITE TIMEOUT MS: 2000 (<DEFAULT>)
UNSAFE DISABLE FLUSH TO DISK: False (<DEFAULT>)
UNSAFE IGNORE HARD DELETE: False (<DEFAULT>)
SKIP INDEX VERIFY: False (<DEFAULT>)
INDEX CACHE DEPTH: 16 (<DEFAULT>)
OPTIMIZE INDEX MERGE: False (<DEFAULT>)
GOSSIP INTERVAL MS: 2000 (<DEFAULT>)
GOSSIP ALLOWED DIFFERENCE MS: 60000 (<DEFAULT>)
GOSSIP TIMEOUT MS: 2500 (<DEFAULT>)
READ ONLY REPLICA: False (<DEFAULT>)
UNSAFE ALLOW SURPLUS NODES: False (<DEFAULT>)
ENABLE HISTOGRAMS: False (<DEFAULT>)
LOG HTTP REQUESTS: False (<DEFAULT>)
LOG FAILED AUTHENTICATION ATTEMPTS: False (<DEFAULT>)
ALWAYS KEEP SCAVENGED: False (<DEFAULT>)
SKIP INDEX SCAN ON READS: False (<DEFAULT>)
REDUCE FILE CACHE PRESSURE: False (<DEFAULT>)
INITIALIZATION THREADS: 1 (<DEFAULT>)
MAX AUTO MERGE INDEX LEVEL: 2147483647 (<DEFAULT>)
WRITE STATS TO DB: False (<DEFAULT>)
MAX TRUNCATION: 268435456 (<DEFAULT>)
MAX APPEND SIZE: 1048576 (<DEFAULT>)
DEV: False (<DEFAULT>)
DEAD MEMBER REMOVAL PERIOD SEC: 1800 (<DEFAULT>)
Startup log:
[31663, 1,17:17:43.011,INF]
INTERFACES
External TCP (Protobuf)
Enabled : False
Port : 1113
HTTP (AtomPub)
Enabled : True
Port : 2113
[31663, 1,17:17:43.011,WRN]
DEPRECATION WARNING: AtomPub over HTTP Interface has been deprecated as of version 20.02. It is recommended to use gRPC instead.
[31663, 1,17:17:43.031,INF] Quorum size set to 2
[31663, 1,17:17:43.119,INF] Trusted root certificate file loaded: "ca.pem"
[31663, 1,17:17:43.132,INF] Cannot find plugins path: "/usr/share/eventstore/plugins"
[31663, 1,17:17:43.329,DBG] MessageHierarchy initialization took 00:00:00.1505396.
[31663, 1,17:17:43.336,INF] "INSTANCE ID:" 2b22cac1-57ac-4c16-9eee-8e217887b8dd
[31663, 1,17:17:43.336,INF] "DATABASE:" "/eventstore/db"
[31663, 1,17:17:43.337,INF] "WRITER CHECKPOINT:" 0 (0x0)
[31663, 1,17:17:43.337,INF] "CHASER CHECKPOINT:" 0 (0x0)
[31663, 1,17:17:43.337,INF] "EPOCH CHECKPOINT:" -1 (0xFFFFFFFFFFFFFFFF)
[31663, 1,17:17:43.337,INF] "TRUNCATE CHECKPOINT:" -1 (0xFFFFFFFFFFFFFFFF)
[31663, 1,17:17:43.400,DBG] Could not create performance counter: category='"Processor"', counter='"% Processor Time"', instance='"_Total"'. Error: "Performance Counters are not supported on this platform."
[31663, 1,17:17:43.400,DBG] Could not create performance counter: category='"Memory"', counter='"Available Bytes"', instance='""'. Error: "Performance Counters are not supported on this platform."
[31663, 1,17:17:43.421,DBG] Opened ongoing "/eventstore/db/chunk-000000.000000" as version 3
[31663, 1,17:17:43.431,DBG] CACHED TFChunk "#0-0 (chunk-000000.000000)" in 00:00:00.0015253.
[31663, 1,17:17:43.579,INF] Starting MiniWeb for "/web/es/js/projections" ==> "/usr/share/eventstore/projections"
[31663, 1,17:17:43.579,INF] Starting MiniWeb for "/web/es/js/projections/v8/Prelude" ==> "/usr/share/eventstore/Prelude"
[31663, 1,17:17:43.593,INF] Starting MiniWeb for "/web" ==> "/usr/share/eventstore/clusternode-web"
[31663,13,17:17:43.667,INF] ========== ["10.0.0.1:2113"] SYSTEM INIT...
[31663,13,17:17:43.672,INF] Starting "Normal" TCP listening on TCP endpoint: "10.0.0.1:1112".
[31663,15,17:17:43.675,INF] TableIndex initialization...
[31663,15,17:17:43.681,INF] ReadIndex building...
[31663,15,17:17:43.682,DBG] ReadIndex rebuilding done: total processed 0 records, time elapsed: 00:00:00.0009569.
[31663,13,17:17:43.683,INF] ========== ["10.0.0.1:2113"] Service '"StorageReader"' initialized.
[31663,13,17:17:43.683,INF] ========== ["10.0.0.1:2113"] Service '"StorageWriter"' initialized.
[31663,13,17:17:43.692,INF] CLUSTER HAS CHANGED ""
Old:
["Priority: 0 VND {2b22cac1-57ac-4c16-9eee-8e217887b8dd} <LIVE> [Unknown, Unspecified/10.0.0.1:1112, n/a, n/a, n/a, Unspecified/10.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2020-09-22 15:17:43.681"]
New:
["MAN {00000000-0000-0000-0000-000000000000} <LIVE> [Manager, 10.0.0.2:2113] | 2020-09-22 15:17:43.685", "Priority: 0 VND {2b22cac1-57ac-4c16-9eee-8e217887b8dd} <LIVE> [Unknown, Unspecified/10.0.0.1:1112, n/a, n/a, n/a, Unspecified/10.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2020-09-22 15:17:43.681"]
[31663,13,17:17:43.697,INF] ========== ["10.0.0.1:2113"] Service '"StorageChaser"' initialized.
[31663,13,17:17:43.698,INF] ========== ["10.0.0.1:2113"] SYSTEM START...
[31663,13,17:17:43.702,INF] ========== ["10.0.0.1:2113"] IS ATTEMPTING TO DISCOVER EXISTING LEADER...
[31663, 4,17:17:43.704,DBG] Persistent subscriptions received state change to DiscoverLeader. Stopping listening
[31663,13,17:17:43.715,DBG] "NO LEADER" found during LEADER DISCOVERY stage, making further attempts.
**[31663,13,17:17:43.981,INF] Looks like node ["10.0.0.2:2113"] is DEAD (Gossip send failed).
[31663,13,17:17:43.982,INF] CLUSTER HAS CHANGED "gossip send failed to [10.0.0.2:2113]"**
Old:
["MAN {00000000-0000-0000-0000-000000000000} <LIVE> [Manager, 10.0.0.2:2113] | 2020-09-22 15:17:43.685", "Priority: 0 VND {2b22cac1-57ac-4c16-9eee-8e217887b8dd} <LIVE> [DiscoverLeader, Unspecified/10.0.0.1:1112, n/a, n/a, n/a, Unspecified/10.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2020-09-22 15:17:43.919"]
New:
["MAN {00000000-0000-0000-0000-000000000000} <DEAD> [Manager, 10.0.0.2:2113] | 2020-09-22 15:17:43.981", "Priority: 0 VND {2b22cac1-57ac-4c16-9eee-8e217887b8dd} <LIVE> [DiscoverLeader, Unspecified/10.0.0.1:1112, n/a, n/a, n/a, Unspecified/10.0.0.1:2113] -1/0/0/E-1@-1:{00000000-0000-0000-0000-000000000000} | 2020-09-22 15:17:43.919"]
[31663,13,17:17:43.982,DBG] "NO LEADER" found during LEADER DISCOVERY stage, making further attempts.
[31663,13,17:17:46.724,INF] LEADER DISCOVERY timed out. Proceeding to UNKNOWN state.
[31663,13,17:17:46.725,INF] ========== ["10.0.0.1:2113"] IS UNKNOWN...
[31663, 6,17:17:46.725,DBG] Persistent subscriptions received state change to Unknown. Stopping listening
[31663,13,17:17:46.727,INF] ELECTIONS: STARTING ELECTIONS.
[31663,13,17:17:46.727,INF] ELECTIONS: (V=0) SHIFT TO LEADER ELECTION.
[31663,13,17:17:46.728,INF] ELECTIONS: (V=0) VIEWCHANGE FROM ["Unspecified/10.0.0.1:2113", {2b22cac1-57ac-4c16-9eee-8e217887b8dd}].
[31663,13,17:17:47.733,INF] ELECTIONS: (V=0) TIMED OUT! (S=ElectingLeader, M=null).
[31663,13,17:17:47.733,INF] ELECTIONS: (V=1) SHIFT TO LEADER ELECTION.
[31663,13,17:17:47.733,INF] ELECTIONS: (V=1) VIEWCHANGE FROM ["Unspecified/10.0.0.1:2113", {2b22cac1-57ac-4c16-9eee-8e217887b8dd}].
[31663,13,17:17:48.741,INF] ELECTIONS: (V=1) TIMED OUT! (S=ElectingLeader, M=null).
[31663,13,17:17:48.741,INF] ELECTIONS: (V=2) SHIFT TO LEADER ELECTION.
[31663,13,17:17:48.741,INF] ELECTIONS: (V=2) VIEWCHANGE FROM ["Unspecified/10.0.0.1:2113", {2b22cac1-57ac-4c16-9eee-8e217887b8dd}].
If I querythe node 1 (10.0.0.1) endpoint it specifies that node 2 (10.0.0.2) is manager (but not alive) and sees itself as unknow (but alive). The same is true If I query node 2 (node 1 is manager and node 2 is unknown)
curl -k https://10.0.0.1:2113/gossip?format=json
{
"members": [
{
"instanceId": "00000000-0000-0000-0000-000000000000",
"timeStamp": "2020-09-23T08:46:58.7751141Z",
"state": "**Manager**",
"isAlive": false,
"internalTcpIp": "**10.0.0.2**",
"internalTcpPort": 2113,
"internalSecureTcpPort": 0,
"externalTcpIp": "10.0.0.2",
"externalTcpPort": 2113,
"externalSecureTcpPort": 0,
"httpEndPointIp": "10.0.0.2",
"httpEndPointPort": 2113,
"lastCommitPosition": -1,
"writerCheckpoint": -1,
"chaserCheckpoint": -1,
"epochPosition": -1,
"epochNumber": -1,
"epochId": "00000000-0000-0000-0000-000000000000",
"nodePriority": 0,
"isReadOnlyReplica": false
},
{
"instanceId": "9fe142e2-e2d3-4648-a119-135a736ea61f",
"timeStamp": "2020-09-23T08:47:52.7326297Z",
"state": "**Unknown**",
"isAlive": true,
"internalTcpIp": **"10.0.0.1**",
"internalTcpPort": 1112,
"internalSecureTcpPort": 0,
"externalTcpPort": 0,
"externalSecureTcpPort": 0,
"httpEndPointIp": "10.0.0.1",
"httpEndPointPort": 2113,
"lastCommitPosition": -1,
"writerCheckpoint": 0,
"chaserCheckpoint": 0,
"epochPosition": -1,
"epochNumber": -1,
"epochId": "00000000-0000-0000-0000-000000000000",
"nodePriority": 0,
"isReadOnlyReplica": false
}
],
"serverIp": "10.0.0.1",
"serverPort": 2113
I have checked my node certificates (which are autosigned with the CA in TRUSTED ROOT CERTIFICATES PATH) and the certification chain is ok.
openssl s_client -connect 10.0.0.2:2113 -CAfile /etc/eventstore/tls/ca/ca.pem
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 C = FR, ST = Some-State, O = mycompany, OU = D2, CN = Eventstore
verify return:1
depth=0 C = FR, ST = Some-State, O = mycompany, CN = eventstoredb-node
verify return:1
---
Certificate chain
0 s:C = FR, ST = Some-State, O = mycompany, CN = eventstoredb-node
i:C = FR, ST = Some-State, O = mycompany, OU = D2, CN = Eventstore
Here are my certificate information:
##### Common Name: eventstoredb-node
##### **Subject Alternative Names:** IP Address:10.0.0.2
##### **Organization:** mycompany
##### **State:** Some-State
##### **Country:** FR
##### **Valid From:** September 22, 2020
##### **Valid To:** September 20, 2030
##### **Issuer:** Eventstore
##### **Serial Number:** 364582e0603c538f168fab301c4a58a75a36d573
I also tried to use the dev option but it seems to have no effects and the TLS is still required.
Thank you.