Hi there,
I have just got a 3 node 20.6.1 cluster working, the certificates do not have the IP address SANs added but instead I have configured the “Advertise” to the DNS A records (cluster0,1,2 [group record ‘clusterdns’] and clustergossip0,1,2 - we have a 2 interface setup with internal gossip going via a different interface). I want to document a migration path for our developers (typically all running with 5.x client libraries) but I am struggling to find a way to do this. Ideally this migration is just a connection string change as updating libraries, releasing etc as well as updated connection strings makes things much harder for all involved.
- Is it true that without IP SANs means that DNS gossip discovery is effectively broken in 20.x (e.g. “ConnectTo:discover://clusterdns:httpport”) ? Our current thinking is that certificates with IP SANs are generally discouraged so we have avoided that here. Obviously the IPs it gets back from the DNS discovery A records are not valid with the certificates therefore clients cannot read the gossip. (This isn’t just a client issue, I also had to disable DNS Discovery in my cluster node config).
- Without working DNS gossip discovery, we can fall back to having a load balancer (nginx in our case) over our clusters, and use that as a fixed gossip seed (e.g. “GossipSeeds=https://cluster-loadbalancer:443;”). Ideally we would avoid adding an extra component to the critical path for ES connections/reconnections but this does a similar job to DNS Discovery and is “ok”.
- Whilst I can get #2 working with the latest 20.6.1 .NET client, it doesn’t work with 5.0.9. Not sure why? Perhaps the gossip JSON is different? So is the only way to connect a 5.0.9 client with a 20.6.1 server to use a direct tcp:// address (e.g. “ConnectTo:tcp://clusterdns:exttcpport”)? In which case it will pick a random node regardless of leader or follower status which isn’t workable.
- If 3, does that mean we need to update all our clients to the 20.6.1 client package, roll out and then update clusters to 20.6.1. Does the 20.6.1 .NET client work fine with older v5 clusters? (About to test this myself)
My gossip JSON looks something like this:
{
"members": [
{
"instanceId": "f1281999-fc07-43ea-839a-d380bf80cc24",
"timeStamp": "2020-10-05T21:02:11.3277271Z",
"state": "Leader",
"isAlive": true,
"internalTcpIp": "test-uat-eventstoregossip2",
"internalTcpPort": 0,
"internalSecureTcpPort": 11112,
"externalTcpIp": "test-uat-eventstore2",
"externalTcpPort": 0,
"externalSecureTcpPort": 11113,
"httpEndPointIp": "test-uat-eventstore2",
"httpEndPointPort": 12113,
"lastCommitPosition": 12833,
"writerCheckpoint": 12985,
"chaserCheckpoint": 12985,
"epochPosition": 12546,
"epochNumber": 4,
"epochId": "c35c4784-672b-4f75-a956-b40b706bae60",
"nodePriority": 1,
"isReadOnlyReplica": false
},
{
"instanceId": "b3095846-6f21-49ed-9974-a0fe08cb7100",
"timeStamp": "2020-10-05T21:02:11.2709331Z",
"state": "Follower",
"isAlive": true,
"internalTcpIp": "test-uat-eventstoregossip1",
"internalTcpPort": 0,
"internalSecureTcpPort": 11112,
"externalTcpIp": "test-uat-eventstore1",
"externalTcpPort": 0,
"externalSecureTcpPort": 11113,
"httpEndPointIp": "test-uat-eventstore1",
"httpEndPointPort": 12113,
"lastCommitPosition": 12833,
"writerCheckpoint": 12985,
"chaserCheckpoint": 12985,
"epochPosition": 12546,
"epochNumber": 4,
"epochId": "c35c4784-672b-4f75-a956-b40b706bae60",
"nodePriority": 1,
"isReadOnlyReplica": false
},
{
"instanceId": "d7b5ae1f-a59b-4b5f-8af4-472939bbf8b3",
"timeStamp": "2020-10-05T21:02:11.332983Z",
"state": "Follower",
"isAlive": true,
"internalTcpIp": "test-uat-eventstoregossip0",
"internalTcpPort": 0,
"internalSecureTcpPort": 11112,
"externalTcpIp": "test-uat-eventstore0",
"externalTcpPort": 0,
"externalSecureTcpPort": 11113,
"httpEndPointIp": "test-uat-eventstore0",
"httpEndPointPort": 12113,
"lastCommitPosition": 12833,
"writerCheckpoint": 12985,
"chaserCheckpoint": 12985,
"epochPosition": 12546,
"epochNumber": 4,
"epochId": "c35c4784-672b-4f75-a956-b40b706bae60",
"nodePriority": 0,
"isReadOnlyReplica": false
}
],
"serverIp": "test-uat-eventstore0",
"serverPort": 12113
}