I have three VMs (Windows) each in a different region in Azure. (East US, West US, Central US)
They are running 3.0.1 of Event Store.
The regions are connected by a VNET to VNET VPN connections.
On the Central US node TestClient’s WRFL command has been running against the cluster in a loop every 60 second.
After having an issue with the connection between East US and West US (VPN gateway disconnected and wouldn’t reconnect), the cluster landed in this state:
According to East US (/gossip)
-
East US is Master
-
Central US is Slave
-
West US isn’t on the list
According to Central US (/gossip)
-
East US is Master
-
Central US is Slave
-
West US isn’t on the list
According to West US (/gossip)
-
West US is Slave
-
East US isn’t on the list
-
Central US isn’t on the list
West US’s logs show that it is in a loop trying to elect a leader…
I am able to verify that all servers can communicate (at least enough to get a /gossip.json response)
It looks like East US was restarted during the network segment (and while the cluster was split).
I have collected the log/config files as well. (3x ~15MB zip files)
I have collected the log file of the looped run of WRFL.
I have verified that the DNS entry shows all three nodes from all three of the boxes.
The cluster is still in this state. I can leave it as is for a while. It is a dev/test environment.
Any ideas on what is going on?
Thanks, Ryan