UPDATE: This post is resolved with a new DNS entry that contains all cluster IP addresses, please see reply to this post to see solution and the new issue.
I’m aware I lack a lot of networking knowledge and, despite Greg’s clarification on ways to discover nodes in a cluster https://github.com/EventStore/EventStore/issues/1878, here it goes my question.
I am unable to run an ESDB cluster as EC2 instances in AWS.
The nodes run but show the exception
{"@t":"2021-04-29T11:42:18.1281960Z","@mt":"Error while retrieving cluster members through DNS.","@l":"Error","@x":"System.Net.Internals.SocketExceptionFactory+ExtendedSocketException (00000005, 0xFFFDFFFF): Name or service not known\n at System.Net.Dns.GetHostEntryOrAddressesCore(String hostName, Boolean justAddresses)\n at System.Net.Dns.<>c.<GetHostEntryOrAddressesCoreAsync>b__27_2(Object s)\n at System.Threading.Tasks.Task`1.InnerInvoke()\n at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)\n--- End of stack trace from previous location ---\n at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\n--- End of stack trace from previous location ---\n at System.Threading.Tasks.TaskToApm.End[TResult](IAsyncResult asyncResult)\n at EventStore.Core.Services.Gossip.DnsGossipSeedSource.EndGetHostEndpoints(IAsyncResult asyncResult) in /home/runner/work/TrainStation/TrainStation/build/oss-eventstore/src/EventStore.Core/Services/Gossip/DnsGossipSeedSource.cs:line 20\n at EventStore.Core.Services.Gossip.GossipServiceBase.OnGotGossipSeedSources(IAsyncResult ar) in /home/runner/work/TrainStation/TrainStation/build/oss-eventstore/src/EventStore.Core/Services/Gossip/GossipServiceBase.cs:line 109","SourceContext":"EventStore.Core.Services.Gossip.GossipServiceBase","ProcessId":4260,"ThreadId":4}
and it’s probably due to my DNS server.
Node A: 10.0.10.188
Node B: 10.0.20.25
Node C: 10.0.30.95
I’ve created a DNS private hosting zone called saswesdb.io
in AWS Route53, associated to my VPC, with entries:
esdb-a.saswesdb.io A Simple -
10.0.10.188
esdb-b.saswesdb.io A Simple -
10.0.20.25
esdb-c.saswesdb.io A Simple -
10.0.30.95
saswesdb.io NS Simple -
ns-1536.awsdns-00.co.uk.
ns-0.awsdns-00.com.
ns-1024.awsdns-00.org.
ns-512.awsdns-00.net.
saswesdb.io SOA Simple -
ns-1536.awsdns-00.co.uk. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
My configuration is
# Cluster
ClusterSize: 3
ClusterDns: saswesdb.io
DiscoverViaDns: true
# Paths
Db: "/home/ubuntu/my-data"
Log: "/home/ubuntu/my-logs"
Index: "/home/ubuntu/my-index"
# Security
Insecure: true
# Network
IntIp: 10.0.10.188 # this is for the first node, the others have its own ip
ExtIp: 10.0.10.188 # this is for the first node, the others have its own ip
HttpPort: 2113
IntTcpPort: 1113
ExtTcpPort: 1113
EnableExternalTcp: false
EnableAtomPubOverHttp: true
# Projections
RunProjections: None
From each EC2 instance I can ping each other. Also I can ping them using the DNS name
ping esdb-b.saswesdb.io
PING esdb-b.saswesdb.io (10.0.20.25) 56(84) bytes of data.
64 bytes from ip-10-0-20-25.eu-west-1.compute.internal (10.0.20.25): icmp_seq=1 ttl=64 time=0.628 ms
64 bytes from ip-10-0-20-25.eu-west-1.compute.internal (10.0.20.25): icmp_seq=2 ttl=64 time=0.636 ms
64 bytes from ip-10-0-20-25.eu-west-1.compute.internal (10.0.20.25): icmp_seq=3 ttl=64 time=0.654 ms
64 bytes from ip-10-0-20-25.eu-west-1.compute.internal (10.0.20.25): icmp_seq=4 ttl=64 time=0.662 ms
I cannot ping the saswesdb.io
though, not sure if I should be able to.
Am I missing anything related to the DNS?
I must admit I don’t know how to troubleshoot this.
PS: EC2 instances have NACL to allow all traffic and a security group that allows all tcp traffic from anywhere (e.g: 0.0.0.0/0)