Self Healing Cluster in AWS

Hi Community,

I’m trying to implement Self-healing cluster in AWS, so far I can get the cluster running by starting all the nodes with the following config -

./run-node.sh --db /opt/eventstore/ESData --log /tmp/cluster/log --cluster-size=3 --int-ip=0.0.0.0 --ext-ip=0.0.0.0 --int-tcp-port=1111 --ext-tcp-port=1112 --int-http-port=2113 --ext-http-port=2114 --cluster-dns=cluster.somedomain.com --cluster-gossip-port=2113

For this to work, I’ve created an A record for cluster.somedomain.com with the IPs of the nodes manually. I need some advise of how I’d be able to update the A record in Route53 when one instance terminates & another takes its place.

I’ve tried using --gossip-seed option with DNS name instead of IP, but it doesnt seem to like it as its expecting IP address.

Thanks,
Syed

cluster-dns and discover-via-dns are what you need, not gossip-seed.

–ab

Thanks Andy,

I am using cluster-dns option, the problem with that is I need to manually update the Route53 A record with IP of the nodes. I am looking for a self healing solution.

Is there an option in config to use the DNS name of the individual nodes instead of the IPs … example

--gossip-seed=node-a.something.com:2113,node-b.something:2113

you want to use the cluster-dns option and then have the node register
itself when it starts up. Note that these dns entries are only used
for iniitial discovery of the cluster, clients and the nodes
themselves use a gossip protocol to maintain actual cluster layout

Thanks Greg,

To give you a background of what I’ve achieved & what I’m trying -

I have 3 nodes running 3 AZs in AWS, each in their own ASG. This way when a node goes down, its replacment node in the same AZ comes up with the same attached EBS & and same hostname/DNS. This way each one of them always have the same state apart from the IP that changes when a node terminates & another replaces it.

So, when an instance goes down, how would I be able to add the new node(with new IP) back into the cluster automatically ?

Thanks
Syed

HI Greg,

Any ideas how I’d be able to use DNS for individual nodes instead of IPs for the gossip ?

Thanks,
Syed

DNS is meant to be used to contain *all* the nodes, this is how
discovery works. You can also use ips as gossip seeds. There is no
option to use dns-gossip-seeds nor can I understand why someone would
want to as this situation is already covered via dns based discovery.

ok alright thanks, is there a way to implement self-healing for my situation then?

yes you can associate more than one ip to a dns entry which is how dns
based gossip works.

I think I didnt make my question clear.

when an instance goes down, how would I be able to add the new node which replaces the terminated node (with its new IP) back into the cluster automatically

This would be part of your node configuration, you can add/remove ips
to a dns dynamically. You can also do such things as
https://aws.amazon.com/blogs/compute/building-a-dynamic-dns-for-route-53-using-cloudwatch-events-and-lambda/
googling for route53 for autoscaling groups will give you some
answers. For commercial customers one thing we provide is terraform
scripts which set this up.

This is not a good idea. You want three instances in one autoscaling group and to deal with replacement via introspection of the ASG API in a wrapping startup script vs using DNS.

Thanks Greg, I was going down the route of Lambda to update the A record in Route53 until you mentioned -
'Note that these dns entries are only used
for iniitial discovery of the cluster, clients and the nodes
themselves use a gossip protocol to maintain actual cluster layout

earlier.

pardon me for my misunderstanding if there was, but i understood that the DNS is used to get the IPs of the nodes initially and then afterwards all gossip happens using the IP addresses, so if thats true, even if the new node comes up & a lambda updates the Route53 entry, the other 2 nodes would still try to reach the terminated node using the old (terminated instance’s) IP, unless the service is restarted so that all nodes pick up the new IPs from Route53.

Am I correct in this ?

@James, I’m extremely sorry I didn’t get your point.

S

Nodes expire from gossip as well.