Can Not Get Cluster Up

Valentin_Kasas · February 26, 2015, 4:24pm

Le jeu. 26 févr. 2015 à 16:56, James Nugent [email protected] a écrit :

Valentin_Kasas · February 26, 2015, 5:17pm

As I said, this is quite dirty for the moment.

So far I use the following two recipes :

install : downloads and unzip eventstore binaries, creates folders for db and logs, and installs the daemonize package, nothing fancy here.

configure_cluster : that was the “tricky” part (but mainly due to my total lack of experience with chef, opsworks or even ruby). Since DNS discovery wasn’t an option, I had to properly build a gossip seed, which implies that each node must know the IPs of (some) other node in the cluster. To achieve that, I had to put this recipe in the “Configure” step of opsworks lifecycle (this step is executed each time a machine comes up or goes down in the entire stack). The recipe creates a simple script (from the startup_node.sh.erb template) that’s responsible for starting the node with all the required configuration options (but leaving the gossip seed as an environment variable). When there are enough ready machines in the layer the recipe is able to build the gossip seed using the IPs of the others machines and launch the startup script. Guards ensure that further “Configure” events don’t trigger the (useless) restart of a running node.

This is a rather early draft : there are still many hardcoded bits that need to be cleaned up, restart after failure are not handled yet, etc… I’ll publish a full cookbook repository when I have something more suitable for production use.

Cheers

Le jeu. 26 févr. 2015 à 17:24, Valentin Kasas [email protected] a écrit :

Phil_Bolduc · March 1, 2015, 8:45am

Do not forget about ensuring all of your resources (storage accounts / cloud services / VMs) are in the same affinity group. Using affinity groups will ensure your resources are provisioned close to each other in the data center.

You also want to ensure you are using Availability Sets so Azure does not put all of your VMs in the same rack/server with a single point of failure (top of rack router or single host).