Very cool! I just went through the same exercise myself (except on Azure AKS vs EKS). Was neat to be able to double check what I had come up with against what you had done (though I suppose it would have been better if I had seen yours first!). We independently ended up with almost identical configurations, so I suppose that’s good. FWIW, if anyone would find a helm chart that deploys these services helpful, let me know and I’ll publish it somewhere.
One thing I am curious about is liveness and readiness checks. I wonder whether anyone may have any good ideas about whether to set either kind of check for the ES database nodes and what to set for them. For liveness, is there a particular executable or http endpoint that could be hit that would act as an appropriate canary in the mine indicating a node may actually be dead enough that it needs to be restarted? What about readiness? I’m less certain readiness has the same implication in ES given that it may handle this appropriately at the application layer.
Any suggestions are welcome.
Thank you!