I keep hearing how bad Azure is for clustering ES. If the instances where all on a Azure VNET and only API Endpoints which are other instance members of the VNET domain are exposed to the outside world then that should work or am I missing something?
Before I go setting up a test cluster for development purposes I was hoping to save some headaches by asking the group.
It’s not bad for clustering ES specifically, it’s bad for clustering any quorum system, including eg ZooKeeper, Consul etc. The reason for this is it only has two fault domains (availability zones in Amazon-speak). Consequently if the wrong side goes down
you will find yourself with only a minority of nodes available.
Within a virtual network things are just IP Addressable so there are no issues beyond this. External access is a mess because of the DNS and HTTP prefix requirements.
I have a stream of events coming in from a game so where in the range of one per second per user in 20 to 30 minute intervals per game session. Right now those events are going into MongoDB though a web service then they are read off asynchronously and the results are ultimately stored in SQL. What I am looking to demonstrate is that I can replace everything related to the game events after the web service with ES. I want to prove the es projections can handle the load and offer a superior scaling solution to the current method.
I believe that with es it will also be easier to switch out other pieces of our stack liked SQL server if we properly apply projections for handling the stats generated from the events.
Hey, I was researching a different issue and just happened upon these links that I thought might help you out. It’s all a bit over my head (being new to azure) but it appears like there might be some headway made in forcing a 3 FD cluster. In this link a guy (?from microsoft?) mentions that they just released this feature within the last few weeks:
However, the second link in his comment that is supposed to link to an example of a 3FD setup doesn’t work anymore. However, I was able to find the document using Google’s cached page:
I’ve never used ARM or it’s templates. And a commenter on the second page above was having issues getting it working. But I thought I would send it your way in case you could make heads or tails of the additional information.
Until disk latency is sorted out, any kind of solution that wouldn’t involve Azure I guess.
On what kind of setup do you have the most customers/best experiences?
It’s possible to get reasonable performance out of Azure, but the tail latencies always seem to be a mess. Since they’ve finally made three fault domains available you can try running on instance storage and things are mostly OK.
But as usual with Azure, the optimal route to success is to just use AWS.
I have been reading the free ebook Microsoft Azure Essentials - Fundamentals of Azure they provided a reference to a white paper which was originally written for SQL Server but the author of the EBook said that the disk configuration discussed within the article should be followed for any systems needing excellent performance. Is there any pitfalls following it’s advice?
Not sure on the specifics, apparently the best throughput can be obtained with DS-type instances and a ton of disks striped together. Personally I’d just use the instance store now that there are three availability zones (well, not really AZs, but fault domains - not quite as good but I guess we have to take what we can when it comes to Azure).