Is there a possibility of running GES on shared storage (standard Windows Fail-over cluster) instead of the paxos cluster? Maybe by running it embedded into a Windows service? (Custom services can be configured to run as highly available in Windows clustering – I’d just be concerned about how to transition GES over in case of failure).
We have pre-existing shared storage investment and Windows clustering knowledge, a small deployment (current legacy app is just a 2-node app cluster), and a small IT department. We’d rather not add another dimension of cluster admin, but retain our availability.
You won’t retain availability anyway since Windows failover clustering takes millennia to actually work - you would have significant downtime during a failover while the new active node verifies the database checksums and checkpoints, rebuilds the mem-table
index etc.
There’s no support for this in Event Store right now and we never anticipate adding it unless a commercial customer wants to pay for the necessary work - we don’t especially like adding Windows specific stuff as none of us use Windows for the most part.
Ages as in what? seconds? minutes? The current cluster takes something like 30 seconds to fail-over (including SQL server). The way I look at Windows clustering is that it’s basically a manual-failover that’s reasonably automated for me. It’s certainly not perfect, but it does the fail-over for me and it’s much better instrumented than something custom using scripts.
Your latter sentence kindof informs the experience I’ve had so far with trying to get EventStore setup in Windows for development purposes. It’s not as though I can just wrap it in a run script like in linux. Kindof ironic that you don’t like adding Windows-specific stuff since it’s written in .NET. Seems a bit like a man without a country, considering that Mono runs a subset of .NET. (Maybe that will change with Rosalyn.) Anyway, something for me to think about.
I believe James was alluding to the manual part and how long the manual part goes until it fails over…
“It’s not as though I can just wrap it in a run script like in linux.”
Overall I’m really confused since wrapping Es in a service to do this is totally trivial? Aside from understanding all their Apis it should be relatively simple to support something like this.
We aren’t adding this particular windows specific stuff because:
no one has ever asked for it aside from you
it’s worse than what we already support
it exposes us supportwise to a huge API from ms
no one is paying for it and we have paid work to do
I agree, it’s feature-wise worse than the 3-node paxos, but I asked because it’s where our investment and knowledge already sits. No is fine and understandable. My time is bounded just like yours, and I don’t think writing this code (and the research required) fits into our budget either. I was picking on James a little with the “we don’t use Windows” comment.
Despite being written in .NET, Event Store (along with anything else primarily doing non-direct file IO) runs substantially better under Mono in Linux than .NET in Windows on equivalent hardware, mostly due to the abysmal file cache handling in Windows. If
you aren’t using HFS on MacOS the same applies there too. There’s a reason SQL Server implements their own filesystem etc, and I strongly doubt it’s anything do with performance… I’d expect to see even further improvements when the Microsoft GC (which is
admittedly better than the Mono one) is integrated following the OSS .NET announcements.
Getting it running in Windows is straightforward though, especially for development - download the zip file, unzip and run EventStore.ClusterNode.exe --mem-db - or am I missing something different you’re trying to do here (perhaps building from source, which
again can be difficult on Windows because of the general derp surrounding Visual C++)?
If you want it to run as a Windows service you can either wrap it using something like nssm, svcany or the like (I believe someone even wrote a wrapper using TopShelf and open sourced it), or buy the commercial tooling which does that natively and is supported.
I finally got around to installing GES as a Windows service. I had previously tried it while wrongfully assuming that it had service hooks and trying to install with installutil (first google results). Anyway, it was easy with NSSM. I made a post about it in case I forget.
We have our existing shared storage that will still be used for other purposes, but after doing some reconfiguration, we will be able to eliminate it from the application servers. We’re going to give Windows NLB with priority a shot for failover amongst the command services on the nodes. At least until we can figure out how to hook into event store’s current elected node or request forwarding or whatever. For now we just want to make sure only 1 service receives all commands (until/if we need to partition) and fails over.
The shared storage case really wants some consideration - on Tuesday when Azure storage just disappeared many people ended up with clusters of identically corrupt databases since it appeared that page blobs were truncated to page boundaries after write acknowledgement.
Also the resilience of using NSSM (I’ve seen this before) is less than ideal especially under error conditions.
Yeesh, that hurts. I hope everybody had their backups going.
Yeah. when I started NSSM on the build machine, ES was crashing and NSSM was restarting it over and over. I only knew because my tests failed and I checked the logs (which was rapidly growing). It was because I copied the ES directory (including the db) to the build server without thinking that it was still running (chaser file error). I haven’t investigated the options on NSSM’s error handling, but I might night need to just let the EXE’s errors in fact stop the service.