Unable to write to cluster when DNS routes to a slave node

I have a set of three cluster nodes running 3.0.3 on linux (Ubuntu 14.04) on AWS.

I start them with:

/usr/lib/event-store/clusternode --config config.yml --http-prefixes=http://eventstore.staging.my.local:2113/

``

$ cat /usr/lib/event-store/config.yml

Error forwarding requests generally means that the things can’t talk to each other using the combination of addressing and accepted HTTP prefixes. Can you try this with using * for the hostname instead of eventstore.staging.my.local too?

This HTTP prefixes stuff is a Windowsism which is crossing into the Unix world and provides a lot of pain for no benefit (to my mind).

James

I’ve explored using * for the hostname as well (in fact that’s what I generally use), but the result was the exact same.

This is answered in another post I think (maybe the list)

I believe this is the forwarding bug I fixed and is in dev already

Can you run dev and verify it does not have this same issue?

No, I was never able to get dev built (I’ve never used any Microsoft family development tools so struggled for too many hours to get mono set up right for the actual mono packaging). So I just sized the cluster down to one node and was waiting for the next release.

What environment are you in? setting up mono is pretty easy for most
(brew install/installer for max/apt-get for ubuntu etc)

Ubuntu. I could build ES fine - well, I thought so, since the exit code was 0 of the ./build.sh full process but I guess I got burned on the next step (package-mono) by the same assumption.

When I ran the run-tests.sh script, it seems that most of the tests pass and most of the log output is not obviously erroneous, so SOMETHING built. I just am not sure what.

But for the package-mono, I got some hints about what was missing, but then couldn’t figure out what to set my PKG_CONFIG_PATH to. Like I said, total ignorance of mono.

And today looking at it again, it looks like I have mono 3.2.8, and it is broken:


$ /usr/bin/mono-test-install

Active Mono: /usr/bin/mono

Failed to compile sample System.Drawing program, your installation is broken

So I don’t know. Just a mess, I think.

You don't need to package mono you can run the node after its built.

from root

mono ./bin/clusternode/EventStore.ClusterNode.exe

if you want to use projections you also need to export LD_LIBRARY_PATH
to include the above folder.

Cheers,

Greg

Right. I don’t have my build system in a cluster, and I don’t have build tools on my cluster machines, so it would not be trivial to do that experiment.

I’m optimistic that you got it fixed and the next release will be out before I need to work on clustering again.

I don’t have my build system in a cluster

You dont need them in a cluster

But this was being caused by AWS networking/DNS resolution, I thought. That would suggest that I would need to exercise that part of the system in order to confirm whether or not this is resolved. I’ve never seen this happen on my vagrant VM on my laptop that I use as a build environment.

But you dont need a cluster to build… build on your laptop use binaries in clister

I can’t figure out how to get the packaging to work, so there is not a well-formed binary to push out…

./bin/clusternode just install mono regularly

No joy - I copied the whole 300mb (zipped) built EventStore directory to the AWS node and it won’t run. Not sure why, and I don’t really think that adding a bunch of build environment stuff is a good use of my time on the AWS nodes that are set up to only pull down fully-formed packages and run them. Especially since I’d have to do the discovery/work in triplicate (these boxes don’t even have git - I have put a lot of work into automation to allow me to separate the “factory” from the “car"). Note, the mono clusternode/EventStore.ClusterNode.exe almost works on my vagrant VM (it can’t bind the port, but it otherwise comes up).

litch@ip-10-0-0-97:~/eventstore-bin$ mono clusternode

The assembly mscorlib.dll was not found or could not be loaded.

It should have been installed in the `/usr/lib/mono/2.0/mscorlib.dll’ directory.

litch@ip-10-0-0-97:~/eventstore-bin$ ./bin/clusternode

-bash: ./bin/clusternode: No such file or directory

litch@ip-10-0-0-97:~/eventstore-bin$ mono clusternode/EventStore.ClusterNode.exe

Missing method .ctor in assembly /home/litch/eventstore-bin/clusternode/EventStore.Core.dll, type System.Runtime.CompilerServices.ExtensionAttribute

Can’t find custom attr constructor image: /home/litch/eventstore-bin/clusternode/EventStore.Core.dll mtoken: 0x0a000050

  • Assertion at class.c:5597, condition `!mono_loader_get_last_error ()’ not met

Stacktrace:

at <0xffffffff>

at EventStore.ClusterNode.Program…ctor () <0x00038>

at EventStore.ClusterNode.Program.Main (string[]) <0x0002e>

at (wrapper runtime-invoke) .runtime_invoke_int_object (object,intptr,intptr,intptr) <0xffffffff>

Native stacktrace:

mono() [0x4b73d8]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f08924c9340]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f089212acc9]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f089212e0d8]

mono() [0x638575]

mono() [0x6386b6]

mono() [0x5298e6]

mono(mono_class_get_full+0xe2) [0x52a0c2]

mono(mono_class_from_name+0xfe) [0x52a4ae]

mono(mono_class_from_typeref+0x190) [0x529d00]

mono() [0x54c44d]

mono() [0x54c54d]

mono(mono_get_method_full+0x9d) [0x54ca7d]

mono() [0x464b9c]

mono() [0x421e2e]

mono() [0x4253cd]

mono() [0x425ecb]

mono() [0x4b845b]

[0x418dc166]

Debug info from gdb:

Looks like mono isnt really installed or is a vorrupt install

Right, I had just done a sudo apt-get install mono-runtime on that AWS instance. It doesn’t seem to have worked very well.

Iirc that also installs a very old version like 2.6?

Alright, I uploaded the 300-meg half-built binary to each AWS node, added some Xamarin apt repositories to my sources, installed mono-complete (since mono-runtime can’t actually start ES but the error message is not clear), and started bringing up my cluster. I had this happen on two nodes in the course of starting it (when I brought the third node online), which was weird:

[29815,26,20:07:39.950] CLUSTER HAS CHANGED (TCP connection lost to [10.0.0.112:1112])

Old:

VND {e55df42b-2688-4d2a-8a24-dfaffe9bb901} [Master, 10.0.0.97:1112, n/a, 10.0.0.97:1113, n/a, 10.0.0.97:2112, 10.0.0.97:2113] 737777847/737778396/737778396/E1370@737677222:{5edc6cc8-fd0b-4f3a-aa83-b63d2abc10ef} | 2015-04-22 20:07:39.779

VND {58c60bc6-7cba-45e2-a28c-f8813f5cd524} [ShuttingDown, 10.0.0.38:1112, n/a, 10.0.0.38:1113, n/a, 10.0.0.38:2112, 10.0.0.38:2113] 737777847/737778396/737778396/E1370@737677222:{5edc6cc8-fd0b-4f3a-aa83-b63d2abc10ef} | 2015-04-22 20:07:39.950

VND {48560ff5-60d0-441c-aa1b-12a931115de8} [Master, 10.0.0.112:1112, n/a, 10.0.0.112:1113, n/a, 10.0.0.112:2112, 10.0.0.112:2113] 1098063316/1098079880/1098079651/E1372@729963254:{9db43a27-ce24-42a2-99c0-cdb966e2d8af} | 2015-04-22 20:07:39.788

New:

VND {e55df42b-2688-4d2a-8a24-dfaffe9bb901} [Master, 10.0.0.97:1112, n/a, 10.0.0.97:1113, n/a, 10.0.0.97:2112, 10.0.0.97:2113] 737777847/737778396/737778396/E1370@737677222:{5edc6cc8-fd0b-4f3a-aa83-b63d2abc10ef} | 2015-04-22 20:07:39.779

VND {58c60bc6-7cba-45e2-a28c-f8813f5cd524} [ShuttingDown, 10.0.0.38:1112, n/a, 10.0.0.38:1113, n/a, 10.0.0.38:2112, 10.0.0.38:2113] 737777847/737778396/737778396/E1370@737677222:{5edc6cc8-fd0b-4f3a-aa83-b63d2abc10ef} | 2015-04-22 20:07:39.950

VND {48560ff5-60d0-441c-aa1b-12a931115de8} [Master, 10.0.0.112:1112, n/a, 10.0.0.112:1113, n/a, 10.0.0.112:2112, 10.0.0.112:2113] 1098063316/1098079880/1098079651/E1372@729963254:{9db43a27-ce24-42a2-99c0-cdb966e2d8af} | 2015-04-22 20:07:39.951