.Net client subscription/connection issues

Hope someone can help here. So we’re writing a tool to replicate streams across event store clusters. Getting it working with the .Net client was pretty straight forward for the happy case, however if the source cluster drops out we want our application to recover automatically and continue. For some reason I’m struggling to get it working, I’ve tried a couple different things but currently I’m basically just reconnecting whenever the connection is closed and resubbing whenever a subscription is dropped.

I’ve attached the class that manages our connections and also the output from a unit test I’ve written around this (which includes the verbose event store connection logs as well as logs from our code so you can see when nodes are killed/restarted).

The section that jumps out at me from the logs is around line 359:

[26,14:50:39.661,DEBUG] Subscription {ca96555a-a082-4b13-badd-c05ce55b057f} to 288984f0-1f52-4482-a575-d11242cc2014: event appeared (288984f0-1f52-4482-a575-d11242cc2014, 0, test @ 875/875).

[26,14:50:39.661,DEBUG] EventStoreConnection ‘ES-c7c3e629-85a7-440a-a4e2-1a1ad36c44a2’: HandleTcpPackage SUBSCRIPTION DECISION DoNothing (StreamEventAppeared), Subscription VolatileSubscriptionOperation (ca96555a-a082-4b13-badd-c05ce55b057f): EventStore.ClientAPI.ClientOperations.VolatileSubscriptionOperation, is subscribed: True, retry count: 0, created: 14:50:37.836, last updated: 14:50:37.836.

[26,14:50:39.661,DEBUG] EventStoreConnection ‘ES-c7c3e629-85a7-440a-a4e2-1a1ad36c44a2’: HandleTcpPackage connId 748e5a8c-2469-4a93-a6f9-35f93592533e, package StreamEventAppeared, ca96555a-a082-4b13-badd-c05ce55b057f…

[26,14:50:39.661,DEBUG] Subscription {ca96555a-a082-4b13-badd-c05ce55b057f} to 288984f0-1f52-4482-a575-d11242cc2014: event appeared (288984f0-1f52-4482-a575-d11242cc2014, 1, test @ 1060/1060).

[26,14:50:39.661,DEBUG] EventStoreConnection ‘ES-c7c3e629-85a7-440a-a4e2-1a1ad36c44a2’: HandleTcpPackage SUBSCRIPTION DECISION DoNothing (StreamEventAppeared), Subscription VolatileSubscriptionOperation (ca96555a-a082-4b13-badd-c05ce55b057f): EventStore.ClientAPI.ClientOperations.VolatileSubscriptionOperation, is subscribed: True, retry count: 0, created: 14:50:37.836, last updated: 14:50:37.836.

[26,14:50:39.661,DEBUG] EventStoreConnection ‘ES-c7c3e629-85a7-440a-a4e2-1a1ad36c44a2’: HandleTcpPackage connId 748e5a8c-2469-4a93-a6f9-35f93592533e, package StreamEventAppeared, ca96555a-a082-4b13-badd-c05ce55b057f…

[26,14:50:39.661,DEBUG] Subscription {ca96555a-a082-4b13-badd-c05ce55b057f} to 288984f0-1f52-4482-a575-d11242cc2014: event appeared (288984f0-1f52-4482-a575-d11242cc2014, 2, test @ 1245/1245).

So are these the new events I’m posting? If so then they are seen but they get ignored and don’t trigger the StreamEventAppeared event?

Any help would be appreciated.

test output.txt (80.9 KB)

EventStoreConnectionManager.cs (10.6 KB)

Hi

Are you seeing any idempotent writes in your server logs?

The reason I ask is because your logs show the 10 new events that you write having event numbers 0 to 9 :

(lines 387 to 404)

[04,14:50:39.662,DEBUG] Catch-up Subscription to 288984f0-1f52-4482-a575-d11242cc2014: event appeared (288984f0-1f52-4482-a575-d11242cc2014, 0, test @ 875/875).

[04,14:50:39.662,DEBUG] Catch-up Subscription to 288984f0-1f52-4482-a575-d11242cc2014: event appeared (288984f0-1f52-4482-a575-d11242cc2014, 1, test @ 1060/1060).

[04,14:50:39.662,DEBUG] Catch-up Subscription to 288984f0-1f52-4482-a575-d11242cc2014: event appeared (288984f0-1f52-4482-a575-d11242cc2014, 9, test @ 2540/2540).

When you resubscribe, you subscribe from the last processed event number, which in this case is 9. It will ignore any event with an event number lower than that.

This would result in your subscription ignoring the new writes because it believes it has already processed them

Ok, so I just reran the test and grepped the server logs for idempotent writes. No luck. I also checked the stream in the source through the dashboard and it all looked fine (20 events as expected, see attached image 1). However when I’ve rerun the test again (a lot of times) I’m now only getting 10 events, from 0 to 9, on all 3 nodes. On the latest run two of the nodes have the 2nd set of events I posted and the other node has the first set of events I posted. All the events have unique guids when posted, I’m wondering if this is an issue with us killing the master too quickly, and then changing the master after the cluster restarts?

I’ve also uploaded the server logs for this latest run, if that helps. The source cluster is running on nodes 29114, 30114, 31114 and the target of replication is one node on 33114.

Thanks for your help!

Image 1 - All 20 events on cluster.PNG

image 2 - first 10 events on one node.PNG

image 3 - second 10 events on other 2 nodes.PNG

127.0.0.1-29114-cluster-node-stats.csv (97.6 KB)

127.0.0.1-30114-cluster-node.log (83.9 KB)

127.0.0.1-30114-cluster-node-stats.csv (97.2 KB)

127.0.0.1-31114-cluster-node.log (74.5 KB)

127.0.0.1-31114-cluster-node-stats.csv (98.3 KB)

127.0.0.1-33114-cluster-node.log (16.4 KB)

127.0.0.1-33114-cluster-node-stats.csv (84.3 KB)

127.0.0.1-29114-cluster-node.log (78.8 KB)

Ok got to the bottom of it, was pretty much as you said - the new events were replacing the old ones. Turned out to be a configuration issue with the database causing the original events to be lost. I’ve resolved my issue, and also made sure we now use ExpectedVersion when posting to catch this if it happens again.

Thanks!