So in this case you cannot connect a brand new client as well correct?
yep, after the subscription I’m creating gets into the “subscription never got confirmation from server” state then it gets stuck there forever. The client recovers just fine for event write/read operations.
i guess, creating a new client (for each subscription) instead of using the same one would solve the situation, but it feels that it shouldn’t be needed.
"creating a new client" I don't understand what you mean?
What I meant was if its not responding... there is nothing special
about a "reconnection" vs a "connection". As such I would guess that
making a new subscription fails at this point as well?
and the only way to get it unstuck from the “subscription never got confirmation from server” state, is to kill the entire cluster (in order to trigger the cluster discovery process in the client) and bring it back.
What I mean is lets say you have a command line tool that does the
subscription. You run one instance kill cluster bring back... then its
reconnecting .... then it is in this state. What happens if you run
another instance (that does a new subscription) at this point?
I meant to say a new connection
yep, launching a new process that does the subscription connects just fine.
Can you send a full verbose client log?
here’s the client log. Again, the subscription gets into this state not all the time. As a workaround, if instead of using the same connection to re-create the subscription once it gets dropped, I use a brand new connection then the subscription connects all the time as expected.
log.txt (46.8 KB)
When you shutdown the cluster and bring it up is it the same db when
you come back up?
Also can you add a print in your subscriptiondropped? I am curious if
it may be getting called multiple times.
yep, the db is exactly the same
also are there any server logs associated? are you connecting to the
master after the cluster comes back or a slave?
here are the 3 nodes and the client logs. After the cluster comes back it fails when it tries to connect to a slave node.
127.0.0.1-1114-cluster-node.log (174 KB)
127.0.0.1-2114-cluster-node.log (157 KB)
127.0.0.1-1114-cluster-node.log (174 KB)
client.log (79.8 KB)
So RequireMaster: true fixes it?
sorry, I uploaded the log for node 1114 twice, here’s the log for node 3114
127.0.0.1-3114-cluster-node.log (185 KB)
it doesn’t work all the time, what is guaranteed to work is to use a brand new connection when re-creating the subscription that was dropped.
OK so this is a clientapi issue not a server issue. We will work on a
reproduction and get back. Can you open a ticket on this?