What is the correct way to disconnect from a persistent subscription? And does the event store .net api attempt to automatically connect to the persistent subscription under any circumstances?
After long periods of inactivity (greater than 24 hours), we see multiple open connections to the subscription group. I see some subscriptions with around 100 connections !!!
To disconnect from a persistent subscription you need to call Stop() on the subscription. And Event Store should not automatically connect to a persistent subscription.
How are you measuring the number of connections to the subscription group?
And how are your clients connecting to your persistent subscription?
Also is it possible to just disconnect all connections to a particular subscriber group? We have 100 connections for a client and stopping the client just reduces the connections by 1. The only way to get rid of the other 99 is to restart the eventstore.
What causes that? And taking down the client process just reduces the number of connections by 1. The only way to get rid of all connections is to shut down the event store.
I am looking at the connections column in the competing consumers page.
I did not check the source of connections today but when it occurred previously the connections were all from the same machine where the client was running (we had no other client running on that machine). The client did not receive any events till we restarted it. Also I have a suspicion this is related to virtual machines, since the subscription groups connected from the base machine do not have this issue.
The issue occurred again and I grabbed some screenshots
CC1.Png - shows the main connections page. There is only one connection
CC2.PNG - Shows the Competing consumers page. We connect to each subscriber group from a separate machine. As you can see, the number of connections are all over the place. They should all be 1. Some of them show 100 which corresponds to the maximum subscribers setting.
CC100.txt - shows the connection info for a particular subscriber group that shows 100 in the CC web page.I have replaced the actual IP with 0.0.0.0 (i can send it through email if needed).
The IP in CC100.txt is different from the IP in CC1.PNG. So some subscriber group connections are not being cleaned up when the connection drops.
Thoughts on why this is happening and how to avoid it? As I said before, i have a feeling this is limited to VMs.
Hey, I was able to replicate the issue with the attached test service. Steps to reproduce -> Install the service and run it over the weekend. > 24 hours of inactivity results in orphan connections showing up and event delivery has issues. The only solution we have found so far is to restart the event store every Monday.
The service has been running since 14th. The problem should have occurred on the 15th evening or 16th. I checked on the 15th (around noon) and there were no orphan connections.
We are running a single instance of event store 4.0 (EventStore.ClusterNode.exe --db ./db --log ./logs --ExtIp 0.0.0.0). We are also using the open source version (in the process of acquiring commercial license)
I will run the test with Stop and see what happens.
Based on our test program logs the subscriptions dropped and reconnected around 15 times from the 13th to 17th July. The connection dropped by a similar count. The connection drops may be related to the heartbeat getting timed out. We used to have connection drops often and Greg recommended we increase it.
After more testing today, we believe we have found the bug causing of your issue.
When the client attempts to create a connection to a persistent subscription, it will send off a request and wait for a response. If it does not receive a response within the OperationTimeout, it will retry the connection operation.
Since you have specified the KeepRetrying setting, the client will retry this connection operation until it eventually succeeds.
In addition to this, the Persistent Subscription Service on the Event Store server does not track which client operation caused the connection operation. This means that the server will create a connection for each retry.
From the logs you provided, your Event Store server was experiencing a lot of slow queues during the time you saw the extra connections being created.
This means that a lot of the connection operations were likely timing out, causing them to be retried, which would cause multiple connections to be created all at the same time.
An issue for this has been added to github, you can track it here.