Maximum recommended number of connections

Moh_Abed · May 24, 2020, 7:20am

Our system is multi-tenant and it has been running in production for a couple of years. We are re-evaluating current strategy to make sure it scales as we expect our number of tenants to increase drastically.

This is mainly for subscription to read model.

Options considered:
Option 1. Catch-up subscription per tenant (we do have an eventstore projection that will create a stream per tenant) and we do use one catch-up subscription per tenant - each will subscribe to the tenant’s stream. This is the current approach which is working fine with current number of tenants of 100. But this number could increase to thousands (potentially 10,000 soonish), our concern is the number of open connections/subscription. Is it safe to assume eventstore will handle this number of connections/subscriptions with no main performance impact? We prefer this approach as we can horizontally scale subscribers (we are using akka.net).

Option 2. One catch-up subscription for the entire app using one connection to eventstore, but that means we need to push the events in a scalable way instead of processing all events of all tenants in one place. Potentially in that option we might need to use some sort of bus/queue to push each event to it’s relevant queue/topic and fan out to different subscribers.

Note that this is regarding running the read model handlers so we need to guarantee message orders.

What is your recommendations / thoughts?

steven.blair · May 24, 2020, 9:33am

We have a similar setup (although no where near as many tenants as you) and we have a couple differences to our setup.

Not all our tenants reside in one Eventstore. We don’t really have cross tenant domain rules, so nothing really forcing us having the data in the same place.
We have grouped a lot of our read models on the same server, so we can mass deliver from one persistent subscription to a single endpoint, and then write to whatever database we need to (based on information in the event)

Just as a side question. Why do you use catchup susbcription over persistent subscription? Just curious. For us, the parked queues are essentual for our event delivery.

Moh_Abed · May 24, 2020, 11:10am

Thanks …
We are using Catch-up subscription to guarantee the message order.

We don’t have cross tenant domain rules, and we thought to run multiple Eventstores but the cost of running and operating will be increased, We wished if Eventstore have the notion of multiple “databases” to handle multiple tenants in a better way.

We are using one subscription per tenant to be able to better handle large number of tenants/events and scale horizontally. Also we don’t want to throttle/consume the main (one) subscription if there are tenants with very heavy load.

Peter_Hageus · May 25, 2020, 12:53pm

“Just as a side question. Why do you use catchup susbcription over persistent subscription? Just curious. For us, the parked queues are essentual for our event delivery.”

Replays?

steven.blair · May 25, 2020, 12:56pm

Do you mean replaying the whole stream?
The Persistent Subscription gives us a parked queue that we can easily replayed when we fixed the problem.
I don’t think you can do this with catchups, and the order isn’t a big problem for us.

Peter_Hageus · May 25, 2020, 1:07pm

No, I meant dropping the readmodel and replaying it from scratch. Pretty common with bigger updates etc.

steven.blair · May 25, 2020, 1:11pm

Yeah, we do that as well.
From the UI, just hit Delete, and it gets recreated (the persistent Subscription) and starts from -1

alexey.zimarev · May 28, 2020, 7:34pm

There are multiple strategies to solve this. Let’s take those that you already described in the first message. Using read-only replicas that will be available in the next major version will allow you to replicate all the data to nodes that aren’t really a part of the cluster. So, you can partition your tenant subscriptions to those nodes. Say, you have 10K tenants and 10 read-only replicas, you’d have 1K subscriptions per node.
Another approach that you proposed is to push events from Event Store somewhere else. Kafka or Pulsar could be candidates for that. You can create a topic per tenant, for example. It might be easier with Pulsar since you can have any number of broker nodes and keep a relatively small number of Booksie nodes since your aim is not to have an unlimited TTL but to scale.
I haven’t tried having thousands of subscriptions on a single cluster, but you might run a synthetic test using a test cluster. Mind to set your catch-up subscriptions to use slave (follower) nodes so you don’t overload the master node.

Moh_Abed · May 28, 2020, 11:58pm

Thanks, that make sense …

I tried a similar approach with current version by using clones (reaplica nodes that are not part of the cluster) and tried to separate subscriptions between master and slave (main repositories accessing master and subscriptions use slaves) .

Also I tried two approaches when subscribing (catch-up), one where you have a dedicated connection per subscription and other option is to have multiple subscriptions sharing same connection.

Sharing connection theoretically should be better as suggested by EventStore documents to share a connection as possible. This worked but from initial load testing, having a dedicated connection per subscription seems to perform better for some reason.