Hi,
we have already used eventstore in a small CQRS application and consider it now for an IOT project. Is there a limit how many subscriptions we can create per consuming server? The idea is to open one subscription per customer. We have around 2000 customers at the moment. Is there a limit we have to be aware of?
I am guessing these notifications with 2000 customers are quite rare and being distributed. Why not use http at this point?
What do you mean with notifications?
Each customers gets max 1 message per second from the sensor, very often less than that. We use a rule engine (drools) to evaluate the sensor events (not a fan of) and the knowledge base is relatively expensive to create (sometimes several seconds). Therefore we use an actor framework (orbit) to distribute the knowledge bases to multiple servers. We also have to query the events to recreate the knowledge base after a deployment. Therefore the idea was to keep each customer independent and to subscribe to the event stream of the customer inside the actor and to also store the position somewhere. Catch-Up subscriptions seems to match very well to our use case. Of course we could also use a single subscription per server or use pulling with http to get the events.
Have you considered subscribing to a projection (such as the Category projection, where the category is Customer) and routing the customer events to the appropriate actor once you receive it? Then you only need 1 catchup subscription for all customer events.
Sure, therefore I ask if I have to do it…
You can also do subscriptions over http in a case like yours which will also use less resources (especially when you have a rate of < 1 message/second). This is what I was alluding to.
I am sure there is also some domain specific logic where you might be able to trade off the amount of time for a notification to arrive vs overall resources used. Given the low volume of messages I would seriously consider the possibility of using http as opposed to opening say 2000 subscriptions here. This option also has some other benefits such as other things being able to look over http as well
I would also consider a single subscription that then dispatches the messages internally. This again will be far more efficient (and easier to manage! in terms of checkpointing etc) than 1000s of subscriptions.
Does this make sense?
I would lean heavily away from
Us
DOH “I would lean heavily away from making 1000s of subscriptions over the TCP API”
The code would be simpler when I would create one subscription per customer because I can just use catch up subscriptions to recreate the customer’s state after a deployment. But if It can cause performance issues I can also create a single subscription per server. But the question is: why? What resources are allocated per subscription?
To start with a socket as well as internal resources looking at what is occurring internally. If its 10 this is obviously no issue but creating 2000 sounds like quite a few. There are likely better ways of handling this. Have you considered as example creating an equivalent of $all per customer via a simple projection where all a customer’s events end up in a single stream based subscription?
As example:
fromAll().
when({
$any : function(s,e) {
if(e.customerId) {
linkTo(“customer-” + e.customerId, e);
}
})
This will create a stream customer-1234 (etc) with all events associated to customer-1234 in it allowing for much easier replays of all data for a given customer. It can also be useful for lots of other purposes
Does this make sense?
Cheers,
Greg
I think I do not need projections, because I would have a single stream per customer anyway.
I had a look to the code and for my understanding is that when a subscription is created the client sends a message to the server “I am interested to get the new events for stream X, please send them to me and use correlation id Y” and when a new event is added to a stream it sends the event over a single socket to the client that declared interest. So there is only a single socket per node and not per subscription. So a subscription only require a few entries to dictionaries here and there.
Or do we talk about different subscriptions?
Is there a misunderstanding happening here, whereby Sebastian is suggesting 2000 subscriptions on a single socket connection, and others are interpreting that as 2000 socket connections?
I can see many advantages to Sebastian for having 2000 subscriptions instead of one. For example, it’s easier to keep checkpoints while also concurrently handling the events. If there was just one subscription to an aggregated stream such as $ce-customer, each event would have to be handled by Sebastian’s application one at a time before the checkpoint can be updated, or more complicated checkpoint logic would have to be used.
I am not suggesting something, I just try to understand it
I also think that multiple subscriptions are easier to manage for me, because our actors are very independent. We use orbit (https://github.com/orbit/orbit) and actors are managed by the runtime and can be moved from one node to another in some scenarios.
I made a simple test and created 10.000 subscriptions. And with sysinternals I can only see a single socket for my process.
Subscriptions created over the same connection operate over the same connection.
Then it should be fine to create one subscription per actor.
if you have lots I would probably introduce a single subscription then have that subscription dispatch internally to your system (there are a whole ton of reasons this becomes simpler!)
Even though there is only one socket open (one per connection), you will receive the same event 10.000 times in your test.
/Peter
There is a reason for this. We don’t know how your internal dispatching etc works so dispatch a subscribed event on every subscription. If you have 10k subscriptions on the same connection we will send it 10k times.
As a counter example of why we do this imagine you have a quick little plugin system and 5 plugins which each subscribe to a few events. Would you prefer to write your own internal dispatching or have the network cost of the same event delivered multiple times? You can obviously build the internal dispatching but for many systems this is overkill no?
I am so confused, probably I was not clear enough. I have one stream per customer and the events of a customer are totally separated from the events of other customers. So basically 1 stream + 1 actor + 1 subscription for customer. So the actor for customer “123” would create a catch up subscription for stream “customer-123” using a single shared connection per server.
If I create a single subscription per server I also have the network costs, because server A could receive an event for a customer, which actor sits on server B, so he also has to forward the event to the other server.
You’d be amazed how cheap those costs are vs maintaining 10000 subscriptions.