Best approach for subscribers that should compete

diego.martin · June 23, 2021, 1:07pm

If I’m not mistaken, EventStoreDB subscriptions using the gRPC client will open a dedicated bidirectional channel between the client and EventStoreDB and start consuming events from the given position in a catch-up manner.

What’s the best approach to deal with scenarios where the services that subscribes to EventStoreDB $all streams, transforms and projects information in a “query side” database, scale horizontally?

In other words, If I have a client app that uses gRPC to subscribe to all streams, and discriminates events to only handle the interesting ones, what happens if that client app scales and suddenly I have two instances accessing the same streams and events in a duplicate manner?

Ideally they should compete, but since this is not a message broker but an event stream that can be read independently, do we have options other than these?:

Ensuring there is just one client app, at all times, subscribed and creating projections or fanning out events to a shared queue.
Ensuring multiple instances behave in an idempotent way, so the slower app won’t write what has already been written (although it will read the event from the stream again and “waste” resources).
???

PS: I’m interested in $all streams since my streams naming follows a AggregateName-Guid pattern and I don’t know the stream names in advance in order to subscribe to all of them, and that wouldn’t be practical.

diego.martin · June 23, 2021, 1:14pm

I’ve must missed this section of the documentation (or maybe it’s new)
https://developers.eventstore.com/clients/dotnet/20.10/subscriptions/#persistent-subscriptions

Groups of consumers seems to be the feature I was looking for, but this seems to imply that’s for persistent subscriptions, not for catch up subscriptions

Catch-up subscriptions serve the purpose of receiving events from a stream for a single subscriber. Subscribers for catch-up subscriptions get events in order and, therefore, are able to process events sequentially. There is nothing on the server that gets stored for such a subscriber.

You can have multiple subscribers for the same stream and all those subscribers will get all the events from that stream. Subscriptions have no influence on each other and can run on their own pace.

Will keep investigating there though.

This explains it very well
https://developers.eventstore.com/clients/dotnet/21.2/subscriptions/persistent-subscriptions.html#concepts

My only question left, for now, is: is it possible to use persistent subscriptions with $all stream? The UI does not allow me entering $all as the stream name and I don’t see anything in the documentation, so I’m guessing I can’t subscribe to all streams using a persistent subscription and groups for my consumers to compete.

Back to my original question

yves.lorphelin · June 25, 2021, 8:45am

persistent subscription to $all is coming in the next release :21.6 (very soon)
this will require an upgrade of the client library as well , they will be released shortly afterwards.
( note , creating a subscription group to $all will only be available through the client libraries , not through the UI )

Joao_Braganca · June 25, 2021, 8:51am

If you’re just talking about projecting into a read model, you could treat the application + the read model database as a single unit, i.e., by running the database embedded on the same machine as your app. Then your app simply talks to its local database and isn’t concerned with anything else.

alexey.zimarev · June 25, 2021, 8:52am

Diego, we normally don’t recommend using persistent subscriptions for read-model projections, as we cannot guarantee that events will be consumed in order. It’s an especially valid concern in case you use consumer groups with multiple consumers. Despite having different strategies for distributing events between consumers in groups, there’s no guarantee that the strategy will be fulfilled 100%.

alexey.zimarev · June 25, 2021, 9:09am

The most obvious solution for scaling read model projections horizontally is events partitioning. Every streaming system, which guarantees ordered message delivery, has the same constraint - you can only have one consumer per partition max, otherwise you’d never be able to guarantee the ordered event processing.

EventStoreDB doesn’t have the “partition” concept at the moment. We value the “single log” idea as it solves a lot of issues when it comes to building event-sourced systems.

Still, we know that users create virtual partitions using streams, which contain link events. $ce stream is an example. I know systems where the stream name is prefixed by the tenant id, which allows using the category stream as a tenant partition. Custom projections can use the event payload, or meta, do implement more advanced partitioning, emitting links. The concern with this approach, however, is that it amplifies the number of writes on the server. As scaling out subscriptions is a necessity caused by performance issues with a single subscription (otherwise, why would you care?), so you have a large number of events ingested. By partitioning using streams with links, the number of events will increase (double at the very least), so you get a chance to move the performance issue on the transaction side instead of solving it.

In high-throughput systems we see customers using the intermediary approach. You can find a post by Jet from a few years back, where they describe their approach to publish all the events from the ESDB cluster to Kafka, where you can partition events. On the ESDB-egress and Kafka-ingress, it’s still one sequential subscription, but it doesn’t normally struggle with performance. We have a Kafka sink for the Replicator, which allows you to partition events based on event data among other things. The sink is not very fast as it uses confirmed writes to Kafka, which isn’t very fast. Still, it might give you some ideas.

We are working on a set of impactful changes for our product, which are expected to mitigate or entirely remove a lot of concerns like this one. At the end of the day, you should be able to do what ESDB does now, what you would do with Kafka, and even more than that, using ESDB only. It’s all in active development, and we aim to publish a product roadmap soon.

diego.martin · June 25, 2021, 10:18am

Excellent. Thanks everyone for your replies and the good comments and ideas.

Order of events is indeed a concern for my read models, so definitely the persistent subscription with subscription groups would not be an option for me, the same as I don’t use a SNS/SQS message broker for projecting read models for the same reason.

For now the best option for my scenario is to have a single subscriber projecting into read models, which should be enough (I think).

I guess there are some tricks, similar to the single table modeling patterns some use with DynamoDb, to use some special stream naming worth considering.

Thanks again, plenty of info there to consider some good options.

alexey.zimarev · July 5, 2021, 6:22pm

There are different kind of brokers. Kafka is a partitioned log, so every partitioned topic maintains the order within a single partition, so you have a guarantee that for each consumer in a consumer group those events will be in order. You just need to partition them correctly. Google PubSub Lite is very similar to Kafka, as well as Azure Event Hub. Not sure about AWS though. Google PubSub now has the ordering key, which is not the partition key, but it can be used similarly. It’s just that choosing the ordering key is a completely different story compared with choosing the partition key, given the implementation specifics.

I mean, Jet used Kafka for event partitioning for read models, where events come from ESDB, for years. It’s a valid pattern, at least until we introduce a similar concept inside our product, which will happen, it’s just a matter of time.