Load Balancing using persistent subscriptions

sinoohe.mkh · December 14, 2021, 11:57am

but ESDB persistent subscription does not guarantee order of messages.
if you check that page, it mentions kafka: Consumers can see the message in the order they were stored in the log.

oskar.dudycz · December 14, 2021, 11:58am

@sinoohe.mkh, please read my whole comment…

yves.lorphelin · December 14, 2021, 12:07pm

FRom the kafka doc :

Kafka guarantees that a message is only ever read by a single consumer in the group.

this means a single consumer for the group.
so there is no autoamtic scaling out there as well.
you need to configure / repartition as well

sinoohe.mkh · December 14, 2021, 12:15pm

we want scaling only for aggregate ids, for example in my case that i explained, with kafka we can define one consumer group and then partition profile-user_id_1, … profile-user_id_N to 10 partitions and then each consumer would consume only N/10 users events.

if ESDB had ordering guaranteed for persistent subscriptions we could use that.

oskar.dudycz · December 14, 2021, 12:20pm

As we stated, Kafka also won’t give you any ordering guarantee between partitions. So for your scenario it won’t give you ordering guarantees for the stream.
Plus Kafka has a hard limit of 200k partitions: https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions. Unless you hash them and put events from different streams to the same partitions, then you won’t be able to process all your aggregates.
I suggest rethinking your design and considering our suggestions/answering our questions.

sinoohe.mkh · December 14, 2021, 12:43pm

I agree completely but they guarantee ordering within partition. does ESDB guarantee ordering within partitions when using persistent subscription?

from: https://developers.eventstore.com/server/v20.10/persistent-subscriptions/#ordering-is-not-guaranteed

i conclude from this sentence that there is no guarantee that i get events in order even in each consumer(in one consumer group)

yves.lorphelin · December 14, 2021, 12:49pm

you read that correcty.
it’s best effort
Only because you need to ack / nack the messages you handle.
I.e if you NACK a message , it will be resubmitted into the subcribers
( and depending on config , it’s goonna end up in a Park message stream)

now you can subscribe / catch-up to everything , in each process , and the consuming process can just ingore what it’s not interrested in .
adding new consumers is then only adding a new process & recofniguring the existing one.

oskar.dudycz · December 14, 2021, 12:51pm

@sinoohe.mkh, well, let’s clarify again. If you partition your stream in Kafka, then you won’t have an ordering guarantee. You also won’t be able to parallelise processing, as only a single consumer can process a single partition.

You can mimic this “pattern” from Kafka, creating streams in a way that stream in ESDB = partition in Kafka, then run filtered catchup subscriptions by stream prefix.

That would give you exactly the same mechanism as you have in Kafka. However, that’s not solving your scaling issue. Also, that’s not what you should be doing in my opinion. In my opinion, you should focus on solving the root cause instead of working around it on the technical solution level. That will give you the real scaling benefits.

sinoohe.mkh · December 14, 2021, 2:22pm

thanks for the response, now i see in which condition the out of order might happen and we can design our system.

yves.lorphelin · December 14, 2021, 2:30pm

there is a bit more : and it has to do wiht how fast you handle & ACK/NACK and batch size.

if you wait too long to ACK , the server is gonna assume it was not process & NACK it ( those are confifurable time outs: https://developers.eventstore.com/clients/grpc/persistent-subscriptions.html#persistent-subscription-settings

mainly boils down to trying out differrent settings (time outs / batch size) to minimize chances of out of order / duplicate given your setup

alexey.zimarev · December 14, 2021, 4:25pm

Kafka could be a good option to scale projections. However, as Yves pointed out when one consumer in a group gets stuck, the whole partition is stuck. Unless you rigorously monitor each consumer there’s a risk that part of your read models is ok, and the other part is very stale.

We also have good experience using Google PubSub as a distribution medium between a catch-up subscription to $all and the projections.

It is possible to use other methods to avoid using a partitioned broker, but it is not easy. Eventuous (Yves posted some links above) implements in-proc partitioning with ordering per partition, but it requires your projections to be strictly idempotent. It’s because you must store the checkpoint with regards to the overall processing, as partitions are consumed at random (between partitions).

Another option is to use Actor Model and distribute the load between each individual document/row in the read model as actors. It’s something I am entertaining for a while, but I won’t be able to implement it before February next year (I think).

sinoohe.mkh · December 15, 2021, 10:25am

thanks for the response, we’ll consider google pubsub