Competing consumers and PreferDispatchToSingle

I’ve been thinking about how I might use EventStore for command messaging with a multi-tenant application.

Given a stream per tenant, I’d like to have one consumer so that I have one writer for derived streams and I can utilize in-memory caches. At the same time I don’t want to have a single point of failure in the system for a given tenant.

The PreferDispatchToSingle option looks ideal for setting up fall-back consumers. The only question left in my mind for this usage is how do I define an order for fall back consumers?

I’m thinking I’d like to have 3 tenant streams and 3 consumers with fall-back redundancy distributed among them (eg. consumer-2 subscribed to stream-2 and also fallback for stream-1, etc.). The potential problem that I see here is – depending on which consumer is the preferred single – that I might be left with an uneven distribution of streams to consumers, in the extreme case a consumer that doesn’t see any messages.

Is there a simple way to achieve this, or any other approaches that I could consider?

You can actually build your own named strategies into the system if
you want to. Right now prefer single probably is not exactly what you
want. It doesn't assure which one gets it just that it will prefer
writing it to the first one. Round Robin is similar its does its best
to do round robin but it may not be round robin depending on if
consumers are keeping up or not.

Here is the relevant area of code:

https://github.com/EventStore/EventStore/blob/release-v3.2.0/src/EventStore.Core/Services/PersistentSubscription/IPersistentSubscriptionConsumerStrategy.cs

These are then wired through:

https://github.com/EventStore/EventStore/blob/release-v3.2.0/src/EventStore.Core/Services/PersistentSubscription/PersistentSubscriptionConsumerStrategyRegistry.cs

Thanks. I was originally thinking about combining multiple tenant streams using consistent hashing, and then having prefer-single competing consumers on the resulting streams. I want to partition by tenant, but not necessarily have one consumer per tenant stream.

Now I’m wondering if I might use a category projection on the tenant streams (tenant_input-a, tenant_input-b) and then do competing consumers on the projected (single) stream. A custom strategy could then do consistent hashing on the original stream name to dispatch events to consumers. This would dispatch to the next recipient in the stream (as if the node removed from the hash ring) if a consumer is down.

Will there be support for doing this kind of thing baked into EventStore at some point? I think this might be really useful in my case, thoughts?

For baked in I kind of doubt it because this is fairly unique to your
scenario and it could be done from the outside if needed. There are
also a lot of other things to consider like what happens when your
consumer is currently slow?

Btw if slow consumer is not clear this is why "prefer single" and
"round robin" are best effort not assured. Keeping a preference when a
consumer is slow can get messy. I think what might be better would be
more something based on a catchup subscription (who handles this)
combined with a bit of coordination between the subscribers from an
outside perspective. What SLA do you need to handle fail overs on vs
what is the messages/second

I think I need ~50 messages/second (across all tenants). The consumers are slow in general and limit the throughput in this case, which is something that I’m hoping to address by using EventStore along with partitioning streams (i.e. consistent hashing) to reduce DB concurrency conflicts and reduce DB interaction with local caches.

SLA on fail overs is not extremely important assuming that it is a rare occurrence, this would be more of a safety net.

Given this I would look at coordinating the consumers. Event store can
help with this. Bring up 3 catchupsubscriptions and use a stream to
coordinate them. Basically all subscribe to stream. One will win to
write saying "I lead". The others get expected version failures. Those
two sit idle and try this same process every say 10 seconds. The one
who is leader writes at 9 seconds instead of 10 to make him the more
often than not winner.

That’s an interesting approach, going to try that for fit in some areas.

I have a point where I’m essentially dealing with events from an external system. I’d need to treat these more as commands, where I’d like retries and parking/dead-letter (probably more of explicit nack vs implicit). I’d also like to preserve ordering as best as I can (forgo for retries). The streams at this point are per client, as I don’t have enough information at this point to easily partition the stream further.

I suppose the equivalent to nacking a message with the catchup approach would just be appending a “failed” (tombstone?) type of event, which might work for me.

I will try to get up a document/example on it in the next week.

Oh, maybe I understood on first read. So, use of an additional stream that’s used to determine the leader (and keep one). From there, what type of subscription would be used against the main stream. I suppose it could be a single competing consumer. Is this basically what you’re describing?