Architectural question


I’m in the process of working out how to interact with Event Store in a system I’m designing. In a nutshell, I have accounts, and each account can have a set of projects, and each project can have a set of entities, with further entities nested inside parent entities, do some arbitrary hierarchical depth. I will have a global event stream representing changes to my system as a whole, and there will be a decision process that looks at each event in the stream, as it is received, and if the entity that the event relates to is “active”, the event will be passed to that entity’s individual event stream for processing by a worker process. Now, whether or not an entity is active will depend on a hierarchical list of lookups; i.e. is that parent’s entity active? Is the project that the entity belongs to active? Is the account to which the account belongs to active? This is pretty trivially handled by an external subscribing process, as it can make that determination using appropriate queries to either the Event Store or perhaps to the database where the current data is flattened to. So in that sense, I can get away without using projections at all as I’m just looking at events and deciding what to do with them.

Here’s my question. I have good reason to expect that my global event stream could get very, very busy, which means I start looking at distributing the workload, either in terms of Event Store’s clustering features, and/or in adding additional decider processes to parallelise processing of the global event stream. My concern is that if I do parallelise the global event stream processing, I’m not sure how I ensure that events forwarded to other streams stay in the same order in which they were received. My first thought is that a clustered Event Store probably handles correct ordering even though projections may be working in parallel on different machines, though that’s just an assumption. The problem is that, correct me if I’m wrong, I wouldn’t be able to use projections for my use case, because they only operate on data they’re fed, forwarding projected data to other streams. Trying to determine where to project based on the state of a different stream isn’t going to work because there is no querying mechanism… again this is only what I’ve gathered from looking at Rob Ashton’s blog posts and so forth.

So, if I am correct and I can’t really use projections to ensure correctly-ordered event forwarding in a clustered setup, does Event Store provide any other mechanism whereby two parallel processes can be forwarding events from the same event stream source and yet ensuring that they are forwarded in the correct order? Perhaps this is an architectural problem that is outside the scope of what Event Store can do for me… any advice would be greatly appreciated.

This always happens. I spend large amounts of time pondering a question and not coming up with a good answer, and only once I’ve posed the question to someone else does the answer start to materialise. So, to answer my own question, I could possibly use an incremental version numbering system on a per-entity basis, which means even if events are forwarded out of sequence, I can have the consuming process notice that the events are out of order by looking at the version number and simply waiting for the correct version to appear. I can possibly also project implicit active state as well in some sort of cascaded fashion, allowing me to handle this process entirely with projections. I will have to write some test code to crystallise this idea and work out if it can be done the way I’m thinking.

A question for me still remains though. If I have projection operating on a global event stream and the projection is simply forwarding event instances to other streams, and my event store is clustered, does Event Store ensure that two different events from the same source stream are forwarded in the correct order to the target stream, even though the projection for the two different events may not have even occurred on the same machine?

Can you define “very busy” how many messages/scene of what sizes?

I have a few scenarios that could generate a few hundred thousand messages per second during times of heavy load. I don’t know how likely it is that those will be hit, and I don’t even necessarily need to waste time coding for that level right now (YAGNI / you don’t have a scaling problem until you have a scaling problem), but I’d like to have an idea in the back of my head about how I will solve that issue if and when I have to, so that I don’t code myself into a corner at this stage and necessitate later migration of production data to a new format/architecture that could have been avoided if I’d coded with such an eventuality in mind from the start.

For dealing with parallel projections (say reading an external database) there are some fairly well known patterns. You discuss having one single large feed. What if you partitioned this? ex:

Instead of having /streams/incoming you were to have /streams/incoming1 /streams/incoming2 /streams/incoming3 /streams/incoming4 etc. You would then parition your data (based on your tree sounds like a reasonable idea where you find say the first level of the tree and partition everything below each node to its own stream as they should be relatively independent then). You then have your subscribers listening to a given patrition.



Sorry for the delayed reply, and thanks for the suggestion. Partitioning the global feed sounds like a good idea; I’ll look at doing that.