Matryoshka dolls

tamas.gabor.zoltan · February 18, 2021, 5:45pm

Hi, I’d like to ask your opinions about modeling the domain of Matryoshka dolls in a DDD + CQRS + ES fashion.

The domain: we have dolls of different sizes. They are hollow and one can put Stuff in them - or other, smaller dolls. Our software can handle dolls of all sizes. If we put a smaller doll in a bigger doll, all the smaller doll’s content will also be content of the bigger doll. If a doll contains a smaller doll or Stuff, then it’s considered full. If it contains neither, then it’s empty. All dolls and all instances of stuff are identified. We always want to be able to query where a specific stuff or a doll is.
There’s no defined maximum depth of recursion. A Huuuge doll can contain thousands of smaller dolls. And actually, if we take a tiny little doll out of a bigger doll, interaction-wise they’re perfectly equal - we can do the exact same things with them.

The problem: defining the consistency boundaries. If we think of dolls as the aggregates, and the operation of putting one doll in another (or taking one out, making it “top-level” citizen), it’s always somehow multiple aggregates getting affected, which would mean multiple streams being modified. For that I would need some kind of process manager but if I can help that, I’d rather not build that in the system.

How would you approach this problem?

yves.lorphelin · February 18, 2021, 7:08pm

always difficult to answer with fake domains.
in this case the doll is not the aggregate,
the “state” of what doll is inside another is a projection of all “putted doll x into doll y”, “removed doll y from doll x” .

tamas.gabor.zoltan · February 19, 2021, 3:43pm

Thanks for your response.
Unfortunately it did not get me much closer to crying heureka. About the fake domains, of course you are right. But the real domain would take an hour of reading to get familiar with, so even if I was at the liberty share it, it would be too scary and boring at the same time to get a response.
But okay, let’s try to do another attempt, trying to make it simple enough, but still describing the problem I face well enough.

Let’s imagine a very simple automated warehouse.There’s pallets and shelves. All pallets have a unique ID. All shelves have a unique address. We are building a system that always keeps track of where the pallets are, and can respond to queries for locating the pallets, or shelves with enough space left for additional pallets.

Between shelves the pallets are moved by forklifts. Client systems decide what pallets need to be moved and by which forklift, based on the information we provide (where’s a specific bin, where’s enough space to put it).
Each forklift is identified as well, and whenever a pallet is loaded on a forklift or unloaded onto a shelf, our software gets a location update for this pallet.

So any pallet that’s in the system is considered to be either on a shelf or on a forklift, and at any moment the system must be able to tell where a specific pallet is, and which are the shelves where a pallet can be put.
The final concept to intruduce is that any time the other system selects a shelf as a target to a pallet, it tries to reserve that shell, to stop other processes from sending a pallet to the same shelf. So from the client perspective when they’re trying to store a pallet, it’s:

Query: shelf with enough space
Select target shelf from result set
Reserve target shelf
Send a forklift (external)
(after pallet has been loaded) Update pallet location -> forklift (and increase free space on source shelf)
(after pallet has been unloaded on the shelf) Update pallet location -> shelf (also, reduce free space on target shelf!)
Release target shelf

(Obviously, only one client can reserve a shelf at any time, and it remains reserved until released by that client. So one shelf can have zero or one client reservations).

So what I find is there’s events that affect:

a shelf
a pallet and a shelf and a forklift

No matter how I think about it, I see these options:

introduce a process manager and work with multiple aggregates
have a single aggregate that represents the entire warehouse and I let it do everything (processing location updates and making reservations)
my aggregates are not the shelves, nor the pallets, nor the warehouse, but… something I cannot think of. The reservations? The location updates?

I feel that with the first approach I am trying to force a domain model that’s more natural to me, but is not well suited for the scenario.
But with the second approach I’m also unhappy, considering I have one “god” aggregate that enforces consistency alright, but every command goes to it, every event happens to it, and so it has to load Encyclopedia Britannica a hundred times a second. And I also feel that ending up using EventStore writing everything into a single stream, I’m doing something wrong.
And the third approach is not really an option until I manage to understand it. Hence this topic.

chris.condron · March 3, 2021, 7:48pm

Hi Tamas,
take a look at the pattern I lay out over here
Cross aggregate events
You can use the same approach.
-Chris