Handling large amounts of references

Peter · May 11, 2022, 3:30pm

Hi,

When designing a simple product manager for our website, we would like organize our products in categories.

Both product and category are aggregate roots. Each category has a product references property with a list of product identities. Products are unaware of the categories they belong to. Categories can be deleted without having to update products.

However some categories can have a large amount of references (over 50k). The events in the domain are fine-grained, so every reference being added, changed or deleted results in an event.

Currently we use a snapshot strategy to prevent having to read all events when loading a category, but we are wondering if there are better strategies to manage (large amounts of) references.

Curious about your ideas!

Cheers,

Peter

steven.blair · May 11, 2022, 6:50pm

Peter,

We had a similar situtation, and wanted to avoid rehydrating a monster stream into an aggregate to enforce a rule.
What we ended up doing was building a stream per relationship.
So in your example, you could use the category and product combination to build a stream name and rehydrate from there.
If you have an event, it tells you the relationship is there (maybe you tell the user it already exists / error / silently handle). if the stream doesnt exist, you are free to create the relationship.
From the read model side, you can simple subscrive to “$et-RelationshipCreated” or even the category of the relationship.

Peter · May 11, 2022, 8:20pm

Hi Steven,

Thank you for sharing this solution!

How did you handle deletion for either entity of the relationship? Did you use the read model to determine the relationships for the entity being deleted?

Cheers,

Peter

steven.blair · May 12, 2022, 7:31am

Possibly both.

So take this example:

Product gets deleted and you want to remove all references in a category for this product.
Rather than scan through all the relationships, you rehydrate the Product, Category and the CategoryProduct stream.
Assuming CategoryProduct exists, you would then delete (so your domain remain valid)
You read model could then be used for displaying the Category / Products (but not enforcing business rules)

The take this a bit further, if the product is deleted, and you don’t know the Categories it belongs to, then in my head, you have two options:

Query to get a list and work through to remove the relationships
Simply mark the Product as deleted and any future updates will rehydrate this first and check its deleted, or your read model picks up on the product was deleted and simply doesn’t show this in Categories;

select * from CategoryRelationships where Product is not disabled

alexey.zimarev · May 12, 2022, 1:49pm

I think it should be the opposite. Products might belong to more than one category, so the list is always small. It can change. Categories are static. They are descriptive, but might not even have any business meaning.

A category might not even be an aggregate.

Peter · May 12, 2022, 4:03pm

Hi Alexey,

Thanks for the feedback.

In our use case the category requires to be an aggregate. Considering it as some sort of tag doesn’t work for us.

We’ve chosen to reference products from categories as products must be able to exist without any form of categorization. When deleting a category it requires only deletion of the category. When deleting a product it would require change of only the categories involved.

When referencing the categories from the product makes the list small. But what happens when a category is deleted? Then all products referencing the category should be changed, right?

Another way could be, as Steven suggested, to keep some sort of state in the read model and exclude entities based on this state. Of course the read model can also be “scavenged” to keep it clean.

Thanks again so far for your ideas guys!

Cheers,

Peter

alexey.zimarev · May 13, 2022, 7:00am

It’s a normal reference between aggregates, and, indeed, keeping the category ids in the Product aggregate would improve the design. Using the query side to find all the products for a given category (also for removing the category id after a category was removed) is something I would do in this case. You will need to fan out commands to each product that belongs to a deleted category, and doing it reliably is the hardest part. At the same time, this particular case is not that hard as the command to remove the category from a product is easy in terms of ensuring idempotence. If you send it twice nothing bad will happen.