Is this a valid approach for snapshots and read models?

Daniel_Ryan · April 11, 2014, 3:31pm

I am sorry about the newbie questions; this is my first time attempting to implement an event store. Can someone please either clarify or correct my design for both snapshots and updating read models?

Read-models:

Is this a valid/recommended strategy?

Subscribe to all streams (in a background service)
For all events check the the stream id ignore events that do not start with a given prefix
For matching events, use a concurrentBag (hashset) to store the Id of the streams
Every 30 seconds or so, for each changed aggregate, load the aggregate replaying any events since the previous snapshot
Save the aggregate (probably in mongodb) replacing the existing version

Is this reasonable?

Snapshots:

I think I understand how to store and retrieve snapshots based on information gleaned from previous posts. What I am not sure about is the mechanism for initiating a request to take a snapshot, given that I want to have the snapshotting itself performed within a background service. Looking at NEventStore, they have a mechanism for retrieving streams due for snapshotting (GetStreamsToSnapshot). My understanding is that EventStore does not provide a mechanism for querying streams so a different approach is required? Therefore does the following approach sound reasonable?

When loading an aggregate, check whether the number of events since the last snapshot exceeds a given threshold.
If the threshold is exceeded, initiate a TakeSnapshot message and store this on a Snapshots stream.
Subscribe to the snapshots stream in a background service.
Take the snapshots.

I don’t particularly like this approach for the following reasons:

a) A snapshot request will only be initiated on loading an aggregate. I would prefer this when writing the event but this is currently an append only operating for performance reasons (i.e. we are not loading previous events).

b) I would rather avoid creating initiate requests when loading the aggregate, preferring to have all snapshotting concerns the responsibility of the snapshot service.

The other idea I had is to use the read-model to provide the mechanism the information on the streams that exist. But this seems a bit wrong?

I am probably missing something here?

Thanks,

Dan

Greg_Young1 · April 11, 2014, 3:35pm

How many events per stream do you have that you have such a focus on snapshots?

In another thread on here 15k events were loading in about 200ms

Daniel_Ryan · April 11, 2014, 5:19pm

I’m working for a large online retailer and want to record the tracking of stock movements. Unfortunately I do not currently have the details regarding the best selling product at the moment (as I am out of the office). But the sales have grown exponentially over the last few years.

Given this, I would estimate the best selling products could potentially move into the hundreds of thousands. Although most skus will be nowhere near this.

Currently reservation and allocation is done synchronously on the website, and it is important that the stock aggregate is fully loaded before accepting orders. Hence why performance is a concern.

jen20 · April 11, 2014, 9:24pm

How fast are the streams growing? I’d tend towards snapshotting into another stream every
n events on write, or queue them for snapshotting or something along those lines.

If you’re not loading previous events on write how would you ensure consistency anyway (which seems to be a concern if you’re doing everything synchronously)?

James

Daniel_Ryan · April 11, 2014, 9:54pm

So to clarify, we have two different styles of events: stock movements style events and allocations and reservations. Stock movements represent changes to stock levels in the warehouse. These are processed asynchronously and can just be appended to the steam without worrying about concurrency. Allocations and reservations are done from the website and must be real-time against an up-to-date aggregate so that we are not selling stock we do not have; concurrency is important for these types of events. It is critical that this is done without any delay to the customer.

So I guess the snapshots could be taken whilst writing the reservation/allocation style events, but it seems wrong to only create snapshots for these event types only. Also some guidelines I have read recommend that snapshot creation is done as a background process, hence why this is an aspect of my proposed design. Am I incorrect to worry about this?

Do you have a comment on my proposed design for creating the read models? It might just be me but there does not appear to be many concrete examples of implementing a realistic event sourcing application. Hence why I am keen to make sure my design makes sense.

Sebastian_Good · April 11, 2014, 10:07pm

For aggregates that sustain a high rate of events we have been successful caching them in memory. This obviously requires a reasonable routing infrastructure, but if allocations and reservations are coming in many times per second, you may find this a little easier to deal with and mostly obviates the need for snapshots (no big deal if the object takes 5 seconds to load, as it happens rarely.) You’d be surprised how many aggregates you can process with reasonable async-focused code, processing against in-memory objects.

But it’s hard to imagine between the website and the warehouse inventory system that you have a guaranteed order of delivery that absolutely forestalls the possibility of overselling your inventory. In this case you must have a system of compensating actions anyway. (Sorry we sold you something we don’t have.) Would then just a standard optimistic concurrency/competing consumer approach handle your problem?

Daniel_Ryan · April 11, 2014, 10:41pm

Thanks for the suggestions.

I was hoping to expose the allocations/reservations through a stateless RESTful service so I am reluctant to go down the in-memory root.

I haven’t really got to the point of thinking around the concurrency for allocations and reservations. You are correct that we cannot guarantee the overselling of inventory, but when this happens it costs money and customer experience so I need to guard against this as much as possible. Thinking this through though (as I write this post), I think I just need disable the optimistic concurrency (ExpectedVersion.Any) for allocations/reservations providing I am reading from an up-to-date aggregate.

Thanks for the thought provoking post. Have you got any comments regarding my design for read-models? Does this seem reasonable?

Dan

Greg_Young1 · April 12, 2014, 8:58am

This is funny as its an example I actually use in my class for eventual consistency.

Normally on such a system I would have all events with no consistency. I normally would not even have a domain model for many systems like this as the business logic is normally minimal. Instead I would push events in then query off the read model (eventually consistent) to see if things are allowed.

A big part of this is especially for things like stock movement the goal of the system should not be to say “yes or no” to being allowed but instead to track the fact that the movement occured and generate an exception report of things that maybe are fishy (something moved from A-B and from A-C).

From a consistency perspective the focus changes instead of making everything perfect to lower the cost of failures (you mention very real costs). Failures will happen no matter what in a warehousing system as at the end of the day your system is not the book of record (the warehouse is). There is even domain specific terminology that covers these concerns such as “lossage”. Instead of trying to proactively prevent all the issues take a focus on detecting them quickly after they happen.

I know what is being discussed is flipping the problem over on its head and thinking about it in a different way but having worked on a warehousing system this model ended up being much closer to the way the domain experts actually thought about the problem. It also has the ability to offer much higher availability than trying to make the system fully consistent (in class we go through how with devices it could appear to “never be down” just that the probability of failure gets higher and time to detection gets higher (eg more business risk as opposed to downtime, punt the problem to the business people :))

Greg

Daniel_Ryan · April 30, 2014, 9:57am

Apologies for attempting to revive this thread but I still
have some questions that I hope someone can help me with.

I agree with the point regarding eventual consistency;
however, I do think that it would be prudent to do as much as possible to use
the most up-to-date stock levels before perform stock allocation. To this end, would it be a reasonable
approach to store the latest processed eventId as part of the read-model, and
process any new events as part of the stock allocation process?

Also, I am having trouble comprehending how to perform stock
reconciliation. Currently we receive a
report from the warehouse of book stock levels.
We then take the latest stock levels (in our database), we then grab any
events received since the report was produced, exclude any events where the
warehouse created date is greater than the date the report was produced, and reverse
the stock levels for these events. We
can do all this easily because we can query on the metadata. How might this be achieved using
EventStore? I have seen other threads
where querying could be reframed as stream re-partitioning but I don’t think
that helps in this case?

Achieving things using an EventStore is really easy to do
when you do not have many events and can recreate states by replaying each
time. When this is not possible due to
performance, there is a lot of additional complexity and not much information
on how people solve common problems.

Thanks for any assistance.