(Another) question on the appropriate usage of ES projections

Ryan_Palmer · June 8, 2019, 11:40am

Hi all,

I have been combing through the topic history and found lots of similar questions, but most drop off before reaching a conclusion (at least one I understand as a newbie!).

If I wish to fold over some events to derive a state, I can do this by either

pulling all events from ES to a client and projecting them into a model there (lets call it client side projection)
creating a projection using ES, either over the API or in the web portal (lets call that server-side projection) then pulling the model from the resulting URL
I understand that ultimately the derived state would probably want to be cached in a queryable store on the client, but that could be done by using either technique.

My instinct would have been to let ES deal with partitioning and filtering the streams and deriving state, with the client pulling this and updating its local read store.

However, I have read many posts saying ‘don’t create lots of ES projections’.

This is presumably because the load of maintaining them is all on the ES instance and there may be many consumers with different needs, so you are better off letting them control how they interpret the events and keep things pretty plain on the ES side, just serving up events in broad buckets like $all, $category, by type etc etc. Is that the logic behind this advice?

But then, on the contrary I see articles on how ES projections are great for temporal queries, stream partitioning, building state from many aggregates (streams) and caching it.

Most of the posts I find on this topic either end with another question like ‘why / what is this for’ which doesn’t get answered or a ‘you can do x y or z but I wouldn’t’ with little explanation what you would do instead and most importantly why.

I guess you could summarise it like so:

"When would I use server projections vs client projections and why?’

Cheers

Ryan_Palmer · June 8, 2019, 11:45am

I should probably clarify - by ‘server’ I mean ES itself, and by ‘client’ I mean any downstream consumer, be it another web server or a mobile app or anything

marc_koutzarov · June 8, 2019, 12:05pm

You could implement a CQRS patern where you save readable models in a noslq database. The Subcribers in that stream check your “STATE” change and twite it to a DB. Or is this not an option ?

Ryan_Palmer · June 8, 2019, 12:09pm

Hi marc,

Thanks for your reply :). Unfortunately this is another case of (at least to me) the answer going off topic. I want to know, plainly and simply, the reasons why I would use the server projections vs locally derived projections as the source of my read model, however that model is used or stored afterwards.

Raith_Munro · June 9, 2019, 9:40am

Hi Ryan,
Well explained question. I believe I’ve asked such a question myself in the past.

I too read the advice “don’t use ES Projections to materialise your aggregates” and I also questioned “why not?”

And I still don’t have a definitive answer.

However, a projection will NOT give you:

control over how you ‘snapshot’ your state.
a means of skipping an event if a subsequent event has marked it as cancelled (which is one way of dealing with correcting history).
control of what history to replay in the event of resetting a projection (especially if it emits to other streams).
any choice as to what language to use - just JavaScript.
any assist in reusing projection code, as there is no import/load feature that seems to work.

Also, a projection is very unforgiving if its emitted streams are written to by anything else, or if its emitted stream is deleted. So recovering from some issues may require streams to be soft-deleted and projections reset, or even to use new stream names!

That will not be an exhaustive list as I still consider myself an event-sourcing newbie.

There is no single right answer as to how to build a solution from a set of tools and platforms. I think the most important thing is that you have an understanding of the strengths, weaknesses, and risks - i.e. a good awareness of what you don’t know. Then you can make informed decisions whether other people like them or not, and whether or not they turn out to be a ‘good’ decision - which might even be a bad fit but which leads to some positive learning (clouds and silver linings).

So, with all that in mind, the path I took was…

I AM using ES Projections to materialise my aggregates.
categorised event stream per aggregate.
projection per category.
my projections emit the materialised aggregate to new streams.
I subscribe to my aggregated streams and load the emitted state into MongoDB, overwriting any previous state.
when correcting history I have to take that aggregate offline, rewrite the entire event stream ‘disabling’ (not deleting) any event previous erroneous event, and inserting corrected ones as necessary.
my downstream aggregate materialisation subscribers have to be idempotent enough to cope with “Groundhog Day” (projection resets).

Furthermore, I have come up with a (suits me) TypeScript approach which lets me…

specify versions of Commands and Events.
provide revisions of handlers which materialise my evolving events (multiple handlers may be required to cover the full history of how an event has evolved).
combine revisions of handlers to form a versioned aggregate.
allow for multiple materialisations of the same aggregate event stream, into multiple emitted aggregate streams of course.

This TypeScript approach has meant that I can…

create type-safe event handling code.
re-use projection code (my handlers) because tsc (TypeScript compiler) will produce me a single JavaScript file irrespective of the sources.
load my resultant JS projection (single JS block from multiple TS sources) into ES via HTTP API.
support API versioning at both ends of my Materialisation pipeline (versioned events lead to versioned aggregates - but the handlers in-between can either translate breaking changes or soak them up thereby protecting one end from chage).
feed into templated generation of further server-side (Node) or client-side (browser) code.

So far, I have found this to work well for me, but my work has been largely R&D and I have not produced a production system with this yet.

And whether or not others would agree with this approach remains to be heard. I suspect that many would shoot this down in flames. But does that mean it wouldn’t work for me? Or just that it wouldn’t work for them?? Or just that they think it wouldn’t work for them???

I wish that my TypeScript-driven projection generator was further along so that I could share it with you, but it is still only a rig for lab work.

This is still not an answer to your/our question, but I hope it fuels the discussion in a helpful way. I for one would be very happy to discuss further.

Cheers,

Raith

Ryan_Palmer · June 9, 2019, 10:11am

Hi Raith,

That is a most excellent answer, thank you so much!

I totally agree that there is no absolute right / wrong technique which is why I really appreciate you rationalising your approach, I can see not only what you have settled on but why you made those decisions.

I had a similar model to yours in mind for my application, although you are much further along. The main thing I haven’t even begun to really consider properly is the versioning of projections and events, which you have clearly thought about and it sounds like you have a great little system going on there.

I’m also really interested in your Typescript approach. I am making a mobile app (and eventually a supporting website) which will share a common back end, all written in F#. The web side of it will be using the SAFE stack which employs Fable to transpile F# into Javascript on the fly, I wonder if I could do something similar to write my projections in F#. Hmmm.

I have stitched myself up a bit really as I am an experienced mobile dev trying to learn web dev and event sourcing at the same time which is challenging to say the least! I have a bit of downtime at work though so this is mainly a testbed for now to stretch myself a bit and try new things. I have a couple of other related questions which I have posted recently if you fancy chipping in there too haha

Thanks again for the great input, cheers!

Ryan

Joao_Braganca · June 12, 2019, 9:11am

I too read the advice “don’t use ES Projections to materialise your aggregates” and I also questioned “why not?”
And whether or not others would agree with this approach remains to be heard. I suspect that many would shoot this down in flames. But does that mean it wouldn’t work for me? Or just that it wouldn’t work for them?? Or just that they think it wouldn’t work for them???

Hate to be that guy but… how do you deal with eventual consistency here? ES projections are not fully consistent, but your aggregates must be.

Ryan_Palmer · June 12, 2019, 9:33am

That is a good point. I was asking in the context of read models rather than aggregates personally.

Although, thinking about it, even if you pull the events straight from the store and materialise them yourself, someone else may commit an event in the meantime whilst you are processing, and cause your optimistic write to fail afterwards.

Wouldn’t this be the same if you used ES projections for aggregates? You would just try to write your events and get an optimistic concurrency exception.

Raith_Munro · June 12, 2019, 11:32am

Our event sourcing is the back-end to a system that it intended to support offline working, so we are embracing eventual consistency in the same bitter-sweet way that we will accept that multiple users can legitimately make conflicting requests, and the computer cannot reasonably say “no” to any of them.

Therefore (almost) all of our commands are expected to be processed asynchronously and possible over an extended time. Our UI has to cope with this, which also allows us to take our command processing offline whilst commands continue to be queued, without changing (other than slowing) the overall flow. The UI uses a fake-it-til-you-make-it approach whereby it assumes its data will all be accepted and augments cached server data with generated data, until the server confirms either way.

This is of course a problem for state validations throughout workflow, which depends on the materialised aggregates. Which can cause workflows to fail (invalid states), or to ‘succeed’ when they shouldn’t have. We retry cases of failed state, to accommodate the 'eventual’ness of consistency. And simply have to deal with the conflicts arising from allowing commands to be processed when the state should not have allowed it.

It sounds bad compared to the always-connected, always-live, always-known, always-reliable statefulness of what is assumed from apps using a strict consistency model. But in an app which allows for offline working and eventual-integration with 3rd party systems, we have accepted the consistency weaknesses as enablers for other features.

If it can go wrong, then it WILL go wrong. And you can’t realistically stop it going wrong. So make sure you can contend with wrong-ness. Or even better, change how you look at things, so it’s not really “wrong” after all - just different from what you’d prefer…

I’d like to say that it’s working for us, but this is still in development, so it’s still somewhat academic. I’ll let you know when I know!

Raith

Raith_Munro · June 12, 2019, 11:41am

Hate to be that guy but… how do you deal with eventual consistency here? ES projections are not fully consistent, but your aggregates must be.

Thanks Joao. I guess that’s the answer as to why not to use ES projections if you demand immediate consistency from event to aggregate.

Due to other considerations (posted separately) we are currently of the mind that we can cope with our aggregates being behind. It’s not without pain, and we may need to change our minds about it, but for now we’re trying to be as accepting of complications as possible rather than trying to rule them out.

The less that kills us, the more that makes us stronger

Ryan_Palmer · June 12, 2019, 11:42am

@Raith, that’s really interesting, as I am making an occasionally connected mobile app which shares many of the same design concerns.

I have a thread going here about it. Essentially though I did a similar thing to you and also Greg in his presentation on offline working.

I allow commands to be executed whilst offline from the master book, but cache the commands and pending events they generate. When reconnecting I pull the up to date master, try to re-execute all the commands and deal with any conflicts as they come up.

I say ‘do’, that is the plan… I’m getting there slowly but surely haha.

Raith_Munro · June 12, 2019, 11:52am

Somedays the only way to cope is to imagine myself…
in a post-apocalyptic wasteland,

with the fallacies of modern computing crumbling around us,

surrounded by broken devs nursing their broken systems,

whilst I stand atop my rickety tower of barely-coupled services which is being swarmed over by an army of nano-scripts weeding out inconsistencies and raising remedial tasks,

cackling like a madman

as a vortex forms above me and threatens to consume all of reality as we know it.

That’s not quite in line with my job description, but it helps me stay sane.

How do you describe your job?

Ryan_Palmer · June 12, 2019, 12:05pm

hahaha that’s great.

I’m not sure I can compete with the lucidity of your description but I can certainly relate

I am regularly trying to convince people that the world isn’t actually a linear sequence of operations and we are fooling ourselves and creating fragile systems by pretending it is.

It is another reason I mainly use F# these days with actors communicating async and managing their own state - it is a much closer model to the real world in my eyes, and much safer.

Every time I see my colleagues dealing with a null ref or tracing some weird chain of mutable effects I die a little inside. They are coming around though, if only because I am always enjoying myself and they are always stressed.

The main thing I am having to learn quickly is to keep some sanity to the message passing, and since finding out about ES recently and the concepts of process managers / sagas I feel I am on the right track.

Raith_Munro · June 12, 2019, 12:16pm

Sagas…

“it’s easy”, they said, “if the saga collapses subsequently, then create compensatory events to revert the previous one”

Which is fine when your example is a bank balance or other purely quantitative property. How many of those do you have in your solution?

If service B invalidates the saga, then service A (having already accepted its part) may have since moved on such that it’s no longer acceptable to revert its previous event. How to deal with a failed compensation?

add it to the list…

Ryan_Palmer · June 12, 2019, 12:37pm

Nothing is easy At least you are solving an actual business problem (as in, how do we sync our departments) rather than some artificial problem you have created for yourself.

https://www.youtube.com/watch?v=axEK3x5KIYc

Chris_Condron1 · June 12, 2019, 1:54pm

Hi Ryan,

the tldr answer on why not to use too many internal projections has 2 parts

First: write throughput/load aka Write Amplification

ES benchmarks at something well north of 30k EPS, but … every new event emitted by an internal projection is a new write as well.

so, with 1 projection that emits on every event written (close to worst case) it’s 15k EPS, 2 projections 7.5 etc.

Now most projections don’t write on every event, but we have seen systems with a 17 to 1 write ampfication or better

Using projections for something like CEP if fine where something like the sliding window in a stream is evaluated to and new events are raised by certain conditions, but something where transient state is generated in a projection and written back to the event store generally don’t go well.

Second: Primary vs Transient state

The business events of the system are what I refer to as primary state. They happened are not going to change. If a change is needed, you raise a new event for the change. The life time of this data is very long term and is the primary data the event store is mean to hold.

Something like a read model is transient state, Transient state is generated by applying some logic/functions to a stream of primary data. It can be dropped, corrected, or restated at any time. The lifetime here is very different, and there is a lot more of it. The state of an aggregate is transient for example.
Event Store is designed to hold steams of primary data. If transient data is placed in the event store it is by definition going to need to be cleaned up at some point, the scavenging and index merge processes are effect, but as systems get very large these can take quite some time just due to the amount of time to read the required data.

This all goes back to the core concept of CQRS and optimizing the data storage for the use case. Follow that pattern and things will generally work well.

Notes:

As all of the transient data can be re-created, in memory and simple file caches often work very well for these transient data stores.

Also projections that emit link-to events are much more efficient and really creating new indexes/streams for events and are generally outside of the above.

-Chris

Chris_Condron1 · June 12, 2019, 2:07pm

The crux on saga’s for me is writing the process as a set of potentially failing requests (think of a write ahead log)
PlacementMade becomes PlacementRequested and some time later PlacementAccepted which matches the actual business activity.
or with Greg’s Plow delivery example
OrderComplete

DeliveryRequested

NewTruckRouting

OrderCanceled

DeliveryCanceled

TruckRoutingChanged

This is an area where DDD can really help.

Ryan_Palmer · June 12, 2019, 2:44pm

Hi Chris!

Thanks for another great answer. So glad I asked this question now.

I had considered the performance impact of relying on ES to handle lots of projection logic, it stands to reason really if you centralise the computation for potentially manually different consumers in one place.

Regarding the transient vs primary state - I think see what you mean. I am considering things like the ‘xbox one count in all orders’ example given on the ES website.

It is maintaining a read state by spanning multiple aggregates, and I guess represents something like a sales reporting feed in a production system.

Would you still avoid that kind of thing in favour of pulling all the sales events to your client and projecting there?

Raith_Munro · June 12, 2019, 2:44pm

Thanks Chris, all good points.

[sorry, wandering off-topic here]

Regarding the saga example, the delivery could potentially be made whilst there is a pending request to cancel it. That service could receive the DeliveryCanceled command/event after another event from the delivery driver has transitioned the state to Delivered. The state of the delivery no longer supports cancellation.

At this point perhaps a new workflow needs kicking off to collect the unwanted delivery. Or perhaps, having been delivered, the company’s policy is that the order cancellation is no longer valid - although if that were the case then I expect they would be the other way round in the saga (cancel the delivery first, and only allow the order to be cancelled if successful).

Either way, it still seems to me that this saga might not be able to execute satisfactorily.

As I lack practical experience in this I initially thought that the saga would just need further extension to cope with these other what-if scenarios. But resolving those could also go wrong. In your experience, is it better to have sagas which try to cover every eventuality, or to keep the saga simple (just the intended transaction and best-case compensations), and to bow out when these (hopefully rare) fail conditions crop up?

Ryan_Palmer · June 12, 2019, 2:47pm

Absolutely, and kinda what I was getting at when I said at least Raith is solving business problems when creating sagas, including the error handling, as the way you handle contexts getting out of sync is a domain problem through and through.