Benefits of eventstore projections over the projections written in the application

azhar.k · March 14, 2022, 7:47am

I am new to the CQRS and event sourcing pattern. Currently, I am working on POC to demonstrate the event sourcing by using Eventstoredb. The primary objective is to check how this can help us to solve some of the technical challenges related to a business requirement.
I have doubts about the projections.
What are the benefits of using the projection feature offered by eventstore over the projections written in the application code? (The application will be listening to the events to maintain materialized views)
Are there any benefits related to the consistency of the views?

yves.lorphelin · March 18, 2022, 1:59pm

The primary usage of the projection feature is to create new streams based on existing one. essentially indexes with the ability to transform data.

In term of consistency, they will be behind as well.

Now in your case it’s seems to be better to have a cacth-up subscriptions & then build the read model in the store of your choice ( mind , this can even be in memory at first ! )

Those are a few examples:

github.com

EventStore/samples/blob/main/CQRS_Flow/.NET/Core/Core.ElasticSearch/Projections/ElasticSearchProjection.cs

using System;
using System.Threading;
using System.Threading.Tasks;
using Core.ElasticSearch.Indices;
using Core.Events;
using Core.Projections;
using Elasticsearch.Net;
using Microsoft.Extensions.DependencyInjection;
using Nest;

namespace Core.ElasticSearch.Projections;

public class ElasticSearchProjection<TEvent, TView> : IEventHandler<StreamEvent<TEvent>>
    where TView : class, IProjection
    where TEvent : notnull
{
    private readonly IElasticClient elasticClient;
    private readonly Func<TEvent, string> getId;

    public ElasticSearchProjection(

This file has been truncated. show original

github.com

EventStore/training-advanced-dotnet/blob/master/Scheduling/Application/AvailableSlotsProjectionV2.cs

using Scheduling.Domain.DoctorDay.Events;
using Scheduling.Domain.ReadModel;
using Scheduling.Infrastructure.MongoDb;
using Scheduling.Infrastructure.Projections;

namespace Scheduling.Application;

public class AvailableSlotsProjectionV2 : EventHandler
{
    public AvailableSlotsProjectionV2(MongoDbAvailableSlotsRepositoryV2 availableSlotsRepository)
    {
        When<SlotScheduled>((e, m) =>
            availableSlotsRepository.AddSlot(new MongoDbAvailableSlotV2(
                e.SlotId.ToString(),
                e.DayId,
                e.SlotStartTime.ToString("dd-MM-yyyy"),
                e.SlotStartTime.ToString("h:mm tt"),
                e.SlotDuration,
                false
            )));

This file has been truncated. show original

github.com

Eventuous/eventuous/blob/dev/src/Mongo/src/Eventuous.Projections.MongoDB/MongoProjection.cs

using System.Runtime.CompilerServices;
using Eventuous.Projections.MongoDB.Tools;
using Eventuous.Subscriptions.Context;
using static Eventuous.Subscriptions.Diagnostics.SubscriptionsEventSource;

namespace Eventuous.Projections.MongoDB;

[PublicAPI]
public abstract class MongoProjection<T> : BaseEventHandler where T : ProjectedDocument {
    protected IMongoCollection<T> Collection { get; }

    readonly Dictionary<Type, ProjectUntypedEvent> _handlersMap = new();
    readonly TypeMapper                            _map;

    protected MongoProjection(IMongoDatabase database, TypeMapper? typeMap = null) {
        Collection = Ensure.NotNull(database).GetDocumentCollection<T>();
        _map       = typeMap ?? TypeMap.Instance;
    }

    readonly UpdateOptions _defaultUpdateOptions = new() { IsUpsert = true };

This file has been truncated. show original

azhar.k · March 19, 2022, 6:04am

@yves.lorphelin Thank you very much for the clarification, It helped a lot.
It will be great If you can comment on the following queries as well.

The following is my use case.
I have an aggregate. A straem in the event store represent an aggreggate. Whenever it changes its state, an event will be created and saved into eventstore… To get the latest state of the aggregate, I replay all the events and apply them on the aggregate.
Now I want to calculate the average time taken by a given aggregate to change its state from state 1 to state 4 or state 1 to state 2 or state 3 to state 4 etc. This is for generating a report.

Should I use eventstore projections or application projections for this use case? Which one will you suggest?

Note: There will be millions of aggregate. Each aggregate will have at least 5 or 6 state change events.

I have 1 more doubt.
If I store states in a projection, how costly will be the memory consumption? eg: I have created a projection to listen to a certain type of event, save one of the fields of the event to an array in the projection state. The event will be received for every aggregate. My aim is to run this projection continuously and query the projection state whenever I required the list of the field for all the aggregate.
The following is a sample state.
{
matched_ids:[
“8475487564”,
“5548545545”,
“7876876676”,
“4545646555”,
“4545455455”,
“0898987777”
]
}
This state is only for a few aggregate. The list will grow when I have more aggregates. Is it practical to keep such a projection and query its state if I have millions of aggregate (1 stream represents an aggregate)?

yves.lorphelin · March 19, 2022, 11:14am

memory consumption

It’s gonna be approximatively equal to the state size in bytes.
Keep in mind that since the state is kept as an event in the database , there is a limit how big it can be ( ~16MB) .
What you need to consider here is also the fact that this will grow your database size , the state will be kept in a system managed stream and appended to that stream on a regular basis.

This is for generating a report.

Reports tends to be point in times, not really needing a near-real time update.
Also , the SLA for reports tends to be different than the rest of the system ( I.e they can be down a few hours, while other processing not)
They also tend to evolve at a different pace than the rest of the system.
This makes me tend to have reporting generation features of the system in seperate processes, with their own SLA.

azhar.k · March 19, 2022, 2:43pm

@yves.lorphelinThanks for the clarifications.
1 more question.
I am using 1 stream per aggregate. The stream name will be the aggregate unique identifier. The aggregate has 1 more unique field. The value for that field is updated through a command. When the command is received at the write component, I need to do a validation of the unique field value across the entire aggregate space. No two aggregates can have the same value for that field.

What could be the best way to do such a validation?

The following are some of the options I figured out.

Validate at write component: Maintain a small read store at the write component which holds the values of that unique field of all the aggregates. When the command is received, validate the value against that read store. If the validation is successful, create the respective event and save it into the eventstore.
Validate at write component: Maintain a seperate stream (by using event type projection) for holding all the events that change the value of that unique field. When the command is received, replay all the events from the eventstore and heck the new value against the existing values.
This method can have performance impact when there are millions of events. (Can we use snap[shotting to resolve the performance issue here? If yes, could please share a good documentation of the snapshotting in java?)
Validate at read component : Once the command is received, continue with the event creation and save it into the eventstore. There will be a read component which is listening to the events and maintaining a materialized view. If the value for that unique field is invalid, the read component will send an exception event to the write component to invalidate that action. But this approach is very complicated and can cause many problems due to the eventual consistency.

Is there better methods?

yves.lorphelin · March 19, 2022, 5:32pm

Just to be sure I understand your stream naming convention it will be [AggregateType]-[Identifier] or just [Identifier] ?
(most of the time stream names follow the convention of [AggregateType]-[Identifier] )

now uniqueness on a secondary value is a huge topic and is very domain related
there are ways to achieve this; Reservation pattern, stream naming, …
I get this type of requirement a lot and more often than not found out that most of the time it does not need to be unique , and if it does not even right away.
That’s where challenging requirments need to be done.
why does that second field need to be unique, what does it represent ?
If it’s unique, why is it not the identifier ?
If it’s unique why don’t we have that value at the creation of the entity ?
Sometimes just a post check is enough , i.e detect duplicate values & have some compensating action

azhar.k · March 20, 2022, 6:08pm

Thanks again @yves.lorphelin. Your comments are really insightful.
It is just [Identifier]. Currently, I am storing only one type of aggregate in the event store. So I thought we don’t have to add aggregate type to the stream name.

By post check, do you mean that something like I have given as the 3rd approach in my previous comment, is enough?

yves.lorphelin · March 21, 2022, 10:18am

It is just [Identifier] . Currently, I am storing only one type of aggregate

I would revise that, before you known it you’ll have more

By post check, do you mean that so

No, blocking the append on a read model is gonna hurt throughput.
What I’m suggesting is having a different process, building a (private) read-model that is dedicated to check duplication and that can trigger corrective actions , either automatically or through some manual intervention. essentially a watchdog of some sort.
the best way to achieve what you want is highly dependant on your domain, throughput …

Where does that second unique field comes from ?
If it comes from some external system you could just trust it?

Here are some thoughts on the matter:

azhar.k · March 22, 2022, 10:50am

@yves.lorphelinThanks. It helps.
I my case, the second unique field an either come from an external system or one of our internal application will produce it. It will depends on the customer.

I would revise that, before you known it you’ll have more

As of now, I am planning to convert only 1 microservice to event sourcing. It is dealing only with 1 type of aggregate.
We also have other microservices which interact through events and maintaining their own aggregate. Those services will continue the traditional approach.
But do you think that the real benefits of event sourcing are achieved only when we migrate all the major services to event sourcing? Am I missing any fundamental concepts here?

yves.lorphelin · March 29, 2022, 6:53am

But do you think that the real benefits of event sourcing are achieved only when we migrate all the major services to event sourcing? Am I missing any fundamental concepts here?

Well, in terms of coordinations between services, the one that are event sourced will benefits of the timeline aspect & most of consumers the capability to have a definitve chekpoint (i.e the position of any event in the consumed streams)
In terms of adding/ modifying / removing capabilities to the system , the more is event sourced, the more I find it easy.

Something to think about is in what order to migrate them to event sourcing.

azhar.k · March 29, 2022, 8:39am

@yves.lorphelin Okay. Thanks very much for all the help.