Projections - how many is too many?

How many projections is reasonable? Let’s see I have a few hundred thousand user accounts, and a hundred million streams comprising the various assets and operations within each account. Is it reasonable to have a projection or five per stream? What performance limits should I be aware of? What if I hypothetically wanted to build a system to cater for the individual preferences of a billion users? Each user would probably need a few different projections to produce personalised aggregates. Is this feasible at all?

Those are some very broad questions. My guess (not having built anything near this size, you’re basically talking about Facebook here) is that storage will be fairly easy (as a completely isolated “problem”). Typically projections are stored in a relational database or a document database. CQRS somewhat promises that you can have as many read stores as you need. You will however run into serious problems when those 1 billion users try to access your site at the same time…

Is there a limit of projections per instance?

No but I would question what you are doing that this question comes up.

I migrated an app of my company from my custom-sql-server-event-store to your Event Store, and now I am starting to build projections for new use cases. The majority of the use cases just require to repartition current streams with linkTo. But I have a few projections that changes their state when new events arrives, and then emits new events when a certain condition is met Those are mission crtitical bussiness logic that are now on top of the Projections library. I do not know how many more will I have in the future (maybe a hundred?, now I have 20 projections), but I do not want to hurt the performance and the reliability of the system just to have all events and projections of the system in one db, so I was thinking that at certain point I should subscribe to events from another instance of the event store and start projecting as I need without worries…

100 projections seems like a lot, I would question exact use-cases and
whether things like stateparition on a single projection can be used.
Projections are quite like stored procs they should be used sparingly.

Use case one: Joining Users and Invoices in one stream from which subscribe to denormalize a read model

// Projection #1
fromCategory('userCategory')
.foreachStream()
.whenAny(function (s, e) {
    linkTo('allUsersStream', e);
});

// Projection #2
fromCategory('invoiceCategory')
.foreachStream()
.whenAny(function (s, e) {
    linkTo('allInvoicesStream', e);
});

// Projection #3
fromStreams([
‘allUsersStream’,
‘allInvoicesStream’
])
.whenAny(function (s, e) {
linkTo(‘UsersAndInvoicesReadModelStream’, e);
});

How many events/second are you writing? What is your total db size?

Is just 2 Gb right now. We are just writing 10/s

Why optimize then? Just use fromall and filter on the subscriber.

Im sorry, is just 1 event per second

I just did not want to ignore a lot of events in a subscriber that just denormalize some events. I though that projections are like store procedures an triggers. You can have thousands of store procedures actually.

What about this use case when I use the state in order to join some events into a single event?// Collecting info from events per stream

fromCategory('invoicesCategory')
.foreachStream()
.when({
    'newInvoiceWithPartialInfoReceived': function (s, e) {
        s.invoiceId = e.data.invoiceId;
        s.info1 = e.data.info1;
        s.info2 = e.data.info2;
    },
    'invoiceWithLastPartialInfoReceived': function (s, e) {
        var id = e.data.invoiceId;
        if (e.info3 !== undefined && e.info4 !== undefined) {
            // Conditions are met, is safe to publish an event
            emit('invoiceReportCategory-' + id, 'fullInvoiceReceived', {
                "invoiceId": id,
                // Info from previous events
                "info1": s.info1,
                "info2": s.info2,
                // Info from this event
                "info3": e.info3,
                "info4": e.info4,
            });
        }
    }
});

// Output a single stream to subscribe from in order to denormalize invoice reports
fromCategory('invoiceReportCategory')
.when({
    'fullInvoiceReceived': function (s, e) {
        linkTo('allInvoicesWithFullInfoStream', e);
    }
});

What are you hoping to gain? From above you are hoping to gain network
efficiency, at 1 event/second its not worth optimizing.

This projection:

// Output a single stream to subscribe from in order to denormalize
invoice reports
fromCategory('invoiceReportCategory')
.when({
    'fullInvoiceReceived': function (s, e) {
        linkTo('allInvoicesWithFullInfoStream', e);
    }
});

Duplicates a more general internal projection.

Greg

You are right Greg, maybe it is a premature optimization, but with that piece of javascript my handlers do not need to “filter” any events, and that was cool to do, until now. So… should I say that a hundred projections is too many? What could happen in the worst case? Will the database degrade gracefully (SEDA-like)? Or will just crash?

degrade gracefully into the abyss.

That´s good to know, thanks again :slight_smile:

how can I can approximate what is “too many projections”?

12,000 write/s * 86400 * 500 = 500gb/day

raidz3, 24x10TB, 168TB usable 10k spindle, index on raidz3 10TB SSD

xeon gold 36core 512G ram

I don’t have the mental representation of the projection system so I can’t identify the eventual bottleneck.