Projections - how many is too many?

Nathan_Ridley · June 4, 2014, 6:44am

How many projections is reasonable? Let’s see I have a few hundred thousand user accounts, and a hundred million streams comprising the various assets and operations within each account. Is it reasonable to have a projection or five per stream? What performance limits should I be aware of? What if I hypothetically wanted to build a system to cater for the individual preferences of a billion users? Each user would probably need a few different projections to produce personalised aggregates. Is this feasible at all?

Daniel_Lidstrom · June 4, 2014, 8:48am

Those are some very broad questions. My guess (not having built anything near this size, you’re basically talking about Facebook here) is that storage will be fairly easy (as a completely isolated “problem”). Typically projections are stored in a relational database or a document database. CQRS somewhat promises that you can have as many read stores as you need. You will however run into serious problems when those 1 billion users try to access your site at the same time…

Darien_Narvaez · September 8, 2016, 4:46pm

Is there a limit of projections per instance?

Greg_Young1 · September 8, 2016, 5:21pm

No but I would question what you are doing that this question comes up.

Darien_Narvaez · September 8, 2016, 6:08pm

I migrated an app of my company from my custom-sql-server-event-store to your Event Store, and now I am starting to build projections for new use cases. The majority of the use cases just require to repartition current streams with linkTo. But I have a few projections that changes their state when new events arrives, and then emits new events when a certain condition is met Those are mission crtitical bussiness logic that are now on top of the Projections library. I do not know how many more will I have in the future (maybe a hundred?, now I have 20 projections), but I do not want to hurt the performance and the reliability of the system just to have all events and projections of the system in one db, so I was thinking that at certain point I should subscribe to events from another instance of the event store and start projecting as I need without worries…

Greg_Young1 · September 8, 2016, 6:12pm

100 projections seems like a lot, I would question exact use-cases and
whether things like stateparition on a single projection can be used.
Projections are quite like stored procs they should be used sparingly.

Darien_Narvaez · September 8, 2016, 6:38pm

Use case one: Joining Users and Invoices in one stream from which subscribe to denormalize a read model

// Projection #1
fromCategory('userCategory')
.foreachStream()
.whenAny(function (s, e) {
    linkTo('allUsersStream', e);
});

// Projection #2
fromCategory('invoiceCategory')
.foreachStream()
.whenAny(function (s, e) {
    linkTo('allInvoicesStream', e);
});

// Projection #3
fromStreams([
‘allUsersStream’,
‘allInvoicesStream’
])
.whenAny(function (s, e) {
linkTo(‘UsersAndInvoicesReadModelStream’, e);
});

Greg_Young1 · September 8, 2016, 6:39pm

How many events/second are you writing? What is your total db size?

Darien_Narvaez · September 8, 2016, 6:44pm

Is just 2 Gb right now. We are just writing 10/s

Greg_Young1 · September 8, 2016, 6:46pm

Why optimize then? Just use fromall and filter on the subscriber.

Darien_Narvaez · September 8, 2016, 6:48pm

Im sorry, is just 1 event per second

Darien_Narvaez · September 8, 2016, 6:54pm

I just did not want to ignore a lot of events in a subscriber that just denormalize some events. I though that projections are like store procedures an triggers. You can have thousands of store procedures actually.

Darien_Narvaez · September 8, 2016, 7:17pm

What about this use case when I use the state in order to join some events into a single event?// Collecting info from events per stream

fromCategory('invoicesCategory')
.foreachStream()
.when({
    'newInvoiceWithPartialInfoReceived': function (s, e) {
        s.invoiceId = e.data.invoiceId;
        s.info1 = e.data.info1;
        s.info2 = e.data.info2;
    },
    'invoiceWithLastPartialInfoReceived': function (s, e) {
        var id = e.data.invoiceId;
        if (e.info3 !== undefined && e.info4 !== undefined) {
            // Conditions are met, is safe to publish an event
            emit('invoiceReportCategory-' + id, 'fullInvoiceReceived', {
                "invoiceId": id,
                // Info from previous events
                "info1": s.info1,
                "info2": s.info2,
                // Info from this event
                "info3": e.info3,
                "info4": e.info4,
            });
        }
    }
});

// Output a single stream to subscribe from in order to denormalize invoice reports
fromCategory('invoiceReportCategory')
.when({
    'fullInvoiceReceived': function (s, e) {
        linkTo('allInvoicesWithFullInfoStream', e);
    }
});

Greg_Young1 · September 8, 2016, 9:45pm

What are you hoping to gain? From above you are hoping to gain network
efficiency, at 1 event/second its not worth optimizing.

This projection:

// Output a single stream to subscribe from in order to denormalize
invoice reports
fromCategory('invoiceReportCategory')
.when({
    'fullInvoiceReceived': function (s, e) {
        linkTo('allInvoicesWithFullInfoStream', e);
    }
});

Duplicates a more general internal projection.

Greg

Darien_Narvaez · September 9, 2016, 11:59am

You are right Greg, maybe it is a premature optimization, but with that piece of javascript my handlers do not need to “filter” any events, and that was cool to do, until now. So… should I say that a hundred projections is too many? What could happen in the worst case? Will the database degrade gracefully (SEDA-like)? Or will just crash?

Greg_Young1 · September 9, 2016, 2:45pm

degrade gracefully into the abyss.

Darien_Narvaez · September 9, 2016, 4:49pm

That´s good to know, thanks again

Poule_Dodue · February 8, 2018, 10:59am

how can I can approximate what is “too many projections”?

12,000 write/s * 86400 * 500 = 500gb/day

raidz3, 24x10TB, 168TB usable 10k spindle, index on raidz3 10TB SSD

xeon gold 36core 512G ram

Poule_Dodue · February 8, 2018, 4:21pm

I don’t have the mental representation of the projection system so I can’t identify the eventual bottleneck.