What Should Projections Do

Kasey_Speakman1 · January 3, 2013, 1:48am

So, I’ve been trying to wrap my head around the purpose of the Projections API. The term Projection itself leads me to believe it’s for generating read models. The documentation (although I know it’s currently sparse) doesn’t seem to suggest a particular use aside from indexing. (huh?). Some of the videos I’ve watched mention custom streams (e.g. chats by user) and temporal queries, although I can’t imagine what I would do with those except use them as read models for reports.

So what exactly should projections be used for? Are we supposed to make a filtered/custom-partitioned projections for denormalizers? Or use the projection directly as a read model? (since the events can be folded into a JSON state object) Or use them as a message bus queues grin ?

One thing that makes no sense to me about projections is that there seems to be no straight-forward way to rebuild them. (Something that I’d do during maintenance / software upgrades, because I know it would be expensive.) Editing the projection (if I read correctly) doesn’t rebuild, but only runs the new projection from the current tail forward. (I’m not even sure why this would be desirable.) Dropping and recreating also doesn’t work since projections are “tombstoned” and can’t immediately be recreated unless you do some expiration tricks.

Exposing ignorance in hopes of finding clarity…

Kasey

Greg_Young1 · January 3, 2013, 6:30am

Projections are not heavily documented with reason (they aren’t actually 100% done yet!).

For rebuilding. It depends whether you post or put to the url. That actually is documented

“So what exactly should projections be used for? Are we supposed to make a filtered/custom-partitioned projections for denormalizers? Or use the projection directly as a read model? (since the events can be folded into a JSON state object) Or use them as a message bus queues grin ?”

Now what they are for …

Projections are a form of read model and solve very easily a certain problem that is very hard to solve in other systems. In particular the library supports JavaScript for Complex Event Processing. I know lots of fancy words. Let’s try a concrete example (I am writing a blog post series now on it so let’s use one from there …).

In almost all financial systems there exists a piece of code that builds “candlesticks” for charts. A candlestick represents over a time period 4 price points (the first price, the last price, the highest price, and the lowest price) so we could have 1 minute candlesticks (open, close, high, low … one per minute). Could you write that query for me in SQL server? select * from ticks where select * from ticks … ouch This is a temporal correlation query (its CEP). This is the exact type of problem that projections are meant to solve (and you will find them much more often than you think!).

That system (building candlesticks) is a huge amount of work to do. Not because of the logic of building candlesticks but because of all the other junk that needs to be built into the system (failovers, high availability, clustering, etc). What if I could just write the candlesticking logic in javascript and everything else were handled for me (clustering etc …). This is where projections are positioned. There is an entire category of problems they are very good at handling. They are not a replacement for having a read model, they are just a form of a read model (though we have talked about supporting indexing over the top of streams with data so you can also build a basic read model internally).

Let’s try another example. I have seen no less than 5 custom systems for doing reports off of nservicebus audit queues. Basically they ask three questions of the system.

“I want to see events at time X”

“I want to see events for user Y”

“I want to see events for correlationId C”

Now let’s try doing this with the event store. We would write an adapter to put the messages from nservicebus into the event store (as NServiceBusMessage). Now we would write 2 projections.

fromAll().when(‘NServiceBusMessage’ : function(s,e) { linkTo(e.username, e); });

fromAll().when(‘NServiceBusMessage’ : function(s,e) { linkTo(e.correlationid, e); });

Now if you navigate to /streams/{username} you have all of the messages for that user there, reporting system done (dev which is getting moved to master today even has ability to realtime view streams in browser or use whatever your favorite atom browsing tool is).

Where things really begin to shine though is when we do more so CEP like candlesticks above. Let’s say you have a question about your system. When event X happens then event Y is Z more likely to happen? We could easily write a one off projection for this!

HTH,

Greg

p.s. we have more docs coming on projections (including a blog post series) the lack is deliberate. If you look at the latest builds they are disabled by default and marked as “experimental”

Kasey_Speakman · January 3, 2013, 3:57pm

Thank you for the thorough answer. That is good info.

As far as it being in the docs… On this page ( http://geteventstore.com/docs/projections.html ), under view/edit, it only mentions GET and PUT. POST is mentioned under creation (also enable/disable). Mentally, I wasn’t connecting a create with an update + rebuild.

I’m on the fence about the basic read model (if you were to add indexing). It would be nice to get rid of another technology (read model DB) and would potentially save a lot of denormalizer code. But, it being unexplored country, I wouldn’t even know where the rough edges are or how to design around them.

Projections are not heavily documented with reason (they aren’t actually 100% done yet!).

For rebuilding. It depends whether you post or put to the url. That actually is documented

“So what exactly should projections be used for? Are we supposed to make a filtered/custom-partitioned projections for denormalizers? Or use the projection directly as a read model? (since the events can be folded into a JSON state object) Or use them as a message bus queues grin ?”

Now what they are for …

Projections are a form of read model and solve very easily a certain problem that is very hard to solve in other systems. In particular the library supports JavaScript for Complex Event Processing. I know lots of fancy words. Let’s try a concrete example (I am writing a blog post series now on it so let’s use one from there …).

In almost all financial systems there exists a piece of code that builds “candlesticks” for charts. A candlestick represents over a time period 4 price points (the first price, the last price, the highest price, and the lowest price) so we could have 1 minute candlesticks (open, close, high, low … one per minute). Could you write that query for me in SQL server? select * from ticks where select * from ticks … ouch This is a temporal correlation query (its CEP). This is the exact type of problem that projections are meant to solve (and you will find them much more often than you think!).

That system (building candlesticks) is a huge amount of work to do. Not because of the logic of building candlesticks but because of all the other junk that needs to be built into the system (failovers, high availability, clustering, etc). What if I could just write the candlesticking logic in javascript and everything else were handled for me (clustering etc …). This is where projections are positioned. There is an entire category of problems they are very good at handling. They are not a replacement for having a read model, they are just a form of a read model (though we have talked about supporting indexing over the top of streams with data so you can also build a basic read model internally).

Let’s try another example. I have seen no less than 5 custom systems for doing reports off of nservicebus audit queues. Basically they ask three questions of the system.

“I want to see events at time X”

“I want to see events for user Y”

“I want to see events for correlationId C”

Now let’s try doing this with the event store. We would write an adapter to put the messages from nservicebus into the event store (as NServiceBusMessage). Now we would write 2 projections.

fromAll().when(‘NServiceBusMessage’ : function(s,e) { linkTo(e.username, e); });

fromAll().when(‘NServiceBusMessage’ : function(s,e) { linkTo(e.correlationid, e); });

Now if you navigate to /streams/{username} you have all of the messages for that user there, reporting system done (dev which is getting moved to master today even has ability to realtime view streams in browser or use whatever your favorite atom browsing tool is).

Where things really begin to shine though is when we do more so CEP like candlesticks above. Let’s say you have a question about your system. When event X happens then event Y is Z more likely to happen? We could easily write a one off projection for this!

HTH,

Greg

p.s. we have more docs coming on projections (including a blog post series) the lack is deliberate. If you look at the latest builds they are disabled by default and marked as “experimental”

Greg_Young1 · January 3, 2013, 4:00pm

The main problem is pretty simple with creating a read model. What do you want it to be able to do? I would bet you basically want OLAP.

Kasey_Speakman · January 3, 2013, 4:10pm

The types of read models I do, I would typically use a document db for operational data, and if reporting needs were high, maybe bring SQL in (or instead). I have not used OLAP, although in some cases I should have. But instead, I ended up creating indexes over time period based fields (semesters) and just wrote complicated, staged queries.

Joona · January 14, 2013, 9:28am

I, too, am curious about rebuilding continuous & persistent projections. POSTing to an already existing projection doesn’t seem to be a valid operation. I can only think of appending a “fake” checkpoint under “$projections--checkpoint”, which seemed to work OK when I briefly tested it. I would like to use projections as a light read model but not being able to rebuild the projection seems inconvenient: do I create new, versioned projections, e.g. projection_activecustomers_v1 and then a new version would be projection_activecustomers_v2. Or should I just POST new adhoc projections when my app starts if the eventstore host can handle the memory pressure from that?

Finally, my hat tip to this project. I’m really enjoying going through the source code.

Joona · January 22, 2013, 3:53pm

Any comments on this? Congrats on the 1.0 release, but it seems my question still applies. Say, I remove a property from the state generated in a continuous projection. How can I rebuild the projection so that the removed property is reflected in the output (other than going for an explicit delete state[“property”] in JS)? For now, I’ve resorted to appending a manual checkpoint under the projection checkpoint stream.

Yuri_Solodkyy · January 22, 2013, 5:17pm

Hi,

In the development scenario we will support deleting all the output streams and rebuilding the projection, but this is not as easy in production. Any event emitted by the projection may have been already consumed when you start rebuilding the projection. So, you will likely prefer to handle this on application level.

The way you do rebuilding with manually creating a checkpoint should work if you don’t emit any events from the projection. If you do, you may need to post an event to output streams as well. Otherwise the projection will not write any output events until it reaches its position before the restart.

Look at events emitted by a projection. There is metadata record attached to these events with projection position caused the event.

-yuriy

Alexey_Raga · February 16, 2013, 1:33pm

I am facing sort of the same scenario: fixing a bug in a projection code is not enough to correct the system. New events are processed using new logic, but it doesn’t help as the state is already corrupted.

I see two scenarios here:

There are projections that cannot be rebuilt “safely”. These are projections that emit new events and transform streams into other streams.
The projections which just accumulate some state. These may be used for the UI, etc. These projections, I think, can safely be rebuilt from scratch at any time.

In both cases, of course, there may be exceptions, it all depends on the system logic.

1st case might be tricky, but what about adding another option to rebuild a projection at least for the 2nd case (when emit is disabled)?

Cheers,

Alexey.

Yuri_Solodkyy · February 16, 2013, 4:07pm

Alexey,

We will add options to rebuild projections. However, as you noted if you rebuild a projection that emits events you have to be aware of consequences.

-yuriy