Hi Jonathan,
We are using RavenDB as our read store and ran into similar problems - needing to batch the writes. A write per event was fine with a small amount of data but over time full rebuilds are becoming painful. We also started to have some batch jobs that would dump a heap of events into the system that would often update both many small documents and also update some aggregated value in some big document - being able to write in batches makes this much faster.
The model we have taken is to have two sets of in memory queues for catch up.
Events are taken from EventStore subscription (we just subscribe to all for the moment). All events are run through a grouping function that routes events to the queue(s) for the appropriate destination document(s).
The processor for the per document queues does the following:
-
Grab the next batch of events
-
Load the document (possibly from cache)
-
Apply the events that are queued for that document.
-
Put the document on the write queue.
The write queue then gets batches of documents and writes them in one round trip to Raven (in your case it would be elastic).
So in this way we are getting events applied to documents in batches as well as documents written in batches.
The code is open source - you can see the main piece here:
https://github.com/adbrowne/Eventful/blob/master/src/Eventful.RavenDb/BulkRavenProjector.fs
Using this method we have taken a full production rebuild from 30 mins down to 3 mins and there is still plenty of room for improvement. In particular the queue that does the grouping needs some optimization. One nice thing we have found is that during a rebuild we can set the consumer of the Raven queue to write batches infrequently (maybe once every few seconds) and it just means bigger batches in both queues.
Also each queue is bounded which should cover your concerns about memory - this means if either queue fills up then the subscriber is slowed down - you can then adjust the queue sizes to match the memory available in your environment.
The code is not quite in production yet but should be sometime in the next two weeks.
The interface is quite simple and I am hoping to support things other than Raven. Neo4j is probably next but elastic is definitely on the list of things I would like to support.
cheers
Andrew