Patterns against Data Loss?

David_Brower · March 1, 2016, 11:41am

We have recently started looking at refactoring our current monolithic system to microservices (hosted in Docker) using EventStore as a message bus.

The main question that we are trying to answer is: can we design a system using EventStore to prevent, as much as is possible, any loss of business data?

One scenario that we have discussed is where a service is consuming (assume here catch-up subscription) events from Stream A, performing a resource-intensive calculation (for example), and publishing events to Streams B and C that are, in turn, subscribed to by other services. However, in between an event appearing on Stream A and the resulting events being published to Streams B and C the connection to EventStore is lost and a circuit breaker keeps these events on a queue. Then to make things even worse, the container with the service crashes meaning that the events are lost without having been published to EventStore.

Is there a recommended pattern for dealing with this kind of scenario? We discussed the possibility of each service publishing a ‘receipt’ for each event consumed after successfully publishing to Streams B and C so that if the service or the container crashes we can then have a new container with that service check the last event successfully processed and begin from there. However, what if the service/container crashes between the event being published to Stream B and it being published to Stream C?

Greg_Young1 · March 1, 2016, 11:44am

"
Is there a recommended pattern for dealing with this kind of scenario?
We discussed the possibility of each service publishing a 'receipt'
for each event consumed after successfully publishing to Streams B and
C so that if the service or the container crashes we can then have a
new container with that service check the last event successfully
processed and begin from there. However, what if the service/container
crashes between the event being published to Stream B and it being
published to Stream C?"

See persistent subscription.

Greg

David_Brower · March 1, 2016, 11:55am

Thanks, Greg.

Using persistent subscription, does it make sense to only send back an acknowledgement for an event to the EventStore once all processing in connection with that event has completed? So, in the scenario I mentioned: only send an acknowledgement for an event that appeared on Stream A once the resulting events have been successfully published to Streams B and C?

Greg_Young1 · March 1, 2016, 12:03pm

"Using persistent subscription, does it make sense to only send back
an acknowledgement for an event to the EventStore once all processing
in connection with that event has completed? So, in the scenario I
mentioned: only send an acknowledgement for an event that appeared on
Stream A once the resulting events have been successfully published to
Streams B and C?"

If the operation timesout (eg you went down) and no ACK then it will
be retried. Also for writing to streams B & C you can do this
perfectly if you can use deterministic ids for the message ids

David_Brower · March 1, 2016, 12:06pm

Do you have any links for using deterministic ids? I’m not entirely sure what you mean.

Many thanks.

Greg_Young1 · March 1, 2016, 12:16pm

http://stackoverflow.com/questions/2642141/how-to-create-deterministic-guids

Its basically so you can when writing to B or C have the same message
id if you need to retry etc

David_Brower · March 1, 2016, 2:50pm

Thanks, Greg. This answers a few questions for us.

David_Brower · March 1, 2016, 3:52pm

I’m going to post another question on deterministic ids, as I think this might be helpful for others too.