Feedback occasionally-connected-client

New to Event Sourcing in general but I’ve been reading up for the last couple of months as well as participated in Greg Young’s excellent course in London back in December.

Now: We’re in the planning stages for our next web-application and it’s within a domain where the client is very often disconnected for days or weeks at a time. Hundreds of MegaBytes of data needs to be stored. We have solved this in the past using CRUD & MS Sync Framework + IndexedDB but with this new web-app we’re looking into Event Sourcing. Especially because detailed audit is now of big importance.

I’m just looking for feedback for one of our architectural drafts. With a heavy focus on occasionally-connected-clients it seemed reasonable to have the Command Processor & majority of CQRS API just live on the client-side, handled by a Service Worker. Are we shooting ourselves in the foot here? We’re trying to avoid doubling up with a Command Processor and Event Processor on both client & server. The full-worth Event Processor w/access control etc is solely found on the server.

Another controversial(?) aspect is that we’re thinking about overwriting ‘local’ events (and re-creating read data) once connection with server is re-established. Let’s say I change the name of a person while disconnected. The event is put in the Out-queue but it’s also put into my local Event Store as event #10 (this only happens if we’re disconnected) and Read-models are updated. We keep a track of what’s the latest event that has synced-in from server. Later on however we regain connection with the server and we upload our queued event. The catch-up subscription feeds us back events #10-12 from another client that changed the person’s height etc. Later on our sent event is also received but now as event #13. Our plan is to remove all ‘local events’ (re-setting the state basically) since last-server sync and with the new events coming from the server re-create the read models. Eventually we will also get our sent events back (though maybe with a reversal transaction event applied). Any big warning signs for going down this road?

Here’s the current draft (attached image file):

Would appreciate any feedback.

Instead of reverting Client Event Store state (when returning online) which sounds a bit finicky and fragile we will simply make a temporary duplicate of the Event Store when going offline for adding local not-yet-sent events in order to update Read Data (and dump it again once we return online). Seems like a viable strategy since we expect the Client Event Store to be of small subset size and there’s no demand to update server frequently (low traffic, ergo up-to 1 hour between connection attempts is Okay).

tirsdag 7. mars 2017 15.22.47 UTC skrev Erlend Wollan følgende:

Just an update on what we ended up successfully implementing (coding started mid-March with 1 dev and scaled up to 3 in the end):

Been a fun project so far. Web client was written in TypeScript and there exists type-definitions for pretty much anything out there (including Service Worker API).
Dexie was a great lib discovery for us in handling indexedDB (we’ve been directly dealing with the low-level API for the last few years).
For now we are just polling from the Service Worker in order to fetch new data from server but we will consider server-push in the next phase (but polling works just fine so doubt it).

tirsdag 7. mars 2017 15.22.47 UTC skrev Erlend Wollan følgende:

Nice work!

Another update w/our recent developments:

Updated the blob syncing mechanism to become fully automatic w/‘attachment’ aggregates being the logical representation.
Also introduced a Redux state layer to the front-end so that we can ‘fake the expected outcome’ (so the user can immediately read their own write). This layer gets automatically updated w/incoming data (or if server should reject outgoing data).

torsdag 25. mai 2017 12.12.24 UTC+1 skrev Chris Ray følgende:

Refactored the system a bit over the last couple of days which ended up simplifying a lot of the SW logic.

Basically we removed all the offline/online preparation previously thought needed due to the occasionally-connected nature of the app.

Now the ‘temp read data’ is the permanent work space that is always queried and written against, very much 1:1 with our Redux state layer in the front-end (so avoiding a lot of eventual consistency issues). All incoming and verified data from the server is stored separately, only overwriting entries in the ‘temp read data’ if the verified data is new or newer (or to revert temp data if server rejects writes by the user). There is no longer really two separate modes (online vs offline) like earlier but just events that awaits in the events out buffer until next opportunity to sync.

tirsdag 7. mars 2017 15.22.47 UTC skrev Erlend Wollan følgende:

This is really fantastic.
As I am writing mine in Cycle.js on the front end, I have a question unrelated to the front-end.

On the server side, in addition to access control protection to incoming events, have you found the need for any server-side post obj validation? I am foreseeing the need for it on my app, even if just for sanity sake, as I want to prevent any poorly formed events making it in. Devs using postman to send whatever, etc.

It’s a good thing to include and we will with time but currently the thorough validations in the Service Worker is decent enough for our smaller mvp target. We the devs will just stay away from using such tools for now (at least on Production). On the server we currently do conflict detection & merging which includes some level of validation. From my notes (example using a dummy domain):

–Conflict Detection & Merging:–

Having the client-server communication be event-based it will simplify the conflict detection performed by the server as the data & context is more specific. Example:

Client A has received stream of five events (0 to 4) from server for a particular aggregate. The client-side Read Data generated from this event data gets marked with _version: 4 indicating the last event applied to it.

Client A loses Internet Connection.

Client B on a different machine creates a new event which will receive event number 5. He syncs this with server. Server event store now has six events for this particular stream/aggregate.

Client A creates a new event while offline and it receives event number 5 (because system sees that Read Data has _version: 4 meaning next should be 5).

Client A regains Internet Connection and uploads the new event to server.

Server sees that this new event is marked as number 5. It checks with the event-store for the particular stream/aggregate and it sees that it already contains six events (and an event with number 5). This means we have a potential conflict and that we need to dig deeper.

–When is there an actual conflict and when is it a false alarm:–

*If event received is of ‘created’ type (always for event with number 0, i.e. ‘createdProject’) then there’s zero chance of conflict and we can just let it through without checking. Also if we receive event with number 2 and there only exists events from 0 to 1 on server then we might also consider not checking deeper for conflict and just let it pass through. *

If we receive events with lower or equal number to what’s already in the event store however then we might have a conflict on our hands and need to check deeper. Let’s say server event #5 is of type ‘addedAttachmentToProject’ and event #5 received from client is ‘addedContactPersonsToProject’ then we can see from the domain logic that there would be no actual conflict and we just update event number for the received event to #6 and add it to the event store.

However if existing server event #5 is of type ‘editedScopeOfProject’ and newly received event #5 from (recently offline) client is also ‘editedScopeOfProject’ then we would have an actual conflict and we reject it.