Atom archive order and unstable URIs

Jan_Kronquist · August 22, 2013, 6:14pm

Hi

In “Implementing Domain Driven Design” Vaughn Vernon describes a strategy that creates fixed URIs that can be cached and also navigates links in the order I expect. I’m just curious why this strategy is not used?

Details:

To read all the events from a stream I first need to load the current document “/stream/streamId” then follow the “last” link, and then keep following all the “previous” links until I have loaded all the events. This is confusing and seems to be the opposite of what is proposed in RFC5005. I would expect reading the “first” and then following the “next” links. Is this intended or is this something that cannot be changed without breaking backwards compatibility?

(I also expected prev-archive and next-archive to be used instead of the page feed convention “without any guarantees about the stability of each document’s contents”)

Another problem is that different URIs are being generated depending on how many events I have in the stream. This creates a problem when I want to read the current events and realize I have missed some, then I need to follow the “next” link to load the older events. The problem is that the “next” URI changes. For example in a stream with 48 events:

GET /streams/123

-> next URI = “http://192.168.0.17:2113/streams/123/28/backward/20”

When I add another event and retry:

GET /streams/123

-> next URI = “http://192.168.0.17:2113/streams/123/29/backward/20”

The problem is that I need to cache up to 20 times as many archieve feeds.

/Jan

Yuri_Solodkyy · August 22, 2013, 6:32pm

Hi Jan,

There is important difference between event store and Atom feeds. Atom considers the most recent events to be the first, while the most recent events in the event store are the last. So, we follow the Atom rules and expose the most recent events as the first page and the least recent events as the last page.

If you start reading from the beginning of the stream (i.e. the rel=“last” and follow the event order (i.e. “prev” links) you get pages that can be caches (http response headers indicates this until you reach the last incomplete page).

best regards

Yuriy

Greg_Young1 · August 22, 2013, 6:35pm

The last/prev is from the atomspec and rfc5005.

"first" - A URI that refers to the furthest preceding document in
      a series of documents.

   o  "last" - A URI that refers to the furthest following document in a
      series of documents.

   o  "previous" - A URI that refers to the immediately preceding
      document in a series of documents.

   o  "next" - A URI that refers to the immediately following document
      in a series of documents.

The set of documents in question is a stream of events sorted in reverse chronological order. As such last would be the oldest document and first the newest. Take a look at example with next as page 2 in the spec (imagine pages are reverse chronologically sorted). There have been many discussions over whether this is correct or not and consensus seems to be yes (including Nottingham via email) though it feels backwards to me

We have discussed adding prev archive and next archive. As the events are immutable the two equate (they are only not stable if you have updates). It does make things more explicit though.

We have also discussed using a slightly different forward/backward link structure. You are however reading in a very unusual way. You should just remember the last prev link you read and go from there…

Greg

Jan_Kronquist · August 22, 2013, 10:31pm

Ok, I think I had the incorrect model when thinking about the feed, when I drew it like this at least I understand what is going on:

48

Greg_Young1 · August 23, 2013, 4:21am

For archive links yes we have been talking about Jim webbers strategy. However there is a reasonable amount of discussion if it is worthwhile. Most proxies act as a lru cache (say nginx) a uri not used again is relatively cheap. All future uses will be aligned.

Greg

Greg_Young1 · August 23, 2013, 2:25pm

Let me be more clear on the trade off of this.

If I always do 41-60 it is … Not cachable until full this causes most requests to hit store. As of now all are infinitely cachable. You would also be surprised with multiple readers how often they all want the same small partial page. Think 500 listeners polling. An event comes every 10 seconds. Almost all will hit same partial page served.

What we are looking at doing is aligning calls onto pages “if we can”. So if you are on 53 returning 7 to align you back on page size (if there are more than 20).

As of now there are many short lived cachable links. On a replay however (common thing to do) all will align on pages.

Another advantage of the model is you can never receive an I’d twice (simplified reader)

Jan_Kronquist · August 23, 2013, 3:04pm

If I always do 41-60 it is … Not cachable until full this causes most requests to hit store. As of now all are infinitely cachable. You would also be surprised with multiple readers how often they all want the same small partial page. Think 500 listeners polling. An event comes every 10 seconds. Almost all will hit same partial page served.

So basically it is a question of how frequently you receive new events vs how many polling consumers you have?

But if you only receive events every 10 seconds you could probably cache the incomplete current page for at least a few seconds, maybe more? Or have you seen any problems with doing this?

/Jan

Greg_Young1 · August 23, 2013, 3:24pm

There is support for caching head links by putting cachecontrol on metadata.

The secondary issue is client complexity. With full pages (noncachable) client receives events n times and must listen to sequence as well. With prev links events are only returned once from a client perspective. Eg just follow prev.

I think alignment overall is useful in some scenarios as is small partial links as it is now. Perhaps a toggle on this is useful.

We have had tons of discussions on this including with Nottingham about “correct” ways of doing things.

Greg