I’d like to track some statistics of how fast and how stale my projections (into SQL etc.) are. I would like to create a dashboard where the average processing speed, estimated time to catch up and number of processed and total events are shown.
Each projection is subscribed to $all and already tracks the number of read and processed events. All I need to get a better picture is to know how many events there are, in total.
How can I determine the total number of events over $all?
The statistics will, most likely, be event sourced too. We’ll have to decide on a granular enough sampling interval which won’t fill up our disk purely with statistics though…
Just thinking out loud here, if we have 500 bytes of data per sample and we’d sample every second we’d end up with 15GB of statistics each year for each projection (of which there could be hundreds…)? Seems a lot - not “big” but not such a great idea in my context (150 separate servers, old system is CRUD big ball of mud and people are freaking out because the entire system database is growing by 20GB per year).
Any pointers, aside from “storage is cheap, your management is [ ]”?
Get total number of events in event store (to track cqrs projection speed and staleness)
Take a look at the position you are using to checkpoint. It represents
the original written conceptual position in the log (eg prepare
position). You can subtract the latest from your current to get a
rough estimate of # of bytes to go. This does not hold true when
scavenging lots of things but should be good enough in many cases. If
you wanted to go one step further you could write a projection that
tracks this but in many cases just the position will be a reasonable.
"Seems a lot - not "big" but not such a great idea in my context (150
separate servers, old system is CRUD big ball of mud and people are
freaking out because the entiresystem database is growing by 20GB per
2-3 years ago a 128 GB microsd was considered big. a 128 gb SSD was
considered big. Today I can buy cheaply a 1 TB microsd and a 1 TB SSD
is standard in many laptops. At a retention rate of 20 gb/year you are
well below the curve of technology improvements. Remember that you
have to think of the curve of your data retention vs the curve of
storage improvements not compare statically.
That said, in terms of limiting statistics you can only keep them for
a period of time. Statistics on your projections will lower in terms
of data value the longer that you keep them. Within the first month
they are valuable but get less valuable over time. Setting up an
archival/removal policy would likely be wise here.
So I would have to track the log position from the write result whenever I write, or get the last event in the $all stream and determine the position from there (on startup) - and compare that to my current read position, is that correct?
I’d like to track this in number of events too. I’m thinking about running a projection that only tracks the current position and counts the number of events it has seen. Since it isn’t doing much else, it should stay near real time to the event stream.
Such a projection might not be 100% accurate, but it could estimate the total number - after all, it knows its current position, how many events that corresponds to and could access the head position. That should yield a very basic estimate of how many events there are left to count.
Limiting statistics seems reasonable. I do want to keep them in ES so that we get a log about the performance in production.
Is scavenging something that must be triggered manually? I ask because we could limit the number of statistics we track by limiting the statistic stream event count…
"I'd like to track this in number of events too. I'm thinking about
running a projection that only tracks the current position and counts
the number of events it has seen. Since it isn't doing much else, it
should stay near real time to the event stream."
How will it see events that are deleted?
Good point. It could occasionally rescan the entire stream but that seems a bit wasteful…
All I want is a nice looking dashboard to prove that the architecture is fast enough to handle our requirements… plus I’d like to be able to look at it in production.
I guess going by position will have to be enough. Deleted events are nothing else but events that a projection did not have to process either, so the statistics would still work.
position obviously has the same issue with deleted events but overall
is a better metric as it also includes the *size* of the events. 500
20 byte events are obviously less expensive than 500 3 mb events and
this would be represented in these numbers.