Hello here,
We are conducting some load test on our brand new platform using ES and we experience something new : the projections can lag.
I probably won’t be a surprise for those who already thought about it : on heavy load the projection “Done” percentage lose the good 100% and start accumulating delay. It’s good, it’s super good design, this allow ES to continue eating mode data and delay what can be delayed.
But when it’s accumulating minutes … now our process managers also accumulating delay and they have some business timeouts, so after a while waiting for an event written but not visible in projections, they crash the process.
After this long introduction here is the question : how to monitor this delay properly ?
I have an idea :
=> send heart beats in a stream
=> add a projection to project $et-
- create a new event in an other stream : MyProjectionDelayMeter { heartbeatTimestamp ; projectionstamp }
=> and then i “just” have to graph this MyProjectionDelayMeterS to any reporting.
I’ll also need to have the load of the node at the same time, I assume i can project some of the metrics in $stats-NODEIP in the same reporting so we could know what are our limits, precisely.
So folks, i’m on the good way or there is simpler way to do that ?
Perhaps it’s already in the $stats stream ?
Perhaps someone already made a live plot of this stream ?