Should a stream of observation events be pre-de-duplicated?

marc · September 9, 2015, 5:48pm

Hi Greg,

Let’s say we have an event stream that is a series of observations, such as SensorsRead. Imagine, for example, that there are 1000 sensors that are all read atomically, so an event has a field for each reading:

{

sensor000: 1.2345…,

sensor001: 2.3456…,

…

sensor999: 9.8765…

}

In best practice using the EventStore, would one store each observation independently (possibly at a large cost in space), or would one pre-de-duplicate the data. Rather than storing a SensorsRead event, one would store a SensorReadingsChanged event, and only store the values which are different from the immediately previous reading.

The latter option is a very crude compression of the data size, but can be extremely effective if there are many duplicate readings. If EventStore is already doing a smarter compression against the event JSON, however, it probably be better to just store the independent observations.

Thanks your insight,

-Marc

Joao_Braganca · September 9, 2015, 6:15pm

Why not have a stream per sensor?

marc · September 9, 2015, 7:27pm

Ok, imagine that the stream is per-sensor, but that the observation itself is large, and is likely to be repeated.

Is the best practice to have a “same as last”, in other words to introduce a crude de-duplication dependence on previous event, or to repeat the content?

Thanks,

-Marc

Greg_Young1 · September 10, 2015, 8:56am

"The latter option is a very crude compression of the data size, but
can be extremely effective if there are many duplicate readings. If
EventStore is already doing a smarter compression against the event
JSON, however, it probably be better to just store the independent
observations."

It depends on how often they are changing. Even with amazing
compression the same event 10000 times will be bigger than only having
the event once.