event size limit eventstore

Hi all,

Does EventStore have a limit regarding the size of an event?

Thanks.

Christian

Yes 4mb IIRC though you really shouldn't have events this big!

We use EventStore to synchronize all our databases, including binary
files. Currently only images, so not close to 4MB, but this means we
can't use it for other things, like audio. Good to know.

Normally I would store a link to a binary file

Yeah, but then we need a separate transactional system for files. We'd
rather avoid that. And it would mean that loading a file would have to
make an additional request to that system, which increases complexity
and latency. This way we can push all binary files to all app servers,
which are completely standalone, meaning they can serve the full app
with no other systems involved.

When I wrote a system on GES that needed to sync binary files (pdfs) I had the exact same thoughts as you. I came to regret it. The services that used the data as external read models would read all the data over the wire whenever they started up (infrequent you say ... But is it?). Those read models did not need the PDFs, just knowledge that they existed (a link), but if you subscribe-from-all you can't get one without the other. I wound up shoehorning in a linked files system after the fact which required a migration of the store to get all those PDFs out.

Food for thought.

I forgot to mention that those services would take many minutes (over 10 before I ripped it out) to startup. That is an unacceptable amount of time and data to be pushing when you need to simply restart a service

That is definitely a valid concern, and I thought about that too. The
mitigating factor in our case is that binary uploads are reasonably
rare. But I'll think about it one more round, see if it makes sense.

/Rickard

Thanks for all your replies. I am not intending to include binaries in events but one needs to know the upper limit just in case. By the way, this came up, as I consider azure table storage (ats) as persistence medium for events. Ats’s limit is 1 MB. I am not storing events for true event sourcing but for read model creation. I want to be able to teardown the read model after requirement changes and then recreate it given all (currently exploited) past events.

Why would the file system need to be transactional? If the file name is deterministic in some way this problem goes away.

So, currently we just create an event with the binary data, and that
will automatically get published and pushed to all servers in the
cluster. Everyone sees the same thing, as with all other data we have.

If I store files separately then adding a file is trickier. First of
all I need to have a place to store them, in a clustered fashion (e.g.
S3, or create my own S3-ish storage facility). If the file
store/update fails, then the event that contains only the link to the
file must fail. If the file store/update succeeds, but the event does
not, I need to rollback the file creation. It becomes a headache,
which the binary event does not have: I just create an event with the
binary data, and that either fails or not.

See what I mean?

/Rickard

Upload the file.

If that fails, inform the user/admin and don’t publish the event.

If you upload the file and storing the event fails, inform the user/admin.

Have the file name be deterministic. When the command is tried again, you can skip the file upload.

If the command is never retired, you can have a dangling fine savage process… or you can forget about it and pay the wasted nanocent every month. After running for a few decades, you might get to a penny a month in charges, but you’d have to hitting this edge rather quickly and at an accelerated rate to keep up with falling storage prices. Then maybe it’s worth building a cleanup? :wink:

In all seriousness, you might be over thinking the latter case. As long as you can’t publish an event without having stored the file, you’re probably ok.

Fwiw, automatic retries can help here for transient errors.

If the file name is deterministic, based on the file contents, you can get away with uploading the file the second time (it’s a PUT after all). If you are using storage such as S3, you can then optimize by checking the Content-MD5 header and see if it matches yours.