I’m just wondering, whether it would make sense to add support for binary event types via Avro and possibly schema validation for json events.
Why binary events?
Size, nothing else. In my test avro was up-to 3 times smaller compared to JSON and the performance was half the time.
Why schema validation?
I personally prefer that a data-store check the format of the data if it is possible. It would enforce that the events are clean.
Why I think avro would be a good fit?
- one of the smallest binary size
- has .net library from MS!
- has a schema validation internally
- has “GenericRecord” concept, which makes is possible to convert it to JSON object (projections could work)
I could imagine it the following way:
- new property “meta-content-type” and “content-type”
- new store for schemas (immutable, append only) you cannot redefine content-types (maybe only a stream internally)
- content-type will point to the schema, includes version of the schema and the type:
- “+avro” would mark, that avro is in use, “+json” that it is a json schema
eg. “myevent-v2+avro” or “myevent-v3+json”
The schema store would contain JSON schema (http://json-schema.org/) and activated with “+json” in content type.
During processing of projections and REST calls, ES could look for the schema and do the following:
Store event via REST
GenericRecord could be created and stored as binary if “+avro” exists.
JSON schema could be checked in case content-type exists and “+json”
Store event via protobuf api - store binary, nothing changes for avro (avro already checks the format)
In case of “+json” could validate schema.
GET event via REST or projection
If “+avro” in content type, then the data could be parsed using “GenericRecord” and the stored schema.
In case they don’t exists, nothing changes.
GET via protobuf, nothing changes
Additionally further formats could be added using the “+” notation.