Does the ES team consider itself a competitor with Kafka?
I ran though some older messages on the topic, but didn’t find a good direct answer.
The abilities of the products seem to overlap (but not completely), as well as the use cases.
However the main description of both differ:
ES: The open-source, functional database with Complex Event Processing in JavaScript.
Kafka: publish-subscribe messaging rethought as a distributed commit log.
I could mostly substitute one for the other in various scenarios, and could also see a producer/consumer scenario for using both together…
But again, what do the creator(s) say? Or is this just noise and nonsense?
There are many overlapping use cases and many where they are not
comparable. As example kafka does not work well as an eventstore as of
the last time I checked as there are some key features missing. 1) a
write in ES when acked means its actually written to a quorum of disks
in kafka its not written on any. 2) In an event store the ability to
get consistency is important (this does not exist there) 3) ES assures
idempotency again nothing 4) the entire concept of
projections/querying. Kafka is however significantly faster though the
largest reason is the durability vs no durability assurance. As a
transport kafka works really well.
That matched well with my assumptions.
Thank you very much.
I am evaluating event store solutions.
While I am no expert, I question the validity of a couple of these assertions based on cursory examination I have performed. Please correct me or add more context if I am not understanding some of these properly.
http://www.slideshare.net/gwenshap/kafka-reliability-when-it-absolutely-positively-has-to-be-there
#1 and #2: It seems to me that these are controlled by configuration. It seems you could have a reliable and consistent event store on a kafka cluster, if configured in a way this presentation recommends. Are you thinking about other things that aren’t mentioned here, or scenarios we should be concerned about?
#3 I’m not sure this means it wont work well as an event store, but it would add more responsibility on producers. This is a nice feature of ES.
#4 I believe it was intended that a separate engine (ie: Samza or Storm) be used for projections. I don’t necessarily view this a downside to using Kafka as an event store, but can see where having it all covered in ES could be beneficial. Separation of concerns can be good.
#1 and #2: It seems to me that these are controlled by configuration.
It seems you could have a reliable and consistent event store on a
kafka cluster, if configured in a way this presentation recommends.
Are you thinking about other things that aren't mentioned here, or
scenarios we should be concerned about?
Even here from what I see (not so much in slides) you can still lose
data just less likely. The whole not putting data on disk bit is the
issue.
" All data is immediately written to a persistent log on the
filesystem without necessarily flushing to disk. In effect this just
means that it is transferred into the kernel's pagecache."
If you consider page cache disk then sure but most people don't
consider page cache to be disk.
If you read a little bit further down;
Writes
The log allows serial appends which always go to the last file. This file is rolled over to a fresh file when it reaches a configurable size (say 1GB). The log takes two configuration parameters: M, which gives the number of messages to write before forcing the OS to flush the file to disk, and S, which gives a number of seconds after which a flush is forced. This gives a durability guarantee of losing at most M messages or S seconds of data in the event of a system crash.
So by setting the M=1 you will have the “flush after each”, which will slow everything down considerably.
But end of the day, my bet lies with using each system for what it was designed for, and if one relies heavily on a consistent event store, then I would go with ES. OTOH, not all event stores need this level of guarantees. YMMV
Cheers
Niclas
Try setting M=1
also IIRC the flush is asynchronous anyways (eg it returns ACK before
the flush) and it will still accept messages while its flushing (eg M
will become bigger than 1). EG still can lose data.
There is nothing wrong with this design but its an important thing to consider.
Also I should add that there is talk about supporting durable writes
in the future.
http://kafka.apache.org/documentation.html#appvsosflush
Under certain settings, I think this could more or less be equated to disk as there are settings to force fsync with every log write. There are other settings to control whether a write is required (and on how many nodes) before an ack. I just don’t agree with the notion that Kafka doesn’t write to disk or have the capacity to offer strong durability or consistency. Either way, I think we are all more or less in agreement and discussing semantics at this point.
I do view ES as a very strong solution, and think your points are valid from a complexity / “it just works” standpoint. Kafka definitely gives you more rope to hang yourself, if you don’t understand how to properly configure / use it. This doesn’t even get into the learning curve for configuring and integrating a streaming solution with Kafka, another nice feature already fully supported in ES. ES has a lower learning curve for adoption and obtaining these guarantees, allowing you to focus on building the actual app instead of how to administer your infrastructure.