With the release of Apache Kafka 1.0 this week, an eight-year journey is finally coming to a temporary end. Temporary because the project will continue to evolve, see near-term big fixes, and long-term feature updates. But for Neha Narkhede, Chief Technology Officer of Confluent, this release is the culmination of work towards a vision she and a team of engineers first laid out in 2009.
Back then, a team at LinkedIn decided it had the solution to a major data stream processing problem. Narkhede said the originators of Kafka first began their journey to building the project by sitting down and trying to understand why stream processing companies founded in the 1990’s and 2000’s had failed.
“We came at this because there was a real need to build applications that did this in a real company. That shaped the vision for this whole system and the Apache Kafka project. We didn’t imagine some hypothetical need, but rather a real business need. We started not by building Kafka, but by thinking ‘Why did the stream processing startups fail in the 2000’s and 1990’s?’ They failed because companies did not have the ability to collect these streams and have them laying around to process,” said Narkhede. “You need to have this retentive storage.”
That concept was an anathema to most existing solutions to the streaming problem at the time. Traditional message queues, for example, are not geared up to save data long term, but rather to move it down the line and ensure it is processed within a tight window of time. For applications that run in batch processes, that’s not the right way to handle the data, said Narkhede.
Additionally, modern microservices and container-based applications are largely stateless by design. As such, bringing a new instance of an application online leaves it devoid of data, like a zombie awaiting orders. With Kafka’s ability to store data longer term, that means newly online servers can be caught up, processing existing data in the exact order in which it arrived, enabling it to be up to the same speed and on the same line of data as the rest of the cluster.
There are only two hard problems in distributed systems: 2. Exactly-once delivery 1. Guaranteed order of messages 2. Exactly-once delivery
— Mathias Verraes (@mathiasverraes) August 14, 2015
In a world where things like ZeroMQ are pushing for highly honed message passing services, Narkhede said that Kafka offers a bridge between the traditional batch processing type jobs and the modern stream pressing jobs that must be executed within milliseconds. As such, Confluent has been vigorously developing KSQL, which enables the running of SQL queries over Kafka-stored data and streams.
Narkhede said it took a while for the industry to realize this was the proper solution to stream processing problems. “I think it took about five years for the whole industry to realize what we were up to. There are quite a few stripped out systems, like ZeroMQ, that are sort of super barebones version of what Kafka might try to achieve. ZeroMQ is more of an in-memory network layer for messaging. A big problem Kafka tries to solve is the fact that its the bridge between the batch world and the online database world. The nature of a system that does that is you need to store data. ZeroMQ was meant for quick one-off messages that don’t need to be processed more than once,” said Narkhede.
Production Ready at Last
For this 1.0 release, many features have received their final going over. This includes better diagnostics for simple authentication and security layer (SASL) authentication failures, better handling of disk failures, and the Streams builder API has been cleaned up to be easier to use.
For the future, Narkhede said that Confluent hopes to build out a connector marketplace, where customers will be able to browse and choose from a variety of enterprise-grade connectors for existing data systems. She said that the Exactly-once capabilities of Apache Kafka will be helpful to enable enterprise stream processing in a controlled fashion.
From the release blog, Narkhede details the benefits of the exactly-once capabilities, which she wrote, enable Closure-like functions for stream processing. In distributed stateless systems, the delivery of a message to some endpoint exactly once, and no more than once, has been an ongoing challenge, given that these systems by design don’t retain state. Yet, the possible uses of Kafka and other distributed stream processing systems would expand dramatically if they could guarantee exactly-once delivery, for those jobs where processing events can only happen in a certain order, though are not necessarily executed in a linear fashion by a single processor.
“The nice thing about all this is that while the current instantiation of Kafka’s Streams APIs are in the form of Java libraries, it isn’t limited to Java per se. Kafka’s support for stream processing is primarily a protocol-level capability that can be represented in any language. This is an important distinction. Stream processing isn’t one interface, so there is no restriction for it to be available as a Java library alone. There are many ways to express continual programs: SQL, function-as-a-service or collection-like DSLs in many programming languages. A foundational protocol is the right way to address this diversity in applications around an infrastructure platform.”
She detailed this even further in August, when these features were first released. “We replaced consumer side buffering for transactional reads with smarter server-side filtering, thus avoiding a potentially big performance overhead. In a similar vein, we also refined the interplay of transactions with compacted topics and added security features,” wrote Narkhede.
She also said that Confluent will be focusing on the “Area of partition slipping and elasticity. Can you throw data at Kafka and have it expand and shrink and change to match the nature of your traffic and data flow? Kafka does that today, but this allows you to do that in an operationally cost-effective manner. I think that’s a really big one. That’s something we’ll undertake with the whole community in a big way in the coming years,” said Narkhede.
Feature image by Uroš Jovičić, via Unsplash.