Energy analytics startup Vortexa used to have what it called “Kafka Fridays.” Something would go wrong with Kafka on Friday and that meant spending the weekend troubleshooting the problem and trying to fix it.
“We totally understood that startups can get a competitive advantage because they’re using these new fancy cutting-edge technologies, right? It’s one of the enablers, one of the differentiators for the startup. So we were keen to make it work somehow, one way or another, [but it got] to the point that many of my guys weren’t sleeping. They were spending their time in constant troubleshooting of Kafka-related calls,” said Maksym Schipka, chief technology officer of Vortexa.
London-based Vortexa provides real-time visibility into the seaborne movement of fuel around the world. It tracks more than 10,000 vessels at any moment and provides analytical aggregates of the data that energy traders, hedge funds, shipbrokers, and charter ship owners use to optimize their operations and to make deals.
A good portion of the data comes from the Automated Identification System (AIS), which provides information on each ship’s position, direction, speed and more. It gets approximately 500 AIS messages per second. But also combines terrestrial, satellite and operational data sources into machine learning applications.
It uses machine learning models and a human-machine feedback loop to fill gaps around the data and to help make predictions. Processing and analyzing real-time data is core to its business, Schipka said, but the issues with Kafka’s open source ingestion technology were many.
It got to the point where the CEO would be asking whether it was a Kafka issue every time there was a problem with the data flow. “In 99% of the cases, the answer was yes,” Schipka said.
“Having real-time Kafka infrastructure working and all the models on top of it working and working reliably, was absolutely critical. And we just could not get any stability whatsoever,” he said. “We had to write a lot of internal tools to try and understand what is actually happening because we never get any reasonable visibility.”
It runs two production cluster clusters, one acting as more or less dumb — sort of cache or synching solution — but the other where the heavy processing takes place.
The 53-person company has more data scientists than software engineers. It has five people on a site reliability engineering team, one in particular that Schipka described as having keen intuition about Kafka, but overall the staff had little experience with it.
So it started looking at various options. One was a nearby managed service provider, but Schipka described the cost as “absolutely eye-watering.”
“Now we have good seven-figures revenues, but [in early 2019] we were just starting to sell this thing, and for data to be delayed by many, many hours and days could actually ruin the company reputation before the company even could start generating in any reasonable way,” he said.
He met the Lenses.io team at a conference, back when it was called Landoop. Husband and wife team Antonios Chalkiopoulos and Christina Daskalaki along with cofounder and now CTO Andrew Stevenson had written around 35 open source tools around Kafka and decided to put them together into a streaming data management platform to make it easier for developers and analysts to use Kafka Streams for real-time analysis of data.
Their range of tools included Extract, Transform and Load (ETL) reference architecture stream-reactor to fast-data-dev, a project that includes Zookeeper, Kafka, a Schema Registry, and more than 20 connectors for the platform.
“When we deployed Lenses for the first time, it was the first time where we actually started to see the light at the end of the tunnel,” Schipka said. “Lenses instantly, immediately uncovered the complexities of what we were trying to do with Kafka just by … visualizing all the data flows, visualizing the topology of our topics and just [showing us] how complex it was.
“We didn’t realize it even in the slightest. We were working in the dark, trying to deploy what we thought was the right thing. And only when we deployed Lenses we realize how complicated and how unusual was what we were trying to do with Kafka Streams in particular.”
It also had chosen Amazon MSK, a managed solution for Kafka, which Schipka described as “not without its own issues, but still quite good and much more reasonably priced,” and decided to deploy the two together.
Lenses was helpful in migration because the team could see how topics became populated and helped the company identify stale topics that were unnecessarily taking up resources and optimize its infrastructure.
Since teaming up with Lenses and Amazon MSK, the company has begun deploying on Kubernetes — touting being the first to use the three together.
It also reduced the time in development for debugging and troubleshooting. Schipka estimated the time saved as a half to one full-time equivalent.
“My [team] absolutely cannot imagine working without Lenses nowadays, because you know, the moment they experience any issue, the moment applications are not as they expect it to behave, their first thought is to connect to Lenses and check what the data flows are and interrogate the data with SQL in real-time to see actually what’s going on.”
It’s also brought wider visibility of the data to a wider section of the company.
“Since June 2019, we have brought our Kafka-related infrastructure and controlled deployments, controlled data flows to a state where people could start sleeping more. And where the data scientists in particular weren’t afraid of touching Kafka and to experiment with new approaches to solving the difficult problems that we have,” he said.
“We end up spending a lot more time solving the business problems than we do managing cost and managing the data flow.”
Feature image: “Maersk_Edgar.Riga.20.10.2017” by Егор Журавлёв. Licensed under CC BY-SA 2.0.