As the name suggests, the fast answer to “what is DataOps?” is that it’s DevOps for data workflows. As data workflows have become increasingly cloud native, the relationship between data engineers and operations has started to look more like the relationship between dev and ops in a DevOps world — thus the name DataOps. If you build a data product, you need to monitor it, update it and patch it, just as you would need to do with a software product.
“I think what we’re seeing is that everybody, every organization needs to handle data in a better way, especially as real-time data is on the rise,” said Andrew Stevenson, chief technology officer at Lenses.io. “DataOps is taking that management of the data, using it correctly with the correct tooling, whether it’s Kafka or Pulsar, and lifting it up a level and combining it with the strategy and applying governance and security around it.”
Stevenson thinks that DataOps goes far beyond just bringing data engineering and operations together, however. True DataOps is about more than just managing the flow of data — it’s also about ensuring everyone involved knows why the data workload is running and what the desired business outcome is. According to Stevenson, understanding business logic is key: although traditional DevOps metrics like speed and resiliency are part of DataOps, too, the companies that do it well are bringing data engineers together with the people who want to use the data, who have some business need that requires data. That relationship is more important to DataOps than simply speeding up the data pipeline. Though that speed is important, too.
Making Data Jobs Faster
Processing data has not traditionally been something that companies expect to do quickly. “We got a little comfortable with data procrastination,” explained Patrick McFadin, vice president of developer relations at DataStax. “It was like, my Hadoop job will run for a month, I’m okay with that. Now it’s ‘how fast can we get things done.’”
In a DataOps world, that kind of delay isn’t acceptable, any more than it would be for a DevOps team to wait a month for continuous integration testing. Instead, teams are looking for ways to optimize their data pipelines and make sure that they are running as fast as possible.
“When I think of DataOps, I think of everything that related to the orchestration and collaboration on the data journey,” explained Douwe Maan, project lead at Meltano, an open source data integration platform by GitLab.
Speed of Change
Driving this change in how data is processed is the need for business leaders to use data inputs in everyday decisions. It used to be that running a Hadoop job for a month was fine, because the end result was for a quarterly report made to the company’s executive board. Now, though, executives expect information on an hourly or at least daily basis. This has been especially critical for companies as they navigate the changing markets over the past eight months. Companies that had prioritized digital transformation, including using DataOps to improve the speed of data workflows, have been able to adapt much faster and roll out new features quicker, McFadin said.
He gave a concrete example: Target. “They were doing well as a retailer as well as an e-retailer,” McFadin said, about the pre-pandemic situation. “When the shutdown happened, they were able to quickly shift to curbside pickup. It was easy for them to do. All their stuff runs on a Cassandra database, and they were able to scale it when they needed to because everything was already in place. The DataOps people were jamming.”
Just Another XxxOps?
“The biggest misconception about DataOps is that it’s just another ‘Ops,’ there’s DevSecOps, MLOps, etc,” Stevenson said. “There’s also a misconception is that you can literally just take a tool and that one tool will solve everything and give you DevOps. We see the same thing with DataOps. There’s a set of best practices but it’s also a cultural shift.”
The key to success with DataOps, everyone agrees, isn’t even about following a specific set of best practices or a precise formula for optimizing the data pipeline. Sure, pipeline optimization is important so that data analysis doesn’t become a bottleneck in the organization, but the ways that companies use data is so varied that it’s hard to put your finger on a one-size-fits-all formula.
“You have to understand the business,” McFadin said. He cited Home Depot as another company that has used DataOps to give customers a better digital experience. Instead of just being a place to pick up screws and lumber, the company invested in creating a digital experience and did so with an IT and data team that intimately understood not just how to make data pipelines fast but how data products could be applied to meet a customer need. A customer in Home Depot can look up a product and see what aisle and row it’s in. “They have a keen understanding of the business but they are also playing with these really cool things,” McFadin said. “The DataOps people are the ones who understand how to connect A to B.”
The Technical and Cultural
DataOps will always have a technical element, largely, Stevenson said, around selecting the right tools to both optimize the pipeline and allow collaboration between data engineers and subject matter. Nonetheless, a big mistake companies when trying to adopt DataOps is focusing too much on the technology.
“They pick a technology and focus purely on that,” he said. “But what’s the strategy? What’s the governance? What about tooling? What about security and who can and can not access the data?”
And then, of course, DataOps can’t just be about a fast pipeline — there’s that “ops” part. Thinking about scaling, updating and monitoring also has to be part of the equation.
Enterprises that have adopting DataOps are thinking these questions through ahead of time, ensuring that there’s a clear business rationale for whatever data products are being built, that they can operate them in production and that the data pipeline doesn’t become an organizational bottleneck.
Feature image via Home Depot.