Organizations see the obvious benefit of maintaining applications and data across distributed environments, but often balk at the security risks of losing control of governance and other data-management capabilities when shifting to a decentralized data-distribution environment.
While microservices can help to support data lakes, digital integration hubs (DIH) and data streaming to support real- or near real-time access across multiple clouds, DevOps teams might prefer to allow security concerns to trump perceived advantages of distributing data and opt to keep key datastores in on-premises environments.
It is for this reason data streaming provider Confluent has launched Stream Governance to appease concerns about losing control of data streaming across multicloud environments.
“Our goal was to strike a balance between protecting and democratizing the data in motion by delivering a simple, self-service experience for users to discover, understand and trust the real-time data that flows throughout the businesses. We believe that when distributed teams can explore, understand and trust data in motion together, they can harness its full value,” David Araujo, data governance product manager for Confluent, said. “Companies can accelerate the development of the real-time experiences that drive differentiation and increase customer satisfaction while upholding strict compliance requirements, both internal and external.”
Stream Governance is based on three “pillars,” Araujo said. They include:
- Improve stream quality: Improved trust and the technical quality of event streams for end-user needs to maintain data integrity as services evolve.
- Stream catalog: Increased collaboration and productivity with self-service data discovery that allows teams to classify, organize and “find the event streams they need.”
- Stream lineage: Understanding complex data relationships to uncover more insights with interactive, end-to-end maps of event streams.
The Price of ‘Playing It Safe’
Confluent Stream Governance was created to serve to specifically meet the technical and security demands of those organizations opting to adopt Apache Kafka for data streaming across, for example, a combination of multiclouds and on-premises and, in many cases, geographic zones.
With the sharp rise in demand for event-driven systems for real-time data management with Kafka and other tools, the need for organizational governance over data in motion is “growing fast,” Araujo said.
“While the tools to build and maintain long-term, compatible event streams exist, the ability to safely and effectively share these streams across teams for widespread use has not. This need is most pressing for businesses deploying distributed, event-driven microservices built by small, disparate teams of streaming data experts,” Araujo said. “As the investment into microservices increases and data sources scale, it becomes more and more challenging for any individual or team to understand and govern the full scope of streams flowing across a business.”
Historically, data governance tools were designed to largely support data compliance and risk mitigation. “While important and necessary, those tools make it difficult for teams to access and make use of the valuable data they need. The objective has essentially been to lock data down and keep it safe at all costs,” Araujo said. “Additionally, these tools are built for data at rest — data standing still inside databases — requiring point-to-point integrations with every system. Those systems typically amount to a challenging and painful operation that’s constantly chasing an in-sync and on-time state that never fully materializes.”
Instead, what organizations require — and Stream Governance delivers — is a solution to respond to the following questions DevOps teams typically have about multi-environment data streaming, Araujo said. These questions include:
- How to scale from 10 topics to thousands of topics?
- How to go from a few developers to thousands of developers?
- How to go from a few use cases here and there spread across the business to a central nervous system of events orchestrating data movements across every corner of the business?
- How to bring the entire organization along for the ride when many are new to data-in-motion?
Developing new applications can also typically be slow and inefficient due to lack of visibility into what data already exists within the business, where it comes from, how it is used and whom to contact when requesting access, Araujo explained. As described above, Stream Governance was designed to overcome these challenges by supporting developers who have been struggling with both boosting the technical quality — and especially — security and visibility when creating applications for data streaming with Kafka.
“Many developers have had to manually develop data quality controls for compliance. We’ve seen customers address this problem by keeping information in spreadsheets and intranet pages — an error-prone, constantly out-of-sync process that only gets worse as you scale,” Araujo said. “Using the new stream catalog you have at your fingertips, a self-service data discovery tool can help you understand what data exists, where it exists, who is responsible for it and how it is being used today.”
While Stream Governance is directed at developers, operations teams can use it to “see in real-time everyone that is producing and consuming data from the platform and identify the root causes of disruptions related to platform usage,” Araujo said.
Confluent is a sponsor of InApps.
Feature by Ross Sokolovski on Unsplash.