Continuuity, the data application platform built around Hadoop, is open sourcing its platform, rolling out the developer release of a new streaming tool, and changing its name. Continuuity is now Cask. Cask will continue to work on bringing Hadoop into the enterprise and adding capabilities that let developers use Hadoop for real time activities.
The company is seeing Hadoop play an increasingly important role in enterprises. Big businesses have typically used traditional data and relational databases to enable CRM, supply chain management and other traditional enterprise apps. “Our belief is that many enterprises are going to use big data through similar types of apps except they’ll be fundamentally different because they’ll be much more about data,” said company CEO John Gray.
The problem, however, is that Hadoop is hard to use, he said. Huge companies like Facebook, where Gray came from, and Yahoo, where his co-founder worked, have the resources, expertise and interest in building systems around an open source technology like Hadoop. However, enterprises, startups and smaller businesses don’t have the talent or resources to pursue similar initiatives. “We got a lot of great stuff done at Facebook on the back of heroics by individuals but that’s not something that scales across the business world,” he said.
Cask is laser focused on developers in enterprises. “We have a strong belief that developers are going to be the individuals who create the disruptive application patterns around big data,” Gray said.
The company’s primary offering, its data application platform, helps users build apps that use Hadoop in smarter ways beyond to build a low-cost, large data warehouse.
Cask also offers Cooper, a tool that helps users quickly and easily provision Hadoop clusters. In part due to its distributed nature, Hadoop can be difficult to deploy. In fact, Cask built Cooper as an internal tool but customers started asking about it. Cooper works in public, private and OpenStack clouds, so developers can use it to provision Hadoop in any of those locations.
With the name change, Cask is also unveiling a joint development effort it has worked on with AT&T Labs called Tigon for real-time stream processing on Hadoop.
Gray pointed to a number of efforts across the industry designed to let people use Hadoop in a more real time fashion. It makes sense, he said. “Traditionally, your data sat somewhere else and your processing sat on a different machine and so you had to move data from one to the other. The inherent benefit of Hadoop is it’s one distributed system. You don’t have to move the data to process it,” he said.
However, if a user is storing a lot of data across hundreds of machines and wants to do stream processing, there’s going to be a bottleneck between the data and the stream processing engine, he said. The idea behind Tigon and its competitor Apache Storm, is to build a stream processing engine that’s native to Hadoop. Users can then run stream processing across multiple nodes in a Hadoop cluster, often on the same machines where the data lives.
Other efforts from the likes of Databricks with Spark and Cloudera with Impala show that the industry is beginning to see Hadoop like an OS on which you can build a variety of functions, Gray said.
While Cask has open sourced other technologies that it’s built along the way, it’s announcing today that it’s open sourcing its core CDAP engine as well. It’s released under an Apache license.
“We knew we would be an open source company,” Gray said, about Cask’s intentions from the start. However, in part to get moving faster, the company decided to try to develop a solid product on its own before releasing it to the community.
Featured image via Flickr Creative Commons.