As it seeks to grow beyond a Hadoop-only company, Hortonworks has hitched its star to IBM to deeply integrated its commercial data products into the IBM portfolio.

The two companies previously worked to pair Hortonworks Data Platform (HDP), a Hadoop distribution based on YARN, available for IBM’s POWER8 platform and storage offerings.

In the newly announced deal:

  • IBM is adopting HDP for its distribution of Hadoop, the leading open framework for distributed storage and processing and will be migrating its BigInsights clients to HDP.
  • IBM’s Data Science Experience and Machine Learning will be integrated into HDP. Data Science Experience is an open development environment providing analytics tools for drag-and-drop model building, and Machine Learning is its cognitive platform for creating and testing high-volume analytic models.
  • Hortonworks is adopting IBM’s BigSQL as its SQL engine of choice for HDP, including its EDW Optimization stack.
  • The companies also announced Hortonworks DataFlow (HDF) for IBM Power Systems. Hortonworks calls DataFlow an “open source data-in-motion platform” from the source, such as a sensor, through ingestion, data streaming and processing to its ultimate destination. DataFlow grew out of its 2015 acquisition of Onyara, the company behind Apache NiFi. Power Systems are IBM servers designed for big data workloads.

“Our strategy at IBM is to take the best that we have, in terms of data science and machine learning, and bring that to where the data is,” said Rob Thomas, general manager of IBM Analytics

Read More:   Update An Introduction to Google Vertex AI AutoML: Training and Inference

In February, it announced the addition of its data science and machine learning tools to its mainframe platform, because mainframes hold most of the world’s transactional data. The logical next step is big data, he said, the majority of which is found in Hadoop, which is Hortonworks’ forte.

The partnership provides an end-to-end connected data platform over the entire data lifecycle regardless of the deployment architecture, Hortonworks CEO Rob Bearden said in a SiliconAlley interview from DataWorks Summit 2017. This provides the ability to put a data science framework on top to better understand the information while simplifying data science and making it more of a team sport.

Hortonworks is making a “significant engineering commitment” to ensure interoperability and integration between all its native services and the IBM products, Bearden said. And it will be taking the new offerings to its customers immediately.

Active in Open Source

Both companies have been involved in myriad efforts to address the shortage of data science skills and expand client companies’ ability to use data effectively. Both are founding members of the Open Data Platform Initiative (ODPi), a Linux Foundation project designed to “simplify and standardize” the open development of Hadoop.

They’re also teaming up to advance governance through the Apache Atlas project, which is an incubator project with the Apache Software Foundation. Atlas provides a scalable governance platform for enterprise Hadoop to help developers model new business processes and data assets quickly and easily.

The companies also will partner on the advancement of Apache Spark. IBM in 2015 announced a major commitment to Spark, including training more than 1 million data scientists. Pivotal also offers a data platform combining HDP and the HAWQ SQL query engine. And Hortonworks provides Hadoop for Microsoft’s Azure HDInsight.

Hadoop, of course, has its critics, including Pachyderm, which offers a container-based alternative. The New York Times also swapped out Hadoop for Google’s BigQuery, with CTO Nick Rockwell attributing a “massive cost savings, a huge headache savings, and a giant productivity increase” to the switch.

Read More:   Update Transparent AI: Explainable and Trainable Artificial Intelligence

And Cloudera’s Daniel Templeton cited significant security concerns associated with deploying containerized workloads on Hadoop — at least until version 3.0 is released later this year.

Yet the Hadoop market is huge and growing.

Zion Market Research projected the global Hadoop market including software, hardware and services will grow to $87.14 billion in 2022, with a compound annual growth rate of 50 percent. Allied Market Research, meanwhile, foresees it generating revenues of $84.6 billion for software, hardware and services in 2021.

Feature image via Pixabay.