In its bid to make Cassandra the go-to database of choice for a wide array of uses, DataStax is launching DataStax Enterprise (DSE) Graph, a scale-out graph database for cloud applications based on Cassandra and additional graph database technology the company acquired last year.
For the new product, the company pulled together the expertise from its acquisition of Aurelius, the folks behind TitanDB, and Apache TinkerPop, a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
IBM, Amazon, Blazegraph and others have products using TinkerPop, an API technology that DataStax donated to the Apache Software Foundation. However, DataStax bills DSE Graph as “the only scalable real-time graph database.”
We’ve covered graph database systems before, particularly around solving the problem of being able to handle high-volume read-write operations at scale, which be the next challenge for the maturing market. We speculated that pairing Cassandra and TitanDB could help meet this challenge.
DataStax rewrote the Titan database to make it tightly entwined with that the Cassandra storage layer, focusing on a multi-model strategy – databases with a different type of data structures like documents, graphs, and key-value data sets.
“Our customers have told us, ‘We love the very unique kind of key-value and tabular store that we get with Cassandra. We like that you’ve added JSON, but we also have these use cases that are much better suited for a graph data model,’” said Martin Van Ryswyk, executive vice president of engineering at DataStax. “That’s more than just supporting a graph data model; it has to be a whole graph database system that has to be around that, query optimizers, bulk loaders and the whole trappings around making a graph system work.”
DSE Graph supports Cassandra’s key benefits such as low latency and linear scalability and DSE’s enterprise features as well as advanced search, analytics and security, according to the company.
DSE Graph allows you to query graph data in real time at scale, so you can have thousands of concurrent transactions asking about relationships between customers, explained Matthias Broecheler, former managing partner at Aurelius and now Datastax director of engineering for DSE Graph.
“A lot of the announcements you’ve seen in recent years have been about adding graph functionality to an existing stack, but [that’s] more about adding a view of the data that is graphed. It’s not about extending the underlying system; they’re not adding new index structures, they’re not adding the things you need to make graph work at scale and with the performance that enterprises need in order to service web traffic, to service IoT-type use cases,” he said.
DSE Graph is unique in that it can scale out across the cluster, across multiple machines; you can add more machines if you have more demand. You can geo-replicate your workload, so if you have customers in Singapore, the U.S. and Europe, you can have the data where your customer is and provide a better user experience by having lower latency, he said.
The company bills Graph as a complete solution including enterprise server, with adaptive query optimizer, automatic graph data partitioning, a distributed query execution engine, and graph-specific index structures. An updated OpsCenter provides full provisioning, management and monitoring. The package also includes web-based Studio, which helps developers visualize graphs and write/execute graph queries, as well as drivers for popular development languages and support for the Gremlin graph language, in addition to Cassandra Query Language and DSE analytics/search APIs.
The graph database can be especially helpful for recommendation engines, security and fraud detection, master data management and IoT use cases, according to the company.
Van Ryswyk said the staff from Aurelius had worked with DataStax to solve some hard problems.
“When you have a graph database and have to distribute it, the reason a lot of people don’t do it is because it’s hard. How do I break up that graph across machines? How do I know when I do a query what machine to go to? I can’t go to 1,000 machines and ask, ‘Do you have the data?’ I have to have methodologies to do that efficiently. That’s where we think we’re unique: We can do those things in real time fast,” he said.
Titan pioneered many of the concepts about distributing graph data across the cluster, according to Broecheler. It builds on top of Cassandra and Hbase through an abstraction layer; it uses Elasticsearch, Hadoop and Spark. There are a lot of components you need to orchestrate to make Titan work.
“The feedback we’ve been getting from customers is that it’s a lot of work to get that to work sufficiently at scale. There’s a lot of hand-holding involved,” he said, adding that’s what DSE Graph does.
“In terms of the performance, the index structures, the distribution, we’ve taken all these concepts from Titan and applied them to DSE Graph, but integrated them more closely with Cassandra so we don’t suffer from data inconsistencies, that we can repair on failure, all those things you expect from a database system if you don’t want to hand-hold it,” Broecheler said.
Graph inherits the security features from DSE, including role-based access and encryption, whether the data is at rest or moved around the cluster.
“From an operational perspective, it doesn’t matter whether you store your data in graph or in a table. You make that choice based on the application you’re developing,” Broecheler said.
“Graph might be a great choice if you’re doing ACID management, if you’re doing a customer 360. The tabular model is a great choice if you’re doing time-series data. But at the end of the day, it all ends up in Cassandra, in DSE, so the security layer built around that applies to both types of data. You don’t have to worry about that as a separate system.”
It also has a tight integration to analytics through Spark, which means it can do simple real-time aggregates like any relational system or more complicated things like a collaborative filtering algorithm.
DSE Graph will be generally available by July.
IBM is a sponsor of InApps Technology.
Feature Image: “Big_Data_Higgs” by KamiPhuc, licensed under CC BY-SA 2.0.