Five years in the making, TigerGraph came out earlier this month with its graph database platform featuring parallel processing and analytics.
Its native parallel graph technology (NPG) powers real-time deep link analytics for enterprises trying to graph and process really Big Data. It’s touting it as the only system on the market to unify real-time analytics with large-scale offline data processing for graphs.
Yu Xu, founder and CEO, said TigerGraph has been tested by some of the largest graph users in the world, and he expects myriad new use cases to grow out of its capabilities. He explained them in an earlier post for InApps Technology.
It’s already being used in what it bills as the largest transaction graph in production in the world at the Chinese online payment firm Alipay. That graph includes more than 100 billion vertices, 600 billion edges and two billion daily real-time updates.
Other early users include Visa, mobile e-commerce company Wish, Japanese telecom SoftBank, and electric utility State Grid Corporation of China.
The software is already being used to fight fraud and money laundering, for customer and supply-chain intelligence and smart grid use cases.
The Redwood City, Calif.-based company was founded in 2012 with the name GraphSQL. At that time, all the buzz was around NoSQL databases, yet Xu was focused on SQL and wanted that differentiation in the name. Times have changed, he said, marking the need for rebranding the company.
Xu has a Ph.D. in computer science and engineering from the University of California-San Diego, 26 patents in distributed systems and databases, and worked as analytics engineer at Twitter and Hadoop MapReduce architect and team leader at data warehouse company Teradata.
His company’s backers include the founders of Yahoo, Walmart Labs (Kosmix), and Data Collective among others. It just secured $31 million in Series A funding from investors including Chinese tech giants Baidu and Ant Financial.
A primer on the native parallel graph #database by founder & CEO @Yu_Xu, via @thenewstack: https://t.co/H3JbafYRA5 pic.twitter.com/Xwt7liSLk6
— TigerGraph (@TigerGraphDB) October 23, 2017
Faster Graph Traversal
Early-generation native graph technologies cannot store a graph across multiple machines. Xu considers Neo4J, a native graph database that stores data as nodes and expresses their connectedness through edges, an example of “Graph 1.0.” However, Neo4J has worked with IBM to iron out scalability problems.
Most early graph databases, he says, are storage-focused with limited analytics capabilities rather than providing the computational chops needed for the workloads companies want to run against graph databases. Most time out at two hops, while TigerGraph NPG is built to traverse 10 or more.
He calls Apache Giraph and other parallel graph databases that sit atop NoSQL “Graph 2.0” — they lack the ability to make updates in real time.
The company bills TigerGraph, which Xu calls “Graph 3.0” a “complete, distributed, graph analytics platform supporting web-scaled data analytics in real time.” It says it works as well for limited, fast queries that use only a small part of the graph as well as complex analysis that touches every vertex in the graph.
Despite those who rail against the weaknesses of Hadoop and MapReduce, Xu based TigerGraph on MapReduce and says it’s all in how it’s implemented.
“Built everything from the ground up, using C++, so we could control the whole stack. We built our own storage engine, our own cross-communication engine,” he said.
The graph is stored both on disk and in memory, allowing the system to take advantage of the data locality on disk, in-memory and CPU cache.
It also allows the user to add data to the database continually without needing to re-run extract transform and load (ETL) processes.
In its own benchmarks, the company claims 4- to 100-times faster graph traversal and query response times compared to Neo4j and Titan, another graph database.
It also claims its parallel computational capability boosts loading speed by 10x — 50 to 150 GB of data per hour, per machine.
It offers multiple ways to load data, including RESTful application program interfaces, high-level mapping of comma-separated value and JavaScript Object Notation files to graphic vertices and connectors to popular data sources.
The company also announced the availability of a hosted version of TigerGraph on Amazon EC2 and GraphStudio, its visual software development kit (SDK).