Thanks in no small part to Facebook, almost everyone assumes graph databases are the province of social networking sites. After all, it is a technology that excels in connecting far-flung entities.
In actuality, however, graph databases are becoming useful for a whole range of duties beyond connecting friends and relatives. Take a look at what is happening with Neo4j, perhaps the most widely used graph database.
“Social has been a blessing and a curse for me,” said Emil Eifrem, CEO of Neo Technology, which offers Neo4j. “One of the big misconceptions about graph databases is that they are used only for social.”
Eifrem is quick to note, “We’re seeing a lot more complicated uses of graph databases in the enterprise, things like fraud detection.”
On Tuesday, Neo released version 3 of the graph database system, which comes with many new features to make graph database use more palatable to the mainstream developer. Download the bits here.
Mapping the Constellations
Launched in 2007, the Neo4j has already gotten some take-up in by some of the largest companies, especially in the space of retail, telecommunications and health care. About 100 of the Global 2000, Forbes list of the largest public companies, use the database. Walmart, for instance, uses it to generate product recommendations for its online retail operations.
IT giants are also increasingly backing the technology. For instance, IBM has stepped in to optimize the database to work with its Power8 line of servers, opening the possibility of running terabyte-sized Neo4j instances completely in memory.
Last month, when the International Consortium of Investigative Journalists (ICIJ) exposed the offshore tax havens of some of the world’s richest people, they used the Neo4j to make the linkages, after indexing the material with Apache Solr and Tika.
“The domain model used by the ICIJ is really basic, just containing four types of entities (Officer, Client, Company, Address) and four relationships between them,” explained Neo’s Michael Hunger and William Lyon, in a blog post detailing how the documents can be further analyzed with Neo4j.
Analyzing the linkages, the journalists initially identified five government leaders who held money in off-shore or shell accounts, which resulted in at least one resignation, that of Iceland’s Prime Minister.
A graph database differs from standard relational databases in that instead of storing data in tables, and linking the data through foreign keys; data is stored in individual nodes, which are connected by specified relationships. One node may hold a product name while another may hold a vendor name, with the relationship between them specifying that the vendor supplies that product.
Graph computing, if not graph databases specifically, are most widely known through social media sites such as Facebook and LinkedIn, both which generate a lot of potentially useful information by making connections across different people (Think of Facebook’s “People You May Know” feature which suggests possible friends based on mutual friends).
Traditional relational databases are fairly bad at generating information about relations between entities. Every relationship query requires at least one join function, which degrades performance really quickly.
So think of any computing that needs to be done involving understanding the relationships among different entities. That is the purview of graph databases.
Take detecting the fraudulent use of a credit card, for instance. Fraud detection relies on a technique called link analysis, which examines who is charging items on a credit card, and where these transactions are taking place, and comparing them to a historical record of the credit card holder. Ideally, fraud detection should happen in real-time, so the cheaters don’t make off with too much of the money.
What’s in the Box?
For this 3.0 release, the Neo architects created the a new binary wire protocol, called Bolt, designed to speed communications between the application and the database system, a job heretofore handled by REST. The company also spun up drivers to work the protocol, for Java, .NET, JavaScript and Python.
These drivers will make it easier for developers to build applications for Neo4j, noted James Governor, RedMonk co-founder and analyst, in a statement.
Also potentially making Neo4j easier to work with for developers is support for Java Stored Procedures, which sets the stage for schema introspection.
Neo4j 3.0 also revamps its core technology to work with larger data sets. The company had found some of its customers were pushing into hundreds-of-billions scale graphs, though the software was more suited to the range of tens of billions of records. Thanks to a new storage engine, that limit has been removed.
Feature Image: Johannes Hevelius, “Prodromus Astronomia, volume III: Firmamentum Sobiescianum, sive Uranographia,” table QQ: Orion, 1690.