Here at the OpenStack Summit in Atlanta, GA, HGST, a Western Digital company, took the wraps off of its new “open Ethernet drive architecture,” designed to demonstrate how applications can be run directly on hard drives and SSDs. The intent of the architecture is to enable use cases such as software-defined storage, big data, analytics and other applications that benefit from minimizing latency and increasing bandwidth between storage and compute. With this new architecture, HGST integrates a full compute stack, including CPU, RAM and networking, right onto the drives.

As demonstrated, each 4TB drive (if we can even call this just a “drive”) contains an ARM CPU, on-board RAM, and is running a full Linux environment. This allows each to run off-the-shelf scale-out storage software—GlusterFS, Ceph (now both Red Hat technologies) and Swift were demonstrated—and fully participate as a node in a distributed storage cluster.

While each drive contains a full Ethernet stack, the drives themselves connect via SAS to a switched network fabric embedded in the 4U, 60 slot drive enclosure. The fabric exposes each device as a Linux server via 10Gbps Ethernet. A quarter-petabye in a 4U enclosure is not bad, especially considering the compute capability that comes along for the ride.

Driving Big Data with Full-Stack Disks

While the use case featured by HGST is scale-out storage, it’s interesting to consider the implications of taking this model further, to the point of full application stacks sitting on individual hard drives. Perhaps one of the most compelling uses of this technology, at least theoretically, would be Hadoop, Spark or another analytics stack running on a device like this.

Read More:   Using Languages to Program Across Clouds – InApps 2022

Much of the success of Hadoop comes from its ability to ensure that computations run on the node that contains the data being operated on, a property called data locality. By integrating the CPU with the drive itself and connecting the two with a high-speed bus, the data path is further optimized and applications are moved even closer to the data.

Of course, there are issues to overcome in trying to get Hadoop to perform adequately on micro devices like the HGST drive, but it has been done. Scaling up the processor might make for a better big data node, but heat would potentially be an issue. For now, though, for the demonstrated use case, Mario Blandini, HGST’s sr. director of product marketing, assures me the increase in heat is negligible.

The company is being careful to manage expectations, says Blandini, who is quick to emphasize that this is only a demonstration of the open Ethernet drive concept. The software-defined storage use case happens to be  particularly well suited to the architecture, due to its modest CPU and memory requirements, but it’s easy to see where this could go.

Seagate is a bit ahead of the game with its Kinetic technology, announced last fall, though Mario points to the open approach taken by his company as an advantage.

Image credit via Create Commons on Flickr