Google’s Data Architecture and What it Takes to Work at Scale – InApps Technology 2022

Main Contents:

Google’s Data Architecture and What it Takes to Work at Scale – InApps Technology is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn Google’s Data Architecture and What it Takes to Work at Scale – InApps Technology in today’s post !

Read more about Google’s Data Architecture and What it Takes to Work at Scale – InApps Technology at Wikipedia

You can find content about Google’s Data Architecture and What it Takes to Work at Scale – InApps Technology from the Wikipedia website

Malte Schwarzkopf — currently finishing his PhD on “operating system support for warehouse-scale computing” at the University of Cambridge — has released a series of slides describing some of his research into large-scale, distributed data architectures.

Schwarzkopf and his team at Cambridge Systems at Scale are aiming to build the next generation of software systems for large-scale data centers. So it has been essential for him to understand how some of the current data giants are configuring their full stack at present, in order to build software for the next wave of businesses that grow with a need to work at a similar scale. Along the way, he has contributed to a number of open source projects including DIOS (a distributed operating system for warehouse-scale data centers that uses an API based on distributed objects); Firmament (a configurable cluster scheduler that looks to apply optimization analysis over a flow network); Musketeer (a workflow manager for big data analytics); and QJump (a network architecture that reduces network interference and provides latency messaging).

Schwarzkopf’s slide deck builds on his extensive bibliography into the Google stack.

His research finds that warehouse-scale computing (defined at 10,000-plus machines) requires a different software stack, all aiming to help increase the utilization of many-core machines, and allow fast, incremental stream processing and approximate analytics (like that offered by BlinkDB) on large datasets. (Many-core is a term meant to indicate a level of magnitude greater than multi-core.)

Schwarzkopf’s research spells out the three main characteristics that many of the largest data-driven companies like Microsoft, Twitter and Yahoo have in common with Google and Facebook:

“Frontend serving systems and fast backends.
Batch data processing systems.
Multi-tier structured/unstructured storage hierarchy.
Coordination system and cluster scheduler.”

In his presentation, “What does it take to make Google work at scale?” Schwarzkopf discusses the architecture behind those 139 microseconds between submitting a search request in the Google input bar, and the pages of ads-and-search results that are returned.

All of what happens, Schwarzkopf says, takes place in containers between customized Linux kernels on each data machine and the transparent layer of distributed systems.

He identifies 16 different software technologies that work in tandem to return the real-time, contextual, personalized search results that users expect from Google.

Screen Shot 2015-08-27 at 9.29.22 PM

These include:

GFS/Colossus: a bulk block data storage system.
Big Table: a three dimensional key-value store that combines row and column keys with a timestamp.
Spanner: Software that uses the GPS and atomic clocks within data centers to enable transactional consistency at a global scale.
MapReduce: a parallel programming framework.
Dremel: a column-oriented datastore useful for quick, interactive queries.
Borg/Omega: the father of Kubernetes, a cluster manager and scheduler for large-scale, distributed data center architecture.

It’s unclear where Schwarzkopf may have presented this work so far: his bio page and Twitter feed don’t indicate that the slides were released in conjunction with any particular talk, it was just provided in a link from a tweet dated August 17. While high-level, the presentation slides are clear enough to provide useful insights into the infrastructure map needed to make distributed architecture work at scale, and there are enough links and resources mentioned that anyone working in the area has plenty of interesting rabbit holes to wander through in late-night research or post-lunch procrastination.

Feature image: “Regards croisés n°3 de Kurt & Thierry Ehrmann” by thierry ehrmann. Licensed under CC BY 2.0.

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.