In 2010, YouTube was quickly hitting the wall in terms of scalability. As an interim solution, it created a master MySQL database for write traffic and a replica database for read traffic. But with cat videos at the height of popularity, that was only a temporary fix for the read replicas. The write database was quickly becoming overwhelmed as well, leading the YouTube team to determine that sharding would be necessary if demand became too great for a single MySQL instance. So it created Vitess, software to scale MySQL databases beyond a single server.
“Growing infrastructure at that rate to support that kind of traffic increase without compromising user experience is really hard,” said PlanetScale co-founder and CEO Jiten Vaidya. “You can go do it for CPU and application-level if you are in the cloud, you can start provisioning more and more machines, you can start provisioning more and more processes. So for stateless services, it’s easy. Doing the same thing for databases is hard,” he said, adding that Slack was able to double database capacity in nine days with Vitess rather than eight to nine months such a project would normally entail.
Managed Vitess
Vaidya and PlanetScale co-founder and CTO Sugu Sougoumarane were part of the YouTube team at Google that created Vitess. Their company, PlanetScale, now offers a managed version of Vitess.
They spun off the Mountain View, California-based company in February 2018 after Google donated the project to the Cloud Native Computing Foundation. Vitess became the eighth CNCF project to graduate in November 2019, joining Kubernetes, Prometheus, Envoy, CoreDNS, containerd, Fluentd, and Jaeger. Version 6 of Vitess was released in April.
The company began by offering commercial support for Vitess, introduced its cloud transaction database-as-a-service, PlanetScaleDB. It more recently announced a beta version of PlanetScaleDB for Kubernetes enabling organizations to run their database alongside their applications in the cloud, while ensuring all data remains within the customer’s network perimeter.
The beta offers low latency between application and databases because they’re in the same Kubernetes cluster, he said. PlanetScale handles software upgrades, hardware failures, network partitions, and because your data stays within your company’s network perimeter, companies can continue to comply with their security policies.
PlanetScale’s customers include Slack, Square, Pinterest, GitHub and HubSpot.
Kubernetes Native
Fundamentally Vitess is a sharding middleware system that sits between your applications and shards of MySQL databases. It gives your application the view that it’s just talking to the single monolithic application, so your application doesn’t have to worry about keeping track of which shard holds the data being queried. To do this, it uses a full-fledged MySQL parser built in, along with a supporting MySQL binary protocol. So it just presents itself as a MySQL server to the application.
It groups data with common record IDs on the same shard and provides connection pooling to reduce memory overhead, enabling the platform to handle high concurrency. After parsing the query, it figures out whether the query needs to go to a specific shard or gather data another way, employing query limiters as needed.
And it also has a lot of robust workflows built into it for recharging, and so on, so it also helps operationally a lot to run like thousands of hosts underneath one, Vaidya said.
Originally designed to run on bare metal, when YouTube migrated to the Google Cloud in 2013-14, Vitess was migrated into Borg, the precursor to Kubernetes, supporting PlanetScale’s assertion that Vitess was Kubernetes-ready before Kubernetes even existed.
The selling points of Vitess are the massive scalability of a NoSQL database, while maintaining the consistency model of MySQL, according to Vaidya.
“At YouTube, the biggest piece was that we had was 256 shards. Each child had one master, of course, and between 80 to 120 replicas distributed across 20 data centers in the world. So if you do the math, that comes between [25,000] to 30,000 nodes being managed as a single database, and the application doesn’t need to even worry about that that there are so many replicas across which your reads are being load-balanced, any of that application disk issues, reads and writes and the right things happen,” he said.
“We have built is a system that scales like MongoDB, but with full relational semantics. It’s a full transactional system, so you get secondary indexes, you get transactions, you get the ability to co-locate your data in shards based on your schema. We have extended the idea about a relational schema.
“So we don’t randomly distribute data in shards, we give you a fair amount of control about how the data should be co located,” he said.
Meanwhile, HubSpot runs about 700 databases on Vitess in Kubernetes, he said. All of them are single-sharded. So they are not using it for this ability to horizontally shard the data, but for its ability to run in Kubernetes.
“Vitess has paved the way for us to unify all of our data storage infrastructure and our microservice infrastructure onto Kubernetes, and it’s giving us a blueprint for what the rest of our data stores might look like on Kubernetes,” said Alex Charis, senior software engineer at HubSpot.
PlanetScale runs on Amazon Web Services, The Google Cloud Platform, and Microsoft Azure, four regions each, and bills itself as truly multicloud enabling customers to spread masters and replicas across different clouds. You can have the master running in AWS, and most of the replicas there, too, but one or two replicas running in GCP for disaster recovery and disaster resilience purposes or while migrating out of one from one cloud to another.
With custom sharding functions, you can distribute your data based on a country code, ZIP code or telephone country code to comply with regulations like GDPR.
Going forward, the company plans to concentrate on migration tools, Vaidya said, and making the process for developing a schema for sharding as easy as possible. While it supports MySQL and MariaDB databases now, it plans to add support for Postgres, but that’s down the road.
Amazon Web Services and the Cloud Native Computing Foundation are sponsors of InApps.
Image by Free-Photos from Pixabay
At this time, InApps does not allow comments directly on this website. We invite all readers who wish to discuss a story to visit us on Twitter or Facebook. We also welcome your news tips and feedback via email: [email protected].
InApps is a wholly owned subsidiary of Insight Partners, an investor in the following companies mentioned in this article: PlanetScale.