I am coming back today to a blog that I usually follow with close attention. Having been busy with my own interests lately, I didn’t find time to share the posts of Adrian Colyer’s interesting The Morning Paper. He posts every single weekday a nice review of papers, reports or whitepapers in the closely intertwined topics of Software Engineering, Computer Science and the more recent coalescence of all those topics sometimes we find in the much hyped field of Data Science.
I usually post The Morning Paper directly to my social feed, but today I decided to open an exception. The paper and topic of today’s post in that blog is warranted. It concerns the important datacenter function of sharding, and on a cloud protocol by Google called Slicer that aims to be a standard general purpose sharding service.
This aim is at odds with what Adrian normally thinks about the sharding function in a data service. And he gives his own interpretation:
Another piece of Google’s back-end infrastructure is revealed in this paper, ready to spawn some new open source implementations of the same ideas no doubt. Slicer is a general purpose sharding service. I normally think of sharding as something that happens within a (typically data) service, not as a general purpose infrastructure service. What exactly is Slicer then? It has two key components: a data plane that acts as an affinity-aware load balancer, with affinity managed based on application-specified keys; and a control plane that monitors load and instructs applications processes as to which keys they should be serving at any one point in time.
Slicer is focused exclusively on the problem of balancing load across a given set of backend tasks, other systems are responsible for adding and removing tasks:
Experience taught us that sharding is hard to get right: the plumbing is tedious, and it can take years to tune and cover corner cases. Rebuilding a sharder for every application wastes engineering effort and often produces brittle results.
One interesting point I would like to mention about this paper is that it describes a general purpose sharding service that is an automation application. Hence it has some level of artificial intelligence built-in. The other nice feature is that it organizes its operations and ‘tasks’ using hash functions, a usual friend of cryptography and distributed computing these days as we all know:
Slicer hashes each application key into a 63-bit slice key; each slice in an assignment is a range in this hashed keyspace. Manipulating key ranges makes Slicer’s workload independent of whether an application has ten keys or a billion and means that an application can create new keys without Slicer on the critical path. As a result, there is no limit on the number of keys nor must they be enumerated. Hashing keys simplifies the load balancing algorithm because clusters of hot keys in the application’s keyspace are likely uniformly distributed in the hashed keyspace.
(back to the morning paper content):
Slicer will honour a minimum level of redundancy per-key to protect availability, and automatically increases replication for hot slices.
The Clerk interface provides a single function for finding the addresses of assigned tasks given a key, but in Google most applications don’t use the Clerk library directly and instead benefit from transparent integration with Google’s RPC system Stubby, or Google’s Front End (GFE) http proxy.
We balance load because we do not know the future: unexpected surges of traffic arrive at arbitrary tasks. Maintaining the system in a balanced state maximizes the buffer between current load and capacity for each task, buying the system time to observe and react.
(back to the morning paper content):
Slicer monitors key load (request rate and/or application reported custom metrics) to determine when load balancing changes are required. The overall objective is to minimize load imbalance, the ratio of the maximum task load to the mean task load. When making key assignments Slicer must also consider the minimum and maximum number of tasks per key specified in configuration options, and should attempt to limit key churn – the fraction of keys impacted by reassignment. Key churn itself is a source of additional load and overhead. In order to scale to billions of keys, Slicer represents assignments using key ranges, aka slices. Thus sometimes it is necessary to split a key range to cope with a hot slice, and sometimes existing slices are merged.
Detail of Slicer’s implementation
Slicer aims to combine the high-quality, strongly consistent sharding decisions of a centralized system with the scalability, low latency, and fault tolerance associated with local decisions.
Adrian mentions in his post the apparent contradiction in that Slicer is conceptually a centralized service, but its practical implementation is distributed. This distribution is on global scale across all Google’s planetary datacenters (will we in the future be talking also about interplanetary space datacenters…?):
A two-tier distributor tree distributes decisions following an assignment change. This happens following a pull-model: subscribers ask a distributor for a job’s assignment, and if a distributor doesn’t have it, it asks the Assigner. Slicer is designed to maintain request routing even in the face of infrastructure failures or failures of Slicer itself. The control-plane / data-plane separation means that most failures hinder timely re-optimization of assignments, but do not delay request routing.
The following part of the fault-tolerating design caught my eye:
The Distributors share a nontrivial code base and thus risk a correlated failure due to a code or configuration error. We have yet to experience such a correlated failure, but our paranoia and institutional wisdom motivated us to guard against it.
Conclusion with a production note
In a one-week period, Slicer perform 260 billion requests for a subset of its Stubby clients: 99.98% of these succeeded, which establishes a lower bound on Slicer’s availability (requests may have failed for other reasons too). In another week, 272 billion requests arrived for a given backend service, of which only 11.6 million (0.004%) had been misrouted. (Many applications can tolerate misdirected requests with only an impact on latency or overhead, not availability).
There are charts and graphs aplenty in §5 of the paper if that’s your thing!:
(the last conclusion section from the reviewed paper):
Production deployment of Slicer shows that the system meets its load balancing and availability goals. Real applications experience a max:mean load ratio of 1.3–2.8, assisting peak load capacity planning. Slicer balances load better than load-aware consistent hashing, and does so while creating an order of magnitude less key churn. Slicer is available, correctly routing production customer requests at least 99.98% of the time, making it a building block for highly available applications.
Adoption by over 20 projects with a variety of use cases demonstrates the generality of its API
featured image: Data Center Infrastructure Management