The new Google hardware venture is a somewhat surprising but important and logical business move. It is surprising given the mostly software-centric business model Google has maintained over the years. But is logical form a strategic and tactical perspective to have a capacity to design and control your own hardware, saving costs and building the hardware needs close to your actual software needs, saving also time. The other aspect of the new tensor processing units (TPUs) developed by Google is the fact that this is purpose-built (it is an ASIC) for machine/deep learning computations and designed with a capacity to surpass the performance of GPUs in the those same computations. The enhanced performance is really quite impressive.
I will share today here for The Intelligence of Information two videos from two different YouTube channels that give each an overview of TPUs. The first video is from the Singaporean Engineering company Engineer.SG’s channel and it is a brief 11 minute presentation of the main bullet points about TPUs. The paper on the TPU (In-Datacenter Performance Analysis of a Tensor Processing Unit) is outlined from the outset. It is recommended reading. Next the speaker guides us through the TPU internals, where the neat design of accumulators is an important factor explaining the high performance of these customizable chips. A performance achieved with low compute expense power and low overhead compared with the alternatives of CPUs or GPUs.
The second video is a longer (one hour) take on the Tensor Processing Unit by Dr. Dave Patterson from the University of Berkeley EECS events’s YouTube channel. Dr. Patterson Curriculum Vitae is introduced and speaks for itself with good volume about the qualified speaker. He is one of the long list of researchers that participated in the development of the TPU.
The trend of hardware to become more domain specific architecture based to perform a few or just one task is well defended by Dr. Patterson in the beginning of the talk. Deep Learning and neural networks fit the bill in the highly specific computations that are required of it. This serves as inspiration for designing something like TPUs. Then it is startling to learn that one of the main motivations for building TPUs and behind its origin is the increased data demands of speech, vision or text data from an application like Android. Without a purpose-built hardware like TPUs, Google would need to build a double count of datacenters.
The inference workload algorithms that TPUs are built for are: Multilayer Perceptrons, Convolutional Neural Networks and Recurrent Neural Networks/ Long Short Term Memory (LSTMs), all by now well known deep learning algorithms by this blog and its readers.
The systolic execution (inspired by the heart rate spike beat) in matrix array explained in the video of how TPUs manage compute memory for matrix multiplication is a clever innovation by the designers. It is beautifully explained by Dr. Patterson.
Dr. Patterson lists a number of factors he deems important reasons for the success of TPUs. It is an interesting list of points that will certainly be further checked by researchers. The hardware developments around neural networks have only the way of improvement from this point in time, and I guess that deeper and even more surprising developments might be at the turn of the corner, when we may least expect it. I think these two videos are good qualified presentations of tensor processing units (TPUs) and the followers of this blog will enjoy it and share with their like-minded friends.
featured image: Google opens up about its Tensor Processing Unit