Today I come back to the excellent website The Next Platform, to share with the viewers an article that I thought very much cuts through the topics I have been writing about here in The Information Age.
It is a well written piece this short article. It is short but the issues behind its content is worth maybe thousands of pages of other articles, PhD or Masters thesis, research papers and so on. What I find quite compelling is how it manages to compress in a single article these profound and cutting-edge science&technology topics: neuromorphic computing, high-performance computing/supercomputers (HPC), quantum computing and deep/machine learning. The usual reader of this blog is by now somewhat familiar with all those topics. I must confess though that I write about all of this with some humility, as I think that most of the articles here only scratch the surface of the topics. Nevertheless, I also think that form now and then, some articles are good deep pieces of review about the specific topic it is written about. The hard part is maintaining a holistic view of how all this is related in a broader Computer Science or Scientific Computing perspective.
Articles such as the one I share here today is one of those that goes in that direction: a short compact holistic view of the complexity of the issues/topics of the above mentioned fields of study.
The very first two paragraphs in this article contextualize for us the topics. Indeed, in spite of the recent trend for hardware development to a slower rate of progress, there should not be any reason to be too pessimistic about the future of computing in general, and of scientific computing in particular. What appears to be emerging is an improved software architecture that does away with the need for miniaturization, an enhanced software architecture capacity taking advantage of a new somewhat unexpected resource: data.
It is difficult to shed a tear for Moore’s Law when there are so many interesting architectural distractions on the systems horizon.
While the steady tick-tock of the tried and true is still audible, the last two years have ushered a fresh wave of new architectures targeting deep learning and other specialized workloads, as well as a bevy of forthcoming hybrids with FPGAs, zippier GPUs, and swiftly emerging open architectures. None of this has been lost on system architects at the bleeding edge, where the rush is on to build systems that can efficiently chew through ever-growing datasets with better performance, lower power consumption, while maintaining programmability and scalability.
Scientific computing generates a lot of data. But researchers were up until the first years of this new century unaware of how this data could be used to enhance the performance of their workloads. To be true to the matter, distributed computing, supercomputers and parallel computing architectures were already a reality, but they weren’t being properly taking advantage of the slew of data they were creating. Enter the new machine learning/deep learning software pipelines to change all this. Suddenly there it was a whole new computing paradigm emerging, with an exciting new way to deal with data:
Yesterday, we profiled how researchers at Oak Ridge National Lab are scaling deep learning by dividing up thousands of networks to run on over 18,000 GPUs using an MPI-based approach. As we discussed, this effort was driven by HPC’s recognition that adding deep learning and machine learning into the supercomputing application mix can drive better use of large wells of scientific data—and bolster the complexity and eventually, capabilities of simulations.
The use of neural networks in this mix was quite interesting. The article mentions the researcher and his work in the following sections. The papers and research of Thomas Potok is an enterprise worth its effort and salt:
One of the researchers involved in that companion effort we described to create auto-generating neural networks for scientific data took neural network hardware investigations one step further. Thomas Potok and his team built a novel deep learning workflow that uses the best of three worlds—supercomputers, neuromorphic devices, and quantum computing.
At a high level, they evaluated the benefits of all three compute platforms (we’ll get to those in a moment) and found that they could use HPC simulation data as the baseline for a convolutional neural network generated using their auto-network tools on the Titan machine, then shift elements of that network to the quantum (using the 1,000 qubit machine at USC/Lockheed) and neuromorphic (developed at ORNL and described here) devices to handle the elements they are best at handling.
Soon enough the difficulties were recognized – inevitably in complex challenging endeavours. But we can also recognize the room (number of different possibilities/perspectives this can take) this challenge has to deal with different approaches or computing goals:
It is difficult to compare the performance of HPC, quantum, and neuromorphic devices on deep learning applications since the measurements are different—as are the areas where each excel. A quantum system can provide deeply connect networks and a greater representation of the information without the computational cost of conventional machines, Potok says, but again, they cannot span the entire deep learning workflow in the way ORNL envisions it with the HPC layer.
Neuromorphic devices, particularly those with a spiking neural network like the one developed at ORNL (DANNA) can take some of the offload of neural networks that incorporate a time series element. In other words, the HPC simulation and origin of the network can be done best on a supercomputer, the higher order functions of a convolutional neural net can be addressed by a quantum machine, and the results can be further analyzed with a temporal aspect from neuromorphic devices.
“With scientific data you often have imagery that has a time aspect. You have a sensor with a particle that interacts with is. With neuromorphic, we can take the standard convolutional neural net and have complementary spiking neural networks that work on the time element of the data or experiment. We can use those in an ensemble to be able to look at the problem not only from a locality aspect in the image—but temporally as well.”
The researchers from the deep learning wider communities – luminaries such as Andrew Ng – were already aware of these issues and challenges. They were also interested in a push forward in the direction the team of Oak Ridge were taking, but were approaching it with not so much ambition:
“Three or four years ago when we started looking at all of this, Andrew Ng and others were trying to scale neural networks across nodes. The problem was, people were not having much luck scaling past 64 nodes or so before the performance gains went away,” Potok explains. “We took a different approach; rather than building a massive 18,000 node deep learning system, we took the biggest challenge—configuring the network, building the topologies and parameters, and using evolutionary optimization to automatically configure it—we made it an optimization problem.”
The further down the road set of challenges will be around scalability, compute demand and operational overhead issues:
All of this might sound like a major integration challenge with so many data types working toward the same end result—even if the inputs and outputs are generated as steps in the overall problem on distributed networks. Potok says the above shown architecture does work, but the real test will be merging the CNN and spiking neural networks to see what results might be achieved. Ultimately, the biggest question is just how hybrid architectures will need to be to sate compute and application demands. It will be beneficial to have a new, more robust, scalable, way to create more complex results from simulations, but there is a lot of overhead—and on devices that are not yet produced at any scale.
Thomas Potok is sanguine and optimist in his views about the future of the three compute paradigms he endorses in his research efforts, though. And what he envisions is nothing short of a whole new way of approaching scientific questions and challenges, where the computing platform is part of the essential loop between human thinking and machine thinking: something which feedbacks well beyond science, as we all by now are familiar with.
Big theory work is part of what national labs do best, of course, but this could move beyond concept in the next generations of machines. Potok doesn’t see such a massively hybrid architecture in the next five years, but does think all three modes of computing hold promise in the next decade. For example, he says, if you look at a field like material science where there is an image of a material at the atomic level (with its associated terabytes of data), what new things could be learned from systems that can take this data, learn from it as a static and temporal record, and then create entirely new ways of considering the material. In other words, not only will the compute platform be new, so too will be the ways problems are approached.
featured image: The University of Tennessee and Oak Ridge National Laboratory