I once wrote a series of posts to the portal Data Science Central. After losing a bit of confidence in my writings about the subject I stopped. But I recognized the quality of the community of writers to this portal and I am still part of that community. One of them recently wrote an article absolutely brilliant about the future of Deep Learning. The points mentioned in the article about the current shortcomings of Deep Learning technologies weren’t full common knowledge to me, although I may have already touched superficially here or elsewhere on those issues.
The article I am referring to was written by William Vorhies, which is the Editorial Director of Data Science Central and has extensive experience as a data scientist and in the commercial predictive analytics industry. I will re-post the main highlights of this post here today, that is an enjoyable read indeed. Also later in the post I will analyze the video found as a link in this post about a TEDtalk dealing with the relationship of modern neuroscience and the future of deep learning (robotics) technologies.
What’s Wrong with CNNs?
The limitations of CNNs have been understood for some time but we’ve been getting such good returns for the last two years it’s been easy to overlook them.
- Need too Much Data. CNNs are not One-Shot Learners: They require massive amounts of tagged training data which is in short supply and expensive to produce. There are a large number of tuning parameters that need to be adjusted by a data scientist that makes the set up long and labor intensive. Even the fitness functions are complex although they rely on common gradient descent logic.
- They Can’t Extract Meaning and They Can’t Remember: CNNs are basically just very good classifiers. Is that a cat – yes or no? It can’t remember patterns that it may have recently developed elsewhere and apply them to the new data set. If you asked a CNN-enabled robot “please get me something to eat with” or “please get me a knife and fork” it cannot recognize that these are essentially the same question.
- They are Supervised Learners: With the exception of recent advances made in Adversarial Learning CNNs need labeled examples and lots of them.
- Need Faster and More Powerful Machines: All those layers need ever faster chips on ever larger MPP clusters. If you scan the literature on Deep Learning you’ll immediately see that the great majority of it is about advances and investments in new and exotic chips. For Intel and their competitors this is a goldmine. There’s a virtual arms race going on to build chips that are ever-faster-ever-cheaper for CNNs.
Where Are 3rd and 4th Gen NNs Coming From?
The simple answer is academia but the more interesting answer is from brain research. AI that mimics the way the brain functions is labeled ‘strong’ AI, while AI that doesn’t worry too much about the exact model but gets the same results is called ‘weak’. We recently argued that since the ‘weak’ school has been in transcendence with CNNs that we should find a more dignified and instructive name like ‘engineered’ AI.
What’s most revealing about 3rd and 4th gen NNs is that they are coming from the very research labs that are attempting to reveal exactly how neurons and synapses collaborate within the brain. What was very slow progress for a long time is now experiencing major breakthroughs.
There are many of these modeled-brains underway and if you’d like to see a very impressive demonstration, actually from 2013, see this YouTube video of ‘Spaun’ created by Chris Eliasmith at the University of Waterloo that remembers, and learns unsupervised from its environment.
So the ‘strong’ school looks like it is not only making a comeback but will in fact dominate in the future. We’ll describe the 3rd gen in a minute. The 4th gen that doesn’t yet exist does already have a name. These will be ‘neurophasic’ nets or more likely just brains on a chip.
Spiking Neural Nets (SNNs) (also sometimes called Oscillatory NNs) are being developed from an examination of the fact that neurons do not constantly communicate with one another but rather in spikes of signals. We all have heard of alpha waves in the brain and these oscillations are only one manifestation of the irregular cyclic and spiking nature of communication among neurons.
So if individual neurons are activated only under specific circumstances in which the electrical potential exceeds a specific threshold, a spike, what might be the implication for designing neural nets? For one, there is the fundamental question of whether information is being encoded in the rate, amplitude, or even latency of the spikes. It appears this is so.
The SNNs that have been demonstrated thus far show the following characteristics:
- They can be developed with far fewer layers. If nodes only fire in response to a spike (actually a train of spikes) then one spiking neuron could replace many hundreds of hidden units on a sigmoidal NN.
- There are implications for energy efficiency. SNNs should require much lower power than CNNs.
- You could in theory route spikes like data packets further reducing layers. It’s tempting to say this reduces complexity and it’s true that layers go away, but are replaced by the complexity of interpreting and directing basically noisy spike trains.
- Training SNNs does not rely on gradient descent functions as do CNNs. Gradient descent which looks at the performance of the overall network can be led astray by unusual conditions at a layer like a non-differentiable activation function. The current and typical way to train SNNs is some variation on ‘Spike Timing Dependent Plasticity’ and is based on the timing, amplitude, or latency of the spike train.
What we can observe in the early examples is this:
- They can learn from one source and apply it to another. They can generalize about their environment.
- They can remember. Tasks once learned can be recalled and applied to other data.
- They are much more energy-efficient which opens a path to miniaturization.
- They learn from their environment unsupervised and with very few examples or observations. That makes them quick learners.
- Particularly interesting in the Spaun demo mentioned above, it makes the same types of mistakes at the same frequency as human learners. For example, when shown a long series of numbers and asked to recall them, the mistakes tended to be in the middle of the series. Experimentally this is exactly what would happen with human learners. This implies that the SNNs in Shaun are in fact closely approximating human brain function.
A final example from a local stealth-mode startup using advanced SNNs. When the SNN was shown a short video of cars moving on a highway it rapidly evolved a counting function. It wasn’t told what its purpose was. It wasn’t told what a car looked like. The images (the data) were moving in an organized but also somewhat chaotic manner. A few minutes later it started to count.
Chris Eliasmith at TEDxWaterloo 2013: How to build a Brain?
In the video below we listen to an interesting talk at a TEDxWaterloo of the year 2013. Dr. Chris Eliasmith describes to us his effort in creating a model of the Brain that one day might be suitable to replicate in an electronic device. The first words are already of encouragement for we never to give up on having the drive to test new ideas, even if everyone around you say how ridiculous they are. To really innovate we need to maybe try a minimum of reasonable 10 ideas to 1 succeed in the end…:
I would stress further the relevance of some aspects in this talk: first the importance of functional computation to modeling the brain, specially when trying to model cognitive functions that the human brain is very good at such as working memory and the task of recognizing dynamically, simultaneously and stably a diverse plethora of behaviors and act and deciding as a response to behaviors; second the way the brain performs all those tasks without us being conscious of them, hinting that the consciousness we so cherish is sometimes a cost and not an asset for better brain performance.
As a final remark it is worth to point out what Dr. Eliasimth says about the nature of the model he presents. First he says that it is a brain like model and not an artificial like model of the brain that closes the gap between mainstream current artificila neural networks models and emergent general artificial intelligence models. Second it has an enormous potential to give insights for treatment of brain disorders, by the way spaun (Semantic Pointer Architecture Unified Network) will interact with neuromorphic computers of the future, pointing to the deeper collaboration between the medical side of neuroscience and the computational side.
Really exciting times in these fields of study worth our attention and this blog in particular, as long as it will be possible will continue that rewarding task.
Body text Image: Data Science Central link