Convolutional Neural Networks limitations for AGI in Computer Vision

I have been posting with some excitement and honestly with a bit of beginner beguiling attitude about Convolutional Neural Networks in this Blog. Those posts were mainly last year when reviewing some papers about Computer Vision, and  the blog was named The Information Age. Now The Intelligence of Information is about a more informed and serious take on Artificial Intelligence or Machine Learning/Neural Networks issues and developments.

To this end it is advisable to listen the more qualified and significant researchers or pioneers in a particular field. That is the case with Dr. Geoffrey Hinton about Neural Netwoks in general, having been one of the main pioneers and creative researchers in the field. His very British upper lip stiff seriousness is also a somewhat of a further important guarantee that we are not being led astray with what we listen. The video I share here today is one of those that gives us precisely that more qualified perspective. And it is form a talk delivered in 2014, some three years before I started to mention ConvNets as state-of-the-art in deep neural networks for Computer Vision. This is not to diminish their relevance but to identify what are their shortcomings. Dr. Hinton is very good at communicating precisely what he thinks are these limitations in ConvNets. To my mind I haven’t yet found so good a narrative as his, accounting for the obvious subjectivity in these kinds of judgements. I just feel good listening to Dr. Hinton talking, and I like to share what feels good to me.

The talk begins by a serialization of the advantages and disadvantages of ConvNets in a task like object recognition, where Dr. Hinton lists what are the points to be happy with, and the main point where he feels less happy with, like the way pooling of neurons done by ConvNets messes with a complicated interweaving of layers of neurons with persistent efficiency that becomes a burden later in the processing of new information.

Then we are offered with a listing of points why Dr. Hinton believes in the Convolution property of ConvNets but not in its pooling property. The way it is explained why Machine Learning is important for the pooling of information from lower dimensional levels in the data to higher ones is a passage worth a listen more than… two times. It just is proper fastening of pieces of knowledge in a jigsaw puzzle being brilliantly but serenely explained (with the normal unperturbed English way… ). We are forced to understand the intricate nature of how the human brain performs vision and why it is so hard for a machine to mimic it to the required perfection. That list is composed of four arguments revolving around the psychology of shape perception, why the human brain does not do invariant viewpoint resolution but are actually equivariant disentangling machines, the incapacity of pooling from ConvNets to use the underlying linear structure (the natural linear manifold that perfectly handles the largest source of variance in images, considered the most serious issue about pooling for Dr. Hinton…) and issues around dynamic routing, with this last argument being critical for why proper implementation of Machine Learning helps Computer Graphics practitioners do dimensional hoping in order to deal with high-dimensional data complexities.

Interesting how Dr. Hinton explains why ConvNets cannot explain effects of assigning of coordinate frames to objects, the top down way the human brain imposes coordinate frames to the recognition of particular objects without us really being aware of how this processing is done. A different coordinate frame yields a completely different recognition but ConvNets do not have a notion of this (some interesting uncertainties here…). After this a good laughing part on the tetrahedron puzzle task… Ah ah 😄

The talk then goes to its main argument: that of introducing the Dr. Geoffrey Hinton’s capsules project, that aims to overcome the limitations with current Convolutional Neural Networks architectures. This is a development and presentation form 2014 and, as of now, I do not know about the current status of this project and if it is an important addition to the quest for Artificial General Intelligence (AGI) in vision tasks. A final interesting take form this talk is that AGI will have to cope with the highly paralleled distributed (non-linear) nature of computations in the human brain, some way or another… instead of doing some brute force pooling of layers of neurons.

The rest of the talk details all the arguments with more points and views. I hope the reader and follower of The Intelligence of Information have a good and formative past time with this post and video. That is my and I guess yours purpose.

featured image: NIRFaceNet: A Convolutional Neural Network for Near-Infrared Face Identification


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s