Today I could not but come back again to PyData London 2017 series of YouTube videos. That is because the field of Bayesian Deep Learning continues to make its strides forward, but also because the quality of the research, researchers and speakers on the subject deserves mine and yours attention, please… 😊
This time the culprit was Andrew Rowan and his talk video shared here below. It is about another opportunity to learn and understand a bit more about Bayesian Deep Learning, presenting a new probabilistic programming framework called Edward and understanding also that the technique Dropout, commonly used in the context of regularization in dealing with overfitting of deep neural networks, might also be views as a Bayesian Approximation method in deep neural networks settings.
The Edward probabilistic programming framework was designed for extending the common Python and TensorFlow libraries dealing with deep neural networks workloads like inference, or variational inference implemented for several benchmark data sets.
One of first important takeaways from this talk is the importance probabilistic programming might have in regard with Artificial Intelligence (AI) safety, especially when AI is applied in fields such as medicine or finance.
Then the explanation of the modern revival of Bayesian Deep Learning, its links with Monte Carlo estimators for doing variational inference in deep neural nets; from there the connection to the development of probabilistic programming and automated inference models follows seamlessly.
The Edward framework actually is TensorFlow with added features such as random variables and inference algorithms. If it is TensorFlow based it is a computational graph node compute engine. Then an intuitive understanding of its philosophy of performance: it builds a model of an inference problem, infer the model given data and then performs a criticism of the model given the data, which Andrew Rowan specified as a Posterior Predictive Checks operation in order to reproduce data features.
Further Edward allows for the implementation of scalable black box variational inference techniques, through Monte Carlo sampling at the cost of noisy gradient estimation, to which Edward reduces its variance by automating all the process.
But how is Dropout in non-probabilistic neural nets connected to variational inference in Bayesian neural nets? By reparametrize and factorize the variational distribution of weights in a neural network to a non-Gaussian (Bernoulli) sampling distribution such that it is as if a Dropout objective optimization function. In fact it was earlier demonstrated the equivalence between a dropout objective and the approximate ELBO (Monte Carlo estimators). As such is a kind of Bayesian Approximation.
MC Dropout experiments revealed to be competitive for convolutional neural networks, recurrent neural networks and reinforcement learning in experiments with the CIFAR 10 datasets and other datasets.
The most probable future lines of research in this field were outlined by Dr. Andrew Rowan as: better variational posterior approximations (normalizing flows in PyMC3, hierarchical variational models, etc..) and lower variance ELBO estimators (less noisy Monte Carlo estimators).