March 14, 2016
by Francesco Gadaleta
Produced by: worldofpiggy.com
If you have no patience, deep learning is the result of training many layers of non-linear processing units for feature extraction and data transformation e.g. from pixel, to edges, to shapes, to object classification, to scene description, captioning, etc.
As old as the 80s! Then why this approach has been abandoned for a while? The answer is in the lack of big training data and computing power in the early days. However, five major events occurred in the past and all of them contributed to define and make what we today call deep learning possible.
Backpropagation Yann LeCun et al. (1989) (* check Errata) applied supervised backpropagation to such architectures. Weng et al. (1992) published convolutional neural networks Cresceptron for 3-D object recognition from images of cluttered scenes and segmentation of such objects from images.
Max-pooling (1992) appeared to be first proposed by Cresceptron to enable the network to tolerate small-to-large deformation in a hierarchical way, while using convolution. Max-pooling helps, but does not guarantee, shift-invariance at the pixel level.
People tried to train deep networks and they mostly failed. Why? Sepp Hochreiter ‘s diploma thesis of 1991 formally identified the reason for this failure as the vanishing gradient problem , which affects many-layered feedforward networks and recurrent neural networks
Other methods use unsupervised pre-training to structure a neural network, making it first learn generally useful feature detectors. Then the network is trained further by supervised back-propagation to classify labeled data. The deep model of Hinton et al. (2006) involves learning the distribution of a high-level representation using successive layers of binary or real-valued latent variables. It uses a restricted Boltzmann machine (Smolensky, 1986) to model each new layer of higher level features. Each new layer guarantees an increase on the lower-bound of the log likelihood of the data, thus improving the model, if trained properly.
Looks like a tongue-twister, right? Well, it basically says that if trained well, a network can generate data that are similar to the ones that were fed from the training set. Once sufficiently many layers have been learned, the deep architecture may be used as a generative model by reproducing the data when sampling down the model (an “ancestral pass”) from the top level feature activation. Hinton reports that his models are effective feature extractors over high-dimensional, structured data.
Recurrent networks are trained by unfolding them into very deep feed forward networks, where a new layer is created for each time step of an input sequence processed by the network. As errors propagate from layer to layer, they shrink exponentially with the number of layers, impeding the tuning of neuron weights which is based on those errors (LSTMs were proposed as a solution in 1997)
For supervised learning tasks, deep learning methods obviate feature engineering, by translating the data into compact intermediate representations akin to principal components, and derive layered structures which remove redundancy in representation. Moreover, PCA is a linear method that will ignore non-linearities of the data. Many deep learning algorithms are applied to unsupervised learning tasks. This is an important benefit because unlabeled data are usually more abundant than labeled data eg. autoencoders.
Google photos search images by text
Teradeep real-time object classifier like Terminator 1984
As Paul P. suggested, Rumelhart, Hinton, and Williams should be credited with discovering back-propagation, not LeCun.
References are from Hinton Backprop
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986) Learning representations by back-propagating errors. Nature, 323, 533–536
Hinton, G. E. (1986) Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society, Amherst, Mass. Reprinted in Morris, R. G. M. editor, Parallel Distributed Processing: Implications for Psychology and Neurobiology, Oxford University Press, Oxford, UK
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986) Learning internal representations by error propagation. In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations Volume 1: Foundations, MIT Press, Cambridge, MA.
I provide high quality training on statistics, data science, computer programming, in order to facilitate setting data analytics pipelines in the most optimized and cost effective ways. Feel free to schedule a meeting with me
Designing algorithms is what I have been doing in the last 10 years and more. I can set up data analytics solutions for small and large enterprises and apply machine learning algorithms to detect patterns and trends, extract knowledge and support decisions within several commercial domains, from finance, healthcare, traffic, and sales forecasting. I design and deploy algorithms and cloud-based software systems for production environments with high industrial standards. Feel free to schedule a meeting with me
I can provide technical consulting regarding data science, deep learning, computing architectures and the most prominent software packages currently available. My skills in distributed infrastructure are essential to give the insights you need to start your own data analytics pipeline in the most optimized and efficient environment. Feel free to schedule a meeting with me