Ofir Shalev, CTO/CIO, CXA Group
There's been a definite buzz lately about deep learning. On March 2016, Google’s artificially intelligent Go-playing computer system, combining deep neural networks and tree search, defeated Korean Grandmaster Lee Sedol. It was the first time that a machine topped the best of Go, a 2,500-year-old game that’s exponentially more complex than chess. A recent paper published in January 2017 introduces DeepStack, a new algorithm for imperfect information settings which is the first to beat Professional No-Limit Texas Hold’em poker players consistently.
During the past few years, deep learning algorithms have been successfully applied not just to games but also to text, image, sound, and motion. Since 2012, deep learning algorithms dominated the IMAGENET Large Scale Visual Recognition Challenge (ILSVR) competition. Contestants in this competition have a simple task; presented with an image of some kind, they need to decide whether it contains a particular type of object or not. For example, looking at Fig. 1, a contestant may determine that there is a Soccer player but no Rugby player. With over 1 million images spanning across 1,000 different categories, this is one of the most challenging competitions out there.
In 2010 and 2011, the first two years of the ILSVR competition, 28 percent and 26 percent error rates were considered good classification error rates. In 2012, Team SuperVision, Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton came first with 15 percent classification error rates, training a large, deep convolutional neural network (AlexNet) on raw RGB pixel values. Their approach stood in contrast to the common approach at that time, the feature engineering approach. With feature engineering, a human is applying domain knowledge of the data to create features that make machine learning algorithms work. As an example, looking at the two images on the far left in Fig. 2, the average colour of the image may be used to differentiate between the first Football image which is mostly green and red and the Hockey image which is mostly white, but the same feature may not distinguish well from images to its right. Coming up with features is difficult, time-consuming, and requires expert knowledge.
“It is important to remember that deep learning is not the only approach to machine learning being pursued today”
Since 2012, deep learning algorithms dominated the ILSVR competition with the 2016 classification error going as low as 3.0percent compared to 3.6 percent in 2015, better than the estimated 5.1 percent human classification error on the same dataset.
The recent revival of interest in neural networks and deep learning has had a strong impact on other areas such as speech recognition, Natural Language Processing (NLP) including word embedding, part of speech tagging, and sentiment analysis improving the state-of-the-art in single sentence positive/negative classification from 80 percent up to 85.4 percent.
What are Deep Neural Networks?
According to Wikipedia, “Neural networks or connectionist systems are a computational approach used in computer science and other research disciplines, based on an extensive collection of neural units (artificial neurones), loosely mimicking the way a biological brain solves problems with large clusters of biological neurones connected by axons. Each neural unit is connected wimany others, and links can be enforcing or inhibitory in their effect on the activation state of connected neural units.”
Fig. 3 shows an example of a deep neural network, which is the implementation of neural networks with more than a single hidden layer of neurones. The first layer to the left is the input layer, followed by two hidden layers and an output layer to the right.
Back propagation is one of the common methods of training neural networks, where a certain image or some data is presented to the input layer of the neural network. During the forward-pass, the network predicts a certain outcome, for example, the label Soccer player. The output of the network is then compared to the desired output using a loss function, and an error value is calculated for each of the neurones in the output layer. The error values are then propagated backward, and the gradient with respect to the weights is used to update the weights in an attempt to minimise the loss function.
The detailed process of building and training deep neural networks is beyond the scope of this article.
The results of applying deep learning algorithms on text, image, sound, and video are truly remarkable; transforming application such as self-driving cars, web-search, e-commerce, automatic machine translation, object classification in photographs, speech and face recognition, chatbots, and medical application such as detection of diabetic eye disease and tumour tissue image classification. It is important to remember that deep learning is not the only approach to machine learning being pursued today and that deep learning is not the solution to machine intelligence in general. Achieving an extremely low error rate at the ILSVR classification challenge doesn’t mean that we had solved computer vision, as the human vision can perceive better and faster in complex situations. Deep learning algorithms require huge amounts of training data and massive amounts of computational power; both may not be available to most organisations.
There’s still a lot of work left to do!
Founded in 2013, CXA is a Singapore-based private workplace exchange transforming organisations’ current healthcare spend into a benefits and wellness program to empower employees to become healthier.