Learning algorithms for people: Supervised learning

Access to education is widely considered a human right, and, as such, many people spend years at school learning. Many of these people also spend a lot of time practising sport, musical instruments and other hobbies and skills. But how exactly do people go about trying to learn? In machine learning, algorithms are clearly defined procedures for learning. Strangely, though the human brain is a machine of sorts, we don’t really consider experimenting with “algorithms” for our own learning. Perhaps we should.

Machine learning is typically divided into three paradigms: supervised learning, reinforcement learning, and unsupervised learning. These roughly translate into “learning with detailed feedback”, “learning with rewards and punishments” and “learning without any feedback” respectively. These types of learning have some close relationships to the learning that people and animals already do.

Many people already do supervised learning, although probably much more haphazardly than a machine algorithm might dictate. Supervised learning  is good when the answers are available. So when practising for a quiz, or practising a motor skill, we make attempts, then try to adjust based on error we observe. A basic algorithm for people to perform supervised learning to memorise discrete facts could be written as:

given quiz questions, Q, correct answers, A, and stopping criteria, S
        for each quiz question q in Q
            record predicted answer p
        for each predicted answer p
            compare p with correct answer, a
            record error, e
    while stopping criteria, S, are not met

Anyone could use this procedure for rote memorisation of facts, using a certain percentage of correct answers and a set time as the stopping criteria. However, this algorithm supposes the existence of questions associated with the facts to memorise. Memorisation can be difficult without a context to prompt recall and questions can also help links these facts together. Much like it being common for people to find recall better when knowledge is presented visually, aurally and in tactile formats. The machine learning equivalent would be adding extra input dimensions to associate with the output. Supervised learning also makes sense for trying to learn motor skills, this is roughly what many people do already when practising skills for sports or musical instruments.

It makes sense to use slightly different procedures for practising motor skills compared to doing quizzes. In addition to getting the desired outcome, gaining proficiency also requires the practising the technique of the skill.  Good outcomes can often be achieved with poor technique, and poor outcomes might occur with good technique. But to attain a high proficiency, technique is very important. To learn a skill well, it is necessary to pay attention not only to errors in the outcome, but also errors in the technique. For this reason, it is good to first spend time focusing practise on the technique. Once the technique is correct, focus can then be more effectively directed toward achieving the desired outcome.

given correct skill technique, T, and stopping criteria, S
        attempt skill
        compare attempt technique to correct technique, T
        note required adjustments to technique
     while stopping criteria, S, not met

given desired skill outcome, O, and stopping criteria, S
         attempt skill
         compare attempt outcome to desired outcome, O
         note required adjustments to skill
     while stopping criteria, S, are not met

These basic, general algorithms spell out the obvious of what many people already do: learn through repetition of phases of attempts, evaluations and adjustments. It’s possible to continue to describe current methods of teaching and learning as algorithms. And it’s also possible to search for optimal learning processes, characterising the learning algorithms we use, and the structure of education, to discover what is most effective. It may be that different people learn more effectively using different algorithms, or that some people could benefit from practising these algorithms to get better at learning. In future, I will try to write some further posts about learning topics and skills, and applications for different paradigms of learning, as well as algorithms describing systems of education.

I spy with my computer vision eye… Wally? Waldo?

Lately I’ve been devoting a bit of my attention to image processing and computer vision. It’s interesting to see so many varied processes applied to the problem over the last 50 or so years, especially when computer vision was once thought to be solvable in a single summer’s work. We humans perceive things with such apparent ease, it was probably thought that it would be a much simpler problem than playing chess. Now, after decades of focused attention, the attempts that appear most successful at image recognition of handwritten digits, street signs, toys, or even thousands of real-world images, are those that, in some way, model the networks of connections and processes of the brain.

You may have heard about the Google learning system that learned to recognise the faces of cats and people from YouTube videos. This is part of a revolution in artificial neural networks known as deep learning. Among deep learning architectures are ones that use many units that activate stochastically and clever learning rules (e.g., stochastic gradient descent and contrastive divergence). The networks can be trained to perform image classification to state-of-the-art levels of accuracy. Perhaps another interesting thing about these developments, a number of which have come from Geoffrey Hinton and his associates, is that some of them are “generative”. That is, while learning to classify images, these networks can be “turned around” or “unfolded” to create images, compress and cluster images, or perform image completion. This has obvious parallels to the human ability to imagine scenes, and the current understanding of the mammalian primary visual cortex that appears to essentially recreate images received at the retina.

A related type of artificial neural network that has had considerable success is the convolutional neural network. Convolution here is just a fancy term for sliding a small patch of network connections across the entire image to find the result at all locations. These networks also typically uses many layers of neurons, and has achieved similar success in image recognition. These convolutional networks may model known processes in the visual cortices, such as simple cells that detect edges of certain orientations. Outlines in images are combined into complex sets of features and classified. An earlier learning system, known as the neocognitron, used layers of simple cell-like filters without the convolution.

The process of applying the same edge-detection filter over the whole image is similar to the parallel processing that occurs in the brain. Though the thousands of neurons functioning simultaneously has an obvious practical difference to the sequential computation performed in the hardware of a computer; however, GPUs with many processor cores now allow parallel processing in machines. If rather than using direction selective simple cells to detect edges we use image features (such as a loop in a handwritten digit, or the dark circle representing the wheel of a vehicle), we might say the convolution process is similar to scanning an image with our eyes.

Even when we humans are searching for something hidden in a scene, such as our friend Wally (or Waldo), our attention typically centres on one thing at a time. Scanning large, detailed images for Wally often takes us a long time. A computer trained to find Wally in an image using a convolutional network could methodically scan the image a lot faster than us with current hardware. It mightn’t be hard to get a computer to beat us in this challenge for many Where’s Wally images with biologically-inspired image recognition systems (rather than more common, but brittle, image processing techniques).

Even though I think these advances are great, it seems there are things missing from what we are trying to do with these computer vision systems and how we’re trying to train them. We are still throwing information at these learning systems as the disembodied number-crunching machines they are. Though consider how our visual perception abilities allow us to recognise objects in images with little regard for scale, translation, shear, rotation or even colour and illumination; these things are major hurdles for computer vision systems, but for us, they just provide us more information about the scene. These are things we learn to do. Most of the focus of computer vision seems to be related to concept of the “what pathway”, rather than the “how pathway”, of two-streams hypothesis of vision processing in the brain. Maybe researchers could start looking at ways of making these deep networks take that next step. Though extracting information from a scene, such as locating sources of illumination or the motion of objects relative to the camera, might be hard to fit into the current trends of trying to perform unsupervised learning from enormous amounts of unlabelled data.

I think there may be significant advantages to treating the learning system as embodied, and make the real-world property of object permanence something the learning system can latch onto. It’s certainly something that can provide a great deal of leverage in our own learning about objects and how our interactions influence them. It is worth mentioning that machine learning practitioners already commonly create new numerous modified training images from their given set and see measurable improvements. This is similar to what happens when a person or animal is exposed to an object and given the chance to view it from multiple angles and under different lighting conditions. Having a series of contiguous view-points is likely to more easily allow parts of our brain to learn to compensate for different perspectives that scale, shear, rotate and translate the view of objects. It may even be important to learning to predict and recreate different perspectives in our imagination.