Learning Algorithms for People: Reinforcement Learning

In a previous post I described some simple ways in which learning could be performed algorithmically, replicating the basic process of supervised learning in machines. Of course it’s no secret that repetition is important for learning many things, but the exact way in which we repeat a training set or trial depends on what it is we are trying to learn or teach. I made a series of post on values and reinforcement learning machines and animals, so here I will describe a process for applying reinforcement learning to developing learning strategies. Perhaps more importantly though, I will discuss a significant notion in machine learning and its relationship to psychological results of conditioning — introducing the value function.

Let’s start with some pseudo-code for a human reinforcement learning algorithm that might be representative of certain styles of learning:

given learning topic, S, a set of assessment, T, and study plan, Q
    for each assessment, t in T
        study learning topic, S, using study plan, Q
        answer test questions in t
        record grade feedback, r
        from feedback, r, update study plan, Q 

This algorithm is vague on the details, but this approach of updating a study plan fits into the common system of education; one where people are given material to learn and given grades as feedback on their responses to assignments and tests.

Let’s come back to the basic definitions of computer-based reinforcement learning. The typical components in reinforcement learning are the state-space, S, which describes the environment, an action-space, A, are the options for attempting to transition to different states, and the value function, Q, that is used to pick an action in any given state. The reinforcement feedback, reward and punishment, can be treated as coming from the environment, and is used to update the value function.

The algorithm above doesn’t easily fit into this structure. Nevertheless, we could consider the combination of the learning topic, S, and the grade-system as the environment. Each assessment, t, is a trial of the study plan, Q, with grades, r, providing an evaluation of the effectiveness of study. The study plan is closely related to the value function — it directs the choices of how to traverse the state-space (learning topic).

This isn’t a perfect analogy, but it leads us to the point of reinforcement feedback: to adjust what is perceived as valuable. We could try to use a reinforcement learning algorithm whenever we are trying to search for the best solution for a routine or a skill, and all we receive as feedback is a success-failure measurement.

Though, coming back to the example algorithm provided, considering grades as the only reinforcement feedback in education is a terrible over-simplification. For example, consider the case of a school in a low socio-economic area where getting a good grade will actually get you punished by your peers. Or consider the case of a child that is given great praise for being “smart”. In a related situation, consider the case of praising a young girl for “looking pretty”. How is the perception of value, particularly self-worth, effected by this praise?

Children, and people in general, feel that the acceptance and approval of their peers is a reward. Praise is a form of approval, and criticism is a form of punishment, and each is informative of what should be valued. If children are punished for looking smart, they will probably value the acceptance of their peers over learning. If children are praised for being smart, they may end up simply avoiding anything that makes them look unintelligent. If children are praised for looking pretty, they may end up valuing looking pretty over being interesting and “good” people.

A solution could be to try to be more discerning about what we praise and criticise. The article linked above makes a good point about praising children for “working hard” rather than “being smart”. Children who feel that their effort is valued are more likely to try hard, even in the face of failure. Children who feel that praise will only come when they are successful, will try to avoid failure. Trying to give children self-esteem by praising them for being smart or pretty is, in fact, making their self-esteem contingent on that very thing they are being praised for.

It may seem manipulative, but praise and criticism are ways we can reward and punish people. Most people will adjust their “value-function”, their perception of what is valuable, and, as a result, they will adjust their actions to try to attain further praise, or avoid further punishment. What we praise and criticise ourselves for, is a reflection of what we value in ourselves. And our self-praise and self-criticism can also be used to influence our values and self-esteem, and hence our actions.

A View on the Nature of Consciousness

In the process of communicating with other bloggers that are interested in the future of humanity and philosophy of mind, some upcoming discussions have been planned on a number of related topics. The first topic is: the nature of consciousness and its relationship to the prospect of artificial intelligence. In preparation for the discussion, I’ve summarised my position on this topic here. I’ve spent some time reading and thinking about the nature of consciousness, so I believe that my position has some firm evidence and logical reasoning supporting it. If requested, I’ll follow up with post(s) and comment(s) providing more detailed descriptions of the steps of reasoning and evidence. As always, I’m open to considering compelling evidence and arguments that refute the points below.

The Nature of Consciousness

1. Consciousness can be defined as its ‘nature’. We seem to define consciousness by trying to describe our experience of it and how others show signs of consciousness or lack of consciousness. If we can successfully explain how consciousness occurs — its nature — we could then use that explanation as a definition. Nevertheless, for now we might use a broad dictionary definition of consciousness, such as “the awareness of internal (mental) and external (sensory) states”.

2. Consciousness is a spectrum. Something may be considered minimally consciousness if its only “aware” of external-sensory states. Higher consciousness includes awareness of internal-mental states, such as conscious thoughts and access to memories. Few animals, even insects, appear to be without “awareness” of memory (particularly spatial memory). As we examine animals of increasing intelligence we typically see a growing sets of perceptual and cognitive abilities — growing complexity in the range of awareness — though varying proficiencies at these abilities.

3. Biological consciousness is the result of physical processes in the brain. Perception and cognition are the result of the activity of localised, though not independent, functional groups of neurons. We can observe a gross relationship between brain structure and cognitive and perceptual abilities by studying structural brain differences animal species of various perceptual and cognitive abilities. With modern technology, and lesion studies, we can observe precise correlations between brain structures, and those cognitive and perceptual processes.

4. The brain is composed of causal structures. The collection of functional groups of neurons in the entire body (peripheral and central nervous system) are interdependent causal systems — at any moment neurons operate according to definable rules, effected by only the past and present states of themselves, their neighbours and surroundings.

5. Causal operation produces representation and meaning. Activity in groups of neurons have the power to abstractly represent information. Neural activity has “meaning” due to being the result of the chain of interactions that typically stretch back to some sensory interaction or memory. The meaning is most clear when neural activity represents external interactions with sensory neurons, e.g., a neuron in the primary visual cortex might encode for an edge of a certain orientation in a particular part of the visual field. There is also evidence for the existence of “grandmother cells”: neurons, typically in the temporal lobe of the neocortex, that activates almost exclusively in response to a very specific concept, such as “Angelina Jolie” (both a picture of the actress and her name).

6. Consciousness is an emergent phenomenon.  Consciousness is (emerges from) the interaction and manipulation of representations, which in biological organisms is performed by the structure of the complete nervous system and developed neural activity. Qualia are representations of primitive sensory interactions and responses. For example, the interaction of light hitting the photosensitive cells in the retina ends up represented as the activation of neurons in the visual cortex. It is potentially possible to have damage to the visual cortex and lose conscious awareness of light (though sometimes still be capable of blindsight). Physiological responses can result from chemicals and neural activity and represent emotions.

7. Consciousness would emerge from any functionally equivalent physical system. Any system that produces the interaction and manipulation of representations will, as a result, produce some form of consciousness. From a functional perspective, a perfect model of neurons, synapses and ambient conditions is not likely to be required to produce representations and interactions. Nevertheless, even if a perfect model of the brain was necessary (down to the atom), the brain and its processes, however complex, function within the physical laws (most likely even classical physics). The principle of universal computation would allow its simulation (given a powerful enough computer) and this simulation would fulfil the criteria above for being conscious.

8. Strong artificial intelligence is possible and would be conscious. Human-like artificial intelligence requires the development of human-equivalent interdependent modules for sensory interaction and perceptual and cognitive processing that manipulate representations. This is theoretically possible in software. The internal representations this artificial intelligence would possess, with processes for interaction and manipulation, would generate qualia and human-like consciousness.

Philosophical Labels

I’ve spent some time reading into various positions within the philosophy of mind, but I’m still not entirely sure where these views fit. I think there are close connections to:

a) Physicalism: I don’t believe there is anything other than that which is describable by physics. That doesn’t mean, however, that there aren’t things that have yet to be adequately described by physics. For example, I’m not aware of an adequate scientific description of the relationship between causation, representation and interpretation — which I think are possibly the most important elements in consciousness. Nevertheless, scientific progress should continue to expand our understanding of the universe.

b) Reductionism and Emergentism: I think things are the sum of their parts (and interactions), but that reducing them to the simplest components is rarely the best way to understand a system. It is, at times, possible to make very accurate, and relatively simple, mathematical models to describe the properties and functionality of complex systems. Finding the right level of description is important in trying to understand the nature of consciousness — finding adequate models of neuronal representations and interactions.

c) Functionalism: These views seem to be consistent with functionalism — consciousness is dependent on the function of the underlying structure of the nervous system. Anything that reproduces the function of a nervous system would also reproduce the emergent property of consciousness. For example, I think the ‘China brain’ would be conscious and experience qualia — it is no more absurd than the neurons in our brain being physically isolated cells that communicate to give rise to the experience of qualia.

Changing Views

I’m open to changing these views in light of sufficiently compelling arguments and evidence. I have incomplete knowledge, and probably some erroneous beliefs; however, I have spent long enough studying artificial intelligence, neuroscience and philosophy to have some confidence in this answer to “What is the nature of consciousness and its relationship to the prospect of artificial intelligence?”.

Please feel free to raise questions or arguments against anything in this post. I’m here to learn, and I will respond to any reasonable comments.

Learning algorithms for people: Supervised learning

Access to education is widely considered a human right, and, as such, many people spend years at school learning. Many of these people also spend a lot of time practising sport, musical instruments and other hobbies and skills. But how exactly do people go about trying to learn? In machine learning, algorithms are clearly defined procedures for learning. Strangely, though the human brain is a machine of sorts, we don’t really consider experimenting with “algorithms” for our own learning. Perhaps we should.

Machine learning is typically divided into three paradigms: supervised learning, reinforcement learning, and unsupervised learning. These roughly translate into “learning with detailed feedback”, “learning with rewards and punishments” and “learning without any feedback” respectively. These types of learning have some close relationships to the learning that people and animals already do.

Many people already do supervised learning, although probably much more haphazardly than a machine algorithm might dictate. Supervised learning  is good when the answers are available. So when practising for a quiz, or practising a motor skill, we make attempts, then try to adjust based on error we observe. A basic algorithm for people to perform supervised learning to memorise discrete facts could be written as:

given quiz questions, Q, correct answers, A, and stopping criteria, S
    do
        for each quiz question q in Q
            record predicted answer p
        for each predicted answer p
            compare p with correct answer, a
            record error, e
    while stopping criteria, S, are not met

Anyone could use this procedure for rote memorisation of facts, using a certain percentage of correct answers and a set time as the stopping criteria. However, this algorithm supposes the existence of questions associated with the facts to memorise. Memorisation can be difficult without a context to prompt recall and questions can also help links these facts together. Much like it being common for people to find recall better when knowledge is presented visually, aurally and in tactile formats. The machine learning equivalent would be adding extra input dimensions to associate with the output. Supervised learning also makes sense for trying to learn motor skills, this is roughly what many people do already when practising skills for sports or musical instruments.

It makes sense to use slightly different procedures for practising motor skills compared to doing quizzes. In addition to getting the desired outcome, gaining proficiency also requires the practising the technique of the skill.  Good outcomes can often be achieved with poor technique, and poor outcomes might occur with good technique. But to attain a high proficiency, technique is very important. To learn a skill well, it is necessary to pay attention not only to errors in the outcome, but also errors in the technique. For this reason, it is good to first spend time focusing practise on the technique. Once the technique is correct, focus can then be more effectively directed toward achieving the desired outcome.

given correct skill technique, T, and stopping criteria, S
    do
        attempt skill
        compare attempt technique to correct technique, T
        note required adjustments to technique
     while stopping criteria, S, not met

given desired skill outcome, O, and stopping criteria, S
     do
         attempt skill
         compare attempt outcome to desired outcome, O
         note required adjustments to skill
     while stopping criteria, S, are not met

These basic, general algorithms spell out the obvious of what many people already do: learn through repetition of phases of attempts, evaluations and adjustments. It’s possible to continue to describe current methods of teaching and learning as algorithms. And it’s also possible to search for optimal learning processes, characterising the learning algorithms we use, and the structure of education, to discover what is most effective. It may be that different people learn more effectively using different algorithms, or that some people could benefit from practising these algorithms to get better at learning. In future, I will try to write some further posts about learning topics and skills, and applications for different paradigms of learning, as well as algorithms describing systems of education.

I spy with my computer vision eye… Wally? Waldo?

Lately I’ve been devoting a bit of my attention to image processing and computer vision. It’s interesting to see so many varied processes applied to the problem over the last 50 or so years, especially when computer vision was once thought to be solvable in a single summer’s work. We humans perceive things with such apparent ease, it was probably thought that it would be a much simpler problem than playing chess. Now, after decades of focused attention, the attempts that appear most successful at image recognition of handwritten digits, street signs, toys, or even thousands of real-world images, are those that, in some way, model the networks of connections and processes of the brain.

You may have heard about the Google learning system that learned to recognise the faces of cats and people from YouTube videos. This is part of a revolution in artificial neural networks known as deep learning. Among deep learning architectures are ones that use many units that activate stochastically and clever learning rules (e.g., stochastic gradient descent and contrastive divergence). The networks can be trained to perform image classification to state-of-the-art levels of accuracy. Perhaps another interesting thing about these developments, a number of which have come from Geoffrey Hinton and his associates, is that some of them are “generative”. That is, while learning to classify images, these networks can be “turned around” or “unfolded” to create images, compress and cluster images, or perform image completion. This has obvious parallels to the human ability to imagine scenes, and the current understanding of the mammalian primary visual cortex that appears to essentially recreate images received at the retina.

A related type of artificial neural network that has had considerable success is the convolutional neural network. Convolution here is just a fancy term for sliding a small patch of network connections across the entire image to find the result at all locations. These networks also typically uses many layers of neurons, and has achieved similar success in image recognition. These convolutional networks may model known processes in the visual cortices, such as simple cells that detect edges of certain orientations. Outlines in images are combined into complex sets of features and classified. An earlier learning system, known as the neocognitron, used layers of simple cell-like filters without the convolution.

The process of applying the same edge-detection filter over the whole image is similar to the parallel processing that occurs in the brain. Though the thousands of neurons functioning simultaneously has an obvious practical difference to the sequential computation performed in the hardware of a computer; however, GPUs with many processor cores now allow parallel processing in machines. If rather than using direction selective simple cells to detect edges we use image features (such as a loop in a handwritten digit, or the dark circle representing the wheel of a vehicle), we might say the convolution process is similar to scanning an image with our eyes.

Even when we humans are searching for something hidden in a scene, such as our friend Wally (or Waldo), our attention typically centres on one thing at a time. Scanning large, detailed images for Wally often takes us a long time. A computer trained to find Wally in an image using a convolutional network could methodically scan the image a lot faster than us with current hardware. It mightn’t be hard to get a computer to beat us in this challenge for many Where’s Wally images with biologically-inspired image recognition systems (rather than more common, but brittle, image processing techniques).

Even though I think these advances are great, it seems there are things missing from what we are trying to do with these computer vision systems and how we’re trying to train them. We are still throwing information at these learning systems as the disembodied number-crunching machines they are. Though consider how our visual perception abilities allow us to recognise objects in images with little regard for scale, translation, shear, rotation or even colour and illumination; these things are major hurdles for computer vision systems, but for us, they just provide us more information about the scene. These are things we learn to do. Most of the focus of computer vision seems to be related to concept of the “what pathway”, rather than the “how pathway”, of two-streams hypothesis of vision processing in the brain. Maybe researchers could start looking at ways of making these deep networks take that next step. Though extracting information from a scene, such as locating sources of illumination or the motion of objects relative to the camera, might be hard to fit into the current trends of trying to perform unsupervised learning from enormous amounts of unlabelled data.

I think there may be significant advantages to treating the learning system as embodied, and make the real-world property of object permanence something the learning system can latch onto. It’s certainly something that can provide a great deal of leverage in our own learning about objects and how our interactions influence them. It is worth mentioning that machine learning practitioners already commonly create new numerous modified training images from their given set and see measurable improvements. This is similar to what happens when a person or animal is exposed to an object and given the chance to view it from multiple angles and under different lighting conditions. Having a series of contiguous view-points is likely to more easily allow parts of our brain to learn to compensate for different perspectives that scale, shear, rotate and translate the view of objects. It may even be important to learning to predict and recreate different perspectives in our imagination.

Consciousness’s abode: Subjugate the substrate

Philosophy of mind has some interesting implications for artificial intelligence, summed up by the question: can a machine ever be “conscious”? I’ve written about this in earlier posts, but recently I’ve come across an argument of which I hadn’t considered very deeply: that substrate matters. There are lots of ways to approach this issue, but if the mind and consciousness is a product of the brain, then surely the  neuroscience perspective is a good place to start.

Investigations show that the activity of different brain regions occurs predictably during different cognitive and perceptual activities. Also there are predictable deficits that occur in people when these parts of the brain are damaged. This suggests that a mind and consciousness are a product of the matter and energy that makes up the brain. If you can tell me how classical Cartesian dualism can account for that evidence, I’m all ears. 🙂

I will proceed under the assumption that there isn’t an immaterial soul that is the source of our consciousness and directs our actions. But if we’re working under the main premise of physicalism, we still have at least one interesting phenomena to explain–“qualia“. How does something abstract and seemingly immaterial as our meaningful conscious experiences arise from our physical brain? That question isn’t going to get answered in this post (but an attempt is going to emerge in this blog).

In terms of conscious machines, we’re still confronted with the question of whether a machine is capable of a similar sort of conscious experience that we biological organisms are. Does the hardware matter? I read and commented on a blog post on Rationally Speaking, after reading a description of the belief that the “substrate” is crucial for consciousness. The substrate argument goes that even though a simulation of a neuron might behave the same as a biological neuron, since it is just a simulation, it doesn’t interact with the physical world to produce the same effect. Ergo no consciousness. Tell me if I’ve set up a straw-man here.

The author didn’t like me suggesting that we should consider the possibility of the simulation being hooked up to a machine that allowed it to perform the same physical interactions as the biological neuron (or perform photosynthesis in the original example). We’re not allowed to “sneak in” the substrate I’m told. 🙂 I disagree, I think it is perfectly legitimate to have this interaction in our thought experiment. And isn’t that what computers already do when they play sound or show images or accept keyboard input? Computers simulate sound and emission of light and interact with the physical world. It’s restricted I admit, but as technology improves there is no reason to think that simulations couldn’t be connected to machines that allow them to interact with the world as their physical equivalent would.

Other comments by readers of that Rationally Speaking post mentioned interesting points: the China brain (or nation) thought experiment, and what David Chalmers calls the “principle of organisational invariance“. The question raised by the China brain and discussed by Chalmers is: if we create the same functional organisation of people as neurons in a human brain (i.e., people communicating as though they were the neurons with the same connections) would that system be conscious? If we accept that the system behaved in the exact same way as the brain, that neurons spiking is a sufficient level of detail to capture consciousness, and the the principle of organisational invariance, the China brain should probably be considered conscious. Most people probably find that unintuitive.

If we accept that the Chinese people simulating a human brain also create a consciousness, we have a difficult question to answer; some might even call it a “hard problem“. 🙂 If consciousness is not dependent on substrate, it seems that consciousness might really be something that is abstract and immaterial. Therefore, we might be forced to choose between considering consciousness an illusion, or letting abstract things exist under our definition physicalism. [Or look for alternative explanations and holes in the argument above. :)]

Simulating stimuli and moral values

This is the fifth post in a series about rewards and values. Previously the neurological origins for pleasure and reward in biological organisms were touched on, and the evolution of pleasure and the discovery of supernormal stimuli were mentioned. This post highlights some issues surrounding happiness and pleasure as ends to be sought.

First let’s refresh: we have evolved sensations and feelings including pleasure and happiness. These feelings are designed to enhance our survival in the world in which they were developed; the prehistoric world where survival was tenuous and selection favoured the “fittest”. This process of evolving first the base feelings of pleasure, wanting and desire, that later extended to the warm social feelings of friendship, attachment and social contact, couldn’t account the facility we now have for tricking these neural systems into strong, but ‘false’, positives. Things like drugs, pornography and facebook, all can deliver large doses of pleasure from directly stimulating the brain or simulating what had been evolved to be pleasurable experiences.

So where does that get us? In the world of various forms of utilitarianism we are usually trying to maximum some value. By my understanding, in plain utilitarianism the aim is to maximise happiness (sometimes described as increasing pleasure and reducing suffering), in hedonism the aim is sensual pleasure, and in preference utilitarianism it is the satisfaction of preferences. Pleasure may once have seemed like a good pursuit, but now that we have methods of creating pleasure at the push of a button, that hardly seems like a “good” way to live – being hooked up to a machine. And if we consider that our life-long search for pleasure as an ineffective process of trying to find out how to push our biological buttons, pleasure may seem like a fairly poor yardstick for measuring “good”.

Happiness is also a mental state that people have varying degrees of success in attaining. Just because we haven’t had the same success in creating happiness “artificially” it doesn’t mean that it is a much better end to seek. Of course the difficulty of living with depression is undesirable, but if we all could become happy at the push of a button the feeling might lose some value. Even the more abstract idea of satisfying preferences might not get us much further, since many of our preferences are for avoiding suffering and attaining pleasure and happiness.

Of course in all this we might be forgetting (or ignoring the perspective) that pleasure and pain were evolved responses to inform us of how to survive. And here comes a leap:

Instead of valuing feelings we could value an important underlying result of the feelings: learning about ourselves and the world.

The general idea of valuing learning and experience might not be entirely new; Buddhism has long been about seeking enlightenment to relieve suffering and find happiness. However, considering learning and gaining experience as valuable ends, and the pleasure, pain or happiness they might arouse as additional aspects of those experiences, isn’t something I’ve seen as part of the discussion of moral values. Clearly there are causes of pleasure and suffering that cause debilitation or don’t result in any “useful” learning, e.g., drug abuse and bodily mutilation, so these should be avoided. But where would a system of ethics and morality based on valuing learning and experience take us?

This idea will be extended and fleshed out in much more detail in a new blog post series starting soon. To conclude this series on rewards and values, I’ll describe an interesting thought experiment for evaluating systems of value: what would an (essentially) omnipotent artificial intelligence do if maximising those values?

Evolving rewards and values

EvolvingRewardsThis is the fourth post in a series on rewards and values. Previous posts have discussed how rewards are internally generated in the brain and how machines might be able to learn autonomously the same way. But the problem still exists: how is the “reward function” created? In the biological case, evolution is a likely candidate. However, the increasing complexity of behaviour seen in humans seems to have led to an interesting flexibility in what are perceived as rewards and punishments. In robots a similar flexibility might be necessary.

Note: here the phrase “reward function” is used to describe the process of taking some input (e.g. the perceived environment) and calculating “reward” as output. A similar phrase and meaning is used for “value function”.

Let’s start with the question posed in the previous post: what comes first – values or rewards? The answer might be different depending on whether we are talking about machine or biological reinforcement learning. A robot or a simulated agent will usually be given a reward function by a designer. The agent will explore the environment and receive rewards and punishments, and it will learn a “value function”. So we could say that, to the agent, the rewards precede the values. At the very least, the rewards precede the learning of values. But the designer knew what the robot should be rewarded for – knew what result the agent should value. The designers had some valuable state in mind when they designed the reward function. To the designer, the value informs the reward.

How about rewards in animals and humans? The reward centres of the brain are not designed in the sense that they have a designer. Instead they are evolved. What we individually value and desire as animals and humans is largely determined by what we feel is pleasurable and what is not pleasurable. This value is translated to learned drives to perform certain actions. The process of genetic recombination and mutation, a key component of evolution, produces different animal anatomies (including digestive systems) and pleasure responses to the environment. Animals that find pleasure in eating food that is readily available and compatible with the digestive system will have a much greater chance of survival than animals that only find pleasure in eating things that are rare or poisonous. Through natural selection pleasure could be expected to converge to what is valuable to the animal.

In answer to the question: what comes first – rewards or values? – it would seem that value comes first. Of course this definition of “value” is related to the objective fact of what the animal or agent must do to achieve its goals of survival or some given purpose. But what of humans? Evolutionary psychology and evolutionary neuroscience, have reasonable sense to say that, along with brain size and structure, many human behaviours and underlying neural processes have been developed through natural selection. While hypotheses are difficult to test, people seem to have evolved to feel pleasure from socialising – driving us to make social bonds and form groups. And people seem to have evolved feelings of social discomfort – displeasure from embarrassment and being rejected. Although the circumstances that caused the selection of social behaviours isn’t clear, many of our pleasure and displeasure responses seem to be able to be rationalised in terms of evolution.

An interesting aspect of the human pleasure response is the pleasure from achievements. Olympic gold medallists certainly would be normal to feel elation at winning. But even small or common victories, such as our first unaided steps as a child or managing to catch a ball, can elicit varying amounts of pleasure and satisfaction. Is this a pleasure due to the adulation and praise of onlookers that we have been wired to enjoy? Or is there a more fundamental case of success at any self-determined goal causing pleasure? This could be related to the loss of pleasure and enjoyment that is often associated with Parkinson’s disease. Areas of the brain related to inhibiting and coordinating movement, which deteriorate as part of Parkinson’s disease, are also strongly associated with reward and pleasure generation.

And we can bring this back to an autonomous robot that generates its own reward: a robot that has multiple purposes will need to have different ways of valuing the objects and states of the environment depending on what its current goal is. When crossing the road the robot needs to avoid cars; when cleaning a car the robot might even need to enter the car. This kind of flexibility in determining what the goal is, and reward feedback that determines on one level whether the goal has been reached, and another that determines whether the goal was as good as it was thought to be, could be an important process in the development of “intelligent” robots.

However, before I conclude, let’s consider one fall out from the planetary dominance of humans – our selection pressures have nearly disappeared. We are, after all, the current dominant species on this planet. Perception-reward centres that were not evolved to deal with newly discovered and human manufactured stimuli, aren’t likely to be strongly selected against. And through our ingenuity we have found powerful ways to “game” our evolved pleasure centres – finding and manufacturing super-normal stimuli.

Dan Dennett: Cute, sweet, sexy, funny (TED talk video available on YouTube).

The video linked above features Dan Dennett describing what how evolution has influencing our feelings of what is “cute, sweet, sexy and funny”. The result is possibly the opposite of what we intuitively feel and think: there is nothing inherently cute, sweet, sexy or funny; these sensations and feelings evolved in to find value in our surroundings and each other. We have evolved to find babies and young animals cute, we have evolved to find food that is high in energy tasty, and we have evolved to find healthy members of the opposite sex attractive. Funny wasn’t explained clearly – Dan Dennett described hypothesis that it was related to helping find boring or unpleasant jobs bearable. I would speculate that humour might also have been selected for making people more socially attractive, or making them, at the very least, more bearable. 🙂

Building on this understanding of pleasure and other feelings being evolved, the topic of the next post in this series will be super-normal stimuli and how they influence our views on human values and ethics. Let’s begin the adventure into the moral minefield.

Rewards: External or internal?

This is the first post in a series on rewards and values.

The reward that would be most familiar is probably food. We often use treats to train animals, and eating is pleasurable for most people. These rewards are clearly an external thing, aren’t they? This idea is, in some ways, echoed in machine reinforcement learning, as shown in a diagram (pictured below) from the introductory book by Richard Sutton and Andrew Barto. Intuitively this makes sense. We get something from the environment that is pleasurable; the reward feels as though its origin is external. But we can, in the case of animals and people, trace reward and pleasure to internal brain locations and processes. And machines can potentially benefit from this reworking of reinforcement learning, to make explicit that the reward comes from within the agent.

Agent-environment interaction in reinforcement learning

Figure 3.1 from Sutton and Barto, 1998, Reinforcement Learning: An Introduction, MIT Press. Online: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node28.html

So let’s trace the sensations of a food “reward”. The animal smells and tastes the food; the olfactory and gustatory receptors transmit a signal to the brain that then identifies the odour and taste. A process is performed within the brain deciding whether the food was a pleasurable or unpleasant. This response is learned and causes impulses to seek or avoid the food in future.

Nothing in the food is inherently rewarding. It is the brain that processes the sensations of the food and the brain that produces reward chemicals. For a more detailed article on pleasure and reward in the brain see Berridge and Kringelbach (2008). Choosing the right food when training animals is a process of finding something that their brain responds to as a reward. Once a good treat has been found the animal knows what it wants, and training it is the processes of teaching the animal what to do to get the rewarding treat.

Agent environment interation with internal reward.

Modified Figure 3.1 from Sutton and Barto, 1998, Reinforcement Learning: An Introduction, MIT Press. Online: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node28.html

We can consider standard implementations of reinforcement learning in machines as a similar process: the machine searches the environment (or “state-space“) and if it performs the right actions to get to the right state it gets a reward. Differences are notable: the agent might not know anything about the environment, how actions move it from one state to another, or what state gives the reward. Animals, on the other hand, come with some knowledge of the environment and themselves, they have some sense of causality and sequences of events, and animals very quickly recognise treats that cause reward.

Another subtle difference is that the machine doesn’t usually know what the target or objective is; the agent performs a blind search. Reinforcement learning works by simulating the agent exploring some (usually simplified) environment until it finds a reward, and then calculating increases in value of states and actions that preceded the reward. Computers can crunch the numbers in simulation, but complexity of the environment and large numbers of available actions are the enemy. Each extra state “dimension” and action adds an exponential increase in the amount of required computation (see “curse of dimensionality“). This sounds different from an animal, that have very simple associations with objects or actions as the targets of rewards. More on this later!

An extension of the machine reinforcement learning problem is the case where the agent doesn’t know what environment state it is in. Rather than getting the environment state the agent only makes “observations” in this model, known as a “partially observable Markov decision process” or POMDP. From these observations the agent can infer the state and predict the action that should be taken, but the agent typically has reduced certainty. Nevertheless, the rewards it receives are still a function of the true state and action. The agent is not generating rewards from its observations, but receiving them from some genie (the trainer or experimenter) that knows the state and gives it the reward. This is a disconnect between what the agent actually senses (the observations) and the rewards that is relevant for autonomous agents including robots.

These implementations of reinforcement learning mimic the training of an animal with treats, where the whole animal is an agent and the trainer is part of the environment that gives rewards. But it doesn’t seem a good model of reward originating in the internal brain processes. Without sensing the food the brain wouldn’t know that it had just been rewarded—it could be argued that brain (and hence the agent) wasn’t rewarded. How much uncertainty in sensations can there be before the brain doesn’t recognise that it has been rewarded? In a computer, where the environment and the agent are all simulated, the distinction between reward coming from the environment or self-generated in the agent may not matter. But in an autonomous robot, where no trainer is giving it rewards, it must sense the environment and decide only from its own observations whether it should be rewarded.

The implementation of reinforcement learning for autonomous agents and robots will be a topic of a later post. Next post, however, I will cover the problem of machines “observing” the world. How do we representing the world as “states” and the robot capabilities as “actions”? I will discuss how animals appear to solve the problem and recent advances in reinforcement learning.

Rewards and values: Introduction

Reward functions are a fundamental part of reinforcement learning for machines. Based partly on Pavlovian, or classical conditioning, exemplified by the pairing of ringing a bell (conditioned stimulus) with the presentation of food (unconditioned stimulus) to a dog repeatedly, resulting in the ringing of the bell alone to cause the dog to salivate (conditioned response).

More recently, developments in reinforcement learning, particularly temporal difference learning, have been compared to the function of reward learning parts of the brain. Pathologies of these reward producing parts of the brain, particularly Parkinson’s disease and Huntington’s disease, show the importance of the reward neurotransmitter dopamine in brain functions for controlling movement and impulses, as well as seeking pleasure.

The purpose and function of these reward centres in the basal ganglia of the brain, could have important implications in way in which we apply reinforcement learning. Especially in autonomous agents and robots. An understanding of the purpose of rewards, and their impact on the development of values in machines and people, also has some interesting philosophical implications that will be discussed

This post introduces what may become a spiral of related posts on concepts of rewards and values covering:

Hopefully this narrowing of post topics results in giving me focus to write and some interesting discourse on the each of the themes of this blog. Suggestions and comments are welcome!

Artificial Intelligence: That’s the myth

AIMythThe holy grail of artificial intelligence is the creation of artificial “general” intelligence. That is, an artificial intelligence that is capable of every sort of perceptual and cognitive function that humans are and more. But despite great optimism in the early days of artificial intelligence research, this has turned out to be a very difficult thing to create. It’s unlikely that there is a “silver bullet”, some single algorithm, that will solve the problem of artificial general intelligence. And an important reason why, is that the human brain, which gives us our intelligence, is actually a massive collection of layers and modules that perform specialised processes.

The squiggly stuff on the outside of the brain, the neocortex, does a lot of the perceptual processing. The neocortex sits on a lot of “white matter” that connects it to the inner brain structures.  Different parts of the inner brain perform important processes like give us emotions, pleasure, hold memories, and form the centre of many “neural circuits”. Even though the structure of the neocortex is quite similar in all areas over the brain, it can be pretty neatly divided up into different sections that perform specific functions like: allow us to see movement, recognising objects and faces, provide conscious control and planning of body movements, and modulating our impulses.

Until we see an example of an intelligent brain or machine that works differently, we should probably admit that replicating the processes, if not the structure, of the human brain is what is most likely to produce artificial general intelligence. I’ll be making posts that discuss specifically some different approaches to artificial intelligence. These posts will mostly be on the high-level concepts of the algorithms and their relationship to “intelligence”. Hopefully these posts will be generally accessible and still interesting to the technically minded. I think there is benefit in grasping important concepts that underlie human intelligence that could direct the creation of intelligent machines.

If people are still looking for that silver bullet algorithm, they should probably be looking for an algorithm that can either create, or be generally applied to, each of these brain processes. If you know of someone that has done this, or has rational grounds for disagreeing that this is necessary, let me know. Then I can stop spreading misinformation or incorrect opinion. 🙂

To conclude with some philosophical questions, if we are successful in reproducing a complete human intelligence (and mind) on a computer, some interesting issues are raised. Is an accurate simulation of a human mind on a computer that different from the “simulation” of the human mind in our brains? And how “artificial” is this computer-based intelligence?

These questions might seem nonsensical if you happen to think that human intelligence and the mind are unassailable by computer software and hardware. Or if you think that the mind is really the soul, separate from the body. First of all, if you believe the latter, I’m surprised you’re reading this (unless you were tricked by the title :)). If you read later posts, I hope to discuss some evidence against both of these points of view in future posts, and I welcome rational counter-arguments.