Simulating stimuli and moral values

This is the fifth post in a series about rewards and values. Previously the neurological origins for pleasure and reward in biological organisms were touched on, and the evolution of pleasure and the discovery of supernormal stimuli were mentioned. This post highlights some issues surrounding happiness and pleasure as ends to be sought.

First let’s refresh: we have evolved sensations and feelings including pleasure and happiness. These feelings are designed to enhance our survival in the world in which they were developed; the prehistoric world where survival was tenuous and selection favoured the “fittest”. This process of evolving first the base feelings of pleasure, wanting and desire, that later extended to the warm social feelings of friendship, attachment and social contact, couldn’t account the facility we now have for tricking these neural systems into strong, but ‘false’, positives. Things like drugs, pornography and facebook, all can deliver large doses of pleasure from directly stimulating the brain or simulating what had been evolved to be pleasurable experiences.

So where does that get us? In the world of various forms of utilitarianism we are usually trying to maximum some value. By my understanding, in plain utilitarianism the aim is to maximise happiness (sometimes described as increasing pleasure and reducing suffering), in hedonism the aim is sensual pleasure, and in preference utilitarianism it is the satisfaction of preferences. Pleasure may once have seemed like a good pursuit, but now that we have methods of creating pleasure at the push of a button, that hardly seems like a “good” way to live – being hooked up to a machine. And if we consider that our life-long search for pleasure as an ineffective process of trying to find out how to push our biological buttons, pleasure may seem like a fairly poor yardstick for measuring “good”.

Happiness is also a mental state that people have varying degrees of success in attaining. Just because we haven’t had the same success in creating happiness “artificially” it doesn’t mean that it is a much better end to seek. Of course the difficulty of living with depression is undesirable, but if we all could become happy at the push of a button the feeling might lose some value. Even the more abstract idea of satisfying preferences might not get us much further, since many of our preferences are for avoiding suffering and attaining pleasure and happiness.

Of course in all this we might be forgetting (or ignoring the perspective) that pleasure and pain were evolved responses to inform us of how to survive. And here comes a leap:

Instead of valuing feelings we could value an important underlying result of the feelings: learning about ourselves and the world.

The general idea of valuing learning and experience might not be entirely new; Buddhism has long been about seeking enlightenment to relieve suffering and find happiness. However, considering learning and gaining experience as valuable ends, and the pleasure, pain or happiness they might arouse as additional aspects of those experiences, isn’t something I’ve seen as part of the discussion of moral values. Clearly there are causes of pleasure and suffering that cause debilitation or don’t result in any “useful” learning, e.g., drug abuse and bodily mutilation, so these should be avoided. But where would a system of ethics and morality based on valuing learning and experience take us?

This idea will be extended and fleshed out in much more detail in a new blog post series starting soon. To conclude this series on rewards and values, I’ll describe an interesting thought experiment for evaluating systems of value: what would an (essentially) omnipotent artificial intelligence do if maximising those values?

Evolving rewards and values

EvolvingRewardsThis is the fourth post in a series on rewards and values. Previous posts have discussed how rewards are internally generated in the brain and how machines might be able to learn autonomously the same way. But the problem still exists: how is the “reward function” created? In the biological case, evolution is a likely candidate. However, the increasing complexity of behaviour seen in humans seems to have led to an interesting flexibility in what are perceived as rewards and punishments. In robots a similar flexibility might be necessary.

Note: here the phrase “reward function” is used to describe the process of taking some input (e.g. the perceived environment) and calculating “reward” as output. A similar phrase and meaning is used for “value function”.

Let’s start with the question posed in the previous post: what comes first – values or rewards? The answer might be different depending on whether we are talking about machine or biological reinforcement learning. A robot or a simulated agent will usually be given a reward function by a designer. The agent will explore the environment and receive rewards and punishments, and it will learn a “value function”. So we could say that, to the agent, the rewards precede the values. At the very least, the rewards precede the learning of values. But the designer knew what the robot should be rewarded for – knew what result the agent should value. The designers had some valuable state in mind when they designed the reward function. To the designer, the value informs the reward.

How about rewards in animals and humans? The reward centres of the brain are not designed in the sense that they have a designer. Instead they are evolved. What we individually value and desire as animals and humans is largely determined by what we feel is pleasurable and what is not pleasurable. This value is translated to learned drives to perform certain actions. The process of genetic recombination and mutation, a key component of evolution, produces different animal anatomies (including digestive systems) and pleasure responses to the environment. Animals that find pleasure in eating food that is readily available and compatible with the digestive system will have a much greater chance of survival than animals that only find pleasure in eating things that are rare or poisonous. Through natural selection pleasure could be expected to converge to what is valuable to the animal.

In answer to the question: what comes first – rewards or values? – it would seem that value comes first. Of course this definition of “value” is related to the objective fact of what the animal or agent must do to achieve its goals of survival or some given purpose. But what of humans? Evolutionary psychology and evolutionary neuroscience, have reasonable sense to say that, along with brain size and structure, many human behaviours and underlying neural processes have been developed through natural selection. While hypotheses are difficult to test, people seem to have evolved to feel pleasure from socialising – driving us to make social bonds and form groups. And people seem to have evolved feelings of social discomfort – displeasure from embarrassment and being rejected. Although the circumstances that caused the selection of social behaviours isn’t clear, many of our pleasure and displeasure responses seem to be able to be rationalised in terms of evolution.

An interesting aspect of the human pleasure response is the pleasure from achievements. Olympic gold medallists certainly would be normal to feel elation at winning. But even small or common victories, such as our first unaided steps as a child or managing to catch a ball, can elicit varying amounts of pleasure and satisfaction. Is this a pleasure due to the adulation and praise of onlookers that we have been wired to enjoy? Or is there a more fundamental case of success at any self-determined goal causing pleasure? This could be related to the loss of pleasure and enjoyment that is often associated with Parkinson’s disease. Areas of the brain related to inhibiting and coordinating movement, which deteriorate as part of Parkinson’s disease, are also strongly associated with reward and pleasure generation.

And we can bring this back to an autonomous robot that generates its own reward: a robot that has multiple purposes will need to have different ways of valuing the objects and states of the environment depending on what its current goal is. When crossing the road the robot needs to avoid cars; when cleaning a car the robot might even need to enter the car. This kind of flexibility in determining what the goal is, and reward feedback that determines on one level whether the goal has been reached, and another that determines whether the goal was as good as it was thought to be, could be an important process in the development of “intelligent” robots.

However, before I conclude, let’s consider one fall out from the planetary dominance of humans – our selection pressures have nearly disappeared. We are, after all, the current dominant species on this planet. Perception-reward centres that were not evolved to deal with newly discovered and human manufactured stimuli, aren’t likely to be strongly selected against. And through our ingenuity we have found powerful ways to “game” our evolved pleasure centres – finding and manufacturing super-normal stimuli.

Dan Dennett: Cute, sweet, sexy, funny (TED talk video available on YouTube).

The video linked above features Dan Dennett describing what how evolution has influencing our feelings of what is “cute, sweet, sexy and funny”. The result is possibly the opposite of what we intuitively feel and think: there is nothing inherently cute, sweet, sexy or funny; these sensations and feelings evolved in to find value in our surroundings and each other. We have evolved to find babies and young animals cute, we have evolved to find food that is high in energy tasty, and we have evolved to find healthy members of the opposite sex attractive. Funny wasn’t explained clearly – Dan Dennett described hypothesis that it was related to helping find boring or unpleasant jobs bearable. I would speculate that humour might also have been selected for making people more socially attractive, or making them, at the very least, more bearable. 🙂

Building on this understanding of pleasure and other feelings being evolved, the topic of the next post in this series will be super-normal stimuli and how they influence our views on human values and ethics. Let’s begin the adventure into the moral minefield.