I spy with my computer vision eye… Wally? Waldo?

Lately I’ve been devoting a bit of my attention to image processing and computer vision. It’s interesting to see so many varied processes applied to the problem over the last 50 or so years, especially when computer vision was once thought to be solvable in a single summer’s work. We humans perceive things with such apparent ease, it was probably thought that it would be a much simpler problem than playing chess. Now, after decades of focused attention, the attempts that appear most successful at image recognition of handwritten digits, street signs, toys, or even thousands of real-world images, are those that, in some way, model the networks of connections and processes of the brain.

You may have heard about the Google learning system that learned to recognise the faces of cats and people from YouTube videos. This is part of a revolution in artificial neural networks known as deep learning. Among deep learning architectures are ones that use many units that activate stochastically and clever learning rules (e.g., stochastic gradient descent and contrastive divergence). The networks can be trained to perform image classification to state-of-the-art levels of accuracy. Perhaps another interesting thing about these developments, a number of which have come from Geoffrey Hinton and his associates, is that some of them are “generative”. That is, while learning to classify images, these networks can be “turned around” or “unfolded” to create images, compress and cluster images, or perform image completion. This has obvious parallels to the human ability to imagine scenes, and the current understanding of the mammalian primary visual cortex that appears to essentially recreate images received at the retina.

A related type of artificial neural network that has had considerable success is the convolutional neural network. Convolution here is just a fancy term for sliding a small patch of network connections across the entire image to find the result at all locations. These networks also typically uses many layers of neurons, and has achieved similar success in image recognition. These convolutional networks may model known processes in the visual cortices, such as simple cells that detect edges of certain orientations. Outlines in images are combined into complex sets of features and classified. An earlier learning system, known as the neocognitron, used layers of simple cell-like filters without the convolution.

The process of applying the same edge-detection filter over the whole image is similar to the parallel processing that occurs in the brain. Though the thousands of neurons functioning simultaneously has an obvious practical difference to the sequential computation performed in the hardware of a computer; however, GPUs with many processor cores now allow parallel processing in machines. If rather than using direction selective simple cells to detect edges we use image features (such as a loop in a handwritten digit, or the dark circle representing the wheel of a vehicle), we might say the convolution process is similar to scanning an image with our eyes.

Even when we humans are searching for something hidden in a scene, such as our friend Wally (or Waldo), our attention typically centres on one thing at a time. Scanning large, detailed images for Wally often takes us a long time. A computer trained to find Wally in an image using a convolutional network could methodically scan the image a lot faster than us with current hardware. It mightn’t be hard to get a computer to beat us in this challenge for many Where’s Wally images with biologically-inspired image recognition systems (rather than more common, but brittle, image processing techniques).

Even though I think these advances are great, it seems there are things missing from what we are trying to do with these computer vision systems and how we’re trying to train them. We are still throwing information at these learning systems as the disembodied number-crunching machines they are. Though consider how our visual perception abilities allow us to recognise objects in images with little regard for scale, translation, shear, rotation or even colour and illumination; these things are major hurdles for computer vision systems, but for us, they just provide us more information about the scene. These are things we learn to do. Most of the focus of computer vision seems to be related to concept of the “what pathway”, rather than the “how pathway”, of two-streams hypothesis of vision processing in the brain. Maybe researchers could start looking at ways of making these deep networks take that next step. Though extracting information from a scene, such as locating sources of illumination or the motion of objects relative to the camera, might be hard to fit into the current trends of trying to perform unsupervised learning from enormous amounts of unlabelled data.

I think there may be significant advantages to treating the learning system as embodied, and make the real-world property of object permanence something the learning system can latch onto. It’s certainly something that can provide a great deal of leverage in our own learning about objects and how our interactions influence them. It is worth mentioning that machine learning practitioners already commonly create new numerous modified training images from their given set and see measurable improvements. This is similar to what happens when a person or animal is exposed to an object and given the chance to view it from multiple angles and under different lighting conditions. Having a series of contiguous view-points is likely to more easily allow parts of our brain to learn to compensate for different perspectives that scale, shear, rotate and translate the view of objects. It may even be important to learning to predict and recreate different perspectives in our imagination.

Advertisements

Fiona preview: Artificial minds “sparking together”

This post previews the new online autonomous avatar service and community, Fiona, that’s currently in Beta, and describe some of my thoughts regarding the concept. I don’t have access to the beta, but there are some interesting videos explaining the how the community may work and how the online avatars are currently developed and constructed. If you are interested, have a look at this video advertisement below, by the creators of Fiona:

 

In summary, the idea appears to be: to create a service in which you can create or buy an online avatar (or artificial mind?) to use on websites for customer service and interaction. This service is in conjunction with an online community for people to use and develop the avatars, composed of a collection of “sparks”–what appear to be units for perceptual or cognitive processes. Below is a video demonstration that has been released on how to create an avatar by connecting sparks in the online graphical interface. For people that have some experience with visual programming languages (e.g. MathWorks’ Simulink and LabView) there are some obvious visual similarities.

 

First of all, for the development of artificial intelligence, an online community could be a good thing for artificial intelligence in general. Since the community is pitched as also being a market for people to develop and sell their spark-programs, that could be an attractive incentive to participating. This sounds like it has the potential to generate interest in artificial intelligence, and provide a viable service for people and businesses that would like to have an interactive avatar on their website.

A more in depth review might be in order once Fiona is out of beta. It will be particularly interesting to see how the visual programming language and underlying framework translate into creating “artificial minds”. Will Fiona be flexible enough to implement any existing cognitive architectures? And will it detract from or be beneficial to other open source projects for artificial general intelligence development, such as OpenCog? Only time will tell.

Consciousness’s abode: Subjugate the substrate

Philosophy of mind has some interesting implications for artificial intelligence, summed up by the question: can a machine ever be “conscious”? I’ve written about this in earlier posts, but recently I’ve come across an argument of which I hadn’t considered very deeply: that substrate matters. There are lots of ways to approach this issue, but if the mind and consciousness is a product of the brain, then surely the  neuroscience perspective is a good place to start.

Investigations show that the activity of different brain regions occurs predictably during different cognitive and perceptual activities. Also there are predictable deficits that occur in people when these parts of the brain are damaged. This suggests that a mind and consciousness are a product of the matter and energy that makes up the brain. If you can tell me how classical Cartesian dualism can account for that evidence, I’m all ears. 🙂

I will proceed under the assumption that there isn’t an immaterial soul that is the source of our consciousness and directs our actions. But if we’re working under the main premise of physicalism, we still have at least one interesting phenomena to explain–“qualia“. How does something abstract and seemingly immaterial as our meaningful conscious experiences arise from our physical brain? That question isn’t going to get answered in this post (but an attempt is going to emerge in this blog).

In terms of conscious machines, we’re still confronted with the question of whether a machine is capable of a similar sort of conscious experience that we biological organisms are. Does the hardware matter? I read and commented on a blog post on Rationally Speaking, after reading a description of the belief that the “substrate” is crucial for consciousness. The substrate argument goes that even though a simulation of a neuron might behave the same as a biological neuron, since it is just a simulation, it doesn’t interact with the physical world to produce the same effect. Ergo no consciousness. Tell me if I’ve set up a straw-man here.

The author didn’t like me suggesting that we should consider the possibility of the simulation being hooked up to a machine that allowed it to perform the same physical interactions as the biological neuron (or perform photosynthesis in the original example). We’re not allowed to “sneak in” the substrate I’m told. 🙂 I disagree, I think it is perfectly legitimate to have this interaction in our thought experiment. And isn’t that what computers already do when they play sound or show images or accept keyboard input? Computers simulate sound and emission of light and interact with the physical world. It’s restricted I admit, but as technology improves there is no reason to think that simulations couldn’t be connected to machines that allow them to interact with the world as their physical equivalent would.

Other comments by readers of that Rationally Speaking post mentioned interesting points: the China brain (or nation) thought experiment, and what David Chalmers calls the “principle of organisational invariance“. The question raised by the China brain and discussed by Chalmers is: if we create the same functional organisation of people as neurons in a human brain (i.e., people communicating as though they were the neurons with the same connections) would that system be conscious? If we accept that the system behaved in the exact same way as the brain, that neurons spiking is a sufficient level of detail to capture consciousness, and the the principle of organisational invariance, the China brain should probably be considered conscious. Most people probably find that unintuitive.

If we accept that the Chinese people simulating a human brain also create a consciousness, we have a difficult question to answer; some might even call it a “hard problem“. 🙂 If consciousness is not dependent on substrate, it seems that consciousness might really be something that is abstract and immaterial. Therefore, we might be forced to choose between considering consciousness an illusion, or letting abstract things exist under our definition physicalism. [Or look for alternative explanations and holes in the argument above. :)]

Values and Artificial Super-Intelligence

Sorry, robot. Once the humans are gone, bringing them back will be hard.This is the sixth and final post on the current series on rewards and values. The topic discussed is the assessment of values as they could be applied to an artificial super-intelligence; what might be the final outcome and how might this help us choose “scalable” moral values.

First of all we should get acquainted with the notion of the technological singularity. One version of the idea goes: should we develop an artificial general intelligence that is capable of making itself more intelligent, it could do so repeatedly at accelerating speed. Before long the machine is vastly more intelligent and powerful than any person or organisation and essentially achieves god-like power. This version of the technological singularity appears to be far from a mainstream belief in the scientific community; however, anyone that believes consciousness and intelligence is solely the result of physical processes in the brain and body could rationally believe that this process could be simulated in a computer. It could logically follow that such a simulated intelligence could go about acquiring more computing resources to scale up some aspects of its intelligence and try to improve upon, and add to, the structure underlying its intelligence.

Many people that believe the that this technological singularity will occur are concerned that this AI could eliminate the human race, and potentially all life on Earth, for no more reason than we happen to be in the way of it achieving some goal. A whole non-profit organisation is devoted to trying to negate this risk. These people might be right in saying we can’t predict the actions of a super-intelligent machine – with the ability to choose what it would do, predicting its actions could require the same or a greater amount of intelligence. But the assumption usually goes that the machine will have some value-function that it will be trying to operate under and achieve the maximum value possible. This has been accompanied by interesting twists in how some people define a “mind” and there not being an obvious definition of “intelligence”. Nonetheless this concern has apparently caused some people at least one researcher being threatened with death. (People do crazy things in the name of their beliefs.)

A favourite metaphor for an unwanted result is the “paperclip maximiser“: a powerful machine devoted to turning all the material of the universe into paperclips. The machine may have wanted to increase the “order” of the universe and thought paperclips were especially useful to that end, and settled on turning everything into paperclips. Other values could result in equally undesirable outcomes, the same article describes another scenario where an alternative of utilitarianism might have the machine maximising smiles by turning the world and everything else into smiley faces. This is a rather unusual step for an “intelligent” machine; somehow the machine skipped any theory of mind and went straight to equating happiness with smiley faces. Nonetheless, other ideas of what we should value might not fare much better. It makes some sense to replace the haphazard way we seek pleasure with electrodes in our brains, if pleasure is our end goal. By some methods of calculation, suffering could best be minimised by euthanising all life. Of course, throughout this blog series I’ve been painting rewards and values (including their proxies pleasure and pain)  not as ends, but feedback (or feelings) we’ve evolved for the sake of learning how to survive.

If we consider the thesis of Sam Harris, that there is a “moral landscape“, then choosing a system of morals and values is a matter of optimisation. Sam Harris thinks we should be maximising the well-being of conscious creatures. And this belief of well-being as an optimisable quantity could lead us to consider morality as an engineering problem. Well-being, however, might be a little bit too vague for a machine to develop a function for calculating the value of all actions. Our intuitive human system of making approximate mental calculations of the moral value of actions might be very difficult for a computer to reproduce without simulating a human mind. And humans are notoriously irregular in their beliefs of what is moral behaviour.

In the last post I raised the idea of valuing what pleasure and pain teach us about ourselves and the world. This could be generalised to valuing all learning and information – valuing taking part in and learning the range of human experience such as music, sights, food, personal relationships, as well as learning about and making new scientific observations and discovering the physical laws of the universe. Furthermore, the physical embodiment of information within the universe as structures of matter and energy such as living organisms could also lead us to consider all life as inherently valuable too. Now this raises plenty of questions. What actually counts as information and how would we measure it? Can there really be any inherent value in information? Can we really say that all life and some non-living structures are the embodiment of information? How might valuing information and learning in as ends in themselves suggest we should live? What would an artificial super-intelligence do under this system of valuing information? Questions such as these could be fertile grounds for discussion. In starting a new series of blog posts I hope to explore the ideas and hopefully receive some feedback from anyone who reads this blog.

And thus ends this first blog series on rewards and values. A range of related topics were covered: the origin of felt rewards being within the brain, the representation of the world as an important aspect of associating values, the self-rewarding capabilities that might benefit autonomous robots, the likely evolutionary origin of rewards in biological organisms, and the development of morality and ethics as a process of maximising that which is valued. The ideas in some of these posts may not have been particularly rigorously argued or cited, so everything written should be taken with a grain of salt. Corrections and suggestions are most certainly welcome! I hope you will join me exploring more ideas and taking a few more mental leaps in future.

Self-rewarding autonomous machines

MarsPerceptionThis is the third post in a series about rewards and values. I’ve written about rewards being generated from within the brain, rather than actually being external, and how autonomous robots must be able to calculate their own rewards in a similar way. I’ve also written about a way in which the world is represented for robots – the “state space” – and the benefits of using object-oriented representations. Combining these two things will be the order of this post: a robot that perceives the world as objects and that rewards itself. But how exactly should this be done and what is the use of learning this way?

First, let’s consider another feature of reinforcement learning. I’ve described a state space, but I didn’t go into “state-action values”. In a world that is divided up into states, and each state has a range of possible actions. The value (state-action value) of any of those actions in that state is the reward we can expect from performing it. Consider the problem of balancing an inverted pendulum: we only get a negative reward (punishment) if the pendulum tips too far to the left or right, or if we reach the limit of movement left or right. If we are in a state of the pendulum tipping to the left, moving in the opposite direction will speed the descent and the punishment. The action with more value is the one that keeps the pendulum balanced. It would be common for the angles and the positions of the base of the inverted pendulum to be partitioned into increments. Smaller partitions gives finer control, but also means that more time needs to be spent to explore and calculate the value of actions in each of those partitions. This is an inverted pendulum version of the “gridworld”.

Now consider a robot that sees the scene in front of it. The robot has inbuilt perceptual algorithms and detects objects in the environment. Each of these detected objects affords a set of complex actions – moving towards, moving away, moving around, inspecting, etc. Again these actions might have values, but how does the robot determine the object-action value, and how does it decide what reward it should receive? Remember that, in an autonomous agent, rewards are internal responses to sensory feedback.  Therefore the rewards the autonomous agent receives is what it gives itself, and must come from its own sensory feedback. The reward could come from visually detecting the object, or from any other sensor information detected from the object. An internal “reward function”, objective, or external instruction determines whether the sensory and perceptual feedback is rewarding.

Now the reward is no longer directly telling the robot what path it should be taking or how it should be controlling its motors, the reward is telling it what it should be seeking or doing at a higher level of abstraction. The act of finding an object may be rewarding, otherwise the reward may come from interacting with the object and detecting its properties. Imagine a robot on Mars, inspecting rocks, but only knowing whether it found the chemicals it was looking for in the rock after probing it with tools. Visually detecting a rock and then detecting the chemicals the robot is looking associates the reward and value with visual qualities of the rock – seeking more rocks with that appearance. If the objective of the robot is to complete a movement, such as grasping a cup, the inclusion of self-perception allows the robot to monitor and detect its success. With detailed sensory feedback, telling the robot where it is in relation to its target, feedback controllers can be used – typically a much more efficient and generalised process for controlling movements than the random search for a trajectory using gridworld-style reinforcement learning.

So if this is such a great way of applying reinforcement learning in robots, why aren’t we? The simple matter is that our current algorithms and processes for perception and perceptual learning just aren’t good enough to recognise objects robustly. So what good is this whole idea if we can’t visually recognise objects? Planning out how to create a robot with robust autonomy, and that is capable of learning, can point us in a direction to focus our efforts in research and development. Perception is still a major stumbling block in the application of autonomous robotics. Biological clues and recent successes suggest that deep convolutional neural networks might be the way to go, but new faster ways of creating and training them are probably necessary. Multiple sensor modalities and active interaction and learning will likely also be important. Once we have more powerful perceptual abilities in robots they can do more effective monitoring of their own motion and adapt their feedback controllers to produce successful movements. With success being determined by the application of reinforcement learning, the learning cycle can be self-adapting and autonomous.

More can still be said of the development of the reward function that decides what sensory states and percepts are rewarding (pleasurable) and what are punishing (painful). The next post will speculate on the biological evolution of rewards and values – which comes first? – and how this might have relevance to a robot deciding what sensory states it should find rewarding.

State space: Quantities and qualities

GridWorldThis is the second post in a series on rewards and values, the previous post discussed whether rewards are external stimuli or internal brain activity. This post discusses the important issue of representing the world in a computer or robot, and the practice of describing the world as discrete quantities or abstract qualities.

The world we live in can usually be described as being in a particular state, e.g., the sky is cloudy, the door is closed, the car is travelling at 42km/h. To be absolutely precise about the state of the world we need to use quantities and measurements: position, weight, number, volume, and so on. But how often do we people know the precise quantitative state of the world we live in? Animals and people often get by without quantifying the exact conditions of the world around them, instead perceiving the qualities of the world – recognising and categorising things, and making relative judgements about position, weight, speed, etc. But then why are robots often programmed to use quantitative descriptions of the world rather than qualitative descriptions? This is a complex issue that won’t be comprehensively answered in this post, but some differences in computer-based representation with quantities and with qualities will be described.

For robots, the world can be represented as a state space. When dealing with measurable quantities the state space often divides up the world into partitions. A classic example in reinforcement learning is navigating a “gridworld“. In the gridworld, the environment the agent finds itself is literally a square grid, and the agent can only move in the four compass directions (north, south, east and west). In the computer these actions and states would usually represented as numbers: state 1, state 2, …, state n, and action 1, action 2, …, action m. The “curse of dimesionality” appears because to store the value of every state-action pair the number of states multiplied by the number of actions. If we add another dimension to the environment with another k possible values, our number of states is multiplied by k. A ten by ten grid, with another dimension of 10 values goes from having 100 states to 1000 states. With four different movements available the agent has 4 actions, so there would be 4000 state-action pairs.

While this highlights one serious problem of representing the world quantitatively, an equally serious problem is deciding how fine should our quantity divisions be? If the agent is a mobile robot driving around a laboratory with only square obstacles, we could probably get by dividing the world up into 50cm x 50cm squares. But if the robot was required to pick something up from a tabletop, it might need to has accuracy down to the centimetre. If it drives around the lab as well as picking things up from tabletops, dividing up the world gets trickier. The grid that makes up the world can’t just describe occupancy, areas of the grid occupied by objects of interest need to be specified as those objects, adding more state dimensions to the representation.

When we people make a choice to do something, like walk to the door, we don’t typically update that choice each time we move 50cm. We collapse all the steps along the way into a single action. Hierarchical reinforcement learning does just this, with algorithms coming under this banner collecting low level actions into high level actions, hierarchically. One popular framework collects actions into “options”, a method of selecting actions (e.g., ‘go north’ 100 times) and evaluating end-conditions (e.g., hit a wall or run out of time) that allow for a reduction in the number of times an agent needs to make a choice (e.g., choose ‘go north’ 100 times) to see how things pan out. This simplifies the process of choosing actions that the agent performs, but it doesn’t simplify the representation of the environment.

When we look around we see the objects – right now you are looking at some sort of computer screen – we also see objects that make up any room we’re in: the door, the floor, the walls, tables and chairs. In our minds, we would seem to represent the world around us as a combined visual and spatial collection of objects. Describing the things in the world as the objects they are in the minds of people allows our “unit of representation” to be any size, and can dramatically simplify the way the world is described. And that is what is happening in more recent developments in machine learning, specifically with relational reinforcement learning and object-oriented reinforcement learning.

In relational reinforcement learning, things in the world are described by their relationship to other things. For example, the coffee cup is on the table and the coffee is in the coffee cup. These relations can usually be described using simple logic statements. Similar to relational abstraction of the world, object-oriented reinforcement learning allows objects to have properties and have associated actions, much like classes in object-oriented programming. Given that object-oriented programming was designed partly because it was related to how we people describe the world, viewing the world as objects has a lot of conceptual benefits. The agent considers the state of objects and learns the effects of actions with those objects. In the case of a robot, we reduce the problem of having large non-meaningful state spaces, but then run into the challenge of recognising objects – a serious hurdled in the world of robotics that isn’t yet solved.

A historical reason for ‘why were quantitative divisions for state space used in the first place?’ is because some problems, such as balancing a broom or gathering momentum to get up a slope, were designed to use as little prior information and as little sensory feedback as possible. This challenge turned into how to get a system to efficiently learn to solve these problems when having to blindly search for a reward or avoid a punishment. Generally speaking, many of the tasks requiring the discrete division of a continuous range are ones that involve some sort of motor control. The same sort of tasks that people perform using vision and touch to provide much more detailed feedback than plain success or failure. The same sort of tasks that we couldn’t feel our success or failure unless we could sense what was happening and had hard-wired responses or some goal in mind (or had someone watching to give feedback). This might mean that reinforcement learning is really the wrong tool for learning low-level motor control, unless that is, we don’t care to give our robots eyes.

This leads me to the topic of the next post in this series on rewards and values: “Self-rewarding autonomous machines“. I’ll discuss how a completely autonomous machine will need to have perceptual capabilities of detecting “good” and “bad” events and reward themselves. I’ll also discuss how viewing the world as “objects” that drive actions will lead to a natural analogy with how animals and people function in the world.

Artificial Intelligence: That’s the myth

AIMythThe holy grail of artificial intelligence is the creation of artificial “general” intelligence. That is, an artificial intelligence that is capable of every sort of perceptual and cognitive function that humans are and more. But despite great optimism in the early days of artificial intelligence research, this has turned out to be a very difficult thing to create. It’s unlikely that there is a “silver bullet”, some single algorithm, that will solve the problem of artificial general intelligence. And an important reason why, is that the human brain, which gives us our intelligence, is actually a massive collection of layers and modules that perform specialised processes.

The squiggly stuff on the outside of the brain, the neocortex, does a lot of the perceptual processing. The neocortex sits on a lot of “white matter” that connects it to the inner brain structures.  Different parts of the inner brain perform important processes like give us emotions, pleasure, hold memories, and form the centre of many “neural circuits”. Even though the structure of the neocortex is quite similar in all areas over the brain, it can be pretty neatly divided up into different sections that perform specific functions like: allow us to see movement, recognising objects and faces, provide conscious control and planning of body movements, and modulating our impulses.

Until we see an example of an intelligent brain or machine that works differently, we should probably admit that replicating the processes, if not the structure, of the human brain is what is most likely to produce artificial general intelligence. I’ll be making posts that discuss specifically some different approaches to artificial intelligence. These posts will mostly be on the high-level concepts of the algorithms and their relationship to “intelligence”. Hopefully these posts will be generally accessible and still interesting to the technically minded. I think there is benefit in grasping important concepts that underlie human intelligence that could direct the creation of intelligent machines.

If people are still looking for that silver bullet algorithm, they should probably be looking for an algorithm that can either create, or be generally applied to, each of these brain processes. If you know of someone that has done this, or has rational grounds for disagreeing that this is necessary, let me know. Then I can stop spreading misinformation or incorrect opinion. 🙂

To conclude with some philosophical questions, if we are successful in reproducing a complete human intelligence (and mind) on a computer, some interesting issues are raised. Is an accurate simulation of a human mind on a computer that different from the “simulation” of the human mind in our brains? And how “artificial” is this computer-based intelligence?

These questions might seem nonsensical if you happen to think that human intelligence and the mind are unassailable by computer software and hardware. Or if you think that the mind is really the soul, separate from the body. First of all, if you believe the latter, I’m surprised you’re reading this (unless you were tricked by the title :)). If you read later posts, I hope to discuss some evidence against both of these points of view in future posts, and I welcome rational counter-arguments.