WO2008136737A1

WO2008136737A1 - Self learning robot

Info

Publication number: WO2008136737A1
Application number: PCT/SE2008/000319
Authority: WO
Inventors: Johan Tofelt; Peter Nordin; Pär BUSCHKA
Original assignee: Institute Of Robotics In Scandinavia Ab
Priority date: 2007-05-08
Filing date: 2008-05-08
Publication date: 2008-11-13

Abstract

The present invention relates to a robot using a learning control architecture comprising three layers: a reasoning layer, a reactive layer, and a modelling layer. The reasoning layer develops strategies from given commands and measured sensor/actuators signals, the reactive layer develop control commands from strategies and from sensor/actuator signals, and the modelling layer is used by both the reactive and reasoning layer to build a physical model of the world around the robot.

Description

SELF LEARNING ROBOT

TECHNICAL FIELD

The present invention relates to a robot and a solution for controlling a robot and in particular to a solution and control architecture used in robotics for providing a learning robot system.

BACKGROUND OF THE INVENTION

The present solutions for robots today involve very simple tasks pre programmed in the memory and simple routines for avoiding colliding with objects in the way of the direction of travel of the robot. For instance lawn mowers and cleaning robots behave in this manner. This collision avoidance and simplicity of the tasks still make the robots quite slow and they can only perform one or a few tasks within their life time. There is a struggle to simplify our daily routines and try to decrease the burden on our time available for certain tasks both during our free time and during our working time.

Robots are therefore becoming more and more involved in our daily lives, for instance in the form of toy robots, cleaning robots in the form of vacuum cleaners, and lawn mowers. These all are pre programmed for specific tasks and have a limited set of functionality available. They also have a drawback in the fact that the user is not allowed to reprogram them. There is therefore a need for a robot that are more intelligent and that can interact with the user in a more complex manner.

There is also a need for a robot that can perform more complex tasks and a wider variety of tasks without extensive programming from users and/or programmers; i.e. robots that can learn a solution for a given task using past experience and from learning during trying to solve the task.

It is an object of the present invention to remedy at least some of these problems.

SUMMARY OF THE INVENTION

This is achieved in a number of aspects of the present invention, in which a first is a robot comprising a control architecture comprising a three layered model: a reasoning layer arranged to provide strategies for completing a task; a reactive layer arranged to handle events subjected to the robot; and a modelling layer arranged to provide a testing ground for reactive and strategy fitness computations, wherein the reasoning layer develop strategies using past experiences and wherein the reactive layer develop control commands using sensor and/or actuator signals together with historic command signals.

This robot may be seen as a self learning robot comprising:

- at least one processor; - at least one memory;

- at least one communication interface for interacting with a task and supervision unit;

- at least one sensor interface; and

- at least one actuating interface for mechanically interacting with external environment;

- wherein the processor is arranged to operate a control architecture comprising a three layered model: o a reasoning layer arranged to provide strategies for completing the received task; o a reactive layer arranged to handle events subjected to the robot; and o a modelling layer arranged to provide a testing ground for reactive and strategy fitness computations using input from the reasoning and reactive layer, o wherein the layers communicate with each other, the reasoning layer develop strategies using stored data relating to command signals and sensor signals, and wherein the reactive layer develop and implement control commands using sensor and/or actuator signals together with in the memory stored command signals with related stored sensor and/or actuator signals.

Wherein at least one of the layers may use genetic programming algorithms in at least some of the involved processes. The reasoning layer may develop a fitness algorithm in order to measure success rate of issued commands. The reactive layer may use a fitness algorithm for measuring success rate of commands.

The processor may be arranged to operate a behaviour rules function providing boundaries in behaviour of the robot, to provide a fitness function from the reasoning layer to the reactive layer, to breed control algorithms using the fitness function, to store in the memory relating to the modelling layer data indicative of sensor signals, and/or to provide a three dimensional map of at least part of the surrounding environment using the modelling layer.

The three dimensional map may also be provided with information about physical properties of objects located in the environment.

The physical properties may comprise at least one of shape, hardness, friction, robustness, colour, and temperature.

The robot may further comprise a separate graphics processor handling graphics related calculations.

The actuating interface may comprise at least one of a transportation device, a movable arm, and a gripping device.

The robot may further comprise an antenna for wireless communication with external devices.

The sensor may be arranged for measuring at least one of temperature, humidity, movement, pressure, friction, electrical characteristics, and magnetic characteristics.

Another aspect of the present invention, a control architecture for controlling a learning robot is provided, comprising three layers: a reasoning layer arranged to provide strategies for completing a task; a reactive layer arranged to handle events subjected to the robot; and a modelling layer arranged to provide a testing ground for reactive and strategy fitness computations, wherein the reasoning layer develop strategies using past experiences and wherein the reactive layer develop control commands using sensor and/or actuator signals together with historic command signals.

Yet another aspect of the present invention, a method of controlling a learning robot is provided, comprising the following steps receiving a task in a reasoning layer; developing a fitness algorithm appropriate for the task in the reasoning layer; developing a strategy for solving the task in the reasoning layer; sending the strategy to a reactive layer; developing control commands in the reactive layer using the strategy; developing new strategies in the reasoning layer using past experience of sensor/actuator signals; sending the new strategies to the reactive layer for developing new control commands; developing new control commands in the reactive layer using past experience comprising a database of stored control command with related sensor signals; testing developed strategies and control commands using a modelling layer.

At least one of the reasoning, modelling, and reactive layer may operate a genetic/evolutionary programming algorithm in at least one process part of the layer operations. BRIEF DESCRIPTION OF THE DRAWINGS

In the following the invention will be described in a non-limiting way and in more detail with reference to exemplary embodiments illustrated in the enclosed drawings, in which:

Fig. 1 illustrates schematically in a block diagram an architecture according to the present invention;

Fig. 2 illustrates schematically in a block diagram the architecture from Fig. 1 in more detail;

Fig. 3 illustrates schematically a robot according to the present invention; Fig. 4 illustrates schematically in a block diagram a hardware system according to the present invention; and

Fig. 5 illustrates schematically in a block diagram a method of controlling a robot according to the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

The architecture of the control solution comprises mainly three layers/modules: a reasoning layer, a modelling layer, and a reactive control layer. The reasoning layer obtain tasks from a task handling architecture and interfaces the obtained tasks to logical principles that can be used by the machine to perform the tasks given. The reactive layer performs immediate event handling in order for the machine to react to external stimuli during task performing. This includes "spinal reaction" in order for the machine to operate on a basic level without damaging it self or the surrounding area.

Fig. 1 illustrates schematically in a block diagram the overall system architecture 10 of the system according to the present invention. Reference numeral 1 generally indicates a reasoning layer, reference numeral 2 indicates the reactive layer, and reference numeral 3 indicates the modelling layer. These layers interact with each other in order to perform a given task by performing different functions and developing different strategies. The system architecture also comprises an external (world) interface 4 for interfacing 5 sensors and actuators; i.e. sensors for measuring different physical properties of the surrounding environment and of the robot and controlling the operation of actuators such as motors for driving wheels or other movable components. The system architecture assumes that a task is given through a user interface communication link 6.

Fig. 2 illustrates the overall system architecture 10 in more detail, wherein a task input function 10 provide a task to a reasoning engine 7 part of the reasoning layer 1. The reasoning engine processes the task into a strategy which is handled as will be discussed below. A behaviour rules set function 8 provide specific rules which gives boundaries in the behaviour of the robot system. The modelling layer 3 is provided with a physics simulator and a model rendering unit 12 in order to efficiently process physical properties of external objects and events and provide a suitable model of the world. The modelling layer communicate directly with the reasoning layer and with a reactive engine 9 that provide control commands to a reactive output unit 11 controlling actuators and other interface units 14 via an external interface unit 13.

The reasoning layer gives a mathematical description on how to measure the performance of the robot system using a fitness function. The fitness function is communicated to the reactive layer in order to start evolutionary processes to breed control algorithms. The reasoning layer also has a purpose of developing an overall strategy for solving the given task. The strategy is then translated to appropriate control algorithms and communicated to the reactive layer. The actual task to be solved by the robot may be given by an upper communication layer handling user interface related questions through the user interface communication link 6.

In between the reactive and reasoning layer the modelling layer provide support for both layers by providing a test environment for algorithms developed in each of the reactive and reasoning layers.

From the perspective of the reactive layer, the modelling layer provide support by storing all sensor and actuator settings together with historical events in order to create a database where the system may draw experience from: when the system behaved in a certain way (=sensor readings and actuator settings) a certain event occurred. This information can be used in a simulation for testing algorithms in order to develop the reactive layers processes. It should be noted that the reactive layer advantageously has a response time for at least certain operations in the range of a few milliseconds in order to reduce risk of acquiring/inflicting damages on the robot or external objects in the surrounding environment.

From the perspective of the reasoning layer, the modelling layer provide support by creating a three dimensional (3D) map of the "world", i.e. the near and far surroundings of the robot. This is done by creating a map with detected or programmed objects with their physical location and physical parameters associated with the object, i.e. hardness, friction, robustness, form, colour, and so on. The reasoning layer can then use this map in testing developed strategies. The physical parameters may be simplified, for instance forms of objects may be simplified using cubical, pyramidal, or similar simplified geometrical forms, likewise, physical parameters concerning algorithms may be simplified using approximation algorithms.

In order to model the world in a 3D representation, a graphics processor is advantageously used and as a benefit from this it is easy to show current world perspective of the robot in a graphical user interface (GUI), e.g. a display.

Each layer is divided into sub parts each operating their specific function in the architecture. Most of the sub parts can communicate with each other directly or indirectly using any suitable communication protocol, e.g. TCP/IP (over for instance Ethernet); however, some parts need to communicate with a graphics processor directly and in these situations communication is over a bus system directly linking the sub part and the graphics processor. It should be noted that other types of processing units may be used for these types of calculations necessary to develop the 3D map, e.g. a digital signal processor (DSP), a microprocessor, an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), or similar.

The layers may for instance be divided into sub layers as follows:

Reasoning layer:

An execute plan module; may perform developed plans independent on how the plans has been developed.

- A planning module; this module may be used to build plans.

Reactive layer:

- A close loop regulator module; this module may directly control the robot using feedback loops operating on sensor input.

- A behaviour module; this module may provide information and decisions based on behaviour.

Modelling layer:

- A mapping module; this module may build a map (in two dimensions (possibly including time)) using sensor input data. - A topological module; this module may build more detailed maps for instance with topological information included.

As the skilled person understands from the above example the three base layers may be divided into a plurality of sub levels and sub modules.

The functionality may be divided into several units of hardware, for instance the reactive layer may operate on a certain piece of hardware while the reasoning layer operate on one or several hardware units and so on.

Fig. 3 illustrates a robot according to the present invention. It should be noted that the actual physical design of the robot is only one of many possible examples and made for illustrative purposes and that many other designs may be utilized. The actual design is set by what type of robot the architecture is to be used with, the types of tasks to be performed, environment, and necessary interface units (for instance grippers, transportation functionality and so on). The robot 30 comprises a main body 31 enclosing driving motors, electronics, control devices and so on. The robot may be propelled using different kinds of propulsion arrangements and in Fig. 3 a caterpillar drive system is used comprising a transport band 32 and driving wheels 33. A rotational portion 34 may be provided in order to move sensors and cameras 36 around without having to move the entire robot. The sensors may be located on the main body 31 , on a sensor portion 35 attached to the rotational portion, and/or on arms 37 or other portions attached to the robot. An arm 37 may be provided in order to extend the reach of the robot, for instance equipped with a gripper 38 at an end of the arm 37 for gripping external objects or manipulating objects in the vicinity of the robot. It should be understood that the gripper may be used for manipulating or operating on the robot itself. An antenna 39 may be provided for communicating with external devices or with a network.

Fig. 4 illustrates a hardware system according to the present invention, wherein a central processing unit 401 operates the reasoning layer activities, a dedicated processing unit 402 is used for handling reactive event handling, and one or several computational processors 403 are used for modelling activities. Memory units 404 may be used for storing operational software, sensor/actuator configurations and/or obtained data readings, events, time, tasks, algorithms, strategies, models, world models, and so on. Furthermore, the hardware system may comprise communication interfaces 405 for interfacing with different parts of the hardware system and/or with the external world (e.g. with some network).

The hardware system may also comprise different types of motors for driving movable 5 parts (propulsion of the robot, moving arms, sensors, cameras and so on), mechanical arrangements (e.g. wheels, arms, gripping tools, or sensors), power source(s) (e.g. power outlet from a power net, battery, solar, wind, combustion engine, and/or chemically based sources). The robot design is not limited to any specific example in the present invention, but any type and design may be used since the core of the invention resides in the control 10 architecture of the robot.

Sensors may be provided for measuring different physical parameters, like for instance, temperature, distance to objects around the robot, movement, humidity, distance, pressure, friction, physical appearance (using a camera), electric or magnetic

15 characteristics, and so on. The sensors may be utilized for obtaining physical data externally of the robot and/or internally of the robot, e.g. one sensor can be directed to measure electric voltage of an object external of the robot and another (or the same) sensor may be used for controlling the voltage of a battery used as power source of the robot; i.e. an internal measurement.

20

Fig. 5 illustrates a method of controlling a robot according to the present invention wherein:

50. a first task is transmitted to the reasoning layer in the robot control architecture;

51. the robot translates this first task to a measurable fitness algorithm;

25 52. a first task command is sent from the reasoning layer to the reactive layer in order to start the process of testing control commands in order to solve the given task;

53. the reasoning layer receives input from the reactive layer and from sensors in order to refine and/or develop strategies for solving the task;

54. the reasoning layer sends new commands relating to newly developed strategies to 30 the reactive layer;

55. the reactive layer operates commands and develop new commands using strategies from the reasoning layer and from sensor signals.

56. the process operates iteratively until the task is solved (57). In the above method the modelling layer is involved in the development of new commands and strategies as has been discussed earlier in this document.

All of the layers use various types of artificial intelligence algorithms in order to develop suitable control commands and strategies for the robot, including but not limited to evolutionary computing such as genetic programming.

Evolutionary computing will now be discussed briefly:

First an initial setup is created from a random selection of solutions or from a seed.

A fitness algorithm is developed that is used for measuring the success of a solution.

The solutions with high success rate are used to develop new setups by randomly or selectively (or a combination of randomly and selectively) choosing one of the solutions having high success rate and allowing it to evolve by mutation. Preferably the selection is not completely random and this may be achieved by randomly choosing a number of candidates (e.g. 4 candidates) and rank them in success order according to the fitness algorithm. The best (e.g. best 2) candidates are used to produce new candidates using recombination (crossover) of candidates, i.e. take a piece of each parent (mother and father) and combine them into a child process (compare DNA). The child may then be mutated. Both the parents and the child is then re-entered into the population and all are tested again.

This is repeated until the solution solves the problem it is set out to solve.

The control architecture according to the present invention may find applicability in a number of different applications as for example, but not limited to, lawn mowers, toys, advertisement robots, guide robots, farming robots (e.g. for handling insects or other vermin by mechanical and/or chemical removal), convoy robots (i.e. transporting something from one location to another location, e.g. by carrying, as trucks, or similar), guard robots (controlling premises), cleaning robots (i.e. robots that pick up things scattered or misplaced and ordering them at a correct location), domestic help robots, bomb disarming robots, fire rescue robots, search planes, exploratory devices (e.g. scientific spacecrafts, deep water vessels), and military devices (e.g. mine detection robots, mine disarming robots, spy planes, and reconnaissance robots).

It should be appreciated that the term layers is used in order to illustrate a hierarchical architecture but the layers operate as modules that may run in parallel with each other. These modules have different functionality and operate instructions as discussed previously in this document.

It should be noted that the word "comprising" does not exclude the presence of other elements or steps than those listed and the words "a" or "an" preceding an element do not exclude the presence of a plurality of such elements. It should further be noted that any reference signs do not limit the scope of the claims, and that several "means", "units" or "devices" may be represented by the same item of hardware, and that at least part of the invention may be implemented in either hardware or software.

The above mentioned and described embodiments are only given as examples and should not be limiting to the present invention. Other solutions, uses, objectives, and functions within the scope of the invention as claimed in the below described patent claims should be apparent for the person skilled in the art.

Claims

1. A self learning robot comprising:

- at least one processor (401 , 402, 403);

- at least one memory (404); - at least one communication interface for interacting with a task and supervision unit;

- at least one sensor interface (405); and

- at least one actuating interface for mechanically interacting with external environment; wherein the processor is arranged to operate a control architecture comprising a three layered model:

- a reasoning layer (1) arranged to provide strategies for completing the received task;

- a reactive layer (2) arranged to handle events subjected to the robot; and - a modelling layer (3) arranged to provide a testing ground for reactive and strategy fitness computations using input from the reasoning and reactive layer, wherein the layers communicate with each other, the reasoning layer develop strategies using stored data relating to command signals and sensor signals, and wherein the reactive layer develop and implement control commands using sensor and/or actuator signals together with in the memory stored command signals with related stored sensor and/or actuator signals.

2. The robot according to claim 1 , wherein at least one of the layers uses genetic programming algorithms in at least a portion of the involved processes.

3. The robot according to claim 1 , wherein the reasoning layer develops a fitness algorithm in order to measure success rate of issued commands.

4. The robot according to claim 1 , wherein the reactive layer uses a fitness algorithm for measuring success rate of commands.

5. The robot according to claim 1 , wherein the processor is arranged to operate a behaviour rules function providing boundaries in behaviour of the robot.

6. The robot according to claim 1 , wherein the processor is arranged to provide a fitness function from the reasoning layer to the reactive layer.

7. The robot according to claim 6, wherein the processor is arranged to breed control algorithms using the fitness function.

8. The robot according to claim 1 , wherein the processor is arranged to store in the memory relating to the modelling layer data indicative of sensor signals.

9. The robot according to claim 1 , wherein the processor is arranged to provide a three dimensional map of at least part of the surrounding environment using the modelling layer.

10. The robot according to claim 1 , wherein the three dimensional map also is provided with information about physical properties of objects located in the environment.

11. The robot according to claim 1 , wherein physical properties comprise at least one of shape, hardness, friction, robustness, colour, and temperature.

12. The robot according to claim 1 , further comprising a separate graphics processor handling graphics related calculations.

13. The robot according to claim 1 , wherein the actuating interface comprises at least one of a transportation device, a movable arm, and a gripping device.

14. The robot according to claim 1 , further comprising an antenna for wireless communication with external devices.

15. The robot according to claim 1 , wherein the sensor is arranged for measuring at least one of temperature, humidity, movement, pressure, friction, electrical characteristics, magnetic characteristics.

16. A control architecture for controlling a learning robot, comprising three layers: - a reasoning layer (1 ) arranged to provide strategies for completing a task;

- a reactive layer (2) arranged to handle events subjected to the robot; and

- a modelling layer (3) arranged to provide a testing ground for reactive and strategy fitness computations, wherein the reasoning layer develop strategies using past experiences and wherein the reactive layer develop control commands using sensor and/or actuator signals together with historic command signals.

17. A method of controlling a learning robot, comprising the following steps - receiving a task in a reasoning layer (1);

- developing a fitness algorithm appropriate for the task in the reasoning layer;

- developing a strategy for solving the task in the reasoning layer;

- sending the strategy to a reactive layer (2); - developing control commands in the reactive layer using the strategy;

- developing new strategies in the reasoning layer using past experience of sensor/actuator signals;

- sending the new strategies to the reactive layer for developing new control commands; - developing new control commands in the reactive layer using past experience comprising a database of stored control command with related sensor signals;

- testing developed strategies and control commands using a modelling layer (3).

18. The method according to claim 17, wherein at least one of the reasoning, modelling, and reactive layer operate a genetic programming algorithm in at least one process part of the layer operations.