CN111546327A

CN111546327A - Method, apparatus and computer program for determining a motion or trajectory of a robot

Info

Publication number: CN111546327A
Application number: CN202010076272.5A
Authority: CN
Inventors: H.贝克; M.托德斯卡托; M.施皮斯; 国萌; P.克斯佩尔; N.瓦尼克
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2019-01-28
Filing date: 2020-01-23
Publication date: 2020-08-18
Also published as: DE102019201045B4; DE102019201045A1

Abstract

The invention relates to a method for determining an action of a robot (11) from a specifiable starting position and a specifiable target position. A subsequent position is selected from a plurality of pre-selected positions in dependence on the actual position(s) of the robot (11). The subsequent position is selected according to variables determined by means of a machine learning system. The action is then selected from a plurality of possible actions (a) that the robot (11) can perform, such that the subsequent position is reached directly starting from the actual position(s) when the robot (11) performs the selected action. The invention also relates to a computer program and a device for performing the method and to a machine-readable storage element in which the computer program is stored.

Description

Method, apparatus and computer program for determining a motion or trajectory of a robot

Technical Field

The invention relates to a method for determining the movement or trajectory of a robot in order to reach a predefinable target position. The invention also relates to a device and a computer program arranged to perform the method.

Background

Unpublished DE 102017217412.9 discloses a method for operating a robot control system comprising a machine learning system. The machine learning system determines a movement route of at least one object in the motion space of the robot from a map representing the motion space of the robot.

Hart et al in their publication "A formal basis for the theoretical determination of the minimum cost Paths",IEEE transactions on Systems Science and Cyberneticsthe best planner, which finds the path with the lowest cost, is shown in page 100-107 of 4.2 (1968).

For example, Cohen et al disclose a suboptimal planner (focus A-search algorithm) in its publication "elementary physical search with applications", proceedings of the Twenty-Seventh International Joint Conference on Intelligent, IJCAI-18, 1434-.

Disclosure of Invention

In a first aspect, a method, in particular computer-implemented, for determining an action of a robot from an actual position of the robot is presented. To this end, a subsequent position is selected from a plurality of preselected neighboring positions (focal set, English). The subsequent position is an adjacent position from the plurality of preselected adjacent positions that is assigned the smallest first variable relative to other preselected adjacent positions. The first variables each represent a first probability of whether the robot has moved past an actual position to a respective preselected adjacent position, in particular starting from a predefinable starting position, along a previous actual position of the robot. Furthermore, a machine learning system is used, which is arranged to output a plurality of second probabilities as output variables. The second probabilities each represent a probability that the robot, starting from the actual position, performs one of a plurality of possible actions. The machine learning system determines the output variables and assigns respective first variables to respective preselected adjacent locations according to at least one output variable of the machine learning system. The action of the robot is selected from the plurality of possible actions such that when the robot performs the selected action it will proceed directly from the actual position to the subsequent position.

An action may be understood as an action performed by an actuator of the robot. Alternatively, an action may be understood as a maneuver of the robot performed by the robot.

An adjacent position is understood to be a position that can be reached directly by the robot starting from its actual position, i.e. a position that can be reached next after the execution of a unique action. The actual position may be a measured or calculated position.

The advantage of this method is that a suboptimal subsequent position is selected based on a selection from preselected neighboring positions, which are preselected by means of a focus a-search algorithm, so that the robot takes a nearly optimal path. Since the path determined by the focus a-search algorithm is suboptimal with respect to predeterminable (cost) criteria (e.g. time, energy consumption, shortest path, etc.) with guaranteed constraints.

It is also advantageous that the output variables of the machine learning system are used as heuristics, and that the machine learning system can learn the heuristics from training data. In addition, the machine learning system can also learn generalized elicitations through teaching. Another advantage is that the machine learning system outputs a second probability characterizing a local behavior of the robot. This is advantageous because it has been realized that the machine learning system can predict the local behavior of the robot particularly accurately, whereby more reliable heuristics can be achieved.

It is further proposed that a total cost is determined for each possible neighbouring position of said actual position and assigned to the respective neighbouring position. The neighboring locations are entered in a first list (open list) and the total cost is determined from the first cost and the second cost. The first cost characterizes the cost that has to be expended to reach the respective adjacent position from the specifiable starting position of the robot, while the second cost characterizes the cost that has to be expended to reach the specifiable target position of the robot from the respective adjacent position. The second cost is estimated such that the second cost is always lower than the actual cost for reaching the target location from the respective neighboring location. The plurality of preselected neighboring locations (focal set, English) are neighboring locations for which the total cost in the first list (open list, English) is less than the determined lowest total cost multiplied by a predeterminable factor.

This has the following advantages: the action/trajectory is found from the preselected neighboring position, which ensures that it is not worse than the best solution multiplied by the predeterminable factor (focus value).

It is also proposed that the first probability is determined as a function of a further probability which characterizes whether the robot has reached the actual position, in particular starting from the predefinable starting position. This has the advantage that previous actions of the robot can be taken into account.

It is further proposed that, after a trajectory has been determined by means of the method of the first aspect, in particular in dependence on the determined movement for reaching the target position, the predeterminable factor is reduced by a predeterminable value, and the method is then executed anew for determining further trajectories. If no other tracks are found, the already determined tracks are used. It is advantageous here to ensure that the further trajectory is closer to the optimal trajectory with respect to the predeterminable (cost) criterion. For this purpose, the already checked locations, including the costs of the determination of these locations, can advantageously be reused.

It is also proposed to determine the actions separately for a plurality of robots. The machine learning system may be a deep neural network that obtains as input variables a map with all the actual positions of the robot. After a predeterminable layer of the deep neural network, the part of the map surrounding each actual position of the robot, which is processed by means of the layers up to the predeterminable layer, is used as an input variable for the subsequent layers of the predeterminable layer. Advantageously, the machine learning system is invariant to the number of robots by extracting a portion around each location of the robot. Furthermore, it is not necessary to teach the machine learning system separately for each robot. It is also advantageous that the output variables of the machine learning system are consistent for all robots, since these output variables are determined by the same machine learning system. Each section has a predefined size, preferably a square size.

It is also proposed to additionally store the positions of the other robots in addition to the actual positions of the robots in each case. Furthermore, the actual position of other movable objects (e.g. people or vehicles in the respective robot environment) may be stored in addition to the actual position. It should be noted that the actual position, which is rich in additional information about other movable objects, may also be referred to as the actual state, i.e. the actual state comprises at least the actual position of the robot. The actions of the robot are then also determined from the additionally stored positions. This has the advantage that collisions with other movable objects can be avoided.

It is also proposed that subsequent positions of the robot are determined as a function of the actual state, i.e. at least from the actual positions of the robots from the actual state. This has the advantage that the robots can be operated together, since the actual positions of the robots are jointly explored from the actual state.

It is also proposed to generate the training data by means of an optimal planner, in particular an a-search algorithm, for determining the trajectory from the starting position and the target position, which optimal planner is applied to the predeterminable problem instance. An optimal planner may thus be simulated, such that after the machine learning system is taught, the optimal planner may be replaced by the taught machine learning system. Because the optimal planner requires significant computational power, it cannot be used in mobile applications.

It is also proposed to determine control variables from the determined motion or trajectory for the robot.

The determined control variables may be used by a control unit such that the control unit controls the actuators of the robot in accordance with the control variables.

In another aspect, a computer program is presented. The computer program is arranged to perform one of the methods described above. The computer program comprises instructions which, when the computer program is run on a computer, cause the computer to perform the method of the first aspect with all the steps. A machine-readable storage module is also presented, on which the computer program is stored. Furthermore, an apparatus is proposed, which is arranged to perform the method of the first aspect.

Drawings

Embodiments of the above-described aspects are illustrated in the drawings and are explained in more detail in the following description.

FIG. 1 shows a schematic diagram of an information flow diagram of a trajectory planning system;

FIG. 2 shows a schematic diagram of the structure of a machine learning system of the trajectory planning system;

FIG. 3 shows a schematic diagram of a flow chart of an embodiment of a method for determining a motion or trajectory of a robot;

FIG. 4 shows a schematic diagram of a flow chart for teaching an embodiment of a machine learning system;

FIG. 5 shows a schematic diagram of a flow chart of an embodiment for determining a trajectory using a search algorithm;

fig. 6 shows a schematic diagram of an embodiment of an apparatus that may be used to teach a machine learning system.

Detailed Description

Fig. 1 shows a schematic representation of an information flow diagram 01 of a trajectory planning system 10. A map is provided as an input variable of the trajectory planning system 10, which determines at least one movement or trajectory T of the robot 11 from the map, the actual position s of the robot 11 and the specifiable target position Z. The map in fig. 1 schematically shows the environment of the robot 11 with objects displayed as black boxes on the map. The motion or trajectory T is then provided to the robot 11, which may use the motion or trajectory as a control variable. Advantageously, the trajectory planning system 10 is additionally provided with a specifiable starting position of the robot 11, which is to be taken into account in determining the movement a or the trajectory T.

The actual position s and the target position Z and, if necessary, the starting position are also entered on the map. In the embodiment of fig. 1, all possible actions a of the robot 11 that can be performed by it according to the actual position s are entered in the map, by way of example. This may be a forward, leftward or rightward motion, for example, due to space limitations caused by objects entered on the map over the position s of the robot 11.

In another embodiment of the trajectory planning system 10, the trajectory planning system 10 is arranged to determine a plurality of robots, the actions or trajectories of each robot, from a plurality of actual positions. For this purpose, the respective actual position a2 and the associated target position A3 of the robot are preferably entered on the map.

The trajectory planning system 10, in particular the trajectory planning system 10 arranged to perform the method according to fig. 3 below, comprises at least one machine learning system (not shown in fig. 1) which determines at least one output variable from the provided map. The machine learning system will be explained in more detail in fig. 2 below. Furthermore, the trajectory planning system 10 comprises a calculation unit 101 on which a search algorithm, advantageously a focus a-search algorithm, is executed. The search algorithm is used to determine a subsequent position from a plurality of possible, in particular preselected, neighboring positions according to a heuristic based on at least one output variable of the machine learning system. From the subsequent position, the trajectory planning system 10 determines an action or trajectory T. It should be noted that the machine learning system may be implemented in both software and hardware.

The trajectory planning system 10 also has a machine-readable storage element 102, on which machine-readable storage element 102 commands are stored to determine the actions or trajectories to be performed by the computing unit 101.

Trajectory planning system 10 may be used, for example, for Automated Valet Parking (AVP) using mobile agents. The trajectory planning system 10 determines its movement or trajectory for the mobile agent, so that the mobile agent can pick up the vehicle and guide it to the free parking space (target location).

The trajectory planning system 10 can alternatively be used for manufacturing robots, in which case the movement of their robot arm is determined, for example, from their actual position and their target position, or for route planning by means of a navigation system.

Fig. 1 shows a schematic view of a robot 11, in this embodiment the robot 11 being given by an at least partially autonomous vehicle 10. In another embodiment, the robot may be a maintenance robot, an installation robot or a fixed production robot, alternatively an autonomous flying object, such as a drone.

Fig. 2 shows a schematic diagram of a machine learning system 20, which is represented here by a deep neural network. The machine learning system 20 obtains the map or a part of the map as input variables, which is processed in a first part 21 of the deep neural network by means of a plurality of series-connected convolutional layers (english: volume layer).

In the second part 22 of the neural network, the output variables of the first part of the neural network are used in succession, wherein the parts of the output variables of the first part of the neural network which respectively surround the actual position of the robot are provided as input variables of the second part 22 of the neural network.

The second part of the neural network has two different (signal-running) paths, each formed by a fully connected (english: fully connected) layer. At the exit of a path, for each robotiAll existence probability

And there is a future cost at the exit of the other path

。

Probability of

It is characterized that the ith robot 11 performs one of a plurality of possible actions A starting from its actual position s

The probability of (c). Future cost

The costs that each robot 11 must spend to reach the respectively predefinable target position Z starting from the respectively preselected adjacent position are characterized.

In the case of observation of only one robot, accordingly only the part surrounding the actual position of the one robot is provided as input variables to the second part of the neural network.

As shown in fig. 2, the convolutional layers may have 64 different filters, respectively, each having a dimension of 3 × 3. The neural network may have a bridge connection that provides an output variable of one of the layers of the first portion of the neural network or an input variable (map) of the machine learning system by skipping at least one of the at least one subsequent convolutional layer.

It is also conceivable that the path is split into two paths in the first part 21 of the neural network and that the first part 21 of the neural network outputs two output variables. The paths of the second part 22 of the neural network are then assigned to the paths of the first part 21, respectively, and a part of the output variables of the respective paths of the first part 21 of the neural network is obtained as input variables, respectively.

In an alternative embodiment of the machine learning system 20, the machine learning system may also be arranged to determine only two output variables of the second part of the neural network

One of them. For example one of the two paths in which the second part of the neural network is disabled.

Fig. 3 shows a schematic illustration of a method 30 for determining an action a or a trajectory T, which is performed, for example, by the trajectory planning system 10.

The method starts at step S31. In this step, a map is provided as an input variable to the machine learning system 20 of fig. 2. The machine learning system 20 determines its output variables from the provided map

. In step S31, the machine learning system 20 is optionally taught with training data, and the map is then provided as an input variable to the taught machine learning system 20. It should be noted that the teachings of machine learning system 20 are explained in more detail in fig. 4 below.

In a next step S32, a path, in particular a trajectory, from a specifiable starting position (in particular an actual position of the robot) to a specifiable target position is determined by means of a search algorithm, advantageously by means of a focus a search algorithm. Here, the search algorithm decides which subsequent positions the robot should optimally take based on at least one of the output variables of the machine learning system determined from step S31. In this process, one of the output variables of the machine learning system is used as a heuristic for deciding the subsequent position, respectively. This step is explained in detail in fig. 5. Determining a path from the starting location to the target location from the determined subsequent locations. A motion sequence can then be determined from the path, so that when the robot executes the motion sequence, the robot moves along the determined path to the specifiable target position. From the determined path or the sequence of actions, a trajectory T of the robot may be determined.

After completion of step S32, step S33 is performed. Here, the robot 11 is operated according to the motion or trajectory T determined from step S32. Preferably, the robot is manipulated with only the first motion or the first part of the trajectory start, and then step S32 is re-executed, so as to be able to react to a changing environment if necessary.

Fig. 4 shows a schematic diagram of a method 40 for teaching the machine learning system 20.

The method begins with generating training data in step S41. Several examples of problems are provided for this purpose. The problem examples can be, for example, different maps of different environments in which the robot is to be moved from a specifiable starting position to a specifiable target position. The movement of the robot should be optimal in terms of a predeterminable cost criterion. The cost criteria may be, for example, time, energy consumption and/or distance covered.

Next, in step S42, the best paths from the respective starting positions to the associated target positions are determined with respect to the cost criterion by means of a best planner, for example an a-search algorithm. Advantageously, problem instances are discarded in which the best planner cannot find any path.

Position-action pairs are formed from the best path from step S42, their relationship to the respective assigned environment (of the respective assigned map portion) being learned by the machine learning system 20. That is to say, the machine learning system learns rules (policies, english) in order to be able to decide at which location s and given the environment of the location s by means of at least one map section to decide that the location s should be optimally selectedWhich action of the robot a. Cost function

It can be derived from the cost that the robot has to spend to reach the target position along the remaining best path starting from its actual position.

Applying position-action pairs and/or cost functions

And the respectively associated problem instances are combined into training data.

In a subsequent step S43, the machine learning system 20 is taught using the training data from step S42. The machine learning system 20 obtains a map of the problem instance as input variables and is taught such that it determines output variables from its input variables and the actual position of the robot and the starting and target positions of the robot

And

. For this teaching, the position-action pairs and/or costs derived from the trajectory determined in step S42 are used. In the teaching, parameters are set in the machine learning system 20 such that output variables of the machine learning system match corresponding ideal output variables from the training data. The parameter changes required for this purpose can be determined by means of a gradient descent method using a difference function (loss function) between the output variables of the machine learning system and the output variables of the training data. To teach output variables

Preferably, cross entropy is used as a difference function, and to teach output variables

Preferably use

Norm as a function of difference.

In optional step 44, the machine learning system 20 is re-taught, for example, if a new problem instance has been defined or the output variables of the machine learning system are not sufficiently accurate after step S42 is complete.

Fig. 5 shows a schematic illustration of a method 50 for determining an action a or a trajectory T, in particular with a search algorithm.

The method starts in step S51. The actual position s of the robot is determined in this case_tFor example, taking measurements or reading from a provided map.

Next, in step S52, the actual position S is set_tInto an open list (english: open list), in particular as used in the a-search algorithm.

In step S53, a total cost is determined for all adjacent positions entered in the open list, respectively, preferably according to an a x-search algorithmf(n). Total cost off(n)Can be made up of a first cost and a second costg (n), h (n) are. First costg(n)It is characterized in that the robot 11 passes the actual position s from the predefinable starting position_tThe cost already spent reaching the respective adjacent positions. For this purpose, costs are preferably assigned to each preceding movement of the robot from the predefinable starting position to the respective adjacent position, which costs add up to a first costg(n). Second costh (n)The costs that the robot must spend to reach the specifiable target position from the respective adjacent position are characterized. Second costh(n)Preferably by means of the euclidean distances from the respective adjacent position to the predeterminable target position. Alternatively, the second cost may be determined by means of a further heuristic which has to obey the following conditions: this condition underestimates the fact that the target position can be predefined from the respective adjacent positionThe cost is high.

After the total cost has been determined for all adjacent positions from the list, respectively, the lowest total cost minf (n) is determined.

In a following step S54, all neighboring positions (focal group, English) having a total cost less than the minimum total cost minf (n) multiplied by a factor are selected from the open list

. Factor(s)

Preferably greater than 1.

Next, in step S55, other variables are determined for each adjacent position (focal set, English) selected from step S54. Preferably, as in the case of the a-focus search algorithm, other heuristics are usedh _FThe other variables are determined, for which reference is made to the documents cited above.

There are two ways to establish the other elicitationsh _FThe possibility of (2): as a first possibility, a cost function determined by means of the machine learning system 20 can be used

As a further heuristich _F。

Additionally or alternatively, path probability P is used as a further heuristich _F. The path probability P characterizes the robot 11 starting from the predefinable starting position, in particular along the previous position, via the actual position s_tProbability of moving to respective preselected adjacent locations.

The path probability may be defined by the following equation:

wherein

Is the probability of occurrence, characterized by the probability of the robot passing the kth position on its path with a total of T actions,

is that the robot performs an action at the k-th position

The probability of the feature.

Can be determined by means of probabilities determined by the machine learning system 20

To express equation (1):

。

to avoid distortion at the beginning of the path, equation (2) can be rewritten:

wherein|A|Illustrating the number of possible actions that the robot can perform,t _kthe actual time points are illustrated.

By means of other heuristicsh _FAfter determining the other variables for each selected neighbor location, the neighbor location that has been assigned the smallest other variable is selected. This adjacent position with the smallest further variables is then the subsequent position which the robot should manipulate in order to reach the predeterminable target position optimally with respect to the cost criterion. This neighbouring position is then accommodated in a closed list (english: closed list), in particular the closed list used by the a-search algorithm, and removed from the open list.

In a further embodiment, steps S52 to S55 are repeated a plurality of times until the predefinable target position is entered into the closed list and deleted from the open list.

It should be noted that other paths may be determined after determining the path, wherein the method 50 just described is performed identically for this purpose, but factors will be predeterminable

Reducing the predeterminable value

。

Fig. 6 shows a schematic diagram of an apparatus 60 for teaching the machine learning system 20, in particular for performing the steps of teaching according to the method 40 of fig. 4. The apparatus 60 includes an optimal planner 61, the machine learning system 20, and a difference module 62. The difference module 62 is arranged to determine the output variables of the optimal planner 61 from the determined output variables by means of the difference function

And an output variable y of the machine learning system 20, and determining a parameter of the machine learning system 20 from the difference

Variations of (2)

'. Parameter(s)

Is stored in database P and varies according to the determination of difference module 62

' adapted.

The device 60 may have a machine-readable storage element (65) on which the method 40 is stored and a computing unit 64 for executing the method 40.

Claims

1. For determining the actual position(s) of the robot (11)_t) Method of determining an action (a) of the robot (11),

wherein from the actual position(s)_t) A subsequent position is selected from a plurality of preselected adjacent positions (focal group), wherein each adjacent position is assigned a first variable of (h _F），

Wherein the subsequent position(s)_t) Is one of the preselected adjacent locations that is assigned the smallest first variable relative to the other preselected adjacent locations (bh _F），

Wherein the first variable (h _F) Respectively characterizing whether the robot (11) is moving from the actual position(s)_t) First probabilities of moving to respective preselected adjacent locations: (P），

Wherein the machine learning system (20) is arranged to output a plurality of second probabilities

) Each second probability characterizes the robot (11) from the actual position(s) as an output variable_t) Setting out to perform one of a plurality of possible actions (A) respectively: (A)

) The probability of (a) of (b) being,

wherein the machine learning system (20) determines the output variable,

wherein the first variable is determined from at least one of the output variables of the machine learning system (20) ((h _F) And assigning said first variables to respective preselected adjacent locations,

wherein an action (a) is selected from the plurality of possible actions (A) such that when the robot (11) performs the selected action (a), the actual position(s) is determined_t) Go out directlyThe subsequent position is reached.

2. Method according to claim 1, wherein for said actual position(s)_t) Determines a total cost of each possible adjacent location of (c) < 2 >f（n）) And the total costs are distributed to respective adjacent locations,

wherein the neighbouring positions are entered into a first list (open list),

wherein according to the first cost (g（n）) And a second cost of (h（n）) To determine the total cost: (f（n）），

Wherein the first cost: (g（n）) Respectively characterizing the costs which have to be expended to reach the respective adjacent position from a predeterminable starting position of the robot (11), and the second cost: (b:)h（n）) Respectively characterizing the costs which have to be expended in order to reach a predeterminable target position (Z) of the robot (11) from the respective adjacent position,

wherein the second cost is estimated (h（n）) Such that the second cost is always lower than the actual cost of reaching the target position (Z) from the respective adjacent position,

wherein the plurality of preselected neighboring locations (focal groups) comprises the following neighboring locations of the first list (open list) for the total cost of (a)f（n）) Below a determined minimum total cost (min)f（n）) Multiplied by a predeterminable factor (

）。

3. The method according to claim 1 or 2, wherein the machine learning system (20) is arranged to output the output variable according to at least one provided portion of a map of the environment of the robot (11),

wherein the machine learning system (20) determines the output variable from the map portion,

wherein according to a plurality of second probabilities (

) At least the following second probability to determine the first probability: (P) Said second probability characterizing whether said robot is moving from said actual position(s)_t) Starting to perform the action that the robot has to perform, such that the robot moves from the actual position(s)_t) Starting directly to the respective adjacent position.

4. Method according to any one of the preceding claims, wherein, after selection of the subsequent location, the subsequent location is entered into a second list (closed list) and the actual location is entered(s) ((_st) Is set equal to the subsequent position and,

wherein the method is repeatedly performed a plurality of times until the actual position(s)_t) Corresponding to the predefinable target position (Z),

wherein at the start of the method the actual position(s)_t) Corresponding to the predefinable starting position,

wherein neighbouring positions of previous actual positions remain entered in the first list, in particular only neighbouring positions that have been selected as subsequent positions are deleted from the first list,

wherein adjacent positions of the first list are each assigned a previous actual position from which the respective adjacent position can be directly reached,

wherein the actual position(s) is not the actual position(s) when the selected subsequent position is the adjacent position_t) Is determined, the actual position(s) is determined_t) Is set equal to the actual position assigned to the adjacent position,

wherein the actions which have to be performed directly consecutively are combined into a sequence of actions, such that the robot reaches the predeterminable target position along a position from the second list,

wherein a trajectory (T) of the robot is determined from the sequence of actions.

5. Method according to claim 4 and claim 2, wherein the predeterminable factor(s) is/are determined after the trajectory has been determined

) Decreasing the predeterminable value (

），And wherein the method is re-executed to determine further trajectories.

6. The method according to any of the preceding claims, wherein the action (a) is determined separately for a plurality of robots (11),

wherein the machine learning system (20) is a deep neural network which obtains as input variables a map with all actual positions of the robot (11),

wherein, after a predeterminable layer of the deep neural network, the part of the map surrounding each actual position of the robot (11) is used as an input variable for a subsequent layer, in particular a directly subsequent layer, of the predeterminable layer.

7. Method according to one of the preceding claims, characterized in that the training data are generated by means of an optimal planner, in particular an A-search algorithm, for determining a trajectory from the starting position and the target position, which optimal planner is applied to the predeterminable problem instance,

wherein the machine learning system (20) is taught according to the generated training data such that the machine learning system determines the best planner's decisions from the actual location and at least the portion of the map and outputs as output variables.

8. According to the foregoingMethod according to one of the claims, wherein the first variables each characterize the future costs (Z) that the robot (11) must spend to arrive at the predefinable target position (Z) starting from a respective preselected adjacent position: (

），

Wherein the machine learning system (20) is arranged to output the future costs (A, B, C, respectively)

) As an output variable, and

wherein the output variable of the machine learning system (20) is taken as a first variable (h _F) Are assigned to respective pre-selected adjacent positions,

wherein the machine learning system is taught according to the generated training data such that it estimates a future cost along the trajectory determined by means of the best planner from the actual position up to the predeterminable target position.

9. A computer program comprising instructions arranged, when executed by a computer, to cause the computer to perform the method according to any preceding claim.

10. A machine readable storage element (102, 65) having stored thereon a computer program according to claim 9.

11. An apparatus (10, 60) arranged to perform the method according to any of the preceding claims 1 to 8.