CN111546327A - Method, apparatus and computer program for determining a motion or trajectory of a robot - Google Patents

Method, apparatus and computer program for determining a motion or trajectory of a robot Download PDF

Info

Publication number
CN111546327A
CN111546327A CN202010076272.5A CN202010076272A CN111546327A CN 111546327 A CN111546327 A CN 111546327A CN 202010076272 A CN202010076272 A CN 202010076272A CN 111546327 A CN111546327 A CN 111546327A
Authority
CN
China
Prior art keywords
robot
machine learning
learning system
actual
cost
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010076272.5A
Other languages
Chinese (zh)
Inventor
H.贝克
M.托德斯卡托
M.施皮斯
国萌
P.克斯佩尔
N.瓦尼克
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Robert Bosch GmbH
Original Assignee
Robert Bosch GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch GmbH filed Critical Robert Bosch GmbH
Publication of CN111546327A publication Critical patent/CN111546327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/0081Programme-controlled manipulators with master teach-in means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1661Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0217Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with energy consumption, time reduction or distance reduction criteria
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0268Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means
    • G05D1/0274Control of position or course in two dimensions specially adapted to land vehicles using internal positioning means using mapping information stored in a memory device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Automation & Control Theory (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Manipulator (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)

Abstract

The invention relates to a method for determining an action of a robot (11) from a specifiable starting position and a specifiable target position. A subsequent position is selected from a plurality of pre-selected positions in dependence on the actual position(s) of the robot (11). The subsequent position is selected according to variables determined by means of a machine learning system. The action is then selected from a plurality of possible actions (a) that the robot (11) can perform, such that the subsequent position is reached directly starting from the actual position(s) when the robot (11) performs the selected action. The invention also relates to a computer program and a device for performing the method and to a machine-readable storage element in which the computer program is stored.

Description

Method, apparatus and computer program for determining a motion or trajectory of a robot
Technical Field
The invention relates to a method for determining the movement or trajectory of a robot in order to reach a predefinable target position. The invention also relates to a device and a computer program arranged to perform the method.
Background
Unpublished DE 102017217412.9 discloses a method for operating a robot control system comprising a machine learning system. The machine learning system determines a movement route of at least one object in the motion space of the robot from a map representing the motion space of the robot.
Hart et al in their publication "A formal basis for the theoretical determination of the minimum cost Paths",IEEE transactions on Systems Science and Cyberneticsthe best planner, which finds the path with the lowest cost, is shown in page 100-107 of 4.2 (1968).
For example, Cohen et al disclose a suboptimal planner (focus A-search algorithm) in its publication "elementary physical search with applications", proceedings of the Twenty-Seventh International Joint Conference on Intelligent, IJCAI-18, 1434-.
Disclosure of Invention
In a first aspect, a method, in particular computer-implemented, for determining an action of a robot from an actual position of the robot is presented. To this end, a subsequent position is selected from a plurality of preselected neighboring positions (focal set, English). The subsequent position is an adjacent position from the plurality of preselected adjacent positions that is assigned the smallest first variable relative to other preselected adjacent positions. The first variables each represent a first probability of whether the robot has moved past an actual position to a respective preselected adjacent position, in particular starting from a predefinable starting position, along a previous actual position of the robot. Furthermore, a machine learning system is used, which is arranged to output a plurality of second probabilities as output variables. The second probabilities each represent a probability that the robot, starting from the actual position, performs one of a plurality of possible actions. The machine learning system determines the output variables and assigns respective first variables to respective preselected adjacent locations according to at least one output variable of the machine learning system. The action of the robot is selected from the plurality of possible actions such that when the robot performs the selected action it will proceed directly from the actual position to the subsequent position.
An action may be understood as an action performed by an actuator of the robot. Alternatively, an action may be understood as a maneuver of the robot performed by the robot.
An adjacent position is understood to be a position that can be reached directly by the robot starting from its actual position, i.e. a position that can be reached next after the execution of a unique action. The actual position may be a measured or calculated position.
The advantage of this method is that a suboptimal subsequent position is selected based on a selection from preselected neighboring positions, which are preselected by means of a focus a-search algorithm, so that the robot takes a nearly optimal path. Since the path determined by the focus a-search algorithm is suboptimal with respect to predeterminable (cost) criteria (e.g. time, energy consumption, shortest path, etc.) with guaranteed constraints.
It is also advantageous that the output variables of the machine learning system are used as heuristics, and that the machine learning system can learn the heuristics from training data. In addition, the machine learning system can also learn generalized elicitations through teaching. Another advantage is that the machine learning system outputs a second probability characterizing a local behavior of the robot. This is advantageous because it has been realized that the machine learning system can predict the local behavior of the robot particularly accurately, whereby more reliable heuristics can be achieved.
It is further proposed that a total cost is determined for each possible neighbouring position of said actual position and assigned to the respective neighbouring position. The neighboring locations are entered in a first list (open list) and the total cost is determined from the first cost and the second cost. The first cost characterizes the cost that has to be expended to reach the respective adjacent position from the specifiable starting position of the robot, while the second cost characterizes the cost that has to be expended to reach the specifiable target position of the robot from the respective adjacent position. The second cost is estimated such that the second cost is always lower than the actual cost for reaching the target location from the respective neighboring location. The plurality of preselected neighboring locations (focal set, English) are neighboring locations for which the total cost in the first list (open list, English) is less than the determined lowest total cost multiplied by a predeterminable factor.
This has the following advantages: the action/trajectory is found from the preselected neighboring position, which ensures that it is not worse than the best solution multiplied by the predeterminable factor (focus value).
It is also proposed that the first probability is determined as a function of a further probability which characterizes whether the robot has reached the actual position, in particular starting from the predefinable starting position. This has the advantage that previous actions of the robot can be taken into account.
It is further proposed that, after a trajectory has been determined by means of the method of the first aspect, in particular in dependence on the determined movement for reaching the target position, the predeterminable factor is reduced by a predeterminable value, and the method is then executed anew for determining further trajectories. If no other tracks are found, the already determined tracks are used. It is advantageous here to ensure that the further trajectory is closer to the optimal trajectory with respect to the predeterminable (cost) criterion. For this purpose, the already checked locations, including the costs of the determination of these locations, can advantageously be reused.
It is also proposed to determine the actions separately for a plurality of robots. The machine learning system may be a deep neural network that obtains as input variables a map with all the actual positions of the robot. After a predeterminable layer of the deep neural network, the part of the map surrounding each actual position of the robot, which is processed by means of the layers up to the predeterminable layer, is used as an input variable for the subsequent layers of the predeterminable layer. Advantageously, the machine learning system is invariant to the number of robots by extracting a portion around each location of the robot. Furthermore, it is not necessary to teach the machine learning system separately for each robot. It is also advantageous that the output variables of the machine learning system are consistent for all robots, since these output variables are determined by the same machine learning system. Each section has a predefined size, preferably a square size.
It is also proposed to additionally store the positions of the other robots in addition to the actual positions of the robots in each case. Furthermore, the actual position of other movable objects (e.g. people or vehicles in the respective robot environment) may be stored in addition to the actual position. It should be noted that the actual position, which is rich in additional information about other movable objects, may also be referred to as the actual state, i.e. the actual state comprises at least the actual position of the robot. The actions of the robot are then also determined from the additionally stored positions. This has the advantage that collisions with other movable objects can be avoided.
It is also proposed that subsequent positions of the robot are determined as a function of the actual state, i.e. at least from the actual positions of the robots from the actual state. This has the advantage that the robots can be operated together, since the actual positions of the robots are jointly explored from the actual state.
It is also proposed to generate the training data by means of an optimal planner, in particular an a-search algorithm, for determining the trajectory from the starting position and the target position, which optimal planner is applied to the predeterminable problem instance. An optimal planner may thus be simulated, such that after the machine learning system is taught, the optimal planner may be replaced by the taught machine learning system. Because the optimal planner requires significant computational power, it cannot be used in mobile applications.
It is also proposed to determine control variables from the determined motion or trajectory for the robot.
The determined control variables may be used by a control unit such that the control unit controls the actuators of the robot in accordance with the control variables.
In another aspect, a computer program is presented. The computer program is arranged to perform one of the methods described above. The computer program comprises instructions which, when the computer program is run on a computer, cause the computer to perform the method of the first aspect with all the steps. A machine-readable storage module is also presented, on which the computer program is stored. Furthermore, an apparatus is proposed, which is arranged to perform the method of the first aspect.
Drawings
Embodiments of the above-described aspects are illustrated in the drawings and are explained in more detail in the following description.
FIG. 1 shows a schematic diagram of an information flow diagram of a trajectory planning system;
FIG. 2 shows a schematic diagram of the structure of a machine learning system of the trajectory planning system;
FIG. 3 shows a schematic diagram of a flow chart of an embodiment of a method for determining a motion or trajectory of a robot;
FIG. 4 shows a schematic diagram of a flow chart for teaching an embodiment of a machine learning system;
FIG. 5 shows a schematic diagram of a flow chart of an embodiment for determining a trajectory using a search algorithm;
fig. 6 shows a schematic diagram of an embodiment of an apparatus that may be used to teach a machine learning system.
Detailed Description
Fig. 1 shows a schematic representation of an information flow diagram 01 of a trajectory planning system 10. A map is provided as an input variable of the trajectory planning system 10, which determines at least one movement or trajectory T of the robot 11 from the map, the actual position s of the robot 11 and the specifiable target position Z. The map in fig. 1 schematically shows the environment of the robot 11 with objects displayed as black boxes on the map. The motion or trajectory T is then provided to the robot 11, which may use the motion or trajectory as a control variable. Advantageously, the trajectory planning system 10 is additionally provided with a specifiable starting position of the robot 11, which is to be taken into account in determining the movement a or the trajectory T.
The actual position s and the target position Z and, if necessary, the starting position are also entered on the map. In the embodiment of fig. 1, all possible actions a of the robot 11 that can be performed by it according to the actual position s are entered in the map, by way of example. This may be a forward, leftward or rightward motion, for example, due to space limitations caused by objects entered on the map over the position s of the robot 11.
In another embodiment of the trajectory planning system 10, the trajectory planning system 10 is arranged to determine a plurality of robots, the actions or trajectories of each robot, from a plurality of actual positions. For this purpose, the respective actual position a2 and the associated target position A3 of the robot are preferably entered on the map.
The trajectory planning system 10, in particular the trajectory planning system 10 arranged to perform the method according to fig. 3 below, comprises at least one machine learning system (not shown in fig. 1) which determines at least one output variable from the provided map. The machine learning system will be explained in more detail in fig. 2 below. Furthermore, the trajectory planning system 10 comprises a calculation unit 101 on which a search algorithm, advantageously a focus a-search algorithm, is executed. The search algorithm is used to determine a subsequent position from a plurality of possible, in particular preselected, neighboring positions according to a heuristic based on at least one output variable of the machine learning system. From the subsequent position, the trajectory planning system 10 determines an action or trajectory T. It should be noted that the machine learning system may be implemented in both software and hardware.
The trajectory planning system 10 also has a machine-readable storage element 102, on which machine-readable storage element 102 commands are stored to determine the actions or trajectories to be performed by the computing unit 101.
Trajectory planning system 10 may be used, for example, for Automated Valet Parking (AVP) using mobile agents. The trajectory planning system 10 determines its movement or trajectory for the mobile agent, so that the mobile agent can pick up the vehicle and guide it to the free parking space (target location).
The trajectory planning system 10 can alternatively be used for manufacturing robots, in which case the movement of their robot arm is determined, for example, from their actual position and their target position, or for route planning by means of a navigation system.
Fig. 1 shows a schematic view of a robot 11, in this embodiment the robot 11 being given by an at least partially autonomous vehicle 10. In another embodiment, the robot may be a maintenance robot, an installation robot or a fixed production robot, alternatively an autonomous flying object, such as a drone.
Fig. 2 shows a schematic diagram of a machine learning system 20, which is represented here by a deep neural network. The machine learning system 20 obtains the map or a part of the map as input variables, which is processed in a first part 21 of the deep neural network by means of a plurality of series-connected convolutional layers (english: volume layer).
In the second part 22 of the neural network, the output variables of the first part of the neural network are used in succession, wherein the parts of the output variables of the first part of the neural network which respectively surround the actual position of the robot are provided as input variables of the second part 22 of the neural network.
The second part of the neural network has two different (signal-running) paths, each formed by a fully connected (english: fully connected) layer. At the exit of a path, for each robotiAll existence probability
Figure 880566DEST_PATH_IMAGE001
And there is a future cost at the exit of the other path
Figure 797706DEST_PATH_IMAGE002
Probability of
Figure 46285DEST_PATH_IMAGE003
It is characterized that the ith robot 11 performs one of a plurality of possible actions A starting from its actual position s
Figure 300549DEST_PATH_IMAGE004
The probability of (c). Future cost
Figure 114921DEST_PATH_IMAGE005
The costs that each robot 11 must spend to reach the respectively predefinable target position Z starting from the respectively preselected adjacent position are characterized.
In the case of observation of only one robot, accordingly only the part surrounding the actual position of the one robot is provided as input variables to the second part of the neural network.
As shown in fig. 2, the convolutional layers may have 64 different filters, respectively, each having a dimension of 3 × 3. The neural network may have a bridge connection that provides an output variable of one of the layers of the first portion of the neural network or an input variable (map) of the machine learning system by skipping at least one of the at least one subsequent convolutional layer.
It is also conceivable that the path is split into two paths in the first part 21 of the neural network and that the first part 21 of the neural network outputs two output variables. The paths of the second part 22 of the neural network are then assigned to the paths of the first part 21, respectively, and a part of the output variables of the respective paths of the first part 21 of the neural network is obtained as input variables, respectively.
In an alternative embodiment of the machine learning system 20, the machine learning system may also be arranged to determine only two output variables of the second part of the neural network
Figure 406225DEST_PATH_IMAGE006
One of them. For example one of the two paths in which the second part of the neural network is disabled.
Fig. 3 shows a schematic illustration of a method 30 for determining an action a or a trajectory T, which is performed, for example, by the trajectory planning system 10.
The method starts at step S31. In this step, a map is provided as an input variable to the machine learning system 20 of fig. 2. The machine learning system 20 determines its output variables from the provided map
Figure 673258DEST_PATH_IMAGE007
. In step S31, the machine learning system 20 is optionally taught with training data, and the map is then provided as an input variable to the taught machine learning system 20. It should be noted that the teachings of machine learning system 20 are explained in more detail in fig. 4 below.
In a next step S32, a path, in particular a trajectory, from a specifiable starting position (in particular an actual position of the robot) to a specifiable target position is determined by means of a search algorithm, advantageously by means of a focus a search algorithm. Here, the search algorithm decides which subsequent positions the robot should optimally take based on at least one of the output variables of the machine learning system determined from step S31. In this process, one of the output variables of the machine learning system is used as a heuristic for deciding the subsequent position, respectively. This step is explained in detail in fig. 5. Determining a path from the starting location to the target location from the determined subsequent locations. A motion sequence can then be determined from the path, so that when the robot executes the motion sequence, the robot moves along the determined path to the specifiable target position. From the determined path or the sequence of actions, a trajectory T of the robot may be determined.
After completion of step S32, step S33 is performed. Here, the robot 11 is operated according to the motion or trajectory T determined from step S32. Preferably, the robot is manipulated with only the first motion or the first part of the trajectory start, and then step S32 is re-executed, so as to be able to react to a changing environment if necessary.
Fig. 4 shows a schematic diagram of a method 40 for teaching the machine learning system 20.
The method begins with generating training data in step S41. Several examples of problems are provided for this purpose. The problem examples can be, for example, different maps of different environments in which the robot is to be moved from a specifiable starting position to a specifiable target position. The movement of the robot should be optimal in terms of a predeterminable cost criterion. The cost criteria may be, for example, time, energy consumption and/or distance covered.
Next, in step S42, the best paths from the respective starting positions to the associated target positions are determined with respect to the cost criterion by means of a best planner, for example an a-search algorithm. Advantageously, problem instances are discarded in which the best planner cannot find any path.
Position-action pairs are formed from the best path from step S42, their relationship to the respective assigned environment (of the respective assigned map portion) being learned by the machine learning system 20. That is to say, the machine learning system learns rules (policies, english) in order to be able to decide at which location s and given the environment of the location s by means of at least one map section to decide that the location s should be optimally selectedWhich action of the robot a. Cost function
Figure 668896DEST_PATH_IMAGE008
It can be derived from the cost that the robot has to spend to reach the target position along the remaining best path starting from its actual position.
Applying position-action pairs and/or cost functions
Figure 931250DEST_PATH_IMAGE008
And the respectively associated problem instances are combined into training data.
In a subsequent step S43, the machine learning system 20 is taught using the training data from step S42. The machine learning system 20 obtains a map of the problem instance as input variables and is taught such that it determines output variables from its input variables and the actual position of the robot and the starting and target positions of the robot
Figure 924614DEST_PATH_IMAGE009
And
Figure 678943DEST_PATH_IMAGE010
. For this teaching, the position-action pairs and/or costs derived from the trajectory determined in step S42 are used. In the teaching, parameters are set in the machine learning system 20 such that output variables of the machine learning system match corresponding ideal output variables from the training data. The parameter changes required for this purpose can be determined by means of a gradient descent method using a difference function (loss function) between the output variables of the machine learning system and the output variables of the training data. To teach output variables
Figure 150376DEST_PATH_IMAGE009
Preferably, cross entropy is used as a difference function, and to teach output variables
Figure 673761DEST_PATH_IMAGE010
Preferably use
Figure 900343DEST_PATH_IMAGE011
Norm as a function of difference.
In optional step 44, the machine learning system 20 is re-taught, for example, if a new problem instance has been defined or the output variables of the machine learning system are not sufficiently accurate after step S42 is complete.
Fig. 5 shows a schematic illustration of a method 50 for determining an action a or a trajectory T, in particular with a search algorithm.
The method starts in step S51. The actual position s of the robot is determined in this casetFor example, taking measurements or reading from a provided map.
Next, in step S52, the actual position S is settInto an open list (english: open list), in particular as used in the a-search algorithm.
In step S53, a total cost is determined for all adjacent positions entered in the open list, respectively, preferably according to an a x-search algorithmf(n). Total cost off(n)Can be made up of a first cost and a second costg (n), h (n) are. First costg(n)It is characterized in that the robot 11 passes the actual position s from the predefinable starting positiontThe cost already spent reaching the respective adjacent positions. For this purpose, costs are preferably assigned to each preceding movement of the robot from the predefinable starting position to the respective adjacent position, which costs add up to a first costg(n). Second costh (n)The costs that the robot must spend to reach the specifiable target position from the respective adjacent position are characterized. Second costh(n)Preferably by means of the euclidean distances from the respective adjacent position to the predeterminable target position. Alternatively, the second cost may be determined by means of a further heuristic which has to obey the following conditions: this condition underestimates the fact that the target position can be predefined from the respective adjacent positionThe cost is high.
After the total cost has been determined for all adjacent positions from the list, respectively, the lowest total cost minf (n) is determined.
In a following step S54, all neighboring positions (focal group, English) having a total cost less than the minimum total cost minf (n) multiplied by a factor are selected from the open list
Figure 141969DEST_PATH_IMAGE012
. Factor(s)
Figure 417092DEST_PATH_IMAGE012
Preferably greater than 1.
Next, in step S55, other variables are determined for each adjacent position (focal set, English) selected from step S54. Preferably, as in the case of the a-focus search algorithm, other heuristics are usedh F The other variables are determined, for which reference is made to the documents cited above.
There are two ways to establish the other elicitationsh F The possibility of (2): as a first possibility, a cost function determined by means of the machine learning system 20 can be used
Figure 654039DEST_PATH_IMAGE013
As a further heuristich F
Additionally or alternatively, path probability P is used as a further heuristich F . The path probability P characterizes the robot 11 starting from the predefinable starting position, in particular along the previous position, via the actual position stProbability of moving to respective preselected adjacent locations.
The path probability may be defined by the following equation:
Figure 989205DEST_PATH_IMAGE014
wherein
Figure 452547DEST_PATH_IMAGE015
Is the probability of occurrence, characterized by the probability of the robot passing the kth position on its path with a total of T actions,
Figure 531362DEST_PATH_IMAGE016
is that the robot performs an action at the k-th position
Figure 29339DEST_PATH_IMAGE017
The probability of the feature.
Can be determined by means of probabilities determined by the machine learning system 20
Figure 394462DEST_PATH_IMAGE018
To express equation (1):
Figure 345100DEST_PATH_IMAGE019
to avoid distortion at the beginning of the path, equation (2) can be rewritten:
Figure 24343DEST_PATH_IMAGE020
wherein|A|Illustrating the number of possible actions that the robot can perform,t k the actual time points are illustrated.
By means of other heuristicsh F After determining the other variables for each selected neighbor location, the neighbor location that has been assigned the smallest other variable is selected. This adjacent position with the smallest further variables is then the subsequent position which the robot should manipulate in order to reach the predeterminable target position optimally with respect to the cost criterion. This neighbouring position is then accommodated in a closed list (english: closed list), in particular the closed list used by the a-search algorithm, and removed from the open list.
In a further embodiment, steps S52 to S55 are repeated a plurality of times until the predefinable target position is entered into the closed list and deleted from the open list.
It should be noted that other paths may be determined after determining the path, wherein the method 50 just described is performed identically for this purpose, but factors will be predeterminable
Figure 376827DEST_PATH_IMAGE021
Reducing the predeterminable value
Figure 319375DEST_PATH_IMAGE022
Fig. 6 shows a schematic diagram of an apparatus 60 for teaching the machine learning system 20, in particular for performing the steps of teaching according to the method 40 of fig. 4. The apparatus 60 includes an optimal planner 61, the machine learning system 20, and a difference module 62. The difference module 62 is arranged to determine the output variables of the optimal planner 61 from the determined output variables by means of the difference function
Figure 226151DEST_PATH_IMAGE023
And an output variable y of the machine learning system 20, and determining a parameter of the machine learning system 20 from the difference
Figure 646768DEST_PATH_IMAGE024
Variations of (2)
Figure 978393DEST_PATH_IMAGE024
'. Parameter(s)
Figure 91842DEST_PATH_IMAGE024
Is stored in database P and varies according to the determination of difference module 62
Figure 17073DEST_PATH_IMAGE024
' adapted.
The device 60 may have a machine-readable storage element (65) on which the method 40 is stored and a computing unit 64 for executing the method 40.

Claims (11)

1. For determining the actual position(s) of the robot (11)t) Method of determining an action (a) of the robot (11),
wherein from the actual position(s)t) A subsequent position is selected from a plurality of preselected adjacent positions (focal group), wherein each adjacent position is assigned a first variable of (h F ),
Wherein the subsequent position(s)t) Is one of the preselected adjacent locations that is assigned the smallest first variable relative to the other preselected adjacent locations (bh F ),
Wherein the first variable (h F ) Respectively characterizing whether the robot (11) is moving from the actual position(s)t) First probabilities of moving to respective preselected adjacent locations: (P),
Wherein the machine learning system (20) is arranged to output a plurality of second probabilities
Figure DEST_PATH_IMAGE001
) Each second probability characterizes the robot (11) from the actual position(s) as an output variablet) Setting out to perform one of a plurality of possible actions (A) respectively: (A)
Figure 586132DEST_PATH_IMAGE002
) The probability of (a) of (b) being,
wherein the machine learning system (20) determines the output variable,
wherein the first variable is determined from at least one of the output variables of the machine learning system (20) ((h F ) And assigning said first variables to respective preselected adjacent locations,
wherein an action (a) is selected from the plurality of possible actions (A) such that when the robot (11) performs the selected action (a), the actual position(s) is determinedt) Go out directlyThe subsequent position is reached.
2. Method according to claim 1, wherein for said actual position(s)t) Determines a total cost of each possible adjacent location of (c) < 2 >f(n)) And the total costs are distributed to respective adjacent locations,
wherein the neighbouring positions are entered into a first list (open list),
wherein according to the first cost (g(n)) And a second cost of (h(n)) To determine the total cost: (f(n)),
Wherein the first cost: (g(n)) Respectively characterizing the costs which have to be expended to reach the respective adjacent position from a predeterminable starting position of the robot (11), and the second cost: (b:)h(n)) Respectively characterizing the costs which have to be expended in order to reach a predeterminable target position (Z) of the robot (11) from the respective adjacent position,
wherein the second cost is estimated (h(n)) Such that the second cost is always lower than the actual cost of reaching the target position (Z) from the respective adjacent position,
wherein the plurality of preselected neighboring locations (focal groups) comprises the following neighboring locations of the first list (open list) for the total cost of (a)f(n)) Below a determined minimum total cost (min)f(n)) Multiplied by a predeterminable factor (
Figure DEST_PATH_IMAGE003
)。
3. The method according to claim 1 or 2, wherein the machine learning system (20) is arranged to output the output variable according to at least one provided portion of a map of the environment of the robot (11),
wherein the machine learning system (20) determines the output variable from the map portion,
wherein according to a plurality of second probabilities (
Figure 237693DEST_PATH_IMAGE001
) At least the following second probability to determine the first probability: (P) Said second probability characterizing whether said robot is moving from said actual position(s)t) Starting to perform the action that the robot has to perform, such that the robot moves from the actual position(s)t) Starting directly to the respective adjacent position.
4. Method according to any one of the preceding claims, wherein, after selection of the subsequent location, the subsequent location is entered into a second list (closed list) and the actual location is entered(s) ((st) Is set equal to the subsequent position and,
wherein the method is repeatedly performed a plurality of times until the actual position(s)t) Corresponding to the predefinable target position (Z),
wherein at the start of the method the actual position(s)t) Corresponding to the predefinable starting position,
wherein neighbouring positions of previous actual positions remain entered in the first list, in particular only neighbouring positions that have been selected as subsequent positions are deleted from the first list,
wherein adjacent positions of the first list are each assigned a previous actual position from which the respective adjacent position can be directly reached,
wherein the actual position(s) is not the actual position(s) when the selected subsequent position is the adjacent positiont) Is determined, the actual position(s) is determinedt) Is set equal to the actual position assigned to the adjacent position,
wherein the actions which have to be performed directly consecutively are combined into a sequence of actions, such that the robot reaches the predeterminable target position along a position from the second list,
wherein a trajectory (T) of the robot is determined from the sequence of actions.
5. Method according to claim 4 and claim 2, wherein the predeterminable factor(s) is/are determined after the trajectory has been determined
Figure 486272DEST_PATH_IMAGE003
) Decreasing the predeterminable value (
Figure 881481DEST_PATH_IMAGE004
And wherein the method is re-executed to determine further trajectories.
6. The method according to any of the preceding claims, wherein the action (a) is determined separately for a plurality of robots (11),
wherein the machine learning system (20) is a deep neural network which obtains as input variables a map with all actual positions of the robot (11),
wherein, after a predeterminable layer of the deep neural network, the part of the map surrounding each actual position of the robot (11) is used as an input variable for a subsequent layer, in particular a directly subsequent layer, of the predeterminable layer.
7. Method according to one of the preceding claims, characterized in that the training data are generated by means of an optimal planner, in particular an A-search algorithm, for determining a trajectory from the starting position and the target position, which optimal planner is applied to the predeterminable problem instance,
wherein the machine learning system (20) is taught according to the generated training data such that the machine learning system determines the best planner's decisions from the actual location and at least the portion of the map and outputs as output variables.
8. According to the foregoingMethod according to one of the claims, wherein the first variables each characterize the future costs (Z) that the robot (11) must spend to arrive at the predefinable target position (Z) starting from a respective preselected adjacent position: (
Figure DEST_PATH_IMAGE005
),
Wherein the machine learning system (20) is arranged to output the future costs (A, B, C, respectively)
Figure 617225DEST_PATH_IMAGE005
) As an output variable, and
wherein the output variable of the machine learning system (20) is taken as a first variable (h F ) Are assigned to respective pre-selected adjacent positions,
wherein the machine learning system is taught according to the generated training data such that it estimates a future cost along the trajectory determined by means of the best planner from the actual position up to the predeterminable target position.
9. A computer program comprising instructions arranged, when executed by a computer, to cause the computer to perform the method according to any preceding claim.
10. A machine readable storage element (102, 65) having stored thereon a computer program according to claim 9.
11. An apparatus (10, 60) arranged to perform the method according to any of the preceding claims 1 to 8.
CN202010076272.5A 2019-01-28 2020-01-23 Method, apparatus and computer program for determining a motion or trajectory of a robot Pending CN111546327A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102019201045.8A DE102019201045B4 (en) 2019-01-28 2019-01-28 Method, device and computer program for determining an action or trajectory of a robot
DE102019201045.8 2019-01-28

Publications (1)

Publication Number Publication Date
CN111546327A true CN111546327A (en) 2020-08-18

Family

ID=71524198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010076272.5A Pending CN111546327A (en) 2019-01-28 2020-01-23 Method, apparatus and computer program for determining a motion or trajectory of a robot

Country Status (2)

Country Link
CN (1) CN111546327A (en)
DE (1) DE102019201045B4 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113561175A (en) * 2021-07-16 2021-10-29 珠海格力智能装备有限公司 Path planning method and device of mechanical arm, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8204623B1 (en) * 2009-02-13 2012-06-19 Hrl Laboratories, Llc Planning approach for obstacle avoidance in complex environment using articulated redundant robot arm
CN104010774A (en) * 2011-09-15 2014-08-27 康富真信息技术股份有限公司 System and method for the automatic generation of robot programs
CN106457565A (en) * 2014-06-03 2017-02-22 阿蒂迈兹机器人技术有限公司 Method and system for programming a robot
CN108292139A (en) * 2015-12-02 2018-07-17 高通股份有限公司 Map is carried out at the same time by robot to draw and plan
WO2018143003A1 (en) * 2017-01-31 2018-08-09 株式会社安川電機 Robot path-generating device and robot system
DE202018104373U1 (en) * 2018-07-30 2018-08-30 Robert Bosch Gmbh Apparatus adapted to operate a machine learning system
CN109146082A (en) * 2017-06-27 2019-01-04 发那科株式会社 Machine learning device, robot control system and machine learning method
US20190025841A1 (en) * 2017-07-21 2019-01-24 Uber Technologies, Inc. Machine Learning for Predicting Locations of Objects Perceived by Autonomous Vehicles

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11256982B2 (en) * 2014-07-18 2022-02-22 University Of Southern California Noise-enhanced convolutional neural networks
US9645577B1 (en) * 2016-03-23 2017-05-09 nuTonomy Inc. Facilitating vehicle driving and self-driving
DE112017004414T5 (en) * 2016-09-02 2019-05-16 Groove X, Inc. AUTONOMOUS ROBOT, SERVER AND BEHAVIOR CONTROL PROGRAM
DE202017106506U1 (en) * 2016-11-15 2018-04-03 Google Llc Device for deep machine learning to robot grip
DE102017217412A1 (en) * 2017-09-29 2019-04-04 Robert Bosch Gmbh Method, apparatus and computer program for operating a robot control system
DE102017223717B4 (en) * 2017-12-22 2019-07-18 Robert Bosch Gmbh Method for operating a robot in a multi-agent system, robot and multi-agent system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8204623B1 (en) * 2009-02-13 2012-06-19 Hrl Laboratories, Llc Planning approach for obstacle avoidance in complex environment using articulated redundant robot arm
CN104010774A (en) * 2011-09-15 2014-08-27 康富真信息技术股份有限公司 System and method for the automatic generation of robot programs
CN106457565A (en) * 2014-06-03 2017-02-22 阿蒂迈兹机器人技术有限公司 Method and system for programming a robot
CN108292139A (en) * 2015-12-02 2018-07-17 高通股份有限公司 Map is carried out at the same time by robot to draw and plan
WO2018143003A1 (en) * 2017-01-31 2018-08-09 株式会社安川電機 Robot path-generating device and robot system
CN109146082A (en) * 2017-06-27 2019-01-04 发那科株式会社 Machine learning device, robot control system and machine learning method
US20190025841A1 (en) * 2017-07-21 2019-01-24 Uber Technologies, Inc. Machine Learning for Predicting Locations of Objects Perceived by Autonomous Vehicles
DE202018104373U1 (en) * 2018-07-30 2018-08-30 Robert Bosch Gmbh Apparatus adapted to operate a machine learning system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113561175A (en) * 2021-07-16 2021-10-29 珠海格力智能装备有限公司 Path planning method and device of mechanical arm, computer equipment and storage medium

Also Published As

Publication number Publication date
DE102019201045B4 (en) 2020-11-26
DE102019201045A1 (en) 2020-07-30

Similar Documents

Publication Publication Date Title
Liu et al. A lifelong learning approach to mobile robot navigation
Bency et al. Neural path planning: Fixed time, near-optimal path generation via oracle imitation
US11092965B2 (en) Method and device for driving dynamics control for a transportation vehicle
CN109597425B (en) Unmanned aerial vehicle navigation and obstacle avoidance method based on reinforcement learning
CN109940614B (en) Mechanical arm multi-scene rapid motion planning method integrating memory mechanism
CN114371711B (en) Robot formation obstacle avoidance path planning method
KR20240004350A (en) Method and system for robot navigation in unknown environments
CN114460943A (en) Self-adaptive target navigation method and system for service robot
Dharmasiri et al. Novel implementation of multiple automated ground vehicles traffic real time control algorithm for warehouse operations: djikstra approach
CN116460843A (en) Multi-robot collaborative grabbing method and system based on meta heuristic algorithm
CN109764876B (en) Multi-mode fusion positioning method of unmanned platform
CN111546327A (en) Method, apparatus and computer program for determining a motion or trajectory of a robot
Han et al. Path regeneration decisions in a dynamic environment
Qiu Multi-agent navigation based on deep reinforcement learning and traditional pathfinding algorithm
Siddarth et al. Path planning for mobile robots using deep learning architectures
CN111984000A (en) Method and device for automatically influencing an actuator
Chekroun et al. MBAPPE: MCTS-Built-Around Prediction for Planning Explicitly
Bansal et al. Control and safety of autonomous vehicles with learning-enabled components
Cody et al. Applying learning systems theory to model cognitive unmanned aerial vehicles
CN115857323A (en) Apparatus and method for controlling agent
CN115081612A (en) Apparatus and method to improve robot strategy learning
CN115019275A (en) Heuristic determination and model training methods, electronic device, and computer storage medium
Xie et al. Learning with stochastic guidance for navigation
JP2018147103A (en) Model learning device, controlled variable calculation device, and program
Bahrpeyma et al. Application of Reinforcement Learning to UR10 Positioning for Prioritized Multi-Step Inspection in NVIDIA Omniverse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination