CN115178944A - Narrow space robot operation planning method for safety reinforcement learning - Google Patents
Narrow space robot operation planning method for safety reinforcement learning Download PDFInfo
- Publication number
- CN115178944A CN115178944A CN202210930544.2A CN202210930544A CN115178944A CN 115178944 A CN115178944 A CN 115178944A CN 202210930544 A CN202210930544 A CN 202210930544A CN 115178944 A CN115178944 A CN 115178944A
- Authority
- CN
- China
- Prior art keywords
- action
- mechanical arm
- acceleration
- reinforcement learning
- motion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 41
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000009471 action Effects 0.000 claims abstract description 69
- 230000001133 acceleration Effects 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 25
- 230000000875 corresponding effect Effects 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 13
- 238000003466 welding Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 8
- 230000006978 adaptation Effects 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 3
- 238000005728 strengthening Methods 0.000 abstract 1
- 230000036461 convulsion Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 235000004257 Cordia myxa Nutrition 0.000 description 1
- 244000157795 Cordia myxa Species 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B23—MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
- B23K—SOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
- B23K37/00—Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
- B23K37/02—Carriages for supporting the welding or cutting element
- B23K37/0211—Carriages for supporting the welding or cutting element travelling on a guide member, e.g. rail, track
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B23—MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
- B23K—SOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
- B23K37/00—Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
- B23K37/02—Carriages for supporting the welding or cutting element
- B23K37/0252—Steering means
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
- B25J9/1666—Avoiding collision or forbidden zones
Landscapes
- Engineering & Computer Science (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Physics & Mathematics (AREA)
- Optics & Photonics (AREA)
- Manipulator (AREA)
- Feedback Control In General (AREA)
- Numerical Control (AREA)
Abstract
The invention discloses a narrow space robot operation planning method for safety reinforcement learning, which comprises the following steps: setting a planning task and a target point before the movement of the mechanical arm; calculating expected acceleration and calculating braking acceleration according to the current state information of the mechanical arm and relevant kinematic constraints; testing the expected acceleration of the joint, if the mechanical arm does not collide and violate the kinematic constraint of the joint after the action is executed, the expected acceleration is feasible, and the expected acceleration is executed as a substitute action; otherwise, executing the calculated braking acceleration as a substitute action; the feasible motion space of the mechanical arm is formed by the alternative motion of each joint of the mechanical arm; and planning a motion track for the mechanical arm in the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy. The invention combines the thought of replacing actions, redesigns the action space for strengthening the learning and training and further ensures the safety of the planning result.
Description
Technical Field
The invention relates to the field of robot operation planning research, in particular to a narrow space robot operation planning method for safety reinforcement learning.
Background
The robot is required to move from the current position to a given position in an autonomous and fast and collision-free manner in an environment constrained by obstacles. By giving a starting position and an end position, a path meeting certain constraints is found in the working space of the robot, such as no collision, the requirement of kinematic conditions and the like are met, and the path is also as short as possible. When planning a path, firstly, a model is built for an obstacle space, the robot is placed in the space, and a traditional planning algorithm such as a genetic algorithm, an artificial potential field method or a deep reinforcement learning algorithm is used for training. The planning computation complexity of the algorithms under the high-dimensional condition also grows exponentially, and the real-time planning is often difficult to realize. Safety reinforcement learning is used as a derivative method of reinforcement learning, and safety constraints are observed in both a learning stage and a deployment process. Under the condition that the environment is composed of a controllable robot and a static obstacle and the shape and the position of each object are known, constraints such as collision and relevant kinematic limits are considered in the training process, the concept of replacing safety behaviors is applied to reinforcement learning, the feasibility of a planning result is greatly improved, and the method is suitable for a high-dimensional robot system.
Disclosure of Invention
The invention aims to provide a narrow space robot operation planning method for safety reinforcement learning, which is used for further improving the safety of a planning result.
In order to realize the task, the invention adopts the following technical scheme:
a narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
setting a planning task and a target point before the movement of the mechanical arm;
calculating expected acceleration a according to the current state information of the mechanical arm and relevant kinematic constraints t+1N While calculating the braking acceleration a t+1B Thereby constructing a feasible action space of the mechanical arm, comprising the following steps:
defining kinematic constraints for the joints;
detecting the minimum distance between the robot and the obstacle and between the robot arm connecting rods of the robot at discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safety distance threshold value, determining that the collision occurs;
acquiring state information of the mechanical arm through a built-in sensor in a pybull environment;
establishing a neural network as a motion prediction network for predicting the motion at the next moment, inputting the state information of the joints into the motion prediction network, and predicting the motion scalar m corresponding to each joint t+1 ∈[-1,1]Then represented by the formula a t+1N =a t+1min +(1+m t+1 )/2·(a t+1max -a t+1min ) Obtaining the desired acceleration a of the joint t+1N Wherein a is t+1min 、a t+1max Respectively the minimum and maximum safe acceleration of the joint; knowing the expected acceleration, the velocity and position of the joint at the next time t +1 can be obtained;
calculating the braking acceleration: joint velocity v corresponding to current time t t >At 0 time, take m' t+1 =2*m t+1 -1, otherwise take m' t+1 =2(1-m t+1 ) -1, m' t+1 Brings into a t+1B =a t+1min +(1+m’ t+1 )/2·(a t+1max -a t+1min ) Calculating to obtain braking acceleration;
a desired acceleration of the joint t+1N Testing is carried out, if the mechanical arm does not collide after the action is executed and does not violate the kinematic constraint of the defined joint, the expected acceleration a t+1N It is feasible that an acceleration a will be desired t+1N Performing as an alternative action; otherwise the calculated braking acceleration a t+1B Performing as an alternative action; expected acceleration a calculated for each joint t+1N After braking is performed; starting from the state information corresponding to the current moment t, if no collision occurs after the corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
the feasible action space of the mechanical arm is formed by the alternative action of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy.
Further, the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
Further, the state information includes a position, a velocity, an acceleration, and a distance from the obstacle of each joint.
Further, in order to prevent oscillation phenomenon during movement, a is taken t+1max =m’ t+1 *(a t+1max -a t+1min ),a t+1min =a t+1min +(1-m’ t+1 )*(a t+1max -a t+1min )。
Further, the deep reinforcement learning algorithm includes:
setting an Actor network and a Critic network as reinforcement learning networks, wherein the loss function used by the Actor updating adopts a loss function of a self-adaptive KL penalty coefficient, the Critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
training path planning under the action space;
and setting a training ending condition, and regarding the condition as planning when the tail end of the mechanical arm continuously reaches a preset target point for multiple times. The training is successfully stopped.
Further, the input quantity of the deep reinforcement learning algorithm is the state information s of the mechanical arm t And setting an Actor network and a Critic network for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1]。
Further, the path planning training is performed in the motion space, so that the expected motion of the mechanical arm can be obtained, namely the motion in the motion space, which maximizes the Q value, i.e. the motion function value in reinforcement learning, which represents the expectation of the robot selecting the motion and then reaching the final state reward sum; and executing each action to obtain a corresponding reward value, and when the reward is stably converged, considering that the planning is successful and the training is stopped, wherein the strategy obtained through the training is the optimal strategy.
Further, the reward function for deep reinforcement learning is R = R target -R action -R adaptation -R distance Comprising four terms, a first term R target A reward item for the distance from the tail end of the mechanical arm to the target point, wherein the reward item is used for training the mechanical arm to approach the target point; second term R action Action punishment item is used for avoiding the action from being too close to the limit value; third term R adaptation The brake penalty item is 1 when the action is collided and the braking action is executed, otherwise, the brake penalty item is 0; fourth term R distance And for the distance penalty item, after the alternative action is executed, the distance between the rods of the mechanical arm and the distance between each rod and the obstacle are smaller than a certain threshold value, the penalty item 1 is applied, and otherwise, the penalty item is 0.
Compared with the prior art, the invention has the following technical characteristics:
the method comprises the steps of calculating the track of the current time interval according to the current motion state and network prediction of the robot, and predicting motion if the predicted track conforms to all safety constraints; otherwise, the braking trajectory calculated in the previous time interval is taken as a substitute safety action, and a feasible action space is obtained. The design of the reinforcement learning motion space ensures that all predicted trajectories meet the kinematic joint limits. Compared with the existing deep reinforcement learning algorithm, the method combines the thought of replacing actions, redesigns the action space for reinforcement learning training, and further ensures the safety of the planning result.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a diagram of planning results of an embodiment of the present invention, using three algorithms to train the resulting reward curves; wherein OURS is the method, PPO and DDPG algorithms are both the existing methods; as can be seen from the figure, the result of training by using the method of the invention can be converged more quickly and is more stable;
FIG. 3 is a schematic flow chart of the method of the present invention.
Detailed Description
The technical scheme aims to perform path planning in a narrow welding seam space in an industrial welding scene, and the adopted robot is provided with a six-degree-of-freedom industrial mechanical arm with a welding gun for welding operation.
Referring to the attached drawings, the narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
In the training process, state information of the mechanical arm is obtained from the simulated welding environment, and the state information comprises kinematic information of a mechanical arm joint and a position relation with an obstacle. The state information can be obtained by interaction between the robot and the simulated welding environment in the training process. Wherein the kinematic information includes a position, a velocity, an acceleration, and a jerk velocity of a joint of the mechanical arm.
Step 2, calculating expected acceleration a according to the current state information of the mechanical arm and relevant kinematic constraints t+1N While calculating the braking acceleration a t+1B (ii) a The method comprises the following steps:
and 2.1, defining kinematic constraints of the joint, wherein the kinematic constraints comprise the constraints of the position, the speed, the acceleration and the jerk speed of the joint, and the constraints comprise the maximum value and the minimum value of the position, the speed, the acceleration and the jerk speed.
Step 2.2, in order to avoid collision, defining an obstacle-rod piece pair and a rod piece-rod piece pair, detecting a minimum distance d between the robot and the obstacle and between the robot and a mechanical arm connecting rod of the robot at discrete time points to determine the collision condition, and if the minimum distance d is smaller than a preset safety distance threshold value d S Then the collision is deemed to have occurred.
And 2.3, simulating in a pybull environment, and acquiring the state information of the mechanical arm through a sensor built in the pybull environment, wherein the state information comprises the position, the speed, the acceleration and the distance between each joint and an obstacle.
Establishing a neural network as a motion prediction network for predicting the motion at the next moment, wherein SELU is used as an activation function for hidden layers, the size of the hidden layer at the first layer is 256, and the hidden layer at the second layer is hiddenThe hidden layer size is 128, the state information of the joints is input into the motion prediction network, and the motion scalar m corresponding to each joint is predicted t+1 ∈[-1,1]Then represented by the formula a t+1N =a t+1min +(1+m t+1 )/2·(a t+1max -a t+1min ) Obtaining the desired acceleration a of the joint t+1N Wherein a is t+1min 、a t+1max Respectively corresponding to the minimum and maximum safe acceleration of the joint; knowing the acceleration, the velocity and position of the joint at the next time t +1 can be determined.
Step 2.4, calculating braking acceleration; joint velocity v corresponding to current time t t >0 time, take m' t+1 =2*m t+1 -1, else take m' t+1 =2(1-m t+1 ) -1, m' t+1 Brings into a t+1B =a t+1min +(1+m’ t+1 )/2·(a t+1max -a t+1min ) Calculating to obtain braking acceleration; meanwhile, in order to prevent oscillation phenomenon in the movement process, a is taken t+1max =m’ t+1 *(a t+1max -a t+1min ),a t+1min =a t+1min +(1-m’ t+1 )*(a t+1max -a t+1min )。
Step 3, carrying out a on the expected acceleration of the joint t+1N A test is performed to expect an acceleration a if the robot arm has not collided and the kinematic constraints of the joint defined in step 2.1 have not been violated after the action has been performed t+1N It is feasible that an acceleration a will be desired t+1N Performing as an alternative action; otherwise the calculated braking acceleration a t+1B Performed as an alternative action.
To ensure that there is a safe and feasible motion at the next time t +1, the expected acceleration a calculated for each joint t+1N After braking is performed; and (3) starting from the state information corresponding to the current moment t, namely from the position, the speed and the acceleration of the joint of the mechanical arm at the current moment, if no collision occurs after the corresponding action is executed, the behavior is safe, otherwise, the movement is stopped, and the step 2 of predicting again is carried out.
And 4, through the steps 2 to 3, the alternative actions of each joint of the mechanical arm can be obtained, and the alternative actions form a feasible action space of the mechanical arm. And planning a motion track for the mechanical arm in the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy. In the reinforcement learning training process, an action is selected from the action space to be executed, and the quality of the action is reflected by an incentive value.
The deep reinforcement learning algorithm comprises the following steps:
and 4.1, setting an Actor network and a Critic network as reinforcement learning networks, wherein the loss function used by the Actor update adopts a loss function of a self-adaptive KL penalty coefficient, the Critic adopts TD-error update, the hidden layer uses swish as an activation function, and the output layer uses tanh as an activation function.
Step 4.2, training path planning is carried out in the action space;
and 4.3, setting a training ending condition, and taking the planning success to stop training when the tail end of the mechanical arm continuously reaches a preset target point for multiple times.
In this embodiment, the input quantity of the deep reinforcement learning algorithm is the state information s of the mechanical arm t The Actor network and Critic network are set for training. The network structure is 400 × 300 × 10 × 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1]。
And (3) carrying out path planning training in the redesigned action space, inputting the current joint position, speed and acceleration of the mechanical arm and the distance between the tail end of the mechanical arm and a target point into the reinforcement learning network to obtain the expected action of the mechanical arm, namely the action which maximizes the Q value in the action space and enables the mechanical arm to execute, wherein the Q value is an action function value in the reinforcement learning and is used for evaluating the value of the action, and represents the expectation of the reward sum of the final state after the intelligent agent selects the action. And executing each action to obtain a corresponding reward value, and when the reward is stably converged, considering that the planning is successful and the training is stopped, wherein the strategy obtained through the training is the optimal strategy.
The reward function for deep reinforcement learning is R = R target -R action -R adaptation -R distance Comprising four terms, a first term R target Reward items for the distance from the tail end of the mechanical arm to the target point so as to train the mechanical arm to approach the target point; second term R action Action penalty item, which is used to avoid action from being too close to the limit value; third term R adaptation The brake penalty item is 1 when the action is collided and the braking action is executed, otherwise, the brake penalty item is 0; fourth term R distance And for the distance penalty item, after the alternative action is executed, the distance between the rods of the mechanical arm and the distance between each rod and the obstacle are smaller than a certain threshold value, the penalty item 1 is applied, and otherwise, the penalty item is 0.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.
Claims (8)
1. A narrow space robot operation planning method for safety reinforcement learning is characterized by comprising the following steps:
setting a planning task and a target point before the movement of the mechanical arm;
calculating expected acceleration a according to the current state information of the mechanical arm and relevant kinematic constraints t+1N While calculating the braking acceleration a t+1B Thereby constructing a feasible action space of the mechanical arm, comprising the following steps:
defining kinematic constraints for the joints;
detecting the minimum distance between the robot and the obstacle and between the robot arm connecting rods of the robot at discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safety distance threshold value, determining that the collision occurs;
acquiring state information of the mechanical arm through a built-in sensor in a pybullet environment;
establishing a neural network as a motion prediction network for predicting the motion at the next moment, inputting the state information of the joints into the motion prediction network, and predicting the motion scalar m corresponding to each joint t+1 ∈[-1,1]Then represented by the formula a t+1N =a t+1min +(1+m t+1 )/2·(a t+1max -a t+1min ) Obtaining the desired acceleration a of the joint t+1N Wherein a is t+1min 、a t+1max Respectively the minimum and maximum safe acceleration of the joint; knowing the expected acceleration, the speed and position of the joint at the next time t +1 can be obtained;
calculating the braking acceleration: joint velocity v corresponding to current time t t >At 0 time, take m' t+1 =2*m t+1 -1, else take m' t+1 =2(1-m t+1 ) -1, m' t+1 Carry-in a t+1B =a t+1min +(1+m’ t+1 )/2·(a t+1max -a t+1min ) Calculating to obtain braking acceleration;
a desired acceleration of the joint t+1N Testing is carried out, if the mechanical arm does not collide after the action is executed and does not violate the kinematic constraint of the defined joint, the acceleration a is expected t+1N It is feasible that an acceleration a will be desired t+1N Performing as an alternative action; otherwise the calculated braking acceleration a t+1B Performing as an alternative action; expected acceleration a calculated for each joint t+1N After braking is performed; starting from the state information corresponding to the current moment t, if no collision occurs after the corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
the feasible motion space of the mechanical arm is formed by the alternative motion of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy.
2. The safe reinforcement learning narrow space robot operation planning method according to claim 1, wherein the target point is a welding start point, and the planning task is to plan a safe path so that the end of the mechanical arm moves to the welding start point.
3. The safety-enhanced learning narrow space robot operation planning method according to claim 1, wherein the state information includes a position, a velocity, an acceleration, and a distance to an obstacle of each joint.
4. The safe reinforcement learning narrow space robot operation planning method according to claim 1, wherein a is taken to prevent oscillation phenomenon during the movement process t+1max =m’ t+1 *(a t+1max -a t+1min ),a t+1min =a t+1min +(1-m’ t+1 )*(a t+1max -a t+1min )。
5. The safety-reinforcement-learning narrow-space robot work planning method according to claim 1, wherein the deep reinforcement learning algorithm comprises:
setting an Actor network and a Critic network as reinforcement learning networks, wherein the loss function used by the Actor updating adopts a loss function of a self-adaptive KL penalty coefficient, the Critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
training path planning under the action space;
and setting a training ending condition, and regarding the condition as planning when the tail end of the mechanical arm continuously reaches a preset target point for multiple times. The training is successfully stopped.
6. The narrow space robot operation planning method for safety reinforcement learning according to claim 1, wherein an input amount of the deep reinforcement learning algorithm is status information of a robot arm, and an Actor network and a Critic network are provided for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, hidden layers all use swish as an activation function, an output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
7. The narrow space robot operation planning method for safety reinforcement learning according to claim 1, wherein the path planning training is performed in the motion space, so as to obtain the desired motion of the robot arm, i.e. the motion in the motion space that maximizes the Q value, i.e. the motion function value in reinforcement learning, representing the expectation of the robot to the final state reward sum after selecting the motion; and executing each action to obtain a corresponding reward value, and when the reward is stably converged, considering that the planning is successful and the training is stopped, wherein the strategy obtained through the training is the optimal strategy.
8. The narrow space robot work planning method for safety reinforcement learning according to claim 1, wherein the reward function for deep reinforcement learning is R = R target -R action -R adaptation -R distance Comprising four terms, a first term R target Reward items for the distance from the tail end of the mechanical arm to the target point so as to train the mechanical arm to approach the target point; second term R action Action penalty item, which is used to avoid action from being too close to the limit value; third term R adaptation The brake penalty item is 1 when the action is collided and the braking action is executed, otherwise, the brake penalty item is 0; fourth term R distance And for the distance penalty item, after the alternative action is executed, the distance between the rods of the mechanical arm and the distance between each rod and the obstacle are smaller than a certain threshold value, the penalty item 1 is applied, and otherwise, the penalty item is 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210930544.2A CN115178944B (en) | 2022-08-04 | 2022-08-04 | Narrow space robot operation planning method for safety reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210930544.2A CN115178944B (en) | 2022-08-04 | 2022-08-04 | Narrow space robot operation planning method for safety reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115178944A true CN115178944A (en) | 2022-10-14 |
CN115178944B CN115178944B (en) | 2024-05-24 |
Family
ID=83520672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210930544.2A Active CN115178944B (en) | 2022-08-04 | 2022-08-04 | Narrow space robot operation planning method for safety reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115178944B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116551703A (en) * | 2023-07-12 | 2023-08-08 | 长春工业大学 | Motion planning method based on machine learning in complex environment |
CN116834018A (en) * | 2023-08-07 | 2023-10-03 | 南京云创大数据科技股份有限公司 | Training method and training device for multi-mechanical arm multi-target searching |
CN116900539A (en) * | 2023-09-14 | 2023-10-20 | 天津大学 | Multi-robot task planning method based on graph neural network and reinforcement learning |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106457565A (en) * | 2014-06-03 | 2017-02-22 | 阿蒂迈兹机器人技术有限公司 | Method and system for programming a robot |
CN110315258A (en) * | 2019-07-24 | 2019-10-11 | 广东工业大学 | A kind of welding method based on intensified learning and ant group algorithm |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110370317A (en) * | 2019-07-24 | 2019-10-25 | 广东工业大学 | Robot restorative procedure and device |
CN112454333A (en) * | 2020-11-26 | 2021-03-09 | 青岛理工大学 | Robot teaching system and method based on image segmentation and surface electromyogram signals |
CN113163332A (en) * | 2021-04-25 | 2021-07-23 | 北京邮电大学 | Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning |
US20220105629A1 (en) * | 2021-12-16 | 2022-04-07 | Venkat Natarajan | Failure rate estimation and reinforcement learning safety factor systems |
CN114708293A (en) * | 2022-03-22 | 2022-07-05 | 广东工业大学 | Robot motion estimation method based on deep learning point-line feature and IMU tight coupling |
-
2022
- 2022-08-04 CN CN202210930544.2A patent/CN115178944B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106457565A (en) * | 2014-06-03 | 2017-02-22 | 阿蒂迈兹机器人技术有限公司 | Method and system for programming a robot |
CN110315258A (en) * | 2019-07-24 | 2019-10-11 | 广东工业大学 | A kind of welding method based on intensified learning and ant group algorithm |
CN110370317A (en) * | 2019-07-24 | 2019-10-25 | 广东工业大学 | Robot restorative procedure and device |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN112454333A (en) * | 2020-11-26 | 2021-03-09 | 青岛理工大学 | Robot teaching system and method based on image segmentation and surface electromyogram signals |
CN113163332A (en) * | 2021-04-25 | 2021-07-23 | 北京邮电大学 | Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning |
US20220105629A1 (en) * | 2021-12-16 | 2022-04-07 | Venkat Natarajan | Failure rate estimation and reinforcement learning safety factor systems |
CN114708293A (en) * | 2022-03-22 | 2022-07-05 | 广东工业大学 | Robot motion estimation method based on deep learning point-line feature and IMU tight coupling |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116551703A (en) * | 2023-07-12 | 2023-08-08 | 长春工业大学 | Motion planning method based on machine learning in complex environment |
CN116551703B (en) * | 2023-07-12 | 2023-09-12 | 长春工业大学 | Motion planning method based on machine learning in complex environment |
CN116834018A (en) * | 2023-08-07 | 2023-10-03 | 南京云创大数据科技股份有限公司 | Training method and training device for multi-mechanical arm multi-target searching |
CN116900539A (en) * | 2023-09-14 | 2023-10-20 | 天津大学 | Multi-robot task planning method based on graph neural network and reinforcement learning |
CN116900539B (en) * | 2023-09-14 | 2023-12-19 | 天津大学 | Multi-robot task planning method based on graph neural network and reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN115178944B (en) | 2024-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115178944A (en) | Narrow space robot operation planning method for safety reinforcement learning | |
CN108621165B (en) | Optimal trajectory planning method for industrial robot dynamics performance in obstacle environment | |
CN114603564A (en) | Mechanical arm navigation obstacle avoidance method and system, computer equipment and storage medium | |
CN111856925B (en) | State trajectory-based confrontation type imitation learning method and device | |
CN112140101A (en) | Trajectory planning method, device and system | |
CN113687659B (en) | Optimal trajectory generation method and system based on digital twinning | |
CN115091469B (en) | Depth reinforcement learning mechanical arm motion planning method based on maximum entropy frame | |
CN110014428A (en) | A kind of sequential logic mission planning method based on intensified learning | |
CN113485323B (en) | Flexible formation method for cascading multiple mobile robots | |
CN115542733A (en) | Self-adaptive dynamic window method based on deep reinforcement learning | |
CN118201742A (en) | Multi-robot coordination using a graph neural network | |
CN117207186A (en) | Assembly line double-mechanical-arm collaborative grabbing method based on reinforcement learning | |
CN116551703B (en) | Motion planning method based on machine learning in complex environment | |
CN111984000A (en) | Method and device for automatically influencing an actuator | |
CN114055479A (en) | Dragging teaching spraying robot collision early warning method, medium and equipment | |
Paudel | Learning for robot decision making under distribution shift: A survey | |
CN115081612A (en) | Apparatus and method to improve robot strategy learning | |
KR20190088093A (en) | Learning method for robot | |
Chen et al. | Mitigating Imminent Collision for Multi-robot Navigation: A TTC-force Reward Shaping Approach | |
Jiang et al. | Motion sequence learning for robot walking based on pose optimization | |
Young et al. | Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies | |
Li et al. | Manipulator Motion Planning based on Actor-Critic Reinforcement Learning | |
KR102719462B1 (en) | Method, apparatus and computer program for forming kinematic model for actuation of articulated robot | |
TWI811156B (en) | Transition method of locomotion gait of robot | |
CN115648213A (en) | Mechanical arm autonomous assembling method and system suitable for unstructured environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |