CN115178944A - Narrow space robot operation planning method for safety reinforcement learning - Google Patents

Narrow space robot operation planning method for safety reinforcement learning Download PDF

Info

Publication number
CN115178944A
CN115178944A CN202210930544.2A CN202210930544A CN115178944A CN 115178944 A CN115178944 A CN 115178944A CN 202210930544 A CN202210930544 A CN 202210930544A CN 115178944 A CN115178944 A CN 115178944A
Authority
CN
China
Prior art keywords
action
mechanical arm
acceleration
reinforcement learning
motion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210930544.2A
Other languages
Chinese (zh)
Other versions
CN115178944B (en
Inventor
王涛
许银涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210930544.2A priority Critical patent/CN115178944B/en
Publication of CN115178944A publication Critical patent/CN115178944A/en
Application granted granted Critical
Publication of CN115178944B publication Critical patent/CN115178944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23KSOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
    • B23K37/00Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
    • B23K37/02Carriages for supporting the welding or cutting element
    • B23K37/0211Carriages for supporting the welding or cutting element travelling on a guide member, e.g. rail, track
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23KSOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
    • B23K37/00Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
    • B23K37/02Carriages for supporting the welding or cutting element
    • B23K37/0252Steering means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Physics & Mathematics (AREA)
  • Optics & Photonics (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)
  • Numerical Control (AREA)

Abstract

The invention discloses a narrow space robot operation planning method for safety reinforcement learning, which comprises the following steps: setting a planning task and a target point before the movement of the mechanical arm; calculating expected acceleration and calculating braking acceleration according to the current state information of the mechanical arm and relevant kinematic constraints; testing the expected acceleration of the joint, if the mechanical arm does not collide and violate the kinematic constraint of the joint after the action is executed, the expected acceleration is feasible, and the expected acceleration is executed as a substitute action; otherwise, executing the calculated braking acceleration as a substitute action; the feasible motion space of the mechanical arm is formed by the alternative motion of each joint of the mechanical arm; and planning a motion track for the mechanical arm in the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy. The invention combines the thought of replacing actions, redesigns the action space for strengthening the learning and training and further ensures the safety of the planning result.

Description

Narrow space robot operation planning method for safety reinforcement learning
Technical Field
The invention relates to the field of robot operation planning research, in particular to a narrow space robot operation planning method for safety reinforcement learning.
Background
The robot is required to move from the current position to a given position in an autonomous and fast and collision-free manner in an environment constrained by obstacles. By giving a starting position and an end position, a path meeting certain constraints is found in the working space of the robot, such as no collision, the requirement of kinematic conditions and the like are met, and the path is also as short as possible. When planning a path, firstly, a model is built for an obstacle space, the robot is placed in the space, and a traditional planning algorithm such as a genetic algorithm, an artificial potential field method or a deep reinforcement learning algorithm is used for training. The planning computation complexity of the algorithms under the high-dimensional condition also grows exponentially, and the real-time planning is often difficult to realize. Safety reinforcement learning is used as a derivative method of reinforcement learning, and safety constraints are observed in both a learning stage and a deployment process. Under the condition that the environment is composed of a controllable robot and a static obstacle and the shape and the position of each object are known, constraints such as collision and relevant kinematic limits are considered in the training process, the concept of replacing safety behaviors is applied to reinforcement learning, the feasibility of a planning result is greatly improved, and the method is suitable for a high-dimensional robot system.
Disclosure of Invention
The invention aims to provide a narrow space robot operation planning method for safety reinforcement learning, which is used for further improving the safety of a planning result.
In order to realize the task, the invention adopts the following technical scheme:
a narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
setting a planning task and a target point before the movement of the mechanical arm;
calculating expected acceleration a according to the current state information of the mechanical arm and relevant kinematic constraints t+1N While calculating the braking acceleration a t+1B Thereby constructing a feasible action space of the mechanical arm, comprising the following steps:
defining kinematic constraints for the joints;
detecting the minimum distance between the robot and the obstacle and between the robot arm connecting rods of the robot at discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safety distance threshold value, determining that the collision occurs;
acquiring state information of the mechanical arm through a built-in sensor in a pybull environment;
establishing a neural network as a motion prediction network for predicting the motion at the next moment, inputting the state information of the joints into the motion prediction network, and predicting the motion scalar m corresponding to each joint t+1 ∈[-1,1]Then represented by the formula a t+1N =a t+1min +(1+m t+1 )/2·(a t+1max -a t+1min ) Obtaining the desired acceleration a of the joint t+1N Wherein a is t+1min 、a t+1max Respectively the minimum and maximum safe acceleration of the joint; knowing the expected acceleration, the velocity and position of the joint at the next time t +1 can be obtained;
calculating the braking acceleration: joint velocity v corresponding to current time t t >At 0 time, take m' t+1 =2*m t+1 -1, otherwise take m' t+1 =2(1-m t+1 ) -1, m' t+1 Brings into a t+1B =a t+1min +(1+m’ t+1 )/2·(a t+1max -a t+1min ) Calculating to obtain braking acceleration;
a desired acceleration of the joint t+1N Testing is carried out, if the mechanical arm does not collide after the action is executed and does not violate the kinematic constraint of the defined joint, the expected acceleration a t+1N It is feasible that an acceleration a will be desired t+1N Performing as an alternative action; otherwise the calculated braking acceleration a t+1B Performing as an alternative action; expected acceleration a calculated for each joint t+1N After braking is performed; starting from the state information corresponding to the current moment t, if no collision occurs after the corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
the feasible action space of the mechanical arm is formed by the alternative action of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy.
Further, the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
Further, the state information includes a position, a velocity, an acceleration, and a distance from the obstacle of each joint.
Further, in order to prevent oscillation phenomenon during movement, a is taken t+1max =m’ t+1 *(a t+1max -a t+1min ),a t+1min =a t+1min +(1-m’ t+1 )*(a t+1max -a t+1min )。
Further, the deep reinforcement learning algorithm includes:
setting an Actor network and a Critic network as reinforcement learning networks, wherein the loss function used by the Actor updating adopts a loss function of a self-adaptive KL penalty coefficient, the Critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
training path planning under the action space;
and setting a training ending condition, and regarding the condition as planning when the tail end of the mechanical arm continuously reaches a preset target point for multiple times. The training is successfully stopped.
Further, the input quantity of the deep reinforcement learning algorithm is the state information s of the mechanical arm t And setting an Actor network and a Critic network for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1]。
Further, the path planning training is performed in the motion space, so that the expected motion of the mechanical arm can be obtained, namely the motion in the motion space, which maximizes the Q value, i.e. the motion function value in reinforcement learning, which represents the expectation of the robot selecting the motion and then reaching the final state reward sum; and executing each action to obtain a corresponding reward value, and when the reward is stably converged, considering that the planning is successful and the training is stopped, wherein the strategy obtained through the training is the optimal strategy.
Further, the reward function for deep reinforcement learning is R = R target -R action -R adaptation -R distance Comprising four terms, a first term R target A reward item for the distance from the tail end of the mechanical arm to the target point, wherein the reward item is used for training the mechanical arm to approach the target point; second term R action Action punishment item is used for avoiding the action from being too close to the limit value; third term R adaptation The brake penalty item is 1 when the action is collided and the braking action is executed, otherwise, the brake penalty item is 0; fourth term R distance And for the distance penalty item, after the alternative action is executed, the distance between the rods of the mechanical arm and the distance between each rod and the obstacle are smaller than a certain threshold value, the penalty item 1 is applied, and otherwise, the penalty item is 0.
Compared with the prior art, the invention has the following technical characteristics:
the method comprises the steps of calculating the track of the current time interval according to the current motion state and network prediction of the robot, and predicting motion if the predicted track conforms to all safety constraints; otherwise, the braking trajectory calculated in the previous time interval is taken as a substitute safety action, and a feasible action space is obtained. The design of the reinforcement learning motion space ensures that all predicted trajectories meet the kinematic joint limits. Compared with the existing deep reinforcement learning algorithm, the method combines the thought of replacing actions, redesigns the action space for reinforcement learning training, and further ensures the safety of the planning result.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a diagram of planning results of an embodiment of the present invention, using three algorithms to train the resulting reward curves; wherein OURS is the method, PPO and DDPG algorithms are both the existing methods; as can be seen from the figure, the result of training by using the method of the invention can be converged more quickly and is more stable;
FIG. 3 is a schematic flow chart of the method of the present invention.
Detailed Description
The technical scheme aims to perform path planning in a narrow welding seam space in an industrial welding scene, and the adopted robot is provided with a six-degree-of-freedom industrial mechanical arm with a welding gun for welding operation.
Referring to the attached drawings, the narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
step 1, before a mechanical arm moves, setting a planning task and a target point; the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
In the training process, state information of the mechanical arm is obtained from the simulated welding environment, and the state information comprises kinematic information of a mechanical arm joint and a position relation with an obstacle. The state information can be obtained by interaction between the robot and the simulated welding environment in the training process. Wherein the kinematic information includes a position, a velocity, an acceleration, and a jerk velocity of a joint of the mechanical arm.
Step 2, calculating expected acceleration a according to the current state information of the mechanical arm and relevant kinematic constraints t+1N While calculating the braking acceleration a t+1B (ii) a The method comprises the following steps:
and 2.1, defining kinematic constraints of the joint, wherein the kinematic constraints comprise the constraints of the position, the speed, the acceleration and the jerk speed of the joint, and the constraints comprise the maximum value and the minimum value of the position, the speed, the acceleration and the jerk speed.
Step 2.2, in order to avoid collision, defining an obstacle-rod piece pair and a rod piece-rod piece pair, detecting a minimum distance d between the robot and the obstacle and between the robot and a mechanical arm connecting rod of the robot at discrete time points to determine the collision condition, and if the minimum distance d is smaller than a preset safety distance threshold value d S Then the collision is deemed to have occurred.
And 2.3, simulating in a pybull environment, and acquiring the state information of the mechanical arm through a sensor built in the pybull environment, wherein the state information comprises the position, the speed, the acceleration and the distance between each joint and an obstacle.
Establishing a neural network as a motion prediction network for predicting the motion at the next moment, wherein SELU is used as an activation function for hidden layers, the size of the hidden layer at the first layer is 256, and the hidden layer at the second layer is hiddenThe hidden layer size is 128, the state information of the joints is input into the motion prediction network, and the motion scalar m corresponding to each joint is predicted t+1 ∈[-1,1]Then represented by the formula a t+1N =a t+1min +(1+m t+1 )/2·(a t+1max -a t+1min ) Obtaining the desired acceleration a of the joint t+1N Wherein a is t+1min 、a t+1max Respectively corresponding to the minimum and maximum safe acceleration of the joint; knowing the acceleration, the velocity and position of the joint at the next time t +1 can be determined.
Step 2.4, calculating braking acceleration; joint velocity v corresponding to current time t t >0 time, take m' t+1 =2*m t+1 -1, else take m' t+1 =2(1-m t+1 ) -1, m' t+1 Brings into a t+1B =a t+1min +(1+m’ t+1 )/2·(a t+1max -a t+1min ) Calculating to obtain braking acceleration; meanwhile, in order to prevent oscillation phenomenon in the movement process, a is taken t+1max =m’ t+1 *(a t+1max -a t+1min ),a t+1min =a t+1min +(1-m’ t+1 )*(a t+1max -a t+1min )。
Step 3, carrying out a on the expected acceleration of the joint t+1N A test is performed to expect an acceleration a if the robot arm has not collided and the kinematic constraints of the joint defined in step 2.1 have not been violated after the action has been performed t+1N It is feasible that an acceleration a will be desired t+1N Performing as an alternative action; otherwise the calculated braking acceleration a t+1B Performed as an alternative action.
To ensure that there is a safe and feasible motion at the next time t +1, the expected acceleration a calculated for each joint t+1N After braking is performed; and (3) starting from the state information corresponding to the current moment t, namely from the position, the speed and the acceleration of the joint of the mechanical arm at the current moment, if no collision occurs after the corresponding action is executed, the behavior is safe, otherwise, the movement is stopped, and the step 2 of predicting again is carried out.
And 4, through the steps 2 to 3, the alternative actions of each joint of the mechanical arm can be obtained, and the alternative actions form a feasible action space of the mechanical arm. And planning a motion track for the mechanical arm in the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy. In the reinforcement learning training process, an action is selected from the action space to be executed, and the quality of the action is reflected by an incentive value.
The deep reinforcement learning algorithm comprises the following steps:
and 4.1, setting an Actor network and a Critic network as reinforcement learning networks, wherein the loss function used by the Actor update adopts a loss function of a self-adaptive KL penalty coefficient, the Critic adopts TD-error update, the hidden layer uses swish as an activation function, and the output layer uses tanh as an activation function.
Step 4.2, training path planning is carried out in the action space;
and 4.3, setting a training ending condition, and taking the planning success to stop training when the tail end of the mechanical arm continuously reaches a preset target point for multiple times.
In this embodiment, the input quantity of the deep reinforcement learning algorithm is the state information s of the mechanical arm t The Actor network and Critic network are set for training. The network structure is 400 × 300 × 10 × 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1]。
And (3) carrying out path planning training in the redesigned action space, inputting the current joint position, speed and acceleration of the mechanical arm and the distance between the tail end of the mechanical arm and a target point into the reinforcement learning network to obtain the expected action of the mechanical arm, namely the action which maximizes the Q value in the action space and enables the mechanical arm to execute, wherein the Q value is an action function value in the reinforcement learning and is used for evaluating the value of the action, and represents the expectation of the reward sum of the final state after the intelligent agent selects the action. And executing each action to obtain a corresponding reward value, and when the reward is stably converged, considering that the planning is successful and the training is stopped, wherein the strategy obtained through the training is the optimal strategy.
The reward function for deep reinforcement learning is R = R target -R action -R adaptation -R distance Comprising four terms, a first term R target Reward items for the distance from the tail end of the mechanical arm to the target point so as to train the mechanical arm to approach the target point; second term R action Action penalty item, which is used to avoid action from being too close to the limit value; third term R adaptation The brake penalty item is 1 when the action is collided and the braking action is executed, otherwise, the brake penalty item is 0; fourth term R distance And for the distance penalty item, after the alternative action is executed, the distance between the rods of the mechanical arm and the distance between each rod and the obstacle are smaller than a certain threshold value, the penalty item 1 is applied, and otherwise, the penalty item is 0.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (8)

1. A narrow space robot operation planning method for safety reinforcement learning is characterized by comprising the following steps:
setting a planning task and a target point before the movement of the mechanical arm;
calculating expected acceleration a according to the current state information of the mechanical arm and relevant kinematic constraints t+1N While calculating the braking acceleration a t+1B Thereby constructing a feasible action space of the mechanical arm, comprising the following steps:
defining kinematic constraints for the joints;
detecting the minimum distance between the robot and the obstacle and between the robot arm connecting rods of the robot at discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safety distance threshold value, determining that the collision occurs;
acquiring state information of the mechanical arm through a built-in sensor in a pybullet environment;
establishing a neural network as a motion prediction network for predicting the motion at the next moment, inputting the state information of the joints into the motion prediction network, and predicting the motion scalar m corresponding to each joint t+1 ∈[-1,1]Then represented by the formula a t+1N =a t+1min +(1+m t+1 )/2·(a t+1max -a t+1min ) Obtaining the desired acceleration a of the joint t+1N Wherein a is t+1min 、a t+1max Respectively the minimum and maximum safe acceleration of the joint; knowing the expected acceleration, the speed and position of the joint at the next time t +1 can be obtained;
calculating the braking acceleration: joint velocity v corresponding to current time t t >At 0 time, take m' t+1 =2*m t+1 -1, else take m' t+1 =2(1-m t+1 ) -1, m' t+1 Carry-in a t+1B =a t+1min +(1+m’ t+1 )/2·(a t+1max -a t+1min ) Calculating to obtain braking acceleration;
a desired acceleration of the joint t+1N Testing is carried out, if the mechanical arm does not collide after the action is executed and does not violate the kinematic constraint of the defined joint, the acceleration a is expected t+1N It is feasible that an acceleration a will be desired t+1N Performing as an alternative action; otherwise the calculated braking acceleration a t+1B Performing as an alternative action; expected acceleration a calculated for each joint t+1N After braking is performed; starting from the state information corresponding to the current moment t, if no collision occurs after the corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
the feasible motion space of the mechanical arm is formed by the alternative motion of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by utilizing a deep reinforcement learning algorithm and obtaining an optimal strategy.
2. The safe reinforcement learning narrow space robot operation planning method according to claim 1, wherein the target point is a welding start point, and the planning task is to plan a safe path so that the end of the mechanical arm moves to the welding start point.
3. The safety-enhanced learning narrow space robot operation planning method according to claim 1, wherein the state information includes a position, a velocity, an acceleration, and a distance to an obstacle of each joint.
4. The safe reinforcement learning narrow space robot operation planning method according to claim 1, wherein a is taken to prevent oscillation phenomenon during the movement process t+1max =m’ t+1 *(a t+1max -a t+1min ),a t+1min =a t+1min +(1-m’ t+1 )*(a t+1max -a t+1min )。
5. The safety-reinforcement-learning narrow-space robot work planning method according to claim 1, wherein the deep reinforcement learning algorithm comprises:
setting an Actor network and a Critic network as reinforcement learning networks, wherein the loss function used by the Actor updating adopts a loss function of a self-adaptive KL penalty coefficient, the Critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
training path planning under the action space;
and setting a training ending condition, and regarding the condition as planning when the tail end of the mechanical arm continuously reaches a preset target point for multiple times. The training is successfully stopped.
6. The narrow space robot operation planning method for safety reinforcement learning according to claim 1, wherein an input amount of the deep reinforcement learning algorithm is status information of a robot arm, and an Actor network and a Critic network are provided for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, hidden layers all use swish as an activation function, an output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
7. The narrow space robot operation planning method for safety reinforcement learning according to claim 1, wherein the path planning training is performed in the motion space, so as to obtain the desired motion of the robot arm, i.e. the motion in the motion space that maximizes the Q value, i.e. the motion function value in reinforcement learning, representing the expectation of the robot to the final state reward sum after selecting the motion; and executing each action to obtain a corresponding reward value, and when the reward is stably converged, considering that the planning is successful and the training is stopped, wherein the strategy obtained through the training is the optimal strategy.
8. The narrow space robot work planning method for safety reinforcement learning according to claim 1, wherein the reward function for deep reinforcement learning is R = R target -R action -R adaptation -R distance Comprising four terms, a first term R target Reward items for the distance from the tail end of the mechanical arm to the target point so as to train the mechanical arm to approach the target point; second term R action Action penalty item, which is used to avoid action from being too close to the limit value; third term R adaptation The brake penalty item is 1 when the action is collided and the braking action is executed, otherwise, the brake penalty item is 0; fourth term R distance And for the distance penalty item, after the alternative action is executed, the distance between the rods of the mechanical arm and the distance between each rod and the obstacle are smaller than a certain threshold value, the penalty item 1 is applied, and otherwise, the penalty item is 0.
CN202210930544.2A 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning Active CN115178944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210930544.2A CN115178944B (en) 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210930544.2A CN115178944B (en) 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning

Publications (2)

Publication Number Publication Date
CN115178944A true CN115178944A (en) 2022-10-14
CN115178944B CN115178944B (en) 2024-05-24

Family

ID=83520672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210930544.2A Active CN115178944B (en) 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning

Country Status (1)

Country Link
CN (1) CN115178944B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116551703A (en) * 2023-07-12 2023-08-08 长春工业大学 Motion planning method based on machine learning in complex environment
CN116834018A (en) * 2023-08-07 2023-10-03 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN116900539A (en) * 2023-09-14 2023-10-20 天津大学 Multi-robot task planning method based on graph neural network and reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106457565A (en) * 2014-06-03 2017-02-22 阿蒂迈兹机器人技术有限公司 Method and system for programming a robot
CN110315258A (en) * 2019-07-24 2019-10-11 广东工业大学 A kind of welding method based on intensified learning and ant group algorithm
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN110370317A (en) * 2019-07-24 2019-10-25 广东工业大学 Robot restorative procedure and device
CN112454333A (en) * 2020-11-26 2021-03-09 青岛理工大学 Robot teaching system and method based on image segmentation and surface electromyogram signals
CN113163332A (en) * 2021-04-25 2021-07-23 北京邮电大学 Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning
US20220105629A1 (en) * 2021-12-16 2022-04-07 Venkat Natarajan Failure rate estimation and reinforcement learning safety factor systems
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106457565A (en) * 2014-06-03 2017-02-22 阿蒂迈兹机器人技术有限公司 Method and system for programming a robot
CN110315258A (en) * 2019-07-24 2019-10-11 广东工业大学 A kind of welding method based on intensified learning and ant group algorithm
CN110370317A (en) * 2019-07-24 2019-10-25 广东工业大学 Robot restorative procedure and device
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN112454333A (en) * 2020-11-26 2021-03-09 青岛理工大学 Robot teaching system and method based on image segmentation and surface electromyogram signals
CN113163332A (en) * 2021-04-25 2021-07-23 北京邮电大学 Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning
US20220105629A1 (en) * 2021-12-16 2022-04-07 Venkat Natarajan Failure rate estimation and reinforcement learning safety factor systems
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116551703A (en) * 2023-07-12 2023-08-08 长春工业大学 Motion planning method based on machine learning in complex environment
CN116551703B (en) * 2023-07-12 2023-09-12 长春工业大学 Motion planning method based on machine learning in complex environment
CN116834018A (en) * 2023-08-07 2023-10-03 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN116900539A (en) * 2023-09-14 2023-10-20 天津大学 Multi-robot task planning method based on graph neural network and reinforcement learning
CN116900539B (en) * 2023-09-14 2023-12-19 天津大学 Multi-robot task planning method based on graph neural network and reinforcement learning

Also Published As

Publication number Publication date
CN115178944B (en) 2024-05-24

Similar Documents

Publication Publication Date Title
CN115178944A (en) Narrow space robot operation planning method for safety reinforcement learning
CN108621165B (en) Optimal trajectory planning method for industrial robot dynamics performance in obstacle environment
CN114603564A (en) Mechanical arm navigation obstacle avoidance method and system, computer equipment and storage medium
CN111856925B (en) State trajectory-based confrontation type imitation learning method and device
CN112140101A (en) Trajectory planning method, device and system
CN113687659B (en) Optimal trajectory generation method and system based on digital twinning
CN115091469B (en) Depth reinforcement learning mechanical arm motion planning method based on maximum entropy frame
CN110014428A (en) A kind of sequential logic mission planning method based on intensified learning
CN113485323B (en) Flexible formation method for cascading multiple mobile robots
CN115542733A (en) Self-adaptive dynamic window method based on deep reinforcement learning
CN118201742A (en) Multi-robot coordination using a graph neural network
CN117207186A (en) Assembly line double-mechanical-arm collaborative grabbing method based on reinforcement learning
CN116551703B (en) Motion planning method based on machine learning in complex environment
CN111984000A (en) Method and device for automatically influencing an actuator
CN114055479A (en) Dragging teaching spraying robot collision early warning method, medium and equipment
Paudel Learning for robot decision making under distribution shift: A survey
CN115081612A (en) Apparatus and method to improve robot strategy learning
KR20190088093A (en) Learning method for robot
Chen et al. Mitigating Imminent Collision for Multi-robot Navigation: A TTC-force Reward Shaping Approach
Jiang et al. Motion sequence learning for robot walking based on pose optimization
Young et al. Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies
Li et al. Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
KR102719462B1 (en) Method, apparatus and computer program for forming kinematic model for actuation of articulated robot
TWI811156B (en) Transition method of locomotion gait of robot
CN115648213A (en) Mechanical arm autonomous assembling method and system suitable for unstructured environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant