CN115178944B - Narrow space robot operation planning method for safety reinforcement learning - Google Patents
Narrow space robot operation planning method for safety reinforcement learning Download PDFInfo
- Publication number
- CN115178944B CN115178944B CN202210930544.2A CN202210930544A CN115178944B CN 115178944 B CN115178944 B CN 115178944B CN 202210930544 A CN202210930544 A CN 202210930544A CN 115178944 B CN115178944 B CN 115178944B
- Authority
- CN
- China
- Prior art keywords
- action
- mechanical arm
- joint
- reinforcement learning
- acceleration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 30
- 230000009471 action Effects 0.000 claims abstract description 94
- 230000001133 acceleration Effects 0.000 claims abstract description 50
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 25
- 230000000875 corresponding effect Effects 0.000 claims description 15
- 230000004913 activation Effects 0.000 claims description 13
- 238000003466 welding Methods 0.000 claims description 12
- 230000006978 adaptation Effects 0.000 claims description 3
- 238000013459 approach Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 claims description 3
- 230000010355 oscillation Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 6
- 230000036461 convulsion Effects 0.000 description 3
- 235000004257 Cordia myxa Nutrition 0.000 description 1
- 244000157795 Cordia myxa Species 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B23—MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
- B23K—SOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
- B23K37/00—Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
- B23K37/02—Carriages for supporting the welding or cutting element
- B23K37/0211—Carriages for supporting the welding or cutting element travelling on a guide member, e.g. rail, track
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B23—MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
- B23K—SOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
- B23K37/00—Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
- B23K37/02—Carriages for supporting the welding or cutting element
- B23K37/0252—Steering means
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
- B25J9/1666—Avoiding collision or forbidden zones
Landscapes
- Engineering & Computer Science (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Physics & Mathematics (AREA)
- Optics & Photonics (AREA)
- Feedback Control In General (AREA)
- Manipulator (AREA)
- Numerical Control (AREA)
Abstract
The invention discloses a narrow space robot operation planning method for safety reinforcement learning, which comprises the following steps: before the mechanical arm moves, setting a planning task and a target point; according to the current state information of the mechanical arm and the related kinematic constraint, calculating the expected acceleration, and simultaneously calculating the braking acceleration; testing the expected acceleration of the joint, if the mechanical arm does not collide and does not violate the kinematic constraint of the joint after the action is executed, the expected acceleration is feasible, and the expected acceleration is executed as a substitute action; otherwise, executing the calculated braking acceleration as a substitute action; the feasible action space of the mechanical arm is formed by the replacement action of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy. The invention combines the idea of alternative actions, redesigns the action space for reinforcement learning training, and further ensures the safety of planning results.
Description
Technical Field
The invention relates to the field of robot operation planning research, in particular to a narrow space robot operation planning method for safety reinforcement learning.
Background
The robot is required to autonomously move from a current position to a given position in a limited environment by an obstacle, rapidly and collision-free. By giving the starting position and the end position, a path meeting certain constraint, such as no collision, meeting kinematic conditions and the like, is found in the working space of the robot, and the path is also as short as possible. In path planning, firstly, an obstacle space is modeled, a robot is placed in the space, and a traditional planning algorithm such as a genetic algorithm, an artificial potential field method or a deep reinforcement learning algorithm is used for training. The computational complexity of the planning under the high-dimensional condition of the algorithms also grows exponentially, which often leads to difficulty in real-time planning. Safety reinforcement learning is used as a derivation method of reinforcement learning, and safety constraints are complied with in the learning stage and the deployment process. Under the condition that the environment consists of a controllable robot and static obstacles and the shape and the position of each object are known, the constraints such as collision, related kinematic restriction and the like are considered in the training process, the concept of replacing safety behaviors is applied to reinforcement learning, the feasibility of planning results is greatly improved, and the method is applicable to a high-dimensional robot system.
Disclosure of Invention
The invention aims to provide a narrow space robot operation planning method for safety reinforcement learning, which is used for further improving the safety of planning results.
In order to realize the tasks, the invention adopts the following technical scheme:
a narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
Before the mechanical arm moves, setting a planning task and a target point;
According to the current state information of the mechanical arm and the related kinematic constraint, calculating the expected acceleration a t+1N and simultaneously calculating the braking acceleration a t+1B, thereby constructing a feasible action space of the mechanical arm, and comprising:
Defining kinematic constraints of the joint;
Detecting the minimum distance between the robot and the obstacle and the minimum distance between the robot and the mechanical arm connecting rod of the robot in discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safe distance threshold value, regarding the collision;
Acquiring state information of the mechanical arm through a sensor built in pybullet environment;
a neural network is established as an action prediction network for predicting the action at the next moment, the state information of the joints is input into the action prediction network, the corresponding action scalar m t+1 epsilon < -1,1 > of each joint is predicted, and then the following formula is adopted Obtaining the expected acceleration a t+1N of the joint, wherein a t+1min、at+1max is the minimum and maximum safe acceleration of the joint respectively; knowing the expected acceleration, the velocity and position of the joint at the next time t+1 can be obtained;
Calculating braking acceleration: when the joint velocity v t corresponding to the current time t is more than 0, taking m ' t+1=2*mt+1 -1, otherwise taking m ' t+1=2(1-mt+1) -1, and taking m ' t+1 Calculating to obtain braking acceleration;
Testing the expected acceleration of the joint by a t+1N, if the mechanical arm does not collide after the action is executed and the defined kinematic constraint of the joint is not violated, the expected acceleration a t+1N is feasible, and the expected acceleration a t+1N is executed as an alternative action; otherwise, executing the calculated braking acceleration a t+1B as a substitute action; the expected acceleration a t+1N calculated by each joint is executed after braking; starting from the state information corresponding to the current time t, if no collision occurs after corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
The feasible action space of the mechanical arm is formed by the replacement action of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy.
Further, the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
Further, the status information includes a position, a velocity, an acceleration, and a distance from the obstacle for each joint.
Further, in order to prevent oscillation during movement, take at+1max=m't+1*(at+1max-at+1min),at+1min=at+1min+(1-m't+1)*(at+1max-at+1min).
Further, the deep reinforcement learning algorithm includes:
Setting an Actor network and a Critic network as reinforcement learning networks, wherein a loss function used for updating the Actor adopts a loss function of a self-adaptive KL penalty coefficient, critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
Training path planning under the action space;
setting a training ending condition, and when the tail end of the mechanical arm continuously reaches a preset target point for a plurality of times, planning can be considered. Training was successfully stopped.
Further, the input quantity of the deep reinforcement learning algorithm is state information s t of the mechanical arm, and an Actor network and a Critic network are set for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
Further, training of path planning is performed under the action space, so that expected actions of the mechanical arm can be obtained, namely actions which maximize the Q value in the action space and are executed by the mechanical arm, wherein the Q value is an action function value in reinforcement learning and represents the expectation of the total sum of rewards of the final state after the robot selects the actions; and executing each action to obtain a corresponding rewarding value, and when rewarding stability and convergence are considered as successful planning and training is stopped, the strategy obtained through training is the optimal strategy.
Further, the reward function for deep reinforcement learning is that r=r target-Raction-Radaptation-Rdistance includes four terms, and the first term R target is a reward term of the distance from the tail end of the mechanical arm to the target point, so as to train the mechanical arm to approach the target point; the second term R action is an action penalty term to avoid actions too close to the limit; the third term R adaptation is a brake penalty term, which is 1 when the action collides to execute the braking action, otherwise, is 0; and the fourth term R distance is a distance penalty term, and if the distance between the rod pieces of the mechanical arm and between each rod piece and the obstacle after the replacement action is executed is smaller than a certain threshold value, a penalty term 1 is applied, otherwise, the penalty term is 0.
Compared with the prior art, the invention has the following technical characteristics:
according to the method, a track of a current time interval is calculated according to the current motion state of a robot and network prediction, and if the predicted track accords with all safety constraints, predicted motion is carried out; otherwise, the braking track calculated in the previous time interval is used as an alternative safety action, and a feasible action space is further obtained. The design of the reinforcement-learned motion space ensures that all predicted trajectories conform to kinematic joint limits. Compared with the existing deep reinforcement learning algorithm, the method combines the idea of alternative actions, redesigns the action space for reinforcement learning training, and further ensures the safety of planning results.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a graph of the outcome of a plan of an embodiment of the present invention, with reward curves trained using three algorithms; wherein OURS is the method, and the PPO algorithm and the DDPG algorithm are both existing methods; as can be seen from the figure, the results trained using the method of the present invention can converge faster and be more stable;
FIG. 3 is a schematic flow chart of the method of the present invention.
Detailed Description
The scheme aims at planning a path in a narrow welding seam space in an industrial welding scene, and an industrial mechanical arm with six degrees of freedom and a welding gun is mounted on an adopted robot and used for welding operation.
Referring to the attached drawings, the narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
Step 1, setting a planning task and a target point before the mechanical arm moves; the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
In the training process, the state information of the mechanical arm is acquired from the simulated welding environment, wherein the state information comprises the kinematic information of the joint of the mechanical arm and the position relation between the joint and an obstacle. The state information can be obtained through interaction between the robot and the simulated welding environment in the training process. The kinematic information comprises the position, the speed, the acceleration and the jerk of the mechanical arm joint.
Step 2, calculating the expected acceleration a t+1N according to the current state information of the mechanical arm and the related kinematic constraint, and simultaneously calculating the braking acceleration a t+1B; the method comprises the following steps:
step 2.1, defining kinematic constraints of the joint, including constraints of the position, the speed, the acceleration and the jerk speed of the joint, wherein the constraints include maximum and minimum values of the position, the speed, the acceleration and the jerk speed.
In step 2.2, in order to avoid collision, defining an obstacle-rod pair and a rod-rod pair, detecting a minimum distance d between the robot and the obstacle and a mechanical arm connecting rod of the robot in discrete time points to determine collision conditions, and if the minimum distance d is smaller than a preset safety distance threshold d S, regarding collision.
Step 2.3, the simulation is performed in pybullet environment, and the state information of the mechanical arm can be obtained through the built-in sensor in pybullet environment, wherein the state comprises the position, the speed, the acceleration and the distance between each joint and the obstacle.
Establishing a neural network as an action prediction network for predicting the action at the next moment, using SELU as an activation function by using a hidden layer, wherein the size of a first hidden layer is 256, the size of a second hidden layer is 128, inputting the state information of joints into the action prediction network, predicting the corresponding action scalar m t+1 epsilon < -1,1 > of each joint, and then using the formulaObtaining the expected acceleration a t+1N of the joint, wherein a t+1min、at+1max corresponds to the minimum and maximum safe accelerations of the joint respectively; knowing the acceleration, the velocity and position of the joint at the next time t+1 can be obtained.
Step 2.4, calculating braking acceleration; when the joint velocity v t corresponding to the current time t is more than 0, taking m ' t+1=2*mt+1 -1, otherwise taking m ' t+1=2(1-mt+1) -1, and taking m ' t+1 Calculating to obtain braking acceleration; at the same time, in order to prevent oscillation during the movement process at+1max=m't+1*(at+1max-at+1min),at+1min=at+1min+(1-m't+1)*(at+1max-at+1min).
Step 3, testing the expected acceleration a t+1N of the joint, if the mechanical arm does not collide after the action is executed and the kinematic constraint of the joint defined in step 2.1 is not violated, the expected acceleration a t+1N is feasible, and the expected acceleration a t+1N is executed as an alternative action; otherwise, the calculated braking acceleration a t+1B is executed as an alternative action.
In order to ensure that there is a safe and feasible action at the next time t+1, the calculated expected acceleration a t+1N of each joint is performed after braking; and (2) starting from the state information corresponding to the current moment t, namely from the position, the speed and the acceleration of the mechanical arm joint at the current moment, if no collision occurs after the corresponding action is executed, the action is safe, otherwise, stopping the motion, and returning to the step (2) for re-prediction.
And 4, through the steps 2 to 3, the replacement action of each joint of the mechanical arm can be obtained, and the replacement actions form a feasible action space of the mechanical arm. And planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy. In the reinforcement learning training process, one action is selected from the action space to be executed, and then the quality of the action is reflected by the value of the rewards.
The deep reinforcement learning algorithm comprises:
And 4.1, setting an Actor network and a Critic network as reinforcement learning networks, wherein a loss function used for updating the Actor adopts a loss function of a self-adaptive KL penalty coefficient, critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function.
Step 4.2, training path planning under the action space;
And 4.3, setting a training ending condition, and stopping training when the tail end of the mechanical arm continuously reaches a preset target point for a plurality of times.
In this embodiment, the input quantity of the deep reinforcement learning algorithm is state information s t of the mechanical arm, and an Actor network and a Critic network are set for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
The training of path planning is carried out under the redesigned action space, the current joint position, speed and acceleration of the mechanical arm and the distance between the tail end of the mechanical arm and the target point are input into the reinforcement learning network, so that the expected action of the mechanical arm can be obtained, namely the action which maximizes the Q value in the action space and is executed by the mechanical arm, and the Q value is the action function value in reinforcement learning and is used for evaluating the action value, and the action value represents the expectation that the agent rewards the sum up to the final state after selecting the action. And executing each action to obtain a corresponding rewarding value, and when rewarding stability and convergence are considered as successful planning and training is stopped, the strategy obtained through training is the optimal strategy.
The reward function for deep reinforcement learning is that r=r target-Raction-Radaptation-Rdistance includes four terms, and the first term R target is a reward term of the distance from the tail end of the mechanical arm to the target point, and is used for training the mechanical arm to approach the target point; the second term R action is an action penalty term to avoid actions too close to the limit; the third term R adaptation is a brake penalty term, which is 1 when the action collides to execute the braking action, otherwise, is 0; and the fourth term R distance is a distance penalty term, and if the distance between the rod pieces of the mechanical arm and between each rod piece and the obstacle after the replacement action is executed is smaller than a certain threshold value, a penalty term 1 is applied, otherwise, the penalty term is 0.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.
Claims (6)
1. The narrow space robot operation planning method for safety reinforcement learning is characterized by comprising the following steps of:
Before the mechanical arm moves, setting a planning task and a target point;
According to the current state information of the mechanical arm and the related kinematic constraint, calculating the expected acceleration a t+1N and simultaneously calculating the braking acceleration a t+1B, thereby constructing a feasible action space of the mechanical arm, and comprising:
Defining kinematic constraints of the joint;
Detecting the minimum distance between the robot and the obstacle and the minimum distance between the robot and the mechanical arm connecting rod of the robot in discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safe distance threshold value, regarding the collision;
Acquiring state information of the mechanical arm through a sensor built in pybullet environment;
a neural network is established as an action prediction network for predicting the action at the next moment, the state information of the joints is input into the action prediction network, the corresponding action scalar m t+1 epsilon < -1,1 > of each joint is predicted, and then the following formula is adopted Obtaining the expected acceleration a t+1N of the joint, wherein a t+1min、at+1max is the minimum and maximum safe acceleration of the joint respectively; knowing the expected acceleration, the velocity and position of the joint at the next time t+1 can be obtained;
Calculating braking acceleration: when the joint velocity v t corresponding to the current time t is more than 0, taking m ' t+1=2*mt+1 -1, otherwise taking m ' t+1=2(1-mt+1) -1, and taking m ' t+1 Calculating to obtain braking acceleration;
Testing the expected acceleration a t+1N of the joint, if the mechanical arm does not collide after the action is executed and the defined kinematic constraint of the joint is not violated, the expected acceleration a t+1N is feasible, and the expected acceleration a t+1N is executed as an alternative action; otherwise, executing the calculated braking acceleration a t+1B as a substitute action; the expected acceleration a t+1N calculated by each joint is executed after braking; starting from the state information corresponding to the current time t, if no collision occurs after corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
The feasible action space of the mechanical arm is formed by the replacement action of each joint of the mechanical arm; planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy;
The deep reinforcement learning algorithm comprises:
Setting an Actor network and a Critic network as reinforcement learning networks, wherein a loss function used for updating the Actor adopts a loss function of a self-adaptive KL penalty coefficient, critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
Training path planning under the action space;
setting a training ending condition, and stopping training when the tail end of the mechanical arm continuously reaches a preset target point for a plurality of times;
Training path planning under the action space can obtain the expected action of the mechanical arm, namely the action which maximizes the Q value in the action space and is executed by the mechanical arm, wherein the Q value is the action function value in reinforcement learning and represents the expectation of the total rewarding of the final state after the robot selects the action; and executing each action to obtain a corresponding rewarding value, and when rewarding stability and convergence are considered as successful planning and training is stopped, the strategy obtained through training is the optimal strategy.
2. The method of claim 1, wherein the target point is a welding start point and the planning task is to plan a safe path so that the end of the arm moves to the welding start point.
3. The method of claim 1, wherein the status information includes a position, a speed, an acceleration, and a distance from an obstacle for each joint.
4. The method for planning operation of a robot in a confined space for reinforcement learning as set forth in claim 1, wherein the oscillation phenomenon is prevented during the movement at+1max=m't+1*(at+1max-at+1min),at+1min=at+1min+(1-m't+1)*(at+1max-at+1min).
5. The narrow space robot operation planning method for safety reinforcement learning according to claim 1, wherein the input quantity of the deep reinforcement learning algorithm is state information of a mechanical arm, and an Actor network and a Critic network are set for training; the network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
6. The narrow space robotic work planning method of claim 1, wherein the reward function for the deep reinforcement learning is R = R target-Raction-Radaptation-Rdistance comprising four terms, the first term R target being a reward term for arm tip to target point distance for training the arm to approach the target point; the second term R action is an action penalty term to avoid actions too close to the limit; the third term R adaptation is a brake penalty term, which is 1 when the action collides to execute the braking action, otherwise, is 0; and the fourth term R distance is a distance penalty term, and if the distance between the rod pieces of the mechanical arm and between each rod piece and the obstacle after the replacement action is executed is smaller than a certain threshold value, a penalty term 1 is applied, otherwise, the penalty term is 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210930544.2A CN115178944B (en) | 2022-08-04 | 2022-08-04 | Narrow space robot operation planning method for safety reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210930544.2A CN115178944B (en) | 2022-08-04 | 2022-08-04 | Narrow space robot operation planning method for safety reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115178944A CN115178944A (en) | 2022-10-14 |
CN115178944B true CN115178944B (en) | 2024-05-24 |
Family
ID=83520672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210930544.2A Active CN115178944B (en) | 2022-08-04 | 2022-08-04 | Narrow space robot operation planning method for safety reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115178944B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116551703B (en) * | 2023-07-12 | 2023-09-12 | 长春工业大学 | Motion planning method based on machine learning in complex environment |
CN116834018B (en) * | 2023-08-07 | 2024-08-27 | 南京云创大数据科技股份有限公司 | Training method and training device for multi-mechanical arm multi-target searching |
CN116900539B (en) * | 2023-09-14 | 2023-12-19 | 天津大学 | Multi-robot task planning method based on graph neural network and reinforcement learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106457565A (en) * | 2014-06-03 | 2017-02-22 | 阿蒂迈兹机器人技术有限公司 | Method and system for programming a robot |
CN110315258A (en) * | 2019-07-24 | 2019-10-11 | 广东工业大学 | A kind of welding method based on intensified learning and ant group algorithm |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN110370317A (en) * | 2019-07-24 | 2019-10-25 | 广东工业大学 | Robot restorative procedure and device |
CN112454333A (en) * | 2020-11-26 | 2021-03-09 | 青岛理工大学 | Robot teaching system and method based on image segmentation and surface electromyogram signals |
CN113163332A (en) * | 2021-04-25 | 2021-07-23 | 北京邮电大学 | Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning |
CN114708293A (en) * | 2022-03-22 | 2022-07-05 | 广东工业大学 | Robot motion estimation method based on deep learning point-line feature and IMU tight coupling |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220105629A1 (en) * | 2021-12-16 | 2022-04-07 | Venkat Natarajan | Failure rate estimation and reinforcement learning safety factor systems |
-
2022
- 2022-08-04 CN CN202210930544.2A patent/CN115178944B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106457565A (en) * | 2014-06-03 | 2017-02-22 | 阿蒂迈兹机器人技术有限公司 | Method and system for programming a robot |
CN110315258A (en) * | 2019-07-24 | 2019-10-11 | 广东工业大学 | A kind of welding method based on intensified learning and ant group algorithm |
CN110370317A (en) * | 2019-07-24 | 2019-10-25 | 广东工业大学 | Robot restorative procedure and device |
CN110333739A (en) * | 2019-08-21 | 2019-10-15 | 哈尔滨工程大学 | A kind of AUV conduct programming and method of controlling operation based on intensified learning |
CN112454333A (en) * | 2020-11-26 | 2021-03-09 | 青岛理工大学 | Robot teaching system and method based on image segmentation and surface electromyogram signals |
CN113163332A (en) * | 2021-04-25 | 2021-07-23 | 北京邮电大学 | Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning |
CN114708293A (en) * | 2022-03-22 | 2022-07-05 | 广东工业大学 | Robot motion estimation method based on deep learning point-line feature and IMU tight coupling |
Also Published As
Publication number | Publication date |
---|---|
CN115178944A (en) | 2022-10-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115178944B (en) | Narrow space robot operation planning method for safety reinforcement learning | |
CN108621165B (en) | Optimal trajectory planning method for industrial robot dynamics performance in obstacle environment | |
WO2022088593A1 (en) | Robotic arm control method and device, and human-machine cooperation model training method | |
Liu et al. | Algorithmic safety measures for intelligent industrial co-robots | |
Nicolis et al. | Human intention estimation based on neural networks for enhanced collaboration with robots | |
CN113485323B (en) | Flexible formation method for cascading multiple mobile robots | |
CN115091469B (en) | Depth reinforcement learning mechanical arm motion planning method based on maximum entropy frame | |
CN111702766B (en) | Mechanical arm self-adaptive door opening screwing method based on force sense guidance | |
Bejar et al. | Reverse parking a car-like mobile robot with deep reinforcement learning and preview control | |
Sehgal et al. | Automatic parameter optimization using genetic algorithm in deep reinforcement learning for robotic manipulation tasks | |
Chen et al. | New approach to intelligent control systems with self-exploring process | |
CN115542733A (en) | Self-adaptive dynamic window method based on deep reinforcement learning | |
Brandao et al. | Multi-controller multi-objective locomotion planning for legged robots | |
CN116551703B (en) | Motion planning method based on machine learning in complex environment | |
Samsonov et al. | Using Reinforcement Learning for Optimization of a Workpiece Clamping Position in a Machine Tool. | |
CN111984000A (en) | Method and device for automatically influencing an actuator | |
Tzafestas et al. | Fuzzy reinforcement learning control for compliance tasks of robotic manipulators | |
CN114115341B (en) | Intelligent agent cluster cooperative motion method and system | |
CN114055479A (en) | Dragging teaching spraying robot collision early warning method, medium and equipment | |
CN113467465A (en) | Human-in-loop decision modeling and control method for robot system | |
CN113189986A (en) | Two-stage self-adaptive behavior planning method and system for autonomous robot | |
Chen et al. | Mitigating Imminent Collision for Multi-robot Navigation: A TTC-force Reward Shaping Approach | |
Li et al. | Manipulator Motion Planning based on Actor-Critic Reinforcement Learning | |
Young et al. | Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies | |
Wawrzyński | Autonomous reinforcement learning with experience replay for humanoid gait optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |