CN115178944B - Narrow space robot operation planning method for safety reinforcement learning - Google Patents

Narrow space robot operation planning method for safety reinforcement learning Download PDF

Info

Publication number
CN115178944B
CN115178944B CN202210930544.2A CN202210930544A CN115178944B CN 115178944 B CN115178944 B CN 115178944B CN 202210930544 A CN202210930544 A CN 202210930544A CN 115178944 B CN115178944 B CN 115178944B
Authority
CN
China
Prior art keywords
action
mechanical arm
joint
reinforcement learning
acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210930544.2A
Other languages
Chinese (zh)
Other versions
CN115178944A (en
Inventor
王涛
许银涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210930544.2A priority Critical patent/CN115178944B/en
Publication of CN115178944A publication Critical patent/CN115178944A/en
Application granted granted Critical
Publication of CN115178944B publication Critical patent/CN115178944B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23KSOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
    • B23K37/00Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
    • B23K37/02Carriages for supporting the welding or cutting element
    • B23K37/0211Carriages for supporting the welding or cutting element travelling on a guide member, e.g. rail, track
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23KSOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
    • B23K37/00Auxiliary devices or processes, not specially adapted to a procedure covered by only one of the preceding main groups
    • B23K37/02Carriages for supporting the welding or cutting element
    • B23K37/0252Steering means
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Physics & Mathematics (AREA)
  • Optics & Photonics (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)
  • Numerical Control (AREA)

Abstract

The invention discloses a narrow space robot operation planning method for safety reinforcement learning, which comprises the following steps: before the mechanical arm moves, setting a planning task and a target point; according to the current state information of the mechanical arm and the related kinematic constraint, calculating the expected acceleration, and simultaneously calculating the braking acceleration; testing the expected acceleration of the joint, if the mechanical arm does not collide and does not violate the kinematic constraint of the joint after the action is executed, the expected acceleration is feasible, and the expected acceleration is executed as a substitute action; otherwise, executing the calculated braking acceleration as a substitute action; the feasible action space of the mechanical arm is formed by the replacement action of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy. The invention combines the idea of alternative actions, redesigns the action space for reinforcement learning training, and further ensures the safety of planning results.

Description

Narrow space robot operation planning method for safety reinforcement learning
Technical Field
The invention relates to the field of robot operation planning research, in particular to a narrow space robot operation planning method for safety reinforcement learning.
Background
The robot is required to autonomously move from a current position to a given position in a limited environment by an obstacle, rapidly and collision-free. By giving the starting position and the end position, a path meeting certain constraint, such as no collision, meeting kinematic conditions and the like, is found in the working space of the robot, and the path is also as short as possible. In path planning, firstly, an obstacle space is modeled, a robot is placed in the space, and a traditional planning algorithm such as a genetic algorithm, an artificial potential field method or a deep reinforcement learning algorithm is used for training. The computational complexity of the planning under the high-dimensional condition of the algorithms also grows exponentially, which often leads to difficulty in real-time planning. Safety reinforcement learning is used as a derivation method of reinforcement learning, and safety constraints are complied with in the learning stage and the deployment process. Under the condition that the environment consists of a controllable robot and static obstacles and the shape and the position of each object are known, the constraints such as collision, related kinematic restriction and the like are considered in the training process, the concept of replacing safety behaviors is applied to reinforcement learning, the feasibility of planning results is greatly improved, and the method is applicable to a high-dimensional robot system.
Disclosure of Invention
The invention aims to provide a narrow space robot operation planning method for safety reinforcement learning, which is used for further improving the safety of planning results.
In order to realize the tasks, the invention adopts the following technical scheme:
a narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
Before the mechanical arm moves, setting a planning task and a target point;
According to the current state information of the mechanical arm and the related kinematic constraint, calculating the expected acceleration a t+1N and simultaneously calculating the braking acceleration a t+1B, thereby constructing a feasible action space of the mechanical arm, and comprising:
Defining kinematic constraints of the joint;
Detecting the minimum distance between the robot and the obstacle and the minimum distance between the robot and the mechanical arm connecting rod of the robot in discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safe distance threshold value, regarding the collision;
Acquiring state information of the mechanical arm through a sensor built in pybullet environment;
a neural network is established as an action prediction network for predicting the action at the next moment, the state information of the joints is input into the action prediction network, the corresponding action scalar m t+1 epsilon < -1,1 > of each joint is predicted, and then the following formula is adopted Obtaining the expected acceleration a t+1N of the joint, wherein a t+1min、at+1max is the minimum and maximum safe acceleration of the joint respectively; knowing the expected acceleration, the velocity and position of the joint at the next time t+1 can be obtained;
Calculating braking acceleration: when the joint velocity v t corresponding to the current time t is more than 0, taking m ' t+1=2*mt+1 -1, otherwise taking m ' t+1=2(1-mt+1) -1, and taking m ' t+1 Calculating to obtain braking acceleration;
Testing the expected acceleration of the joint by a t+1N, if the mechanical arm does not collide after the action is executed and the defined kinematic constraint of the joint is not violated, the expected acceleration a t+1N is feasible, and the expected acceleration a t+1N is executed as an alternative action; otherwise, executing the calculated braking acceleration a t+1B as a substitute action; the expected acceleration a t+1N calculated by each joint is executed after braking; starting from the state information corresponding to the current time t, if no collision occurs after corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
The feasible action space of the mechanical arm is formed by the replacement action of each joint of the mechanical arm; and planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy.
Further, the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
Further, the status information includes a position, a velocity, an acceleration, and a distance from the obstacle for each joint.
Further, in order to prevent oscillation during movement, take at+1max=m't+1*(at+1max-at+1min),at+1min=at+1min+(1-m't+1)*(at+1max-at+1min).
Further, the deep reinforcement learning algorithm includes:
Setting an Actor network and a Critic network as reinforcement learning networks, wherein a loss function used for updating the Actor adopts a loss function of a self-adaptive KL penalty coefficient, critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
Training path planning under the action space;
setting a training ending condition, and when the tail end of the mechanical arm continuously reaches a preset target point for a plurality of times, planning can be considered. Training was successfully stopped.
Further, the input quantity of the deep reinforcement learning algorithm is state information s t of the mechanical arm, and an Actor network and a Critic network are set for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
Further, training of path planning is performed under the action space, so that expected actions of the mechanical arm can be obtained, namely actions which maximize the Q value in the action space and are executed by the mechanical arm, wherein the Q value is an action function value in reinforcement learning and represents the expectation of the total sum of rewards of the final state after the robot selects the actions; and executing each action to obtain a corresponding rewarding value, and when rewarding stability and convergence are considered as successful planning and training is stopped, the strategy obtained through training is the optimal strategy.
Further, the reward function for deep reinforcement learning is that r=r target-Raction-Radaptation-Rdistance includes four terms, and the first term R target is a reward term of the distance from the tail end of the mechanical arm to the target point, so as to train the mechanical arm to approach the target point; the second term R action is an action penalty term to avoid actions too close to the limit; the third term R adaptation is a brake penalty term, which is 1 when the action collides to execute the braking action, otherwise, is 0; and the fourth term R distance is a distance penalty term, and if the distance between the rod pieces of the mechanical arm and between each rod piece and the obstacle after the replacement action is executed is smaller than a certain threshold value, a penalty term 1 is applied, otherwise, the penalty term is 0.
Compared with the prior art, the invention has the following technical characteristics:
according to the method, a track of a current time interval is calculated according to the current motion state of a robot and network prediction, and if the predicted track accords with all safety constraints, predicted motion is carried out; otherwise, the braking track calculated in the previous time interval is used as an alternative safety action, and a feasible action space is further obtained. The design of the reinforcement-learned motion space ensures that all predicted trajectories conform to kinematic joint limits. Compared with the existing deep reinforcement learning algorithm, the method combines the idea of alternative actions, redesigns the action space for reinforcement learning training, and further ensures the safety of planning results.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention;
FIG. 2 is a graph of the outcome of a plan of an embodiment of the present invention, with reward curves trained using three algorithms; wherein OURS is the method, and the PPO algorithm and the DDPG algorithm are both existing methods; as can be seen from the figure, the results trained using the method of the present invention can converge faster and be more stable;
FIG. 3 is a schematic flow chart of the method of the present invention.
Detailed Description
The scheme aims at planning a path in a narrow welding seam space in an industrial welding scene, and an industrial mechanical arm with six degrees of freedom and a welding gun is mounted on an adopted robot and used for welding operation.
Referring to the attached drawings, the narrow space robot operation planning method for safety reinforcement learning comprises the following steps:
Step 1, setting a planning task and a target point before the mechanical arm moves; the target point is a welding starting point, and the planning task is to plan a safe path so that the tail end of the mechanical arm moves to the welding starting point.
In the training process, the state information of the mechanical arm is acquired from the simulated welding environment, wherein the state information comprises the kinematic information of the joint of the mechanical arm and the position relation between the joint and an obstacle. The state information can be obtained through interaction between the robot and the simulated welding environment in the training process. The kinematic information comprises the position, the speed, the acceleration and the jerk of the mechanical arm joint.
Step 2, calculating the expected acceleration a t+1N according to the current state information of the mechanical arm and the related kinematic constraint, and simultaneously calculating the braking acceleration a t+1B; the method comprises the following steps:
step 2.1, defining kinematic constraints of the joint, including constraints of the position, the speed, the acceleration and the jerk speed of the joint, wherein the constraints include maximum and minimum values of the position, the speed, the acceleration and the jerk speed.
In step 2.2, in order to avoid collision, defining an obstacle-rod pair and a rod-rod pair, detecting a minimum distance d between the robot and the obstacle and a mechanical arm connecting rod of the robot in discrete time points to determine collision conditions, and if the minimum distance d is smaller than a preset safety distance threshold d S, regarding collision.
Step 2.3, the simulation is performed in pybullet environment, and the state information of the mechanical arm can be obtained through the built-in sensor in pybullet environment, wherein the state comprises the position, the speed, the acceleration and the distance between each joint and the obstacle.
Establishing a neural network as an action prediction network for predicting the action at the next moment, using SELU as an activation function by using a hidden layer, wherein the size of a first hidden layer is 256, the size of a second hidden layer is 128, inputting the state information of joints into the action prediction network, predicting the corresponding action scalar m t+1 epsilon < -1,1 > of each joint, and then using the formulaObtaining the expected acceleration a t+1N of the joint, wherein a t+1min、at+1max corresponds to the minimum and maximum safe accelerations of the joint respectively; knowing the acceleration, the velocity and position of the joint at the next time t+1 can be obtained.
Step 2.4, calculating braking acceleration; when the joint velocity v t corresponding to the current time t is more than 0, taking m ' t+1=2*mt+1 -1, otherwise taking m ' t+1=2(1-mt+1) -1, and taking m ' t+1 Calculating to obtain braking acceleration; at the same time, in order to prevent oscillation during the movement process at+1max=m't+1*(at+1max-at+1min),at+1min=at+1min+(1-m't+1)*(at+1max-at+1min).
Step 3, testing the expected acceleration a t+1N of the joint, if the mechanical arm does not collide after the action is executed and the kinematic constraint of the joint defined in step 2.1 is not violated, the expected acceleration a t+1N is feasible, and the expected acceleration a t+1N is executed as an alternative action; otherwise, the calculated braking acceleration a t+1B is executed as an alternative action.
In order to ensure that there is a safe and feasible action at the next time t+1, the calculated expected acceleration a t+1N of each joint is performed after braking; and (2) starting from the state information corresponding to the current moment t, namely from the position, the speed and the acceleration of the mechanical arm joint at the current moment, if no collision occurs after the corresponding action is executed, the action is safe, otherwise, stopping the motion, and returning to the step (2) for re-prediction.
And 4, through the steps 2 to 3, the replacement action of each joint of the mechanical arm can be obtained, and the replacement actions form a feasible action space of the mechanical arm. And planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy. In the reinforcement learning training process, one action is selected from the action space to be executed, and then the quality of the action is reflected by the value of the rewards.
The deep reinforcement learning algorithm comprises:
And 4.1, setting an Actor network and a Critic network as reinforcement learning networks, wherein a loss function used for updating the Actor adopts a loss function of a self-adaptive KL penalty coefficient, critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function.
Step 4.2, training path planning under the action space;
And 4.3, setting a training ending condition, and stopping training when the tail end of the mechanical arm continuously reaches a preset target point for a plurality of times.
In this embodiment, the input quantity of the deep reinforcement learning algorithm is state information s t of the mechanical arm, and an Actor network and a Critic network are set for training. The network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
The training of path planning is carried out under the redesigned action space, the current joint position, speed and acceleration of the mechanical arm and the distance between the tail end of the mechanical arm and the target point are input into the reinforcement learning network, so that the expected action of the mechanical arm can be obtained, namely the action which maximizes the Q value in the action space and is executed by the mechanical arm, and the Q value is the action function value in reinforcement learning and is used for evaluating the action value, and the action value represents the expectation that the agent rewards the sum up to the final state after selecting the action. And executing each action to obtain a corresponding rewarding value, and when rewarding stability and convergence are considered as successful planning and training is stopped, the strategy obtained through training is the optimal strategy.
The reward function for deep reinforcement learning is that r=r target-Raction-Radaptation-Rdistance includes four terms, and the first term R target is a reward term of the distance from the tail end of the mechanical arm to the target point, and is used for training the mechanical arm to approach the target point; the second term R action is an action penalty term to avoid actions too close to the limit; the third term R adaptation is a brake penalty term, which is 1 when the action collides to execute the braking action, otherwise, is 0; and the fourth term R distance is a distance penalty term, and if the distance between the rod pieces of the mechanical arm and between each rod piece and the obstacle after the replacement action is executed is smaller than a certain threshold value, a penalty term 1 is applied, otherwise, the penalty term is 0.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (6)

1. The narrow space robot operation planning method for safety reinforcement learning is characterized by comprising the following steps of:
Before the mechanical arm moves, setting a planning task and a target point;
According to the current state information of the mechanical arm and the related kinematic constraint, calculating the expected acceleration a t+1N and simultaneously calculating the braking acceleration a t+1B, thereby constructing a feasible action space of the mechanical arm, and comprising:
Defining kinematic constraints of the joint;
Detecting the minimum distance between the robot and the obstacle and the minimum distance between the robot and the mechanical arm connecting rod of the robot in discrete time points to determine the collision condition, and if the minimum distance is smaller than a preset safe distance threshold value, regarding the collision;
Acquiring state information of the mechanical arm through a sensor built in pybullet environment;
a neural network is established as an action prediction network for predicting the action at the next moment, the state information of the joints is input into the action prediction network, the corresponding action scalar m t+1 epsilon < -1,1 > of each joint is predicted, and then the following formula is adopted Obtaining the expected acceleration a t+1N of the joint, wherein a t+1min、at+1max is the minimum and maximum safe acceleration of the joint respectively; knowing the expected acceleration, the velocity and position of the joint at the next time t+1 can be obtained;
Calculating braking acceleration: when the joint velocity v t corresponding to the current time t is more than 0, taking m ' t+1=2*mt+1 -1, otherwise taking m ' t+1=2(1-mt+1) -1, and taking m ' t+1 Calculating to obtain braking acceleration;
Testing the expected acceleration a t+1N of the joint, if the mechanical arm does not collide after the action is executed and the defined kinematic constraint of the joint is not violated, the expected acceleration a t+1N is feasible, and the expected acceleration a t+1N is executed as an alternative action; otherwise, executing the calculated braking acceleration a t+1B as a substitute action; the expected acceleration a t+1N calculated by each joint is executed after braking; starting from the state information corresponding to the current time t, if no collision occurs after corresponding action is executed, the behavior is safe, otherwise, the movement is stopped;
The feasible action space of the mechanical arm is formed by the replacement action of each joint of the mechanical arm; planning a motion track for the mechanical arm under the action space by using a deep reinforcement learning algorithm and obtaining an optimal strategy;
The deep reinforcement learning algorithm comprises:
Setting an Actor network and a Critic network as reinforcement learning networks, wherein a loss function used for updating the Actor adopts a loss function of a self-adaptive KL penalty coefficient, critic adopts TD-error updating, a hidden layer uses swish as an activation function, and an output layer uses tanh as an activation function;
Training path planning under the action space;
setting a training ending condition, and stopping training when the tail end of the mechanical arm continuously reaches a preset target point for a plurality of times;
Training path planning under the action space can obtain the expected action of the mechanical arm, namely the action which maximizes the Q value in the action space and is executed by the mechanical arm, wherein the Q value is the action function value in reinforcement learning and represents the expectation of the total rewarding of the final state after the robot selects the action; and executing each action to obtain a corresponding rewarding value, and when rewarding stability and convergence are considered as successful planning and training is stopped, the strategy obtained through training is the optimal strategy.
2. The method of claim 1, wherein the target point is a welding start point and the planning task is to plan a safe path so that the end of the arm moves to the welding start point.
3. The method of claim 1, wherein the status information includes a position, a speed, an acceleration, and a distance from an obstacle for each joint.
4. The method for planning operation of a robot in a confined space for reinforcement learning as set forth in claim 1, wherein the oscillation phenomenon is prevented during the movement at+1max=m't+1*(at+1max-at+1min),at+1min=at+1min+(1-m't+1)*(at+1max-at+1min).
5. The narrow space robot operation planning method for safety reinforcement learning according to claim 1, wherein the input quantity of the deep reinforcement learning algorithm is state information of a mechanical arm, and an Actor network and a Critic network are set for training; the network structure is 400 multiplied by 300 multiplied by 10 multiplied by 1, the hidden layers all use swish as an activation function, the output layer of the Actor network uses tanh as an activation function, and the output action range is [ -1,1].
6. The narrow space robotic work planning method of claim 1, wherein the reward function for the deep reinforcement learning is R = R target-Raction-Radaptation-Rdistance comprising four terms, the first term R target being a reward term for arm tip to target point distance for training the arm to approach the target point; the second term R action is an action penalty term to avoid actions too close to the limit; the third term R adaptation is a brake penalty term, which is 1 when the action collides to execute the braking action, otherwise, is 0; and the fourth term R distance is a distance penalty term, and if the distance between the rod pieces of the mechanical arm and between each rod piece and the obstacle after the replacement action is executed is smaller than a certain threshold value, a penalty term 1 is applied, otherwise, the penalty term is 0.
CN202210930544.2A 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning Active CN115178944B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210930544.2A CN115178944B (en) 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210930544.2A CN115178944B (en) 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning

Publications (2)

Publication Number Publication Date
CN115178944A CN115178944A (en) 2022-10-14
CN115178944B true CN115178944B (en) 2024-05-24

Family

ID=83520672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210930544.2A Active CN115178944B (en) 2022-08-04 2022-08-04 Narrow space robot operation planning method for safety reinforcement learning

Country Status (1)

Country Link
CN (1) CN115178944B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116551703B (en) * 2023-07-12 2023-09-12 长春工业大学 Motion planning method based on machine learning in complex environment
CN116834018B (en) * 2023-08-07 2024-08-27 南京云创大数据科技股份有限公司 Training method and training device for multi-mechanical arm multi-target searching
CN116900539B (en) * 2023-09-14 2023-12-19 天津大学 Multi-robot task planning method based on graph neural network and reinforcement learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106457565A (en) * 2014-06-03 2017-02-22 阿蒂迈兹机器人技术有限公司 Method and system for programming a robot
CN110315258A (en) * 2019-07-24 2019-10-11 广东工业大学 A kind of welding method based on intensified learning and ant group algorithm
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN110370317A (en) * 2019-07-24 2019-10-25 广东工业大学 Robot restorative procedure and device
CN112454333A (en) * 2020-11-26 2021-03-09 青岛理工大学 Robot teaching system and method based on image segmentation and surface electromyogram signals
CN113163332A (en) * 2021-04-25 2021-07-23 北京邮电大学 Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220105629A1 (en) * 2021-12-16 2022-04-07 Venkat Natarajan Failure rate estimation and reinforcement learning safety factor systems

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106457565A (en) * 2014-06-03 2017-02-22 阿蒂迈兹机器人技术有限公司 Method and system for programming a robot
CN110315258A (en) * 2019-07-24 2019-10-11 广东工业大学 A kind of welding method based on intensified learning and ant group algorithm
CN110370317A (en) * 2019-07-24 2019-10-25 广东工业大学 Robot restorative procedure and device
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN112454333A (en) * 2020-11-26 2021-03-09 青岛理工大学 Robot teaching system and method based on image segmentation and surface electromyogram signals
CN113163332A (en) * 2021-04-25 2021-07-23 北京邮电大学 Road sign graph coloring unmanned aerial vehicle energy-saving endurance data collection method based on metric learning
CN114708293A (en) * 2022-03-22 2022-07-05 广东工业大学 Robot motion estimation method based on deep learning point-line feature and IMU tight coupling

Also Published As

Publication number Publication date
CN115178944A (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN115178944B (en) Narrow space robot operation planning method for safety reinforcement learning
CN108621165B (en) Optimal trajectory planning method for industrial robot dynamics performance in obstacle environment
WO2022088593A1 (en) Robotic arm control method and device, and human-machine cooperation model training method
Liu et al. Algorithmic safety measures for intelligent industrial co-robots
Nicolis et al. Human intention estimation based on neural networks for enhanced collaboration with robots
CN113485323B (en) Flexible formation method for cascading multiple mobile robots
CN115091469B (en) Depth reinforcement learning mechanical arm motion planning method based on maximum entropy frame
CN111702766B (en) Mechanical arm self-adaptive door opening screwing method based on force sense guidance
Bejar et al. Reverse parking a car-like mobile robot with deep reinforcement learning and preview control
Sehgal et al. Automatic parameter optimization using genetic algorithm in deep reinforcement learning for robotic manipulation tasks
Chen et al. New approach to intelligent control systems with self-exploring process
CN115542733A (en) Self-adaptive dynamic window method based on deep reinforcement learning
Brandao et al. Multi-controller multi-objective locomotion planning for legged robots
CN116551703B (en) Motion planning method based on machine learning in complex environment
Samsonov et al. Using Reinforcement Learning for Optimization of a Workpiece Clamping Position in a Machine Tool.
CN111984000A (en) Method and device for automatically influencing an actuator
Tzafestas et al. Fuzzy reinforcement learning control for compliance tasks of robotic manipulators
CN114115341B (en) Intelligent agent cluster cooperative motion method and system
CN114055479A (en) Dragging teaching spraying robot collision early warning method, medium and equipment
CN113467465A (en) Human-in-loop decision modeling and control method for robot system
CN113189986A (en) Two-stage self-adaptive behavior planning method and system for autonomous robot
Chen et al. Mitigating Imminent Collision for Multi-robot Navigation: A TTC-force Reward Shaping Approach
Li et al. Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Young et al. Enhancing Robotic Navigation: An Evaluation of Single and Multi-Objective Reinforcement Learning Strategies
Wawrzyński Autonomous reinforcement learning with experience replay for humanoid gait optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant