CN115890670A - Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning - Google Patents

Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning Download PDF

Info

Publication number
CN115890670A
CN115890670A CN202211451128.0A CN202211451128A CN115890670A CN 115890670 A CN115890670 A CN 115890670A CN 202211451128 A CN202211451128 A CN 202211451128A CN 115890670 A CN115890670 A CN 115890670A
Authority
CN
China
Prior art keywords
mechanical arm
degree
obstacle
freedom
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211451128.0A
Other languages
Chinese (zh)
Inventor
吕楠
张丽秋
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Huiyan Artificial Intelligence Technology Co ltd
Original Assignee
Wuxi Huiyan Artificial Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Huiyan Artificial Intelligence Technology Co ltd filed Critical Wuxi Huiyan Artificial Intelligence Technology Co ltd
Priority to CN202211451128.0A priority Critical patent/CN115890670A/en
Publication of CN115890670A publication Critical patent/CN115890670A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Numerical Control (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a method for training a motion trail of a seven-degree-of-freedom redundant mechanical arm based on intensive deep learning, and relates to the technical field of mechanical arm control. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning comprises the following steps of: strengthening the learning foundation; the space manipulator model: collision detection: setting parameters of the obstacle avoidance motion track of the mechanical arm: DQN arm path planning: three layers of DNN networks are used as the network main bodies of the DQN, and the optimal motion trail is trained automatically through a mechanical arm obstacle avoidance path planning algorithm, so that space obstacles are avoided, and the reinforced deep learning training of the mechanical arm is completed. Through the three layers of DNN networks, the input and output of the state information and the motion joint angle of the mechanical arm are respectively, and the offline training is combined, so that the mechanical arm can automatically train a motion track close to the optimal motion track, successfully avoid the obstacle to reach a target point, and has strong obstacle avoidance capability.

Description

Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning
Technical Field
The invention relates to the technical field of mechanical arm control, in particular to a method for training a motion trail of a seven-degree-of-freedom redundant mechanical arm based on reinforcement deep learning.
Background
For the mechanical arm, the concept of redundancy is relative, which is defined for specific tasks, for a planar task, a commonly used 6-axis (six-degree-of-freedom) mechanical arm is also redundant, but more cases, the case for all tasks is usually adopted, and a three-dimensional space can be described by six degrees of freedom, so that a 7-axis (seven-degree-of-freedom) mechanical arm is called a redundant mechanical arm, additional tasks such as body obstacle avoidance, strangeness avoidance, joint avoidance limitation, joint moment optimization, increased operability and the like are realized by using the redundant degrees of freedom of the redundant mechanical arm, and meanwhile, according to the fact that one arm of a human has seven degrees of freedom, the redundant mechanical arm is more suitable for practical and scene applications from the perspective of bionics.
The seven-degree-of-freedom redundant mechanical arm is restricted by a plurality of constraint conditions such as joint limit, surrounding environment, self collision, dynamic balance and the like in the process of executing a task, particularly in the welding operation of a container, the welding mechanical arm is applied to a container production line on a large scale in order to meet the requirement of high-efficiency operation, collision factors need to be considered in the process of automatic welding due to the fact that a large number of obstacles exist in the operation space, and research on obstacle avoidance path planning of the mechanical arm is necessary in order to smoothly execute the welding operation.
To the problem of obstacle avoidance path planning of a mechanical arm, the prior art includes the following: converting the Cartesian space obstacle into a configuration space obstacle, and then performing collision-free path planning by using an A-star algorithm; the method comprises the steps of randomly sampling a mechanical arm from a J space, mapping the mechanical arm to a C space through forward kinematics, and planning a path by using an improved RRT algorithm; the method comprises the following steps of planning a path of the mechanical arm by adopting an artificial potential field method, artificially simulating a repulsion field at an obstacle, simulating a gravitational field at a target point, and moving the mechanical arm under the action of the two fields; planning the path of the mechanical arm by adopting a genetic algorithm; the mechanical arm working space is mapped to a planar two-dimensional space, then the path planning is carried out on the planar space by using an A-x algorithm, and the path planning is analyzed through the fir tree prior art, but the method can only be used in a specified scene and is lack of flexibility, and meanwhile, the technology needs an accurate environment to construct a model and cannot be well applied to a complex industrial environment.
Therefore, in order to solve the existing problems, a method for training a motion trail of a seven-degree-of-freedom redundant manipulator based on reinforcement deep learning is provided.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for training the motion trail of the seven-degree-of-freedom redundant mechanical arm based on the reinforcement deep learning, and solves the problems of high difficulty and lack of flexibility in the existing welding system and the adjustment action of the seven-degree-of-freedom redundant mechanical arm.
In order to realize the purpose, the invention is realized by the following technical scheme: the method for training the motion trail of the seven-degree-of-freedom redundant mechanical arm based on the reinforced deep learning comprises the following steps of:
step one, strengthening the learning foundation: a Q-l earn i ng algorithm and a deep Q network are used as a logic basis of a mechanical arm reinforcement learning base bottom layer;
step two, a space manipulator model: firstly, modeling is carried out on a redundant mechanical arm, a mathematical model is established on the redundant mechanical arm by adopting a D-H method, meanwhile, the relation between adjacent connecting rods of the mechanical arm is described by adopting a 4 multiplied by 4 homogeneous transformation matrix, and the mechanical arm is adjusted to move positively, so that a three-degree-of-freedom serial mechanical arm model is obtained and is used for simulating the motion trail and the deep learning training of a seven-degree-of-freedom mechanical arm;
step three, collision detection: placing the three-degree-of-freedom serial mechanical arm model obtained in the second step into a three-dimensional obstacle space, wherein a spherical obstacle is arranged in the three-dimensional obstacle space, detecting collision parameters of the mechanical arm and the spherical obstacle, and calculating and analyzing collision results;
step four, setting parameters of the obstacle avoidance motion track of the mechanical arm: setting a reward r, a state s and an action a;
reward r: representing the executing action track of the mechanical arm, and generating rewards by the environment to form reward function parameters;
and a state s: representing state variable parameters generated by interaction of the mechanical arm and the environment;
behavior a: representing a joint rotation angle vector of the mechanical arm;
step five, DQN mechanical arm path planning: the three-layer DNN network is used as a DQN network main body, state information of the mechanical arm is input, the motion joint angle of the mechanical arm is output, the optimal motion track is trained automatically through a mechanical arm obstacle avoidance path planning algorithm, space obstacles are avoided, and the reinforced deep learning training of the mechanical arm is completed.
Preferably, in the second step, the homogeneous transformation matrix of the rotary joint has a general form:
Figure BDA0003951289260000031
preferably, in the third step, the step of judging the collision between the current position of the three-degree-of-freedom tandem mechanical arm and the spherical obstacle is as follows:
step 1, setting the center of a spherical barrier as P (X) o ,Y o ,Z o ) Radius R o The radius of the three-degree-of-freedom tandem mechanical arm is R 1 The two end points of the mechanical arm axis are respectively A (X) 1 ,Y 1 ,Z 1 )、B(X 2 ,Y 2 ,Z 2 ) And simultaneously setting the projection point from the barrier to the straight line as P c (X c ,Y c ,Z c );
Step 2, calculating the direction vector of the straight line as follows: (X) 1 -X 2 ,Y 1 -Y 2 ,Z 1 -Z 2 )=(M,N,P);
And 3, calculating an equation according to the space straight line of the AB as follows: x c =MT+X 1 ,Y c =NT+Y 1 ,Z c =PT+Z 1
Step 4, calculating an equation according to the intersection relation between the connecting line of the barrier and the projection point and the straight line, wherein the equation comprises the following steps: m (X) 0 -X C )+N(Y 0 -Y C )+P(Z 0 -Z c )=0;
Step 5, calculating the coordinate P of the projection point according to the two formulas c (X c ,Y c ,Z c ) And further determining the distance from the obstacle to the mechanical arm: d 1 =√(X 1 -X C ) 2 +(Y 1 -Y C ) 2 +(Z 1 -Z c ) 2 (ii) a Distance of obstacle to end point: d 2 =√(X 1 -X 0 ) 2 +(Y 1 -Y 0 ) 2 +(Z 1 -Z 0 ) 2
Step 6, analyzing the collision detection algorithm result:
case one, the projection of the obstacle is in the line segment area, i.e. X c Belong to (X) 1 -X 2 )、Y c Belong to (Y) 1 ,Y 2 )、Z c Belong to (Z) 1 ,Z 2 ) Then calculate the sphere center R o Distance D to line segment AB 1 ,D 1 Is greater than R o +R 1 If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;
case two, the projection of the obstacle is not in the line segment region, i.e. X c Do not belong to (X) 1 -X 2 )、Y c Do not belong to (Y) 1 ,Y 2 )、Z c Do not belong to (Z) 1 ,Z 2 ) Then calculate the center of sphere R o Distance D to line segment AB 2 ,D 2 Is greater than R o +R 1 If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;
and in the third case, the three shafts do not collide with the barrier, so that the safety is realized, and otherwise, the collision phenomenon is realized.
Preferably, in the fourth step, the reward r is planned for the path of the mechanical arm, and is mainly expressed in that the mechanical arm can smoothly avoid the obstacle and reach the target point, and the number of steps generated at the same time is as small as possible, so that the reward function is set as: one is that the arrival target point is r = r +100, the collision is r = r-100, and the other is r =0.6 × P 1 Or 0.6 XP 2
Preferably, in the fourth step, the vector form of the state s is: s = (X) 4 ,Y 4 ,Z 4 ,X 3 ,Y 3 ,Z 3 ,DS,MS,D 4 ,D 3 )。
Preferably, in the fourth step, the vector form of the behavior a is as follows: a = (θ) 1 、θ 2 、θ 3 )。
Preferably, in the fifth step, the mechanical arm obstacle avoidance path planning algorithm comprises the following steps:
step 1, randomly initializing values Q corresponding to all actions and states, randomly initializing all parameters W of a current Q network, initializing parameters W '= W of a target network Q', and emptying an experience playback set D;
step 2, obtaining an initial state S of the mechanical arm from the environment, selecting an action a according to an epsilon-greedy strategy, starting the mechanical arm to execute the action, and obtaining a reward R from the environment t, Then obtaining the state S' at the next moment;
step 3, storing the interaction process of the mechanical arm and the environment: (S, a, R, S') is present in D;
step 4, randomly sampling n minimum batches of samples from D (S) j ,a j ,R j ,S' j ) Then j =1 in sequence, and the like until the required n times are reached;
step 5, the mechanical arm reaches the termination state to enable y j= R j, Then
Figure BDA0003951289260000041
Step 6, passing through the loss function
Figure BDA0003951289260000042
And performing gradient descent, updating the parameter W, and updating the weight parameter W' = W of the target network every c steps, namely finishing the mechanical arm obstacle avoidance path planning calculation process.
The invention provides a method for training a motion trail of a seven-degree-of-freedom redundant manipulator based on intensive deep learning. The method has the following beneficial effects:
the invention solves the problem of path planning of the mechanical arm by adopting a depth reinforcement learning algorithm, uses a three-layer DNN network, inputs state information of the mechanical arm, outputs the state information as a motion joint angle of the mechanical arm, and combines offline training to enable the seven-degree-of-freedom redundant mechanical arm to automatically train a motion track close to the optimal motion track, successfully avoid an obstacle from reaching a target point, simultaneously simulates on a three-degree-of-freedom spot welding robot on a simulation platform to obtain the motion track of the seven-degree-of-freedom redundant mechanical arm, and simultaneously shows that the mechanical arm adopting the depth reinforcement learning technology can plan a collision-free path for a welding mechanical arm, and has strong obstacle avoidance capability.
Drawings
FIG. 1 is a schematic diagram of a three-degree-of-freedom spot welding robot according to the present invention;
fig. 2 is a schematic diagram of a three-dimensional obstacle space according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
The embodiment is as follows:
as shown in fig. 1-2, an embodiment of the present invention provides a method for training a motion trajectory of a seven-degree-of-freedom redundant manipulator based on reinforcement deep learning, including the following steps:
step one, strengthening the learning foundation: a Q-l earn i ng algorithm and a depth Q network are used as a logic foundation of a mechanical arm reinforcement learning foundation bottom layer;
step two, a space manipulator model: firstly, modeling is carried out on a redundant mechanical arm, the redundant mechanical arm adopts a D-H method to establish a mathematical model, meanwhile, the relation between adjacent connecting rods of the mechanical arm is described by adopting a 4 multiplied by 4 homogeneous transformation matrix, and the mechanical arm is adjusted to move positively, so that a three-degree-of-freedom series-connection mechanical arm model is obtained and used for simulating the movement track and the deep learning training of the seven-degree-of-freedom mechanical arm;
step three, collision detection: placing the three-freedom-degree serial mechanical arm model obtained in the second step in a three-dimensional obstacle space, arranging a spherical obstacle in the three-dimensional obstacle space, detecting collision parameters of the mechanical arm and the spherical obstacle, and measuring, calculating and analyzing collision results;
step four, setting parameters of the obstacle avoidance motion track of the mechanical arm: setting a reward r, a state s and an action a;
reward r: representing the executing action track of the mechanical arm, and generating rewards by the environment to form reward function parameters;
and a state s: representing state variable parameters generated by interaction of the mechanical arm and the environment;
behavior a: representing a joint rotation angle vector of the mechanical arm;
step five, DQN mechanical arm path planning: the three-layer DNN network is used as a DQN network main body, state information of the mechanical arm is input, the motion joint angle of the mechanical arm is output, the optimal motion track is trained automatically through a mechanical arm obstacle avoidance path planning algorithm, space obstacles are avoided, and the reinforced deep learning training of the mechanical arm is completed.
Through the first step and the fifth step, a seven-degree-of-freedom redundant manipulator can automatically train a motion track close to the optimal motion track through a reinforcement learning base, a space manipulator model, collision detection, manipulator obstacle avoidance motion track parameter setting and DQN manipulator path planning, the seven-degree-of-freedom redundant manipulator can successfully avoid the obstacles to reach a target point, the three-degree-of-freedom spot welding robot simulates on a simulation platform, the manipulator outputs a maximum Q motion through a trained neural network in each step, then positive kinematics is used for controlling the manipulator to walk, the manipulator successfully avoids two obstacles to reach the target point through the reinforcement learning model training, the number of motion steps of the manipulator is small and close to the optimal, the motion track of the seven-degree-of-freedom redundant manipulator is further obtained, and meanwhile, the manipulator adopting the deep reinforcement learning technology can plan a collision-free path for a welding manipulator, and has strong obstacle avoidance capability.
In the second step, the common form of the homogeneous transformation matrix of the rotary joint is as follows:
Figure BDA0003951289260000061
in the third step, the judgment step of the collision between the current position of the three-degree-of-freedom series mechanical arm and the spherical obstacle is as follows:
step 1, setting the center of a spherical barrier as P (X) o ,Y o ,Z o ) Radius R o The radius of the three-freedom-degree series mechanical arm is R 1 The two end points of the mechanical arm axis are respectively A (X) 1 ,Y 1 ,Z 1 )、B(X 2 ,Y 2 ,Z 2 ) And simultaneously setting the projection point from the barrier to the straight line as P c (X c ,Y c ,Z c );
Step 2, calculating the direction vector of the straight line as follows: (X) 1 -X 2 ,Y 1 -Y 2 ,Z 1 -Z 2 )=(M,N,P);
And 3, calculating an equation according to the space straight line of the AB as follows: x c =MT+X 1 ,Y c =NT+Y 1 ,Z c =PT+Z 1
Step 4, calculating an equation according to the intersection relation between the connecting line of the barrier and the projection point and the straight line, wherein the equation comprises the following steps: m (X) 0 -X C )+N(Y 0 -Y C )+P(Z 0 -Z c )=0;
Step 5, calculating the coordinate P of the projection point according to the two formulas c (X c ,Y c ,Z c ) And further determining the distance from the obstacle to the mechanical arm: d 1 =√(X 1 -X C ) 2 +(Y 1 -Y C ) 2 +(Z 1 -Z c ) 2 (ii) a Distance of obstacle to end point: d 2 =√(X 1 -X 0 ) 2 +(Y 1 -Y 0 ) 2 +(Z 1 -Z 0 ) 2
Step 6, analyzing the collision detection algorithm result:
case one, the projection of the obstacle is in the line segment area, i.e. X c Belong to (X) 1 -X 2 )、Y c Belong to (Y) 1 ,Y 2 )、Z c Belong to (Z) 1 ,Z 2 ) Then calculate the sphere center R o Distance D to line segment AB 1 ,D 1 Is greater than R o +R 1 If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;
case two, the projection of the obstacle is not in the line segment area, i.e. X c Do not belong to(X 1 -X 2 )、Y c Do not belong to (Y) 1 ,Y 2 )、Z c Do not belong to (Z) 1 ,Z 2 ) Then calculate the sphere center R o Distance D to line segment AB 2 ,D 2 Is greater than R o +R 1 If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;
and in the third case, the three shafts do not collide with the barrier, so that the safety is realized, and otherwise, the collision phenomenon is realized.
In the fourth step, the reward r is planned aiming at the path of the mechanical arm, which mainly shows that the mechanical arm can smoothly avoid the obstacle and reach the target point, and the number of generated steps is as few as possible, so that the reward function is set as: one is that the arrival target point is r = r +100, the collision is r = r-100, and the other is r =0.6 × P 1 Or 0.6 XP 2
In step four, the vector form of state s is: s = (X) 4 ,Y 4 ,Z 4 ,X 3 ,Y 3 ,Z 3 ,DS,MS,D 4 ,D 3 )。
In step four, the vector form of behavior a is: a = (θ) 1 、θ 2 、θ 3 )。
In the fifth step, the mechanical arm obstacle avoidance path planning algorithm comprises the following steps:
step 1, randomly initializing values Q corresponding to all actions and states, randomly initializing all parameters W of a current Q network, initializing parameters W '= W of a target network Q', and emptying an experience playback set D;
step 2, obtaining an initial state S of the mechanical arm from the environment, selecting an action a according to an epsilon-greedy strategy, starting the mechanical arm to execute the action, and obtaining a reward R from the environment t, Then obtaining the state S' of the next moment;
and 3, storing an interaction process of the mechanical arm and the environment: (S, a, R, S') is present in D;
step 4, randomly sampling n minimum batches of samples from D (S) j ,a j ,R j ,S' j ) Then j =1 in sequence, and the like until the required n times are reached;
step 5, the mechanical arm reaches the termination state to enable y j= R j, Then
Figure BDA0003951289260000081
Step 6, passing the loss function
Figure BDA0003951289260000082
And (4) executing gradient descent, updating the parameter W, and updating the weight parameter W' = W of the target network every c steps, namely completing the calculation process of planning the obstacle avoidance path of the mechanical arm.
According to the method, the seven-degree-of-freedom redundant mechanical arm subjected to spot welding is taken as a research object, automatic spot welding of the mechanical arm in a three-dimensional space is converted into obstacle avoidance path planning of the mechanical arm, and the mechanical arm can smoothly plan a collision-free welding path through a certain training step.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (7)

1. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforced deep learning is characterized by comprising the following steps of: the method comprises the following steps:
step one, strengthening the learning foundation: a Q-learning algorithm and a deep Q network are adopted as the logic foundation of the mechanical arm reinforcement learning foundation bottom layer;
step two, a space manipulator model: firstly, modeling is carried out on a redundant mechanical arm, a mathematical model is established on the redundant mechanical arm by adopting a D-H method, meanwhile, the relation between adjacent connecting rods of the mechanical arm is described by adopting a 4 multiplied by 4 homogeneous transformation matrix, and the mechanical arm is adjusted to move positively, so that a three-degree-of-freedom serial mechanical arm model is obtained and is used for simulating the motion trail and the deep learning training of a seven-degree-of-freedom mechanical arm;
step three, collision detection: placing the three-degree-of-freedom serial mechanical arm model obtained in the second step into a three-dimensional obstacle space, wherein a spherical obstacle is arranged in the three-dimensional obstacle space, detecting collision parameters of the mechanical arm and the spherical obstacle, and calculating and analyzing collision results;
step four, setting parameters of the obstacle avoidance motion track of the mechanical arm: setting a reward r, a state s and an action a;
reward r: representing the executing action track of the mechanical arm, and generating rewards by the environment to form reward function parameters;
and a state s: representing state variable parameters generated by interaction of the mechanical arm and the environment;
behavior a: representing a joint rotation angle vector of the mechanical arm;
step five, DQN mechanical arm path planning: the three-layer DNN network is used as a DQN network main body, state information of the mechanical arm is input, the motion joint angle of the mechanical arm is output, the optimal motion track is trained automatically through a mechanical arm obstacle avoidance path planning algorithm, space obstacles are avoided, and the reinforced deep learning training of the mechanical arm is completed.
2. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the second step, the homogeneous transformation matrix of the rotary joint has a general form:
Figure FDA0003951289250000011
3. the method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the third step, the step of judging the collision between the current position of the three-degree-of-freedom tandem mechanical arm and the spherical obstacle is as follows:
step 1, setting the center of a spherical barrier as P (X) o ,Y o ,Z o ) Radius R o Three-freedom series machineryRadius of the arm being R 1 The two end points of the mechanical arm axis are respectively A (X) 1 ,Y 1 ,Z 1 )、B(X 2 ,Y 2 ,Z 2 ) And simultaneously setting the projection point from the barrier to the straight line as P c (X c ,Y c ,Z c );
Step 2, calculating the direction vector of the straight line as follows: (X) 1 -X 2 ,Y 1 -Y 2 ,Z 1 -Z 2 )=(M,N,P);
And 3, calculating an equation according to the space straight line of the AB as follows: x c =MT+X 1 ,Y c =NT+Y 1 ,Z c =PT+Z 1
Step 4, calculating an equation according to the intersection relation between the connecting line of the barrier and the projection point and the straight line as follows: m (X) 0 -X C )+N(Y 0 -Y C )+P(Z 0 -Z c )=0;
Step 5, calculating the coordinate P of the projection point according to the two formulas c (X c ,Y c ,Z c ) And further determining the distance from the obstacle to the mechanical arm: d 1 =√(X 1 -X C ) 2 +(Y 1 -Y C ) 2 +(Z 1 -Z c ) 2 (ii) a Distance of obstacle to end point: d 2 =√(X 1 -X 0 ) 2 +(Y 1 -Y 0 ) 2 +(Z 1 -Z 0 ) 2
Step 6, analyzing the collision detection algorithm result:
case one, the projection of the obstacle is in the line segment area, i.e. X c Belong to (X) 1 -X 2 )、Y c Belong to (Y) 1 ,Y 2 )、Z c Belong to (Z) 1 ,Z 2 ) Then calculate the sphere center R o Distance D to line segment AB 1 ,D 1 Is greater than R o +R 1 If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;
case two, the projection of the obstacle is not in the line segment area, i.e. X c Do not belong to (X) 1 -X 2 )、Y c Do not belong to (Y) 1 ,Y 2 )、Z c Do not belong to (Z) 1 ,Z 2 ) Then calculate the sphere center R o Distance D to line segment AB 2 ,D 2 Is greater than R o +R 1 If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;
and in the third condition, the three shafts do not collide with the barrier, so that the safety is achieved, and otherwise, the collision phenomenon is achieved.
4. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fourth step, the reward r is planned aiming at the path of the mechanical arm, which mainly shows that the mechanical arm can smoothly avoid the obstacle and reach the target point, and the number of generated steps is as few as possible, so that the reward function is set as follows: one is that the arrival target point is r = r +100, the collision is r = r-100, and the other is r =0.6 × P 1 Or 0.6 XP 2
5. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fourth step, the vector form of the state s is: s = (X) 4 ,Y 4 ,Z 4 ,X 3 ,Y 3 ,Z 3 ,DS,MS,D 4 ,D 3 )。
6. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fourth step, the vector form of the behavior a is as follows: a = (θ) 1 、θ 2 、θ 3 )。
7. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fifth step, the mechanical arm obstacle avoidance path planning algorithm comprises the following steps:
step 1, randomly initializing values Q corresponding to all actions and states, randomly initializing all parameters W of a current Q network, initializing parameters W '= W of a target network Q', and emptying an experience playback set D;
step 2, obtaining an initial state S of the mechanical arm from the environment, selecting an action a according to an epsilon-greedy strategy, starting the mechanical arm to execute the action, and obtaining a reward R from the environment t, Then obtaining the state S' of the next moment;
and 3, storing an interaction process of the mechanical arm and the environment: (S, a, R, S') is present in D;
step 4, randomly sampling n minimum batches of samples from D (S) j ,a j ,R j ,S' j ) Then j =1 in turn, and so on to reach the required n times;
step 5, the mechanical arm reaches the termination state to enable y j= R j, Then
Figure FDA0003951289250000031
Step 6, passing the loss function
Figure FDA0003951289250000032
And performing gradient descent, updating the parameter W, and updating the weight parameter W' = W of the target network every c steps, namely finishing the mechanical arm obstacle avoidance path planning calculation process. />
CN202211451128.0A 2022-11-19 2022-11-19 Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning Pending CN115890670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211451128.0A CN115890670A (en) 2022-11-19 2022-11-19 Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211451128.0A CN115890670A (en) 2022-11-19 2022-11-19 Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning

Publications (1)

Publication Number Publication Date
CN115890670A true CN115890670A (en) 2023-04-04

Family

ID=86474034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211451128.0A Pending CN115890670A (en) 2022-11-19 2022-11-19 Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning

Country Status (1)

Country Link
CN (1) CN115890670A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116604571A (en) * 2023-07-14 2023-08-18 湖南大学 Depth reinforcement learning-based robot three-dimensional measurement path planning method
CN117162103A (en) * 2023-11-01 2023-12-05 中山大学 Redundant robot self-collision avoidance control method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116604571A (en) * 2023-07-14 2023-08-18 湖南大学 Depth reinforcement learning-based robot three-dimensional measurement path planning method
CN116604571B (en) * 2023-07-14 2023-10-27 湖南大学 Depth reinforcement learning-based robot three-dimensional measurement path planning method
CN117162103A (en) * 2023-11-01 2023-12-05 中山大学 Redundant robot self-collision avoidance control method
CN117162103B (en) * 2023-11-01 2024-02-09 中山大学 Redundant robot self-collision avoidance control method

Similar Documents

Publication Publication Date Title
CN110682286B (en) Real-time obstacle avoidance method for cooperative robot
Köker et al. A study of neural network based inverse kinematics solution for a three-joint robot
CN115890670A (en) Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning
US20210299860A1 (en) Method and system for robot action imitation learning in three-dimensional space
CN112338921A (en) Mechanical arm intelligent control rapid training method based on deep reinforcement learning
CN112140101A (en) Trajectory planning method, device and system
CN114237235B (en) Mobile robot obstacle avoidance method based on deep reinforcement learning
CN112711261B (en) Multi-agent formation planning method based on local visual field
CN115416016A (en) Mechanical arm obstacle avoidance path planning method based on improved artificial potential field method
CN112966816A (en) Multi-agent reinforcement learning method surrounded by formation
Chen et al. Optimizing the obstacle avoidance trajectory and positioning error of robotic manipulators using multigroup ant colony and quantum behaved particle swarm optimization algorithms
Zhu et al. Deep reinforcement learning for real-time assembly planning in robot-based prefabricated construction
Wang et al. Learning of long-horizon sparse-reward robotic manipulator tasks with base controllers
CN112434464B (en) Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG algorithm
Chauhan et al. Forward kinematics of the Stewart parallel manipulator using machine learning
Wang et al. An online collision-free trajectory generation algorithm for human–robot collaboration
Antonio-Gopar et al. Inverse kinematics for a manipulator robot based on differential evolution algorithm
Xu et al. Avoidance of manual labeling in robotic autonomous navigation through multi-sensory semi-supervised learning
CN113043278B (en) Mechanical arm track planning method based on improved whale searching method
Mousa et al. Path planning for a 6 DoF robotic arm based on whale optimization algorithm and genetic algorithm
Tian et al. Fruit Picking Robot Arm Training Solution Based on Reinforcement Learning in Digital Twin
dos Santos et al. Planning and learning for cooperative construction task with quadrotors
CN115366099A (en) Mechanical arm depth certainty strategy gradient training method based on forward kinematics
Rybak et al. Development of an algorithm for managing a multi-robot system for cargo transportation based on reinforcement learning in a virtual environment
Wang et al. Optimizing robot arm reaching ability with different joints functionality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination