CN115890670A

CN115890670A - Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning

Info

Publication number: CN115890670A
Application number: CN202211451128.0A
Authority: CN
Inventors: 吕楠; 张丽秋; 刘伟
Original assignee: Wuxi Huiyan Artificial Intelligence Technology Co ltd
Current assignee: Wuxi Huiyan Artificial Intelligence Technology Co ltd
Priority date: 2022-11-19
Filing date: 2022-11-19
Publication date: 2023-04-04

Abstract

The invention provides a method for training a motion trail of a seven-degree-of-freedom redundant mechanical arm based on intensive deep learning, and relates to the technical field of mechanical arm control. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning comprises the following steps of: strengthening the learning foundation; the space manipulator model: collision detection: setting parameters of the obstacle avoidance motion track of the mechanical arm: DQN arm path planning: three layers of DNN networks are used as the network main bodies of the DQN, and the optimal motion trail is trained automatically through a mechanical arm obstacle avoidance path planning algorithm, so that space obstacles are avoided, and the reinforced deep learning training of the mechanical arm is completed. Through the three layers of DNN networks, the input and output of the state information and the motion joint angle of the mechanical arm are respectively, and the offline training is combined, so that the mechanical arm can automatically train a motion track close to the optimal motion track, successfully avoid the obstacle to reach a target point, and has strong obstacle avoidance capability.

Description

Method for training motion trail of seven-degree-of-freedom redundant mechanical arm based on intensive deep learning

Technical Field

The invention relates to the technical field of mechanical arm control, in particular to a method for training a motion trail of a seven-degree-of-freedom redundant mechanical arm based on reinforcement deep learning.

Background

For the mechanical arm, the concept of redundancy is relative, which is defined for specific tasks, for a planar task, a commonly used 6-axis (six-degree-of-freedom) mechanical arm is also redundant, but more cases, the case for all tasks is usually adopted, and a three-dimensional space can be described by six degrees of freedom, so that a 7-axis (seven-degree-of-freedom) mechanical arm is called a redundant mechanical arm, additional tasks such as body obstacle avoidance, strangeness avoidance, joint avoidance limitation, joint moment optimization, increased operability and the like are realized by using the redundant degrees of freedom of the redundant mechanical arm, and meanwhile, according to the fact that one arm of a human has seven degrees of freedom, the redundant mechanical arm is more suitable for practical and scene applications from the perspective of bionics.

The seven-degree-of-freedom redundant mechanical arm is restricted by a plurality of constraint conditions such as joint limit, surrounding environment, self collision, dynamic balance and the like in the process of executing a task, particularly in the welding operation of a container, the welding mechanical arm is applied to a container production line on a large scale in order to meet the requirement of high-efficiency operation, collision factors need to be considered in the process of automatic welding due to the fact that a large number of obstacles exist in the operation space, and research on obstacle avoidance path planning of the mechanical arm is necessary in order to smoothly execute the welding operation.

To the problem of obstacle avoidance path planning of a mechanical arm, the prior art includes the following: converting the Cartesian space obstacle into a configuration space obstacle, and then performing collision-free path planning by using an A-star algorithm; the method comprises the steps of randomly sampling a mechanical arm from a J space, mapping the mechanical arm to a C space through forward kinematics, and planning a path by using an improved RRT algorithm; the method comprises the following steps of planning a path of the mechanical arm by adopting an artificial potential field method, artificially simulating a repulsion field at an obstacle, simulating a gravitational field at a target point, and moving the mechanical arm under the action of the two fields; planning the path of the mechanical arm by adopting a genetic algorithm; the mechanical arm working space is mapped to a planar two-dimensional space, then the path planning is carried out on the planar space by using an A-x algorithm, and the path planning is analyzed through the fir tree prior art, but the method can only be used in a specified scene and is lack of flexibility, and meanwhile, the technology needs an accurate environment to construct a model and cannot be well applied to a complex industrial environment.

Therefore, in order to solve the existing problems, a method for training a motion trail of a seven-degree-of-freedom redundant manipulator based on reinforcement deep learning is provided.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method for training the motion trail of the seven-degree-of-freedom redundant mechanical arm based on the reinforcement deep learning, and solves the problems of high difficulty and lack of flexibility in the existing welding system and the adjustment action of the seven-degree-of-freedom redundant mechanical arm.

In order to realize the purpose, the invention is realized by the following technical scheme: the method for training the motion trail of the seven-degree-of-freedom redundant mechanical arm based on the reinforced deep learning comprises the following steps of:

step one, strengthening the learning foundation: a Q-l earn i ng algorithm and a deep Q network are used as a logic basis of a mechanical arm reinforcement learning base bottom layer;

step two, a space manipulator model: firstly, modeling is carried out on a redundant mechanical arm, a mathematical model is established on the redundant mechanical arm by adopting a D-H method, meanwhile, the relation between adjacent connecting rods of the mechanical arm is described by adopting a 4 multiplied by 4 homogeneous transformation matrix, and the mechanical arm is adjusted to move positively, so that a three-degree-of-freedom serial mechanical arm model is obtained and is used for simulating the motion trail and the deep learning training of a seven-degree-of-freedom mechanical arm;

step three, collision detection: placing the three-degree-of-freedom serial mechanical arm model obtained in the second step into a three-dimensional obstacle space, wherein a spherical obstacle is arranged in the three-dimensional obstacle space, detecting collision parameters of the mechanical arm and the spherical obstacle, and calculating and analyzing collision results;

step four, setting parameters of the obstacle avoidance motion track of the mechanical arm: setting a reward r, a state s and an action a;

reward r: representing the executing action track of the mechanical arm, and generating rewards by the environment to form reward function parameters;

and a state s: representing state variable parameters generated by interaction of the mechanical arm and the environment;

behavior a: representing a joint rotation angle vector of the mechanical arm;

step five, DQN mechanical arm path planning: the three-layer DNN network is used as a DQN network main body, state information of the mechanical arm is input, the motion joint angle of the mechanical arm is output, the optimal motion track is trained automatically through a mechanical arm obstacle avoidance path planning algorithm, space obstacles are avoided, and the reinforced deep learning training of the mechanical arm is completed.

Preferably, in the second step, the homogeneous transformation matrix of the rotary joint has a general form:

preferably, in the third step, the step of judging the collision between the current position of the three-degree-of-freedom tandem mechanical arm and the spherical obstacle is as follows:

step 1, setting the center of a spherical barrier as P (X) _o ,Y _o ,Z _o ) Radius R _o The radius of the three-degree-of-freedom tandem mechanical arm is R ₁ The two end points of the mechanical arm axis are respectively A (X) ₁ ,Y ₁ ,Z ₁ )、B(X ₂ ,Y ₂ ,Z ₂ ) And simultaneously setting the projection point from the barrier to the straight line as P _c (X _c ,Y _c ,Z _c )；

Step 2, calculating the direction vector of the straight line as follows: (X) ₁ -X ₂ ,Y ₁ -Y ₂ ,Z ₁ -Z ₂ )＝(M,N,P)；

And 3, calculating an equation according to the space straight line of the AB as follows: x _c ＝MT+X ₁ ,Y _c ＝NT+Y ₁ ,Z _c ＝PT+Z ₁ ；

Step 4, calculating an equation according to the intersection relation between the connecting line of the barrier and the projection point and the straight line, wherein the equation comprises the following steps: m (X) ₀ -X _C )+N(Y ₀ -Y _C )+P(Z ₀ -Z _c )＝0；

Step 5, calculating the coordinate P of the projection point according to the two formulas _c (X _c ,Y _c ,Z _c ) And further determining the distance from the obstacle to the mechanical arm: d ₁ ＝√(X ₁ -X _C ) ² +(Y ₁ -Y _C ) ² +(Z ₁ -Z _c ) ² (ii) a Distance of obstacle to end point: d ₂ ＝√(X ₁ -X ₀ ) ² +(Y ₁ -Y ₀ ) ² +(Z ₁ -Z ₀ ) ² ；

Step 6, analyzing the collision detection algorithm result:

case one, the projection of the obstacle is in the line segment area, i.e. X _c Belong to (X) ₁ -X ₂ )、Y _c Belong to (Y) ₁ ,Y ₂ )、Z _c Belong to (Z) ₁ ，Z ₂ ) Then calculate the sphere center R _o Distance D to line segment AB ₁ ，D ₁ Is greater than R _o +R ₁ If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;

case two, the projection of the obstacle is not in the line segment region, i.e. X _c Do not belong to (X) ₁ -X ₂ )、Y _c Do not belong to (Y) ₁ ,Y ₂ )、Z _c Do not belong to (Z) ₁ ，Z ₂ ) Then calculate the center of sphere R _o Distance D to line segment AB ₂ ，D ₂ Is greater than R _o +R ₁ If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;

and in the third case, the three shafts do not collide with the barrier, so that the safety is realized, and otherwise, the collision phenomenon is realized.

Preferably, in the fourth step, the reward r is planned for the path of the mechanical arm, and is mainly expressed in that the mechanical arm can smoothly avoid the obstacle and reach the target point, and the number of steps generated at the same time is as small as possible, so that the reward function is set as: one is that the arrival target point is r = r +100, the collision is r = r-100, and the other is r =0.6 × P ₁ Or 0.6 XP ₂ 。

Preferably, in the fourth step, the vector form of the state s is: s = (X) ₄ ，Y ₄ ，Z ₄ ，X ₃ ，Y ₃ ，Z ₃ ，DS,MS,D ₄ ,D ₃ )。

Preferably, in the fourth step, the vector form of the behavior a is as follows: a = (θ) ₁ 、θ ₂ 、θ ₃ )。

Preferably, in the fifth step, the mechanical arm obstacle avoidance path planning algorithm comprises the following steps:

step 1, randomly initializing values Q corresponding to all actions and states, randomly initializing all parameters W of a current Q network, initializing parameters W '= W of a target network Q', and emptying an experience playback set D;

step 2, obtaining an initial state S of the mechanical arm from the environment, selecting an action a according to an epsilon-greedy strategy, starting the mechanical arm to execute the action, and obtaining a reward R from the environment _t， Then obtaining the state S' at the next moment;

step 3, storing the interaction process of the mechanical arm and the environment: (S, a, R, S') is present in D;

step 4, randomly sampling n minimum batches of samples from D (S) _j ,a _j ，R _j ，S' _j ) Then j =1 in sequence, and the like until the required n times are reached;

step 5, the mechanical arm reaches the termination state to enable y _j＝ R _j， Then

Step 6, passing through the loss function

And performing gradient descent, updating the parameter W, and updating the weight parameter W' = W of the target network every c steps, namely finishing the mechanical arm obstacle avoidance path planning calculation process.

The invention provides a method for training a motion trail of a seven-degree-of-freedom redundant manipulator based on intensive deep learning. The method has the following beneficial effects:

the invention solves the problem of path planning of the mechanical arm by adopting a depth reinforcement learning algorithm, uses a three-layer DNN network, inputs state information of the mechanical arm, outputs the state information as a motion joint angle of the mechanical arm, and combines offline training to enable the seven-degree-of-freedom redundant mechanical arm to automatically train a motion track close to the optimal motion track, successfully avoid an obstacle from reaching a target point, simultaneously simulates on a three-degree-of-freedom spot welding robot on a simulation platform to obtain the motion track of the seven-degree-of-freedom redundant mechanical arm, and simultaneously shows that the mechanical arm adopting the depth reinforcement learning technology can plan a collision-free path for a welding mechanical arm, and has strong obstacle avoidance capability.

Drawings

FIG. 1 is a schematic diagram of a three-degree-of-freedom spot welding robot according to the present invention;

fig. 2 is a schematic diagram of a three-dimensional obstacle space according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The embodiment is as follows:

as shown in fig. 1-2, an embodiment of the present invention provides a method for training a motion trajectory of a seven-degree-of-freedom redundant manipulator based on reinforcement deep learning, including the following steps:

step one, strengthening the learning foundation: a Q-l earn i ng algorithm and a depth Q network are used as a logic foundation of a mechanical arm reinforcement learning foundation bottom layer;

step two, a space manipulator model: firstly, modeling is carried out on a redundant mechanical arm, the redundant mechanical arm adopts a D-H method to establish a mathematical model, meanwhile, the relation between adjacent connecting rods of the mechanical arm is described by adopting a 4 multiplied by 4 homogeneous transformation matrix, and the mechanical arm is adjusted to move positively, so that a three-degree-of-freedom series-connection mechanical arm model is obtained and used for simulating the movement track and the deep learning training of the seven-degree-of-freedom mechanical arm;

step three, collision detection: placing the three-freedom-degree serial mechanical arm model obtained in the second step in a three-dimensional obstacle space, arranging a spherical obstacle in the three-dimensional obstacle space, detecting collision parameters of the mechanical arm and the spherical obstacle, and measuring, calculating and analyzing collision results;

behavior a: representing a joint rotation angle vector of the mechanical arm;

Through the first step and the fifth step, a seven-degree-of-freedom redundant manipulator can automatically train a motion track close to the optimal motion track through a reinforcement learning base, a space manipulator model, collision detection, manipulator obstacle avoidance motion track parameter setting and DQN manipulator path planning, the seven-degree-of-freedom redundant manipulator can successfully avoid the obstacles to reach a target point, the three-degree-of-freedom spot welding robot simulates on a simulation platform, the manipulator outputs a maximum Q motion through a trained neural network in each step, then positive kinematics is used for controlling the manipulator to walk, the manipulator successfully avoids two obstacles to reach the target point through the reinforcement learning model training, the number of motion steps of the manipulator is small and close to the optimal, the motion track of the seven-degree-of-freedom redundant manipulator is further obtained, and meanwhile, the manipulator adopting the deep reinforcement learning technology can plan a collision-free path for a welding manipulator, and has strong obstacle avoidance capability.

In the second step, the common form of the homogeneous transformation matrix of the rotary joint is as follows:

in the third step, the judgment step of the collision between the current position of the three-degree-of-freedom series mechanical arm and the spherical obstacle is as follows:

step 1, setting the center of a spherical barrier as P (X) _o ,Y _o ,Z _o ) Radius R _o The radius of the three-freedom-degree series mechanical arm is R ₁ The two end points of the mechanical arm axis are respectively A (X) ₁ ,Y ₁ ,Z ₁ )、B(X ₂ ,Y ₂ ,Z ₂ ) And simultaneously setting the projection point from the barrier to the straight line as P _c (X _c ,Y _c ,Z _c )；

Step 6, analyzing the collision detection algorithm result:

case two, the projection of the obstacle is not in the line segment area, i.e. X _c Do not belong to(X ₁ -X ₂ )、Y _c Do not belong to (Y) ₁ ,Y ₂ )、Z _c Do not belong to (Z) ₁ ，Z ₂ ) Then calculate the sphere center R _o Distance D to line segment AB ₂ ，D ₂ Is greater than R _o +R ₁ If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;

In the fourth step, the reward r is planned aiming at the path of the mechanical arm, which mainly shows that the mechanical arm can smoothly avoid the obstacle and reach the target point, and the number of generated steps is as few as possible, so that the reward function is set as: one is that the arrival target point is r = r +100, the collision is r = r-100, and the other is r =0.6 × P ₁ Or 0.6 XP ₂ 。

In step four, the vector form of state s is: s = (X) ₄ ，Y ₄ ，Z ₄ ，X ₃ ，Y ₃ ，Z ₃ ，DS,MS,D ₄ ,D ₃ )。

In step four, the vector form of behavior a is: a = (θ) ₁ 、θ ₂ 、θ ₃ )。

In the fifth step, the mechanical arm obstacle avoidance path planning algorithm comprises the following steps:

step 2, obtaining an initial state S of the mechanical arm from the environment, selecting an action a according to an epsilon-greedy strategy, starting the mechanical arm to execute the action, and obtaining a reward R from the environment _t， Then obtaining the state S' of the next moment;

and 3, storing an interaction process of the mechanical arm and the environment: (S, a, R, S') is present in D;

Step 6, passing the loss function

And (4) executing gradient descent, updating the parameter W, and updating the weight parameter W' = W of the target network every c steps, namely completing the calculation process of planning the obstacle avoidance path of the mechanical arm.

According to the method, the seven-degree-of-freedom redundant mechanical arm subjected to spot welding is taken as a research object, automatic spot welding of the mechanical arm in a three-dimensional space is converted into obstacle avoidance path planning of the mechanical arm, and the mechanical arm can smoothly plan a collision-free welding path through a certain training step.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforced deep learning is characterized by comprising the following steps of: the method comprises the following steps:

step one, strengthening the learning foundation: a Q-learning algorithm and a deep Q network are adopted as the logic foundation of the mechanical arm reinforcement learning foundation bottom layer;

behavior a: representing a joint rotation angle vector of the mechanical arm;

2. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the second step, the homogeneous transformation matrix of the rotary joint has a general form:

3. the method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the third step, the step of judging the collision between the current position of the three-degree-of-freedom tandem mechanical arm and the spherical obstacle is as follows:

step 1, setting the center of a spherical barrier as P (X) _o ,Y _o ,Z _o ) Radius R _o Three-freedom series machineryRadius of the arm being R ₁ The two end points of the mechanical arm axis are respectively A (X) ₁ ,Y ₁ ,Z ₁ )、B(X ₂ ,Y ₂ ,Z ₂ ) And simultaneously setting the projection point from the barrier to the straight line as P _c (X _c ,Y _c ,Z _c )；

Step 4, calculating an equation according to the intersection relation between the connecting line of the barrier and the projection point and the straight line as follows: m (X) ₀ -X _C )+N(Y ₀ -Y _C )+P(Z ₀ -Z _c )＝0；

Step 6, analyzing the collision detection algorithm result:

case two, the projection of the obstacle is not in the line segment area, i.e. X _c Do not belong to (X) ₁ -X ₂ )、Y _c Do not belong to (Y) ₁ ,Y ₂ )、Z _c Do not belong to (Z) ₁ ，Z ₂ ) Then calculate the sphere center R _o Distance D to line segment AB ₂ ，D ₂ Is greater than R _o +R ₁ If so, no collision phenomenon exists, otherwise, the collision phenomenon occurs;

and in the third condition, the three shafts do not collide with the barrier, so that the safety is achieved, and otherwise, the collision phenomenon is achieved.

4. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fourth step, the reward r is planned aiming at the path of the mechanical arm, which mainly shows that the mechanical arm can smoothly avoid the obstacle and reach the target point, and the number of generated steps is as few as possible, so that the reward function is set as follows: one is that the arrival target point is r = r +100, the collision is r = r-100, and the other is r =0.6 × P ₁ Or 0.6 XP ₂ 。

5. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fourth step, the vector form of the state s is: s = (X) ₄ ，Y ₄ ，Z ₄ ，X ₃ ，Y ₃ ，Z ₃ ，DS,MS,D ₄ ,D ₃ )。

6. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fourth step, the vector form of the behavior a is as follows: a = (θ) ₁ 、θ ₂ 、θ ₃ )。

7. The method for training the motion trail of the seven-degree-of-freedom redundant manipulator based on the reinforcement deep learning as claimed in claim 1, wherein the method comprises the following steps: in the fifth step, the mechanical arm obstacle avoidance path planning algorithm comprises the following steps:

step 4, randomly sampling n minimum batches of samples from D (S) _j ,a _j ，R _j ，S' _j ) Then j =1 in turn, and so on to reach the required n times;

Step 6, passing the loss function

And performing gradient descent, updating the parameter W, and updating the weight parameter W' = W of the target network every c steps, namely finishing the mechanical arm obstacle avoidance path planning calculation process. />