CN113787514B

CN113787514B - Mechanical arm dynamic collision avoidance planning method

Info

Publication number: CN113787514B
Application number: CN202110713794.6A
Authority: CN
Inventors: 程良伦; 陈肇江; 王涛
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2022-12-23
Anticipated expiration: 2041-06-25
Also published as: CN113787514A

Abstract

The invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps: s1, constructing a system dynamic equation of the mechanical arm; s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation; s3, constructing an environment punishment model; s4, constructing a target prediction function according to the original prediction function and the environment penalty model; s5, optimizing the target prediction function to obtain a control sequence; s6, training the environment punishment model according to the control sequence until the environment punishment model converges; the method has the advantages of good control effect, strong robustness and support for online optimization.

Description

Mechanical arm dynamic collision avoidance planning method

Technical Field

The invention relates to the technical field of robot manufacturing, in particular to a dynamic collision avoidance planning method for a mechanical arm.

Background

Most of industrial mechanical arms put into industrial production at present finish path planning by a manual teaching method so as to finish production processes such as welding, spraying, stacking, carrying, assembling, processing and the like. The manual teaching method has a certain role in dealing with a single and repeated specific task.

However, with the increasing diversification of production tasks and the increasing complexity of task scenes, the disadvantages of the existing manual teaching methods, such as complicated and tedious operation, poor generality, and low precision, are gradually exposed. And with the increasing requirements on the working efficiency and the working precision of the mechanical arm in the field of intelligent production, the lagging manual teaching method cannot meet the requirements. In order to solve the problem of path planning of the mechanical arm, many algorithms and their variants are developed, but most of these algorithms only support offline planning and cannot operate in a dynamic obstacle environment, so that the mechanical arm does not have the capability of coping with sudden hazards.

Disclosure of Invention

The invention aims to provide a mechanical arm dynamic collision avoidance planning method which is good in control effect, strong in robustness and supports online optimization.

In order to achieve the purpose, the invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps:

s1, constructing a system dynamic equation of the mechanical arm;

s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;

s3, constructing an environment punishment model;

s4, constructing a target prediction function according to the original prediction function and the environment penalty model;

s5, optimizing the target prediction function to obtain a control sequence;

and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.

Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.

Preferably, the step S6 specifically includes:

s61, initializing the weight of the environmental penalty in the target prediction function;

s62, assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;

and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.

Specifically, the step S63 specifically includes:

and optimizing the weight of the environment penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.

Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.

Preferably, the step S1 specifically includes:

s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;

and S12, calculating according to a discretized system dynamic equation to obtain an original prediction function of the mechanical arm, wherein the original prediction function comprises parameters related to control quantity.

Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous time, the control increments are combinations of current control quantities and control quantities at a previous time, and the environmental penalty model is used for predicting the state of the mechanical arm in the control sequence.

Correspondingly, the invention also discloses a dynamic collision avoidance planning device for the industrial mechanical arm, which comprises:

a first construction unit configured to construct a system dynamic equation of the robot arm;

a calculation unit configured to calculate an original prediction function of the robot arm from the system dynamic equation;

a second construction unit configured for constructing an environmental penalty model;

a third construction unit, configured to construct a target prediction function according to the original prediction function and an environmental penalty model;

an optimization unit configured to optimize the objective prediction function to obtain a control sequence;

a training unit configured to train the environmental penalty model according to the control sequence until the environmental penalty model converges.

Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.

Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.

Compared with the prior art, the target prediction function is constructed according to the original prediction function and the environment punishment model, the environment punishment model is trained according to the control sequence until the environment punishment model converges, the target prediction function is combined with the original prediction function and the environment punishment model, the target prediction function can be more actually fitted, the environment punishment model is trained through the control sequence until the environment punishment model converges, the target prediction function can be optimized in real time on line, and better control effect, robustness and stability are obtained.

Drawings

FIG. 1 is a block flow diagram of a mechanical arm dynamic collision avoidance planning method of the present invention;

FIG. 2 is a block diagram of the environmental penalty model of the present invention;

FIG. 3 is an algorithm flow diagram of a natural evolution strategy algorithm;

fig. 4 is a block diagram of the dynamic collision avoidance planning apparatus for an industrial robot arm according to the present invention.

Detailed Description

In order to explain the technical contents, structural features, and objects and effects of the present invention in detail, the following description is made in conjunction with the embodiments and the accompanying drawings.

Referring to fig. 1, the method for planning dynamic collision avoidance of a mechanical arm of the present embodiment includes the following steps:

s1, constructing a system dynamic equation of the mechanical arm;

s3, constructing an environment punishment model;

s5, optimizing the target prediction function to obtain a control sequence;

Preferably, the step S6 specifically includes:

Specifically, the step S63 specifically includes:

and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.

Preferably, the step S1 specifically includes:

Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities being a combination of a current state quantity and a state quantity at a previous time, the control increments being a combination of a current control quantity and a control quantity at a previous time, and the environmental penalty model is used for predicting the state of the robot arm within the control sequence.

Referring to fig. 1-3, the following description will be made in detail by taking a six-axis industrial robot as an example:

1. system dynamic equations and raw prediction functions:

the six-axis industrial mechanical arm configuration vector at the time t is q (t), the velocity vector is q '(t), and the acceleration vector is q' (t), wherein q (t), q '(t), and q' (t) are six-dimensional vectors and respectively represent angles, angular velocities, and angular accelerations of six joints of the mechanical arm.

Let the state quantity x (t) = [ q (t) of the robot arm; q' (t) ], the controlled quantity u (t) = q "(t), the following system dynamics equation can be constructed:

wherein the content of the first and second substances,

the matrix 0m × n is an m × n all-0 matrix, and I n × n is an n-dimensional unit matrix.

Because the above system dynamic equation is a continuous time system, but the continuous time system cannot be directly used as a prediction controller, the system dynamic equation of continuous time needs to be discretized to obtain:

the discretized system dynamic equation can be obtained by shifting the terms of the above equation:

wherein the content of the first and second substances,

in the process of optimization solution, the following constraints must also be satisfied:

in the prediction time domain N _P The recursion is carried out internally:

the above formula may be further expressed as:

wherein, the first and the second end of the pipe are connected with each other,

from this, the original prediction function based on the system dynamic equation can be:

wherein Xg is a target state vector, and Qg and Qu are corresponding weight matrixes. By optimizing the original prediction function J at each time k, the optimal control quantity U (k) can be obtained. The following formula is rewritten into a form in which each term contains U (k):

the original prediction function can be easily solved by quadratic programming.

2. A static/dynamic environment obstacle avoidance method based on a natural evolution strategy comprises the following steps:

the optimization problem of the mechanical arm contains hard constraints such as collision distance, collision state and the like, and the traditional optimization method is used for processing the constraints with low efficiency or difficult processing, so that the method adopts a static/dynamic obstacle avoidance method based on a natural evolution strategy and constructs a target prediction function:

wherein f (x (k), U (k)) is an environment penalty model trained based on a natural evolution strategy, f _s And f _d Respectively representing a static barrier network and a dynamic barrier network in the environment punishment model, mu and phi are coefficients, the structures of the two networks are completely consistent with the training method, and the only difference is that the dynamic barrier network f _d The state quantity of (2) also includes the state quantity of the dynamic obstacle. The environment penalty model takes the state quantity x (k) and the control sequence U (k) of the system dynamic equation as the input and takes the environment penalty quantity as the output. The structure of the network is shown in fig. 2.

The steps of training the environment penalty model are as follows:

1) Initializing the weight of an environment penalty model f;

2) Running MPC controller and optimizing function J within a certain time _stable ；

3) Obtaining a series of data tracks I (x (1), x (2) \8230, x (k), U (1), U (2) \8230, U (k) and corresponding environmental punishment f ¹ 、f ² 、…、f ^k And an actual environmental penalty f ¹ _real 、f ² _real 、…、f ^k _real )

4) Sequentially optimizing weight parameters in the environment punishment model f by using a natural evolution strategy according to data in the track I;

5) And repeating the steps 2) to 4) until the environment penalty model f converges.

In the original natural evolution strategy algorithm, the parameter theta needing to be updated comprises mu and sigma, and the mu and the sigma are two parameters of normal distribution, while the natural evolution strategy used in the invention fixes the parameter sigma and only focuses on the update of the parameter mu, so the parameter theta needing to be updated is also the parameter mu.

In the present invention, a flow of a natural evolution policy algorithm used is shown in fig. 3, and in the natural evolution policy algorithm shown in fig. 3, a value of a cost function f depends on an environment, and the cost function f may be set as the following expression:

model predictionEnvironmental cost f ^k And actual cost f ^k _real The larger the deviation of (a) is, the larger the modulus of the gradient is, and the smaller the opposite is. Through continuous iterative training, the cost function f can be finally approximate to a real environment cost model.

Referring to fig. 4, correspondingly, the present invention also discloses a dynamic collision avoidance planning device for an industrial robot arm, which includes:

a first construction unit 10 configured to construct a system dynamic equation of the robot arm;

a calculation unit 20 configured to calculate an original prediction function of the robot arm according to the system dynamic equation;

a second construction unit 30 configured for constructing an environmental penalty model;

a third construction unit 40 configured to construct a target prediction function according to the original prediction function and the environmental penalty model;

an optimization unit 50 configured to optimize the objective prediction function to obtain a control sequence;

a training unit 60 configured to train the environmental penalty model in accordance with the control sequence until the environmental penalty model converges.

With reference to fig. 1 to 4, the invention constructs a target prediction function according to an original prediction function and an environmental penalty model, trains the environmental penalty model according to a control sequence until the environmental penalty model converges, trains the environmental penalty model according to the target prediction function and the environmental penalty model, so that the target prediction function can better fit the reality, trains the environmental penalty model through the control sequence until the environmental penalty model converges, so that the target prediction function can be optimized online in real time, and obtains better control effect, robustness and stability.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.

Claims

1. A mechanical arm dynamic collision avoidance planning method is characterized by comprising the following steps:

constructing a system dynamic equation of the mechanical arm;

calculating an original prediction function of the mechanical arm according to the system dynamic equation;

constructing an environment punishment model;

constructing a target prediction function according to the original prediction function and the environment penalty model;

optimizing the target prediction function to obtain a control sequence;

and training the environment punishment model according to the control sequence until the environment punishment model converges.

2. The method for planning the dynamic collision avoidance of the mechanical arm as claimed in claim 1, wherein the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.

3. The method for planning dynamic collision avoidance for a robot arm according to claim 2, wherein the training of the environment penalty model according to the control sequence until the environment penalty model converges specifically comprises:

initializing the weight of the environment penalty amount in the target prediction function;

assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;

and optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.

4. The dynamic mechanical arm collision avoidance planning method according to claim 3, wherein the weight of the environmental penalty amount in the target prediction function is optimized according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges, specifically:

5. The dynamic mechanical arm collision avoidance planning method of claim 3, wherein each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.

6. The method for planning dynamic collision avoidance for a robot arm according to claim 1, wherein the calculating the original prediction function of the robot arm according to the system dynamic equation specifically comprises:

discretizing the system dynamic equation to obtain a discretized system dynamic equation;

and calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the control quantity.

7. The method for planning dynamic collision avoidance of a mechanical arm according to claim 1, wherein the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous moment, the control increments are combinations of current control quantities and control quantities at a previous moment, and the environment penalty model is used for predicting the state of the mechanical arm in the control sequence.