CN113787514B - Mechanical arm dynamic collision avoidance planning method - Google Patents

Mechanical arm dynamic collision avoidance planning method Download PDF

Info

Publication number
CN113787514B
CN113787514B CN202110713794.6A CN202110713794A CN113787514B CN 113787514 B CN113787514 B CN 113787514B CN 202110713794 A CN202110713794 A CN 202110713794A CN 113787514 B CN113787514 B CN 113787514B
Authority
CN
China
Prior art keywords
prediction function
environment
penalty
mechanical arm
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110713794.6A
Other languages
Chinese (zh)
Other versions
CN113787514A (en
Inventor
程良伦
陈肇江
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110713794.6A priority Critical patent/CN113787514B/en
Publication of CN113787514A publication Critical patent/CN113787514A/en
Application granted granted Critical
Publication of CN113787514B publication Critical patent/CN113787514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps: s1, constructing a system dynamic equation of the mechanical arm; s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation; s3, constructing an environment punishment model; s4, constructing a target prediction function according to the original prediction function and the environment penalty model; s5, optimizing the target prediction function to obtain a control sequence; s6, training the environment punishment model according to the control sequence until the environment punishment model converges; the method has the advantages of good control effect, strong robustness and support for online optimization.

Description

Mechanical arm dynamic collision avoidance planning method
Technical Field
The invention relates to the technical field of robot manufacturing, in particular to a dynamic collision avoidance planning method for a mechanical arm.
Background
Most of industrial mechanical arms put into industrial production at present finish path planning by a manual teaching method so as to finish production processes such as welding, spraying, stacking, carrying, assembling, processing and the like. The manual teaching method has a certain role in dealing with a single and repeated specific task.
However, with the increasing diversification of production tasks and the increasing complexity of task scenes, the disadvantages of the existing manual teaching methods, such as complicated and tedious operation, poor generality, and low precision, are gradually exposed. And with the increasing requirements on the working efficiency and the working precision of the mechanical arm in the field of intelligent production, the lagging manual teaching method cannot meet the requirements. In order to solve the problem of path planning of the mechanical arm, many algorithms and their variants are developed, but most of these algorithms only support offline planning and cannot operate in a dynamic obstacle environment, so that the mechanical arm does not have the capability of coping with sudden hazards.
Disclosure of Invention
The invention aims to provide a mechanical arm dynamic collision avoidance planning method which is good in control effect, strong in robustness and supports online optimization.
In order to achieve the purpose, the invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps:
s1, constructing a system dynamic equation of the mechanical arm;
s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;
s3, constructing an environment punishment model;
s4, constructing a target prediction function according to the original prediction function and the environment penalty model;
s5, optimizing the target prediction function to obtain a control sequence;
and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.
Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
Preferably, the step S6 specifically includes:
s61, initializing the weight of the environmental penalty in the target prediction function;
s62, assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Specifically, the step S63 specifically includes:
and optimizing the weight of the environment penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
Preferably, the step S1 specifically includes:
s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and S12, calculating according to a discretized system dynamic equation to obtain an original prediction function of the mechanical arm, wherein the original prediction function comprises parameters related to control quantity.
Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous time, the control increments are combinations of current control quantities and control quantities at a previous time, and the environmental penalty model is used for predicting the state of the mechanical arm in the control sequence.
Correspondingly, the invention also discloses a dynamic collision avoidance planning device for the industrial mechanical arm, which comprises:
a first construction unit configured to construct a system dynamic equation of the robot arm;
a calculation unit configured to calculate an original prediction function of the robot arm from the system dynamic equation;
a second construction unit configured for constructing an environmental penalty model;
a third construction unit, configured to construct a target prediction function according to the original prediction function and an environmental penalty model;
an optimization unit configured to optimize the objective prediction function to obtain a control sequence;
a training unit configured to train the environmental penalty model according to the control sequence until the environmental penalty model converges.
Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.
Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.
Compared with the prior art, the target prediction function is constructed according to the original prediction function and the environment punishment model, the environment punishment model is trained according to the control sequence until the environment punishment model converges, the target prediction function is combined with the original prediction function and the environment punishment model, the target prediction function can be more actually fitted, the environment punishment model is trained through the control sequence until the environment punishment model converges, the target prediction function can be optimized in real time on line, and better control effect, robustness and stability are obtained.
Drawings
FIG. 1 is a block flow diagram of a mechanical arm dynamic collision avoidance planning method of the present invention;
FIG. 2 is a block diagram of the environmental penalty model of the present invention;
FIG. 3 is an algorithm flow diagram of a natural evolution strategy algorithm;
fig. 4 is a block diagram of the dynamic collision avoidance planning apparatus for an industrial robot arm according to the present invention.
Detailed Description
In order to explain the technical contents, structural features, and objects and effects of the present invention in detail, the following description is made in conjunction with the embodiments and the accompanying drawings.
Referring to fig. 1, the method for planning dynamic collision avoidance of a mechanical arm of the present embodiment includes the following steps:
s1, constructing a system dynamic equation of the mechanical arm;
s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;
s3, constructing an environment punishment model;
s4, constructing a target prediction function according to the original prediction function and the environment penalty model;
s5, optimizing the target prediction function to obtain a control sequence;
and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.
Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
Preferably, the step S6 specifically includes:
s61, initializing the weight of the environmental penalty in the target prediction function;
s62, assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Specifically, the step S63 specifically includes:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
Preferably, the step S1 specifically includes:
s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and S12, calculating according to a discretized system dynamic equation to obtain an original prediction function of the mechanical arm, wherein the original prediction function comprises parameters related to control quantity.
Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities being a combination of a current state quantity and a state quantity at a previous time, the control increments being a combination of a current control quantity and a control quantity at a previous time, and the environmental penalty model is used for predicting the state of the robot arm within the control sequence.
Referring to fig. 1-3, the following description will be made in detail by taking a six-axis industrial robot as an example:
1. system dynamic equations and raw prediction functions:
the six-axis industrial mechanical arm configuration vector at the time t is q (t), the velocity vector is q '(t), and the acceleration vector is q' (t), wherein q (t), q '(t), and q' (t) are six-dimensional vectors and respectively represent angles, angular velocities, and angular accelerations of six joints of the mechanical arm.
Let the state quantity x (t) = [ q (t) of the robot arm; q' (t) ], the controlled quantity u (t) = q "(t), the following system dynamics equation can be constructed:
Figure BDA0003133983170000051
wherein the content of the first and second substances,
Figure BDA0003133983170000052
the matrix 0m × n is an m × n all-0 matrix, and I n × n is an n-dimensional unit matrix.
Because the above system dynamic equation is a continuous time system, but the continuous time system cannot be directly used as a prediction controller, the system dynamic equation of continuous time needs to be discretized to obtain:
Figure BDA0003133983170000053
the discretized system dynamic equation can be obtained by shifting the terms of the above equation:
Figure BDA0003133983170000061
wherein the content of the first and second substances,
Figure BDA0003133983170000062
in the process of optimization solution, the following constraints must also be satisfied:
Figure BDA0003133983170000063
in the prediction time domain N P The recursion is carried out internally:
Figure BDA0003133983170000064
the above formula may be further expressed as:
Figure BDA0003133983170000065
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003133983170000066
from this, the original prediction function based on the system dynamic equation can be:
Figure BDA0003133983170000067
wherein Xg is a target state vector, and Qg and Qu are corresponding weight matrixes. By optimizing the original prediction function J at each time k, the optimal control quantity U (k) can be obtained. The following formula is rewritten into a form in which each term contains U (k):
Figure BDA0003133983170000071
the original prediction function can be easily solved by quadratic programming.
2. A static/dynamic environment obstacle avoidance method based on a natural evolution strategy comprises the following steps:
the optimization problem of the mechanical arm contains hard constraints such as collision distance, collision state and the like, and the traditional optimization method is used for processing the constraints with low efficiency or difficult processing, so that the method adopts a static/dynamic obstacle avoidance method based on a natural evolution strategy and constructs a target prediction function:
Figure BDA0003133983170000072
wherein f (x (k), U (k)) is an environment penalty model trained based on a natural evolution strategy, f s And f d Respectively representing a static barrier network and a dynamic barrier network in the environment punishment model, mu and phi are coefficients, the structures of the two networks are completely consistent with the training method, and the only difference is that the dynamic barrier network f d The state quantity of (2) also includes the state quantity of the dynamic obstacle. The environment penalty model takes the state quantity x (k) and the control sequence U (k) of the system dynamic equation as the input and takes the environment penalty quantity as the output. The structure of the network is shown in fig. 2.
The steps of training the environment penalty model are as follows:
1) Initializing the weight of an environment penalty model f;
2) Running MPC controller and optimizing function J within a certain time stable
3) Obtaining a series of data tracks I (x (1), x (2) \8230, x (k), U (1), U (2) \8230, U (k) and corresponding environmental punishment f 1 、f 2 、…、f k And an actual environmental penalty f 1 real 、f 2 real 、…、f k real )
4) Sequentially optimizing weight parameters in the environment punishment model f by using a natural evolution strategy according to data in the track I;
5) And repeating the steps 2) to 4) until the environment penalty model f converges.
In the original natural evolution strategy algorithm, the parameter theta needing to be updated comprises mu and sigma, and the mu and the sigma are two parameters of normal distribution, while the natural evolution strategy used in the invention fixes the parameter sigma and only focuses on the update of the parameter mu, so the parameter theta needing to be updated is also the parameter mu.
In the present invention, a flow of a natural evolution policy algorithm used is shown in fig. 3, and in the natural evolution policy algorithm shown in fig. 3, a value of a cost function f depends on an environment, and the cost function f may be set as the following expression:
Figure BDA0003133983170000081
model predictionEnvironmental cost f k And actual cost f k real The larger the deviation of (a) is, the larger the modulus of the gradient is, and the smaller the opposite is. Through continuous iterative training, the cost function f can be finally approximate to a real environment cost model.
Referring to fig. 4, correspondingly, the present invention also discloses a dynamic collision avoidance planning device for an industrial robot arm, which includes:
a first construction unit 10 configured to construct a system dynamic equation of the robot arm;
a calculation unit 20 configured to calculate an original prediction function of the robot arm according to the system dynamic equation;
a second construction unit 30 configured for constructing an environmental penalty model;
a third construction unit 40 configured to construct a target prediction function according to the original prediction function and the environmental penalty model;
an optimization unit 50 configured to optimize the objective prediction function to obtain a control sequence;
a training unit 60 configured to train the environmental penalty model in accordance with the control sequence until the environmental penalty model converges.
Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.
Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.
With reference to fig. 1 to 4, the invention constructs a target prediction function according to an original prediction function and an environmental penalty model, trains the environmental penalty model according to a control sequence until the environmental penalty model converges, trains the environmental penalty model according to the target prediction function and the environmental penalty model, so that the target prediction function can better fit the reality, trains the environmental penalty model through the control sequence until the environmental penalty model converges, so that the target prediction function can be optimized online in real time, and obtains better control effect, robustness and stability.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.

Claims (7)

1. A mechanical arm dynamic collision avoidance planning method is characterized by comprising the following steps:
constructing a system dynamic equation of the mechanical arm;
calculating an original prediction function of the mechanical arm according to the system dynamic equation;
constructing an environment punishment model;
constructing a target prediction function according to the original prediction function and the environment penalty model;
optimizing the target prediction function to obtain a control sequence;
and training the environment punishment model according to the control sequence until the environment punishment model converges.
2. The method for planning the dynamic collision avoidance of the mechanical arm as claimed in claim 1, wherein the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
3. The method for planning dynamic collision avoidance for a robot arm according to claim 2, wherein the training of the environment penalty model according to the control sequence until the environment penalty model converges specifically comprises:
initializing the weight of the environment penalty amount in the target prediction function;
assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
4. The dynamic mechanical arm collision avoidance planning method according to claim 3, wherein the weight of the environmental penalty amount in the target prediction function is optimized according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges, specifically:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
5. The dynamic mechanical arm collision avoidance planning method of claim 3, wherein each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
6. The method for planning dynamic collision avoidance for a robot arm according to claim 1, wherein the calculating the original prediction function of the robot arm according to the system dynamic equation specifically comprises:
discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the control quantity.
7. The method for planning dynamic collision avoidance of a mechanical arm according to claim 1, wherein the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous moment, the control increments are combinations of current control quantities and control quantities at a previous moment, and the environment penalty model is used for predicting the state of the mechanical arm in the control sequence.
CN202110713794.6A 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method Active CN113787514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110713794.6A CN113787514B (en) 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110713794.6A CN113787514B (en) 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method

Publications (2)

Publication Number Publication Date
CN113787514A CN113787514A (en) 2021-12-14
CN113787514B true CN113787514B (en) 2022-12-23

Family

ID=78876981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110713794.6A Active CN113787514B (en) 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method

Country Status (1)

Country Link
CN (1) CN113787514B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106737673A (en) * 2016-12-23 2017-05-31 浙江大学 A kind of method of the control of mechanical arm end to end based on deep learning
CN110320809A (en) * 2019-08-19 2019-10-11 杭州电子科技大学 A kind of AGV track correct method based on Model Predictive Control
CN112809682A (en) * 2021-01-27 2021-05-18 佛山科学技术学院 Mechanical arm obstacle avoidance path planning method and system and storage medium
CN112882469A (en) * 2021-01-14 2021-06-01 浙江大学 Deep reinforcement learning obstacle avoidance navigation method integrating global training

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102017129665B3 (en) * 2017-12-12 2019-01-24 Pilz Gmbh & Co. Kg Collision-free motion planning with closed kinematics
DE102019129338B3 (en) * 2019-10-30 2021-02-18 Pilz Gmbh & Co. Kg Model predictive interaction control

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106737673A (en) * 2016-12-23 2017-05-31 浙江大学 A kind of method of the control of mechanical arm end to end based on deep learning
CN110320809A (en) * 2019-08-19 2019-10-11 杭州电子科技大学 A kind of AGV track correct method based on Model Predictive Control
CN112882469A (en) * 2021-01-14 2021-06-01 浙江大学 Deep reinforcement learning obstacle avoidance navigation method integrating global training
CN112809682A (en) * 2021-01-27 2021-05-18 佛山科学技术学院 Mechanical arm obstacle avoidance path planning method and system and storage medium

Also Published As

Publication number Publication date
CN113787514A (en) 2021-12-14

Similar Documents

Publication Publication Date Title
Camci et al. An aerial robot for rice farm quality inspection with type-2 fuzzy neural networks tuned by particle swarm optimization-sliding mode control hybrid algorithm
Zhong et al. Value function approximation and model predictive control
Anderson et al. Challenging control problems
CN108227506A (en) A kind of robot admittance control system based on adaptive optimization method
Omran et al. Optimal task space control design of a Stewart manipulator for aircraft stall recovery
CN112297005B (en) Robot autonomous control method based on graph neural network reinforcement learning
CN113687659B (en) Optimal trajectory generation method and system based on digital twinning
CN111240344A (en) Autonomous underwater robot model-free control method based on double neural network reinforcement learning technology
CN110703692A (en) Multi-mobile-robot distributed predictive control method based on virtual structure method
JP2000347708A (en) Method and device for controlling dynamic system by neural net and storage medium storing control program for dynamic system by neural net
CN112051734A (en) Wheeled mobile robot event triggering tracking control method based on deterministic learning
CN113787514B (en) Mechanical arm dynamic collision avoidance planning method
CN112276952B (en) Robust simultaneous stabilization method and system for multi-robot system
Okuma et al. A neural network compensator for uncertainties of robotic manipulators
Su et al. Adaptive coordinated motion constraint control for cooperative multi-manipulator systems
dos Santos et al. Planning and learning for cooperative construction task with quadrotors
CN109794939B (en) Parallel beam planning method for welding robot motion
Ouyang et al. Motion control of a snake robot via cerebellum-inspired learning control
CN113238482B (en) Asymptotic tracking control method and system of single-arm robot system
Cera Design, control, and motion planning of cable-driven flexible tensegrity robots
CN112621761B (en) Communication time lag-oriented mechanical arm system multi-stage optimization coordination control method
CN113867157B (en) Optimal trajectory planning method and device for control compensation and storage device
Baselizadeh et al. Adaptive Real-time Learning-based Neuro-Fuzzy Control of Robot Manipulators
Liu et al. Human-Simulated Intelligent Walking Control for Biped Robots
Tsai et al. Trajectory Control of An Articulated Robot Based on Direct Reinforcement Learning. Robotics 2022, 11, 116

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant