CN113787514A - Mechanical arm dynamic collision avoidance planning method - Google Patents

Mechanical arm dynamic collision avoidance planning method Download PDF

Info

Publication number
CN113787514A
CN113787514A CN202110713794.6A CN202110713794A CN113787514A CN 113787514 A CN113787514 A CN 113787514A CN 202110713794 A CN202110713794 A CN 202110713794A CN 113787514 A CN113787514 A CN 113787514A
Authority
CN
China
Prior art keywords
prediction function
mechanical arm
penalty
environment
environmental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110713794.6A
Other languages
Chinese (zh)
Other versions
CN113787514B (en
Inventor
程良伦
陈肇江
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202110713794.6A priority Critical patent/CN113787514B/en
Publication of CN113787514A publication Critical patent/CN113787514A/en
Application granted granted Critical
Publication of CN113787514B publication Critical patent/CN113787514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps: s1, constructing a system dynamic equation of the mechanical arm; s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation; s3, constructing an environment penalty model; s4, constructing a target prediction function according to the original prediction function and the environment penalty model; s5, optimizing the target prediction function to obtain a control sequence; s6, training the environment punishment model according to the control sequence until the environment punishment model converges; the method has the advantages of good control effect, strong robustness and support for online optimization.

Description

Mechanical arm dynamic collision avoidance planning method
Technical Field
The invention relates to the technical field of robot manufacturing, in particular to a dynamic collision avoidance planning method for a mechanical arm.
Background
Most of industrial mechanical arms put into industrial production at present finish path planning by a manual teaching method so as to finish production processes such as welding, spraying, stacking, carrying, assembling, processing and the like. The manual teaching method has a certain role in dealing with a single and repeated specific task.
However, with the increasing diversification of production tasks and the increasing complexity of task scenes, the disadvantages of the existing manual teaching methods, such as complicated and tedious operation, poor generality, and low precision, are gradually exposed. And with the increasing requirements on the working efficiency and the working precision of the mechanical arm in the field of intelligent production, the lagging manual teaching method cannot meet the requirements. In order to solve the problem of path planning of the mechanical arm, many algorithms and their variants are developed, but most of these algorithms only support offline planning and cannot operate in a dynamic obstacle environment, so that the mechanical arm does not have the capability of coping with sudden hazards.
Disclosure of Invention
The invention aims to provide a mechanical arm dynamic collision avoidance planning method which is good in control effect, strong in robustness and supports online optimization.
In order to achieve the purpose, the invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps:
s1, constructing a system dynamic equation of the mechanical arm;
s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;
s3, constructing an environment penalty model;
s4, constructing a target prediction function according to the original prediction function and the environment penalty model;
s5, optimizing the target prediction function to obtain a control sequence;
and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.
Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
Preferably, the step S6 specifically includes:
s61, initializing the weight of the environment penalty amount in the target prediction function;
s62, assigning the target prediction function with a preset continuous control quantity in a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Specifically, the step S63 specifically includes:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
Preferably, the step S1 specifically includes:
s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and S12, calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the controlled variable.
Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous time, the control increments are combinations of current control quantities and control quantities at a previous time, and the environmental penalty model is used for predicting the state of the mechanical arm in the control sequence.
Correspondingly, the invention also discloses a dynamic collision avoidance planning device for the industrial mechanical arm, which comprises:
a first construction unit configured to construct system dynamic equations of the robot arm;
a calculation unit configured to calculate an original prediction function of the robot arm from the system dynamic equation;
a second construction unit configured for constructing an environmental penalty model;
a third construction unit, configured to construct a target prediction function according to the original prediction function and an environmental penalty model;
an optimization unit configured to optimize the objective prediction function to obtain a control sequence;
a training unit configured to train the environmental penalty model according to the control sequence until the environmental penalty model converges.
Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.
Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.
Compared with the prior art, the target prediction function is constructed according to the original prediction function and the environment punishment model, the environment punishment model is trained according to the control sequence until the environment punishment model converges, the target prediction function is combined with the original prediction function and the environment punishment model, the target prediction function can be more actually fitted, the environment punishment model is trained through the control sequence until the environment punishment model converges, the target prediction function can be optimized in real time on line, and better control effect, robustness and stability are obtained.
Drawings
FIG. 1 is a block flow diagram of a mechanical arm dynamic collision avoidance planning method of the present invention;
FIG. 2 is a block diagram of the environmental penalty model of the present invention;
FIG. 3 is an algorithm flow diagram of a natural evolution strategy algorithm;
fig. 4 is a block diagram of the dynamic collision avoidance planning apparatus for an industrial robot arm according to the present invention.
Detailed Description
In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Referring to fig. 1, the method for planning dynamic collision avoidance of a mechanical arm of the present embodiment includes the following steps:
s1, constructing a system dynamic equation of the mechanical arm;
s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;
s3, constructing an environment penalty model;
s4, constructing a target prediction function according to the original prediction function and the environment penalty model;
s5, optimizing the target prediction function to obtain a control sequence;
and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.
Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
Preferably, the step S6 specifically includes:
s61, initializing the weight of the environment penalty amount in the target prediction function;
s62, assigning the target prediction function with a preset continuous control quantity in a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Specifically, the step S63 specifically includes:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
Preferably, the step S1 specifically includes:
s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and S12, calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the controlled variable.
Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous time, the control increments are combinations of current control quantities and control quantities at a previous time, and the environmental penalty model is used for predicting the state of the mechanical arm in the control sequence.
Referring to fig. 1-3, the following description will be made in detail by taking a six-axis industrial robot as an example:
1. system dynamic equations and raw prediction functions:
the six-axis industrial mechanical arm configuration vector at the time t is q (t), the velocity vector is q '(t), and the acceleration vector is q "(t), wherein q (t), q' (t), and q" (t) are six-dimensional vectors and respectively represent angles, angular velocities, and angular accelerations of six joints of the mechanical arm.
Making a state quantity x (t) of the robot arm equal to [ q (t); q' (t), and the controlled variable u (t) q "(t), the following system dynamics equation can be constructed:
Figure BDA0003133983170000051
wherein the content of the first and second substances,
Figure BDA0003133983170000052
the matrix 0m × n is an m × n all-0 matrix, and I n × n is an n-dimensional unit matrix.
Because the above system dynamic equation is a continuous time system, but the continuous time system cannot be directly used as a prediction controller, the system dynamic equation of continuous time needs to be discretized to obtain:
Figure BDA0003133983170000053
the discretized system dynamic equation can be obtained by shifting the terms of the above equation:
Figure BDA0003133983170000061
wherein the content of the first and second substances,
Figure BDA0003133983170000062
in the process of optimization solution, the following constraints must also be satisfied:
Figure BDA0003133983170000063
in the prediction time domain NPThe recursion is carried out internally:
Figure BDA0003133983170000064
the above formula may be further expressed as:
Figure BDA0003133983170000065
wherein the content of the first and second substances,
Figure BDA0003133983170000066
from this, the original prediction function based on the system dynamic equation can be:
Figure BDA0003133983170000067
wherein Xg is the target state vector, and Qg and Qu are corresponding weight matrixes. By optimizing the original prediction function J at each time k, the optimal control quantity u (k) can be obtained. The following formula is rewritten into a form each containing U (k):
Figure BDA0003133983170000071
the original prediction function can be easily solved by quadratic programming.
2. A static/dynamic environment obstacle avoidance method based on a natural evolution strategy comprises the following steps:
the optimization problem of the mechanical arm contains hard constraints such as collision distance, collision state and the like, and the traditional optimization method is used for processing the constraints with low efficiency or difficult processing, so that the method adopts a static/dynamic obstacle avoidance method based on a natural evolution strategy and constructs a target prediction function:
Figure BDA0003133983170000072
wherein f (x), (k), U (k) are environment punishment models trained based on natural evolution strategies, fsAnd fdRespectively representing a static barrier network and a dynamic barrier network in an environment punishment model, mu and phi are coefficients, the structures of the two networks are completely consistent with a training method, and the only difference is that the dynamic barrier network fdThe state quantity of (2) also includes the state quantity of the dynamic obstacle. The environment penalty model takes the state quantity x (k) and the control sequence U (k) of the system dynamic equation as the input and takes the environment penalty quantity as the output. The structure of the network is shown in fig. 2.
The steps of training the environment penalty model are as follows:
1) initializing the weight of an environment penalty model f;
2) running MPC controller and optimizing function J within a certain timestable
3) Obtaining a series of data tracks I (x (1), x (2), … x (k), U (1), U (2), … U (k), and corresponding environment penalty f1、f2、…、fkAnd an actual environmental penalty f1 real、f2 real、…、fk real)
4) Sequentially optimizing weight parameters in the environment punishment model f by using a natural evolution strategy according to data in the track I;
5) and repeating the steps 2) to 4) until the environment penalty model f converges.
In the original natural evolution strategy algorithm, the parameter theta needing to be updated comprises mu and sigma, and the mu and the sigma are two parameters of normal distribution, while the natural evolution strategy used in the invention fixes the parameter sigma and only focuses on the update of the parameter mu, so the parameter theta needing to be updated is also the parameter mu.
In the present invention, a flow of a natural evolution policy algorithm used is shown in fig. 3, and in the natural evolution policy algorithm shown in fig. 3, a value of a cost function f depends on an environment, and the cost function f may be set as the following expression:
Figure BDA0003133983170000081
model predicted environmental cost fkAnd the actual cost fk realThe larger the deviation of (a) is, the larger the modulus of the gradient is, and the smaller the opposite is. Through continuous iterative training, the cost function f can be finally approximate to a real environment cost model.
Referring to fig. 4, correspondingly, the present invention further discloses a dynamic collision avoidance planning apparatus for an industrial robot arm, which includes:
a first building unit 10 configured to build a system dynamic equation of the robot arm;
a calculation unit 20 configured to calculate an original prediction function of the robot arm according to the system dynamic equation;
a second construction unit 30 configured for constructing an environmental penalty model;
a third construction unit 40 configured to construct a target prediction function according to the original prediction function and the environmental penalty model;
an optimization unit 50 configured to optimize the objective prediction function to obtain a control sequence;
a training unit 60 configured to train the environmental penalty model in accordance with the control sequence until the environmental penalty model converges.
Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.
Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.
With reference to fig. 1-4, the present invention constructs a target prediction function according to an original prediction function and an environmental penalty model, trains an environmental penalty model according to a control sequence until the environmental penalty model converges, wherein the target prediction function combines the original prediction function and the environmental penalty model, so that the target prediction function can better fit the reality, trains the environmental penalty model through the control sequence until the environmental penalty model converges, so that the target prediction function can be optimized online in real time, and obtains better control effect, robustness and stability.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.

Claims (7)

1. A mechanical arm dynamic collision avoidance planning method is characterized by comprising the following steps:
constructing a system dynamic equation of the mechanical arm;
calculating an original prediction function of the mechanical arm according to the system dynamic equation;
constructing an environment punishment model;
constructing a target prediction function according to the original prediction function and the environment penalty model;
optimizing the target prediction function to obtain a control sequence;
and training the environment punishment model according to the control sequence until the environment punishment model converges.
2. The method for planning dynamic collision avoidance of a mechanical arm according to claim 1, wherein the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
3. The method for planning dynamic collision avoidance for a robot arm according to claim 2, wherein the training of the environment penalty model according to the control sequence until the environment penalty model converges specifically comprises:
initializing weights of environmental penalties in the target prediction function;
assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and optimizing the weight of the environmental penalty amount in the target prediction function according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
4. The dynamic mechanical arm collision avoidance planning method according to claim 3, wherein the weight of the environmental penalty amount in the target prediction function is optimized according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges, specifically:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
5. The dynamic mechanical arm collision avoidance planning method of claim 3, wherein each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
6. The method for planning dynamic collision avoidance for a robot arm according to claim 1, wherein the calculating the original prediction function of the robot arm according to the system dynamic equation specifically comprises:
discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the control quantity.
7. The method for planning dynamic collision avoidance of a mechanical arm according to claim 1, wherein the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous moment, the control increments are combinations of current control quantities and control quantities at a previous moment, and the environment penalty model is used for predicting the state of the mechanical arm in the control sequence.
CN202110713794.6A 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method Active CN113787514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110713794.6A CN113787514B (en) 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110713794.6A CN113787514B (en) 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method

Publications (2)

Publication Number Publication Date
CN113787514A true CN113787514A (en) 2021-12-14
CN113787514B CN113787514B (en) 2022-12-23

Family

ID=78876981

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110713794.6A Active CN113787514B (en) 2021-06-25 2021-06-25 Mechanical arm dynamic collision avoidance planning method

Country Status (1)

Country Link
CN (1) CN113787514B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106737673A (en) * 2016-12-23 2017-05-31 浙江大学 A kind of method of the control of mechanical arm end to end based on deep learning
CN110320809A (en) * 2019-08-19 2019-10-11 杭州电子科技大学 A kind of AGV track correct method based on Model Predictive Control
US20200376663A1 (en) * 2017-12-12 2020-12-03 Pilz Gmbh & Co. Kg Collision-Free Motion Planning for Closed Kinematics
US20210138652A1 (en) * 2019-10-30 2021-05-13 Pilz Gmbh & Co. Kg Robot Control Using Model-Predictive Interaction
CN112809682A (en) * 2021-01-27 2021-05-18 佛山科学技术学院 Mechanical arm obstacle avoidance path planning method and system and storage medium
CN112882469A (en) * 2021-01-14 2021-06-01 浙江大学 Deep reinforcement learning obstacle avoidance navigation method integrating global training

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106737673A (en) * 2016-12-23 2017-05-31 浙江大学 A kind of method of the control of mechanical arm end to end based on deep learning
US20200376663A1 (en) * 2017-12-12 2020-12-03 Pilz Gmbh & Co. Kg Collision-Free Motion Planning for Closed Kinematics
CN110320809A (en) * 2019-08-19 2019-10-11 杭州电子科技大学 A kind of AGV track correct method based on Model Predictive Control
US20210138652A1 (en) * 2019-10-30 2021-05-13 Pilz Gmbh & Co. Kg Robot Control Using Model-Predictive Interaction
CN112882469A (en) * 2021-01-14 2021-06-01 浙江大学 Deep reinforcement learning obstacle avoidance navigation method integrating global training
CN112809682A (en) * 2021-01-27 2021-05-18 佛山科学技术学院 Mechanical arm obstacle avoidance path planning method and system and storage medium

Also Published As

Publication number Publication date
CN113787514B (en) 2022-12-23

Similar Documents

Publication Publication Date Title
Zhong et al. Value function approximation and model predictive control
Yang et al. Stability analysis and implementation of a decentralized formation control strategy for unmanned vehicles
Anderson et al. Challenging control problems
Bessa et al. A biologically inspired framework for the intelligent control of mechatronic systems and its application to a micro diving agent
CN110703692B (en) Multi-mobile-robot distributed predictive control method based on virtual structure method
CN108227506A (en) A kind of robot admittance control system based on adaptive optimization method
Omran et al. Optimal task space control design of a Stewart manipulator for aircraft stall recovery
CN113687659B (en) Optimal trajectory generation method and system based on digital twinning
CN111240344A (en) Autonomous underwater robot model-free control method based on double neural network reinforcement learning technology
JP2000347708A (en) Method and device for controlling dynamic system by neural net and storage medium storing control program for dynamic system by neural net
Ananthraman et al. Training backpropagation and CMAC neural networks for control of a SCARA robot
CN113787514B (en) Mechanical arm dynamic collision avoidance planning method
Nagata et al. Adaptive learning with large variability of teaching signals for neural networks and its application to motion control of an industrial robot
Okuma et al. A neural network compensator for uncertainties of robotic manipulators
dos Santos et al. Planning and learning for cooperative construction task with quadrotors
Toha et al. Augmented feedforward and feedback control of a twin rotor system using real-coded MOGA
CN109794939B (en) Parallel beam planning method for welding robot motion
CN113238482B (en) Asymptotic tracking control method and system of single-arm robot system
CN113867157B (en) Optimal trajectory planning method and device for control compensation and storage device
Cera Design, control, and motion planning of cable-driven flexible tensegrity robots
CN112621761B (en) Communication time lag-oriented mechanical arm system multi-stage optimization coordination control method
Baselizadeh et al. Adaptive Real-time Learning-based Neuro-Fuzzy Control of Robot Manipulators
Podvalny et al. Synergetic control of UAV on the basis of multi-alternative principles
Liu et al. Human-Simulated Intelligent Walking Control for Biped Robots
Bisig Modular Decentralized Genetic Fuzzy Control for Multi-UAV Slung Payloads

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant