CN113787514A - Mechanical arm dynamic collision avoidance planning method - Google Patents
Mechanical arm dynamic collision avoidance planning method Download PDFInfo
- Publication number
- CN113787514A CN113787514A CN202110713794.6A CN202110713794A CN113787514A CN 113787514 A CN113787514 A CN 113787514A CN 202110713794 A CN202110713794 A CN 202110713794A CN 113787514 A CN113787514 A CN 113787514A
- Authority
- CN
- China
- Prior art keywords
- prediction function
- mechanical arm
- penalty
- environment
- environmental
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
- B25J9/1666—Avoiding collision or forbidden zones
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Feedback Control In General (AREA)
- Manipulator (AREA)
Abstract
The invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps: s1, constructing a system dynamic equation of the mechanical arm; s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation; s3, constructing an environment penalty model; s4, constructing a target prediction function according to the original prediction function and the environment penalty model; s5, optimizing the target prediction function to obtain a control sequence; s6, training the environment punishment model according to the control sequence until the environment punishment model converges; the method has the advantages of good control effect, strong robustness and support for online optimization.
Description
Technical Field
The invention relates to the technical field of robot manufacturing, in particular to a dynamic collision avoidance planning method for a mechanical arm.
Background
Most of industrial mechanical arms put into industrial production at present finish path planning by a manual teaching method so as to finish production processes such as welding, spraying, stacking, carrying, assembling, processing and the like. The manual teaching method has a certain role in dealing with a single and repeated specific task.
However, with the increasing diversification of production tasks and the increasing complexity of task scenes, the disadvantages of the existing manual teaching methods, such as complicated and tedious operation, poor generality, and low precision, are gradually exposed. And with the increasing requirements on the working efficiency and the working precision of the mechanical arm in the field of intelligent production, the lagging manual teaching method cannot meet the requirements. In order to solve the problem of path planning of the mechanical arm, many algorithms and their variants are developed, but most of these algorithms only support offline planning and cannot operate in a dynamic obstacle environment, so that the mechanical arm does not have the capability of coping with sudden hazards.
Disclosure of Invention
The invention aims to provide a mechanical arm dynamic collision avoidance planning method which is good in control effect, strong in robustness and supports online optimization.
In order to achieve the purpose, the invention discloses a mechanical arm dynamic collision avoidance planning method, which comprises the following steps:
s1, constructing a system dynamic equation of the mechanical arm;
s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;
s3, constructing an environment penalty model;
s4, constructing a target prediction function according to the original prediction function and the environment penalty model;
s5, optimizing the target prediction function to obtain a control sequence;
and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.
Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
Preferably, the step S6 specifically includes:
s61, initializing the weight of the environment penalty amount in the target prediction function;
s62, assigning the target prediction function with a preset continuous control quantity in a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Specifically, the step S63 specifically includes:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
Preferably, the step S1 specifically includes:
s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and S12, calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the controlled variable.
Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous time, the control increments are combinations of current control quantities and control quantities at a previous time, and the environmental penalty model is used for predicting the state of the mechanical arm in the control sequence.
Correspondingly, the invention also discloses a dynamic collision avoidance planning device for the industrial mechanical arm, which comprises:
a first construction unit configured to construct system dynamic equations of the robot arm;
a calculation unit configured to calculate an original prediction function of the robot arm from the system dynamic equation;
a second construction unit configured for constructing an environmental penalty model;
a third construction unit, configured to construct a target prediction function according to the original prediction function and an environmental penalty model;
an optimization unit configured to optimize the objective prediction function to obtain a control sequence;
a training unit configured to train the environmental penalty model according to the control sequence until the environmental penalty model converges.
Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.
Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.
Compared with the prior art, the target prediction function is constructed according to the original prediction function and the environment punishment model, the environment punishment model is trained according to the control sequence until the environment punishment model converges, the target prediction function is combined with the original prediction function and the environment punishment model, the target prediction function can be more actually fitted, the environment punishment model is trained through the control sequence until the environment punishment model converges, the target prediction function can be optimized in real time on line, and better control effect, robustness and stability are obtained.
Drawings
FIG. 1 is a block flow diagram of a mechanical arm dynamic collision avoidance planning method of the present invention;
FIG. 2 is a block diagram of the environmental penalty model of the present invention;
FIG. 3 is an algorithm flow diagram of a natural evolution strategy algorithm;
fig. 4 is a block diagram of the dynamic collision avoidance planning apparatus for an industrial robot arm according to the present invention.
Detailed Description
In order to explain technical contents, structural features, and objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
Referring to fig. 1, the method for planning dynamic collision avoidance of a mechanical arm of the present embodiment includes the following steps:
s1, constructing a system dynamic equation of the mechanical arm;
s2, calculating an original prediction function of the mechanical arm according to the system dynamic equation;
s3, constructing an environment penalty model;
s4, constructing a target prediction function according to the original prediction function and the environment penalty model;
s5, optimizing the target prediction function to obtain a control sequence;
and S6, training the environment punishment model according to the control sequence until the environment punishment model converges.
Preferably, the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
Preferably, the step S6 specifically includes:
s61, initializing the weight of the environment penalty amount in the target prediction function;
s62, assigning the target prediction function with a preset continuous control quantity in a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and S63, optimizing the weight of the environment penalty amount in the target prediction function according to the plurality of state amounts and the environment penalty amount until the environment penalty model converges.
Specifically, the step S63 specifically includes:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
Preferably, each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
Preferably, the step S1 specifically includes:
s11, discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and S12, calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the controlled variable.
Preferably, the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous time, the control increments are combinations of current control quantities and control quantities at a previous time, and the environmental penalty model is used for predicting the state of the mechanical arm in the control sequence.
Referring to fig. 1-3, the following description will be made in detail by taking a six-axis industrial robot as an example:
1. system dynamic equations and raw prediction functions:
the six-axis industrial mechanical arm configuration vector at the time t is q (t), the velocity vector is q '(t), and the acceleration vector is q "(t), wherein q (t), q' (t), and q" (t) are six-dimensional vectors and respectively represent angles, angular velocities, and angular accelerations of six joints of the mechanical arm.
Making a state quantity x (t) of the robot arm equal to [ q (t); q' (t), and the controlled variable u (t) q "(t), the following system dynamics equation can be constructed:
wherein the content of the first and second substances,
the matrix 0m × n is an m × n all-0 matrix, and I n × n is an n-dimensional unit matrix.
Because the above system dynamic equation is a continuous time system, but the continuous time system cannot be directly used as a prediction controller, the system dynamic equation of continuous time needs to be discretized to obtain:
the discretized system dynamic equation can be obtained by shifting the terms of the above equation:
wherein the content of the first and second substances,
in the process of optimization solution, the following constraints must also be satisfied:
in the prediction time domain NPThe recursion is carried out internally:
the above formula may be further expressed as:
wherein the content of the first and second substances,
from this, the original prediction function based on the system dynamic equation can be:
wherein Xg is the target state vector, and Qg and Qu are corresponding weight matrixes. By optimizing the original prediction function J at each time k, the optimal control quantity u (k) can be obtained. The following formula is rewritten into a form each containing U (k):
the original prediction function can be easily solved by quadratic programming.
2. A static/dynamic environment obstacle avoidance method based on a natural evolution strategy comprises the following steps:
the optimization problem of the mechanical arm contains hard constraints such as collision distance, collision state and the like, and the traditional optimization method is used for processing the constraints with low efficiency or difficult processing, so that the method adopts a static/dynamic obstacle avoidance method based on a natural evolution strategy and constructs a target prediction function:
wherein f (x), (k), U (k) are environment punishment models trained based on natural evolution strategies, fsAnd fdRespectively representing a static barrier network and a dynamic barrier network in an environment punishment model, mu and phi are coefficients, the structures of the two networks are completely consistent with a training method, and the only difference is that the dynamic barrier network fdThe state quantity of (2) also includes the state quantity of the dynamic obstacle. The environment penalty model takes the state quantity x (k) and the control sequence U (k) of the system dynamic equation as the input and takes the environment penalty quantity as the output. The structure of the network is shown in fig. 2.
The steps of training the environment penalty model are as follows:
1) initializing the weight of an environment penalty model f;
2) running MPC controller and optimizing function J within a certain timestable;
3) Obtaining a series of data tracks I (x (1), x (2), … x (k), U (1), U (2), … U (k), and corresponding environment penalty f1、f2、…、fkAnd an actual environmental penalty f1 real、f2 real、…、fk real)
4) Sequentially optimizing weight parameters in the environment punishment model f by using a natural evolution strategy according to data in the track I;
5) and repeating the steps 2) to 4) until the environment penalty model f converges.
In the original natural evolution strategy algorithm, the parameter theta needing to be updated comprises mu and sigma, and the mu and the sigma are two parameters of normal distribution, while the natural evolution strategy used in the invention fixes the parameter sigma and only focuses on the update of the parameter mu, so the parameter theta needing to be updated is also the parameter mu.
In the present invention, a flow of a natural evolution policy algorithm used is shown in fig. 3, and in the natural evolution policy algorithm shown in fig. 3, a value of a cost function f depends on an environment, and the cost function f may be set as the following expression:
model predicted environmental cost fkAnd the actual cost fk realThe larger the deviation of (a) is, the larger the modulus of the gradient is, and the smaller the opposite is. Through continuous iterative training, the cost function f can be finally approximate to a real environment cost model.
Referring to fig. 4, correspondingly, the present invention further discloses a dynamic collision avoidance planning apparatus for an industrial robot arm, which includes:
a first building unit 10 configured to build a system dynamic equation of the robot arm;
a calculation unit 20 configured to calculate an original prediction function of the robot arm according to the system dynamic equation;
a second construction unit 30 configured for constructing an environmental penalty model;
a third construction unit 40 configured to construct a target prediction function according to the original prediction function and the environmental penalty model;
an optimization unit 50 configured to optimize the objective prediction function to obtain a control sequence;
a training unit 60 configured to train the environmental penalty model in accordance with the control sequence until the environmental penalty model converges.
Correspondingly, the invention also discloses a storage medium, wherein the storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in the mechanical arm dynamic collision avoidance planning method.
Correspondingly, the invention also discloses a robot, wherein the robot is provided with the industrial mechanical arm dynamic collision avoidance planning device.
With reference to fig. 1-4, the present invention constructs a target prediction function according to an original prediction function and an environmental penalty model, trains an environmental penalty model according to a control sequence until the environmental penalty model converges, wherein the target prediction function combines the original prediction function and the environmental penalty model, so that the target prediction function can better fit the reality, trains the environmental penalty model through the control sequence until the environmental penalty model converges, so that the target prediction function can be optimized online in real time, and obtains better control effect, robustness and stability.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, therefore, the present invention is not limited by the appended claims.
Claims (7)
1. A mechanical arm dynamic collision avoidance planning method is characterized by comprising the following steps:
constructing a system dynamic equation of the mechanical arm;
calculating an original prediction function of the mechanical arm according to the system dynamic equation;
constructing an environment punishment model;
constructing a target prediction function according to the original prediction function and the environment penalty model;
optimizing the target prediction function to obtain a control sequence;
and training the environment punishment model according to the control sequence until the environment punishment model converges.
2. The method for planning dynamic collision avoidance of a mechanical arm according to claim 1, wherein the environment penalty model takes the joint state quantity and the control quantity of the mechanical arm as input quantities, and takes the environment penalty quantity of the system dynamic equation as output quantities.
3. The method for planning dynamic collision avoidance for a robot arm according to claim 2, wherein the training of the environment penalty model according to the control sequence until the environment penalty model converges specifically comprises:
initializing weights of environmental penalties in the target prediction function;
assigning the target prediction function with a preset continuous control quantity within a preset time to obtain a plurality of state quantities and environmental penalty quantities;
and optimizing the weight of the environmental penalty amount in the target prediction function according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
4. The dynamic mechanical arm collision avoidance planning method according to claim 3, wherein the weight of the environmental penalty amount in the target prediction function is optimized according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges, specifically:
and optimizing the weight of the environmental penalty amount in the target prediction function by a natural evolution strategy according to the plurality of state amounts and the environmental penalty amount until the environmental penalty model converges.
5. The dynamic mechanical arm collision avoidance planning method of claim 3, wherein each control quantity is assigned to the target prediction function to obtain a corresponding state quantity and an environmental penalty quantity.
6. The method for planning dynamic collision avoidance for a robot arm according to claim 1, wherein the calculating the original prediction function of the robot arm according to the system dynamic equation specifically comprises:
discretizing the system dynamic equation to obtain a discretized system dynamic equation;
and calculating to obtain an original prediction function of the mechanical arm according to the discretized system dynamic equation, wherein the original prediction function comprises parameters related to the control quantity.
7. The method for planning dynamic collision avoidance of a mechanical arm according to claim 1, wherein the control sequence includes a plurality of new state quantities and control increments in one-to-one correspondence, the new state quantities are combinations of current state quantities and state quantities at a previous moment, the control increments are combinations of current control quantities and control quantities at a previous moment, and the environment penalty model is used for predicting the state of the mechanical arm in the control sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110713794.6A CN113787514B (en) | 2021-06-25 | 2021-06-25 | Mechanical arm dynamic collision avoidance planning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110713794.6A CN113787514B (en) | 2021-06-25 | 2021-06-25 | Mechanical arm dynamic collision avoidance planning method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113787514A true CN113787514A (en) | 2021-12-14 |
CN113787514B CN113787514B (en) | 2022-12-23 |
Family
ID=78876981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110713794.6A Active CN113787514B (en) | 2021-06-25 | 2021-06-25 | Mechanical arm dynamic collision avoidance planning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113787514B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106737673A (en) * | 2016-12-23 | 2017-05-31 | 浙江大学 | A kind of method of the control of mechanical arm end to end based on deep learning |
CN110320809A (en) * | 2019-08-19 | 2019-10-11 | 杭州电子科技大学 | A kind of AGV track correct method based on Model Predictive Control |
US20200376663A1 (en) * | 2017-12-12 | 2020-12-03 | Pilz Gmbh & Co. Kg | Collision-Free Motion Planning for Closed Kinematics |
US20210138652A1 (en) * | 2019-10-30 | 2021-05-13 | Pilz Gmbh & Co. Kg | Robot Control Using Model-Predictive Interaction |
CN112809682A (en) * | 2021-01-27 | 2021-05-18 | 佛山科学技术学院 | Mechanical arm obstacle avoidance path planning method and system and storage medium |
CN112882469A (en) * | 2021-01-14 | 2021-06-01 | 浙江大学 | Deep reinforcement learning obstacle avoidance navigation method integrating global training |
-
2021
- 2021-06-25 CN CN202110713794.6A patent/CN113787514B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106737673A (en) * | 2016-12-23 | 2017-05-31 | 浙江大学 | A kind of method of the control of mechanical arm end to end based on deep learning |
US20200376663A1 (en) * | 2017-12-12 | 2020-12-03 | Pilz Gmbh & Co. Kg | Collision-Free Motion Planning for Closed Kinematics |
CN110320809A (en) * | 2019-08-19 | 2019-10-11 | 杭州电子科技大学 | A kind of AGV track correct method based on Model Predictive Control |
US20210138652A1 (en) * | 2019-10-30 | 2021-05-13 | Pilz Gmbh & Co. Kg | Robot Control Using Model-Predictive Interaction |
CN112882469A (en) * | 2021-01-14 | 2021-06-01 | 浙江大学 | Deep reinforcement learning obstacle avoidance navigation method integrating global training |
CN112809682A (en) * | 2021-01-27 | 2021-05-18 | 佛山科学技术学院 | Mechanical arm obstacle avoidance path planning method and system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113787514B (en) | 2022-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhong et al. | Value function approximation and model predictive control | |
Yang et al. | Stability analysis and implementation of a decentralized formation control strategy for unmanned vehicles | |
Anderson et al. | Challenging control problems | |
Bessa et al. | A biologically inspired framework for the intelligent control of mechatronic systems and its application to a micro diving agent | |
CN110703692B (en) | Multi-mobile-robot distributed predictive control method based on virtual structure method | |
CN108227506A (en) | A kind of robot admittance control system based on adaptive optimization method | |
Omran et al. | Optimal task space control design of a Stewart manipulator for aircraft stall recovery | |
CN113687659B (en) | Optimal trajectory generation method and system based on digital twinning | |
CN111240344A (en) | Autonomous underwater robot model-free control method based on double neural network reinforcement learning technology | |
JP2000347708A (en) | Method and device for controlling dynamic system by neural net and storage medium storing control program for dynamic system by neural net | |
Ananthraman et al. | Training backpropagation and CMAC neural networks for control of a SCARA robot | |
CN113787514B (en) | Mechanical arm dynamic collision avoidance planning method | |
Nagata et al. | Adaptive learning with large variability of teaching signals for neural networks and its application to motion control of an industrial robot | |
Okuma et al. | A neural network compensator for uncertainties of robotic manipulators | |
dos Santos et al. | Planning and learning for cooperative construction task with quadrotors | |
Toha et al. | Augmented feedforward and feedback control of a twin rotor system using real-coded MOGA | |
CN109794939B (en) | Parallel beam planning method for welding robot motion | |
CN113238482B (en) | Asymptotic tracking control method and system of single-arm robot system | |
CN113867157B (en) | Optimal trajectory planning method and device for control compensation and storage device | |
Cera | Design, control, and motion planning of cable-driven flexible tensegrity robots | |
CN112621761B (en) | Communication time lag-oriented mechanical arm system multi-stage optimization coordination control method | |
Baselizadeh et al. | Adaptive Real-time Learning-based Neuro-Fuzzy Control of Robot Manipulators | |
Podvalny et al. | Synergetic control of UAV on the basis of multi-alternative principles | |
Liu et al. | Human-Simulated Intelligent Walking Control for Biped Robots | |
Bisig | Modular Decentralized Genetic Fuzzy Control for Multi-UAV Slung Payloads |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |