CN113510709B - Industrial robot pose precision online compensation method based on deep reinforcement learning - Google Patents

Industrial robot pose precision online compensation method based on deep reinforcement learning Download PDF

Info

Publication number
CN113510709B
CN113510709B CN202110856844.6A CN202110856844A CN113510709B CN 113510709 B CN113510709 B CN 113510709B CN 202110856844 A CN202110856844 A CN 202110856844A CN 113510709 B CN113510709 B CN 113510709B
Authority
CN
China
Prior art keywords
robot
pose
reinforcement learning
deep reinforcement
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110856844.6A
Other languages
Chinese (zh)
Other versions
CN113510709A (en
Inventor
肖文磊
孙子惠
姚开然
吴少宇
张鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110856844.6A priority Critical patent/CN113510709B/en
Publication of CN113510709A publication Critical patent/CN113510709A/en
Application granted granted Critical
Publication of CN113510709B publication Critical patent/CN113510709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control

Abstract

The invention discloses an industrial robot pose accuracy online compensation method based on deep reinforcement learning, which comprises the following steps of: operating the robot in different running states, acquiring the actual pose of the robot, and performing error operation on the actual pose and the theoretical pose to serve as a training set; constructing a deep reinforcement learning network model, and determining an input and output layer of the learning network; completing the pre-training of the deep reinforcement learning network model, and training to obtain network model parameters; and predicting the pose deviation of the robot on line by using the trained deep reinforcement learning network model, realizing the return of the closed-loop real-time error compensation, and performing on-line compensation on the non-system error. The method realizes the interactive learning of the robot model and the current environment by using two networks with different functions together, dynamically adjusts the control parameters, and solves the problem of non-systematic error pose compensation of the industrial robot.

Description

Industrial robot pose precision online compensation method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of industrial robot pose precision online compensation, in particular to an industrial robot pose precision online compensation method based on deep reinforcement learning.
Background
Along with the development of the domestic high-precision manufacturing industry towards the direction of automation and intellectualization, the industrial robot has the characteristics of high efficiency, high quality, good environmental adaptability and the like, the industrial robot is more and more widely applied to automatic production such as spraying, welding, carrying, assembling and the like, the demand of the industrial robot is increased day by day, the technical innovation of the high-precision manufacturing industry is realized, the processing quality and the production efficiency are greatly improved, and the problem of breaking through the high-precision positioning of the industrial robot is a difficult problem which must be solved. The operation precision of the robot directly influences the operation effect of the robot, and especially when higher requirements are made on certain performance indexes in the operation process, higher requirements are also made for improving the operation precision of the robot.
The precision compensation method of the robot at present mainly has two types: error prediction compensation and error calibration compensation. The error prediction compensation method has high production cost, and the long-time movement of the robot can cause the abrasion of a mechanical structure, and the generated error can not be avoided, so the method is less applied in practice. The error calibration compensation mainly adopts the idea of error modeling of the system to obtain a mathematical model of non-system errors, thereby realizing dynamic error feedback. However, for a series structure of an industrial robot, the dynamic solution of the series structure is very complex, and meanwhile, the influence of load change under temperature and different postures is introduced, so that the error model is necessarily very large and complex, and the solution is very difficult. Meanwhile, because the influence of the non-system errors in a general industrial application environment is far smaller than the system errors, a relatively uniform model for online compensation of the non-system errors is not formed in the current industrial environment. In addition, the existing pose precision compensation method cannot realize the pose compensation of the on-line real-time robot target pose; while the off-line pose compensation can improve the absolute position accuracy and the attitude accuracy of the robot at the same time, the off-line pose compensation cannot be performed. For example, patent CN107351089A discloses an optimized selection method for robot kinematic parameter calibration pose, but the algorithm convergence time of the method is affected by the number of iterations, the number of parameters to be identified, and the number of pose points, and is not easy to converge. Patent CN108608425A discloses an offline programming method for milling of a six-axis industrial robot, which needs to construct a complex one-dimensional robot pose optimization model, is difficult to ensure the similarity between a mathematical model and the actual robot cutting process, and reduces the actual upper limit of compensation effect. Patent CN112450820A discloses a pose optimization method, a mobile robot and a storage medium, which cannot realize prediction and compensation of robot attitude errors. Patent CN112536797A discloses a method for comprehensively compensating position and attitude errors of an industrial robot, which does not need to establish a complex motion error model, and improves the absolute position accuracy and attitude accuracy of the industrial robot at the same time, but the interpretability of the error prediction process is weak, and in addition, online prediction and compensation of non-system errors under different working application environments cannot be realized.
Disclosure of Invention
In order to solve the problems, the invention provides an industrial robot pose accuracy online compensation method based on deep reinforcement learning, which does not depend on a mathematical model of an industrial robot, realizes interactive learning of a robot model and the current environment by using two networks with different functions together, dynamically adjusts control parameters and solves the problem of non-systematic error pose compensation of the industrial robot. The invention adopts the following technical scheme:
an industrial robot pose accuracy online compensation method based on deep reinforcement learning comprises the following steps:
step 1, operating the robot in different running states, acquiring an actual pose of the robot, and performing error operation on the actual pose and a theoretical pose to serve as a training set;
step 2, constructing a deep reinforcement learning network model, and determining an input and output layer of the deep reinforcement learning network;
step 3, completing the pre-training of the deep reinforcement learning network model to obtain network model parameters;
and 4, predicting the pose deviation of the robot on line by using the trained deep reinforcement learning network model, realizing the return of closed-loop real-time error compensation, and performing on-line compensation on non-system errors.
Further, in the step 1, the actual pose of the robot is measured by using a laser tracker, and a coordinate system conversion matrix is adopted for a measurement coordinate system of the laser tracker and a base coordinate system of the robot
Figure BDA0003184444230000021
And (3) conversion is carried out:
Figure BDA0003184444230000022
wherein R is a rotation matrix:
R=(n C3 ,n C1 ×n C3 ,n C1 )
in the formula, n C1 Is C 1 Normal direction of the locus circle, n C3 Is C 3 The normal direction of the trajectory circle;
q is a displacement vector and is obtained by adopting the following method:
locus circle C 1 And the locus circle C 6 Intersect at P T Points, i.e. target ball position at zero position of the robot, locus circle C 1 Has a radius of R 1 (ii) a According to the self reading of the robot, the coordinate P of the default tool center point under the robot base coordinate system can be obtained 0 =[X 0 ,Y 0 ,Z 0 ] T Definition of P T Point relative to P 0 When the dot offset vector Δ is (Δ X, Δ Y, Δ Z), the vector O is obtained 6 O B Under the base coordinate system, can be represented by the following formula:
Figure BDA0003184444230000031
wherein, Δ Y 0 =O 6 P 0 ·n C3 Orbit circle C 6 Center of circle O 6 The coordinate vector under the measuring coordinate system of the laser tracker is
Figure BDA0003184444230000032
Further, a displacement vector Q':
Figure BDA0003184444230000033
to ensure that the error of the displacement vector is as small as possible, ten points P are randomly sampled in the robot space iB P i Is a coordinate vector of the target ball under a base coordinate system, C P i and calculating a displacement vector Q' for the coordinate vector of the target sphere in the robot default tool coordinate system based on a least square fitting method.
Figure BDA0003184444230000034
Respectively calculating the displacement vector errors delta E of the displacement vectors Q 'and Q' through a formula, and selecting the displacement vector with small error as a coordinate system conversion matrix
Figure BDA0003184444230000035
Displacement vector Q in (1):
Figure BDA0003184444230000036
Q=min{ΔE(Q i ),Q i ∈{Q′,Q″}}
furthermore, the deep reinforcement learning network model is an Actor-Critic network model, an Actor neural network calculates and generates a strategy according to the current environment state S, generates specific joint motion actions as the input of robot motion, and interacts with the environment; the criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation by a value, returning the measured value to the Actor neural network for learning, and performing parameter optimization to make the cost function converge to the global optimum.
Further, an end execution position TCP pose, a rigidity k, a temperature change T, a load eta, a time signal T and time signal functions sin (T) and ln (T) of the robot are used as the input of the deep reinforcement learning network, wherein the end execution position TCP pose is composed of a coordinate position (x, y, z) and an Euler angle orientation (alpha, beta, gamma); and taking the angle value delta joint _ angle (a1, a2, a3, a4, a5 and a6) of each joint of the robot as the output of the deep reinforcement learning network.
Further, the step 3 specifically includes the following steps:
(1) taking the state characteristics of the industrial robot acquired in the step 1 and the corresponding pose error parameter mass data set as training samples, inputting the training samples into robot simulation interactive software, starting each training, taking the actual position as the actual pose of the robot sample data set, and taking the target position as the theoretical pose of the robot sample data set;
(2) the Actor-criticic network obtains a current TCP (transmission control protocol), rigidity k, temperature change T, a load eta state value, a time signal and a time function of the robot from a robot simulation interaction environment, calculates to obtain an angle correction value of each current joint and sends the angle correction value back to the robot simulation interaction software;
(3) after receiving the joint angle correction value, the robot simulation interactive software performs joint limit calculation on the robot, judges whether the robot is in limit, if so, executes joint motion correction, and if a certain robot joint is not in limit, ends the current alignment and transmits a message to an Actor-Critic network;
(4) obtaining the current robot pose and the target position, calculating a reward value to obtain a reward function R, and if the R value is too low, ending the current game; if the value of R is normal, continuing the current office, and returning R to the Actor-Critic network for continuous learning;
and repeating the steps, and training to obtain the structure parameters of the Actor-Critic network model.
Further, the reward function R is calculated by the theoretical pose and the actual pose of the robot:
Figure BDA0003184444230000041
R=η*D M (P,P 0 )
wherein P is the current pose, P 0 Is the pose of the target, sigma is P and P 0 η < 0:
Figure BDA0003184444230000042
compared with the prior art, the invention has the following beneficial effects:
(1) the method does not depend on a mathematical model of the industrial robot, but utilizes a reinforcement learning algorithm to find an optimal control strategy through continuous exploration and trial-and-error learning, realizes online compensation of non-system errors such as temperature change and rigidity, and solves the problem of the non-system errors caused by factors such as temperature and dynamic load change in the motion of the mechanical arm.
(2) The invention uses two networks with different functions to jointly realize the interactive learning of the robot model and the current environment, namely an Actor neural network and a Critic neural network. The Actor neural network generates a robot motion strategy according to the current environment state S (comprising a TCP pose P, rigidity k, temperature change T and load eta) by calculation, generates specific joint motion actions as the output of the robot motion, and interacts with the environment. The criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation through a value, and returning the measured value to the Actor neural network for learning, so that parameter optimization is carried out, and the cost function is converged to the global optimum.
Drawings
FIG. 1 is a flow chart of an industrial robot pose accuracy online compensation method based on deep reinforcement learning;
FIG. 2 is a schematic diagram of an experimental platform for acquiring terminal pose position information and online pose accuracy compensation of an industrial robot;
FIG. 3 is a schematic diagram of a robot body and a coordinate system;
FIG. 4 is a diagram of the logic structure of an Actor-Critic network;
FIG. 5 is a flowchart of an algorithm for performing deep reinforcement learning network training in interaction with a robot in a robot simulation scenario.
Detailed Description
The present invention will be described in further detail below with reference to the drawings and examples, but the embodiments of the present invention are not limited thereto.
In a field working environment, robot positioning is affected by external factors such as complex and variable loads, dynamics and temperature changes, the action form of a system error changes, and a non-system error is introduced, so that the invention provides an industrial robot pose accuracy online compensation method based on deep reinforcement learning, as shown in fig. 1, the method comprises the following steps:
step 1: the robot is operated under different running states (load and temperature), the actual pose is measured and error operation is carried out on the actual pose and the theoretical pose, and all data are collected to be used as a training set. The method comprises the following specific steps:
the invention discloses an experimental platform for realizing acquisition of pose position information and precision compensation of the tail end of a mechanical arm, which comprises an industrial robot and a control cabinet thereof, a pose position measuring system device (a laser tracker and a pose measuring target) and a movable workstation, wherein the industrial robot is of a six-degree-of-freedom open-chain structure, the tail end of the robot is provided with a tail end executor, and the absolute positioning precision is 2-3mm, as shown in figure 2. The position of the robot is monitored in real time through a laser tracker, and the position is transmitted to a TwinCAT master station in real time based on an EtherCAT bus, so that a full closed loop is realized; the robot end effector six-degree-of-freedom pose information from the laser tracker and the motion control information from the industrial robot are acquired in real time, and the robot-laser tracker system state machine can be analyzed and controlled in real time.
For subsequent error calculation, a coordinate system needs to be unified, conversion between a laser tracker measurement coordinate system and an industrial robot base coordinate system is carried out, pose data of the industrial robot coordinate system is converted into the laser tracker coordinate system, and the coordinate origin of the base coordinate system is calculated by adopting a method of combining axis measurement and multipoint fitting, so that a conversion matrix is obtained. Calculating a displacement vector Q by using a multipoint fitting method, and ensuring the calculation precision of the displacement vector Q; the rotation matrix R is calculated using axis vector measurement,
Figure BDA0003184444230000051
a transformation matrix for the robot base coordinate system B to the laser tracker measurement coordinate system L:
Figure BDA0003184444230000061
specifically, as shown in fig. 3, the robot is moved to the HOME position, the target ball of the laser tracker is placed on the target holder of the end effector, and the a1, A3 and a6 axes of the robot are independently rotated to obtain a trajectory circle C 1 、C 3 And C 6 The center of the circle corresponds to O 1 、O 3 And O 6 And obtaining C 1 And C 3 Normal direction n of the locus circle C1 And n C3 The Z and Y directions of the base coordinate system are respectively calculated to obtain a rotation matrix R:
R=(n C3 ,n C1 ×n C3 ,n C1 )
locus circle C 1 And the locus circle C 6 Intersect at P T Points, i.e. target ball position at zero position of the robot, locus circle C 1 Has a radius of R 1 . According to the self reading of the robot, the coordinate P of the default tool center point (defined at the center of the sixth shaft flange plate of the robot) under the robot base coordinate system can be obtained 0 =[X 0 ,Y 0 ,Z 0 ] T Definition of P T Point relative to P 0 If the offset vector Δ of the dot is (Δ X, Δ Y, Δ Z), the vector O is obtained 6 O B Under the base coordinate system, can be represented by the following formula:
Figure BDA0003184444230000062
wherein, Delta Y 0 =O 6 P 0 ·n C3 Orbit circle C 6 Center of circle O of 6 The coordinate vector under the measuring coordinate system of the laser tracker is
Figure BDA0003184444230000063
Further, a displacement vector Q' can be obtained:
Figure BDA0003184444230000064
to ensure that the error of the displacement vector is as small as possible, ten points P are randomly sampled in the robot space iB P i Is a coordinate vector of the target ball under a base coordinate system, C P i and calculating a displacement vector Q' for the coordinate vector of the target ball under the robot default tool coordinate system based on a least square fitting method.
Figure BDA0003184444230000065
Respectively calculating displacement vector errors delta E of the displacement vectors Q 'and Q' through a formula, and selecting the displacement vector with small error as a coordinate system conversion matrix
Figure BDA0003184444230000066
Displacement vector Q in (1):
Figure BDA0003184444230000071
Q=min{ΔE(Q i ),Q i ∈{Q′,Q″}}
for non-systematic errors, they are generated during the robot use and will vary with factors such as working temperature, running time and motion attitude. The method comprises the steps of operating the industrial robot to move under different running states (rigidity, load and temperature), measuring the actual position of the industrial robot by using the laser tracker, further converting actual data measured by the laser tracker through coordinate system conversion matrix operation, converting the actual data to a robot coordinate system from the laser tracker coordinate system, and performing error operation on the actual pose and the theoretical pose of the robot to obtain a robot pose error.
And storing the data samples in a format of < pose error, robot running state (rigidity, load and temperature) > and constructing a robot motion error data set of a large sample through experimental acquisition.
Step 2: and constructing a deep reinforcement learning network model and determining the input and output layers of the learning network.
Fig. 4 is a logical structure diagram of an Actor-Critic network, which provides a deep reinforcement learning network design framework, and two networks with different functions are used to jointly implement interactive learning between a robot model and a current environment, namely an Actor neural network and a Critic neural network. The Actor neural network is essentially a DPG network, and generates specific joint motion actions as robot motion input according to a calculation generation strategy of a current environment state S (comprising a TCP pose P, rigidity k, temperature change T and load eta) so as to interact with the environment. The criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation through a value, and returning the measured value to the Actor neural network for learning, so that parameter optimization is carried out, and the cost function is converged to the global optimum.
The terminal execution position TCP pose of the robot is taken as an input layer of the network, the state values of the rigidity k, the temperature change T and the load eta of the robot are taken as input layers of the network, the terminal execution position TCP pose is composed of coordinate position (x, y, z) and Euler angle orientation (alpha, beta, gamma), however, the motion deviation of the robot is generally extremely small, if an error value corresponding to the theoretical position is taken as an output layer, the output and the input of the network are extremely similar, the learning difficulty is improved, and the learning result cannot be obtained correctly. Therefore, in order to make the input and output of the network as far as possible, the input and output are established into a nonlinear relationship, the values of the joint angles of the robot are used as the output of the network, the values are represented by delta joint _ angle (a1, a2, a3, a4, a5 and a6), and finally the TCP pose of the robot can also be obtained by performing positive kinematic calculation on the joint angles.
The non-systematic errors of the robot, such as rigidity, temperature change, load and the like, change slightly in a short period and are functions of time. If the influence factor data is directly used as network input, due to the fact that the change is always lacked, in the process of multiple updates of network parameters based on gradients, neuron parameters connected with the network parameters are considered to be low in learning value, numerical values are pressed to be small, and the neural parameters are quickly fixed, so that non-system error factors are ignored equivalently. Therefore, the time signal t and the time signal functions sin (t) and ln (t) are used as the input of the network, and because the influence factors of the time-varying signal have periodicity or a logarithmic relation, the number of neurons used by the reinforcement learning network is reduced, and the feature information can be learned more quickly. The final network inputs and outputs are shown in table 1.
Table 1 robot deep reinforcement learning network input/output table
Figure BDA0003184444230000081
And step 3: and completing the pre-training of the reinforcement learning network model, and training to obtain network model parameters.
A virtual training virtual scene of a reinforcement learning network model is built in the robot simulation interactive software, and the virtual training virtual scene is communicated with Python through a UDP protocol, so that a deep reinforcement learning training network and a robot simulation interactive scene are interactively trained, as shown in FIG. 5, the process of training the deep reinforcement learning network is as follows:
(1) and (2) inputting the state characteristic dimension S (comprising TCP pose P, rigidity k, temperature change T and load eta) of the industrial robot acquired in the step (1) and a pose error parameter massive data set corresponding to the state characteristic dimension S as training samples into the robot simulation interaction software, wherein each training is started, the actual position is the actual pose of the robot sample data set, and the target position is the theoretical pose of the robot sample data set.
(2) The Actor-Critic network obtains the current state values, time signals and time functions of the TCP, the rigidity k, the temperature change T and the load eta of the robot from the simulation interaction environment of the robot, and initializes the state of the system
Figure BDA0003184444230000082
Use in an Actor network
Figure BDA0003184444230000083
As an input, a calculation is performed to output a current motion angle correction value a of each joint { Δ joint _ angle (a1, a2, a3, a4, a5, a6) }, and this value is sent back to the robot simulation interaction software.
(3) And after receiving the joint angle correction value, the robot simulation interactive software performs limit calculation on each joint of the robot, judges whether the joint angle correction value is within the limit range, and if so, executes motion correction of each joint to obtain a new state S'. If a certain robot joint is not in the limit, ending the current game and transmitting the message to the reinforcement learning network.
(4) Use in Critic networks separately
Figure BDA0003184444230000084
The Q value output V (S), V (S') is obtained as input, TD error is calculated, the step size is alpha, the attenuation factor gamma, the search rate epsilon:
δ=R+γV(S′)-V(S)
using a mean square error loss function Σ (R + γ V (S') -V (S, ω)) 2 And performing gradient update on the Critic network parameter omega, and updating an Actor network strategy parameter theta as follows:
Figure BDA0003184444230000091
score function for Actor
Figure BDA0003184444230000092
Softmax or gaussian score function may be selected.
(4) Obtaining the pose and the target position of the current robot, calculating to obtain a reward function R, and calculating by using the Ma distance negative number:
Figure BDA0003184444230000093
R=η*D M (P,P 0 )
where P is the current pose, P 0 For the pose of the target, sigma is P and P 0 η < 0:
Figure BDA0003184444230000094
if the R value is too low, the current game is also ended, because the low R value indicates that the network output correction value is abnormal and is useless, and the game is ended to prevent the network from memorizing wrong operation data and carrying out learning. If the value of R is normal, continuing to check the game at present and returning R to the reinforcement learning network for continuous learning.
And repeating the steps so as to train and obtain the structure parameters of the Actor-Critic network model.
And 4, step 4: and calculating the current pose deviation aiming at the current robot state on line through the trained Actor-Critic network model to obtain a real-time pose error compensation value, realizing the return of closed-loop real-time error compensation, and compensating non-system errors on line so as to realize the on-line compensation of the robot pose positioning precision.
The pose positioning accuracy online compensation scheme provided by the invention aims at the non-system errors in the online input track, realizes the online compensation of the non-system errors such as rigidity, temperature variation, load and the like by an online error reinforcement learning method, can improve the absolute positioning accuracy of the industrial robot, and realizes the real-time compensation and control of the motion pose of the robot. The compensation method has the advantages of no need of establishing a robot kinematic model, high calculation speed and universality, and provides guarantee for real-time online calibration of the subsequent robot and improvement of the accuracy and speed of online calibration.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (5)

1. An industrial robot pose accuracy online compensation method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, operating a robot in different running states, acquiring an actual pose of the robot, and performing error operation on the actual pose and a theoretical pose to serve as a training set;
step 2, constructing a deep reinforcement learning network model, and determining an input and output layer of the deep reinforcement learning network;
step 3, completing the pre-training of the deep reinforcement learning network model to obtain network model parameters;
step 4, predicting the pose deviation of the robot on line by using the trained deep reinforcement learning network model, realizing the real-time error compensation return of a closed loop, and performing on-line compensation on non-system errors;
the robot is a six-degree-of-freedom open chain structure, the actual pose of the robot is measured by using a laser tracker, and a coordinate system conversion matrix is adopted by a measurement coordinate system of the laser tracker and a base coordinate system of the robot
Figure FDA0003723073680000011
And (3) conversion is carried out:
Figure FDA0003723073680000012
wherein R is a rotation matrix:
R=(n C3 ,n C1 ×n C3 ,n C1 )
in the formula, a locus circle C is obtained by fitting an A1 axis, an A3 axis and an A6 axis of the rotary robot respectively and independently 1 、C 3 And C 6 The center of the circle corresponds to O 1 、O 3 And O 6 ,n C1 Is C 1 Normal direction of the locus circle, n C3 Is C 3 The normal direction of the trajectory circle;
q is a displacement vector and is obtained by adopting the following method:
locus circle C 1 And the locus circle C 6 Intersect at P T Points, i.e. target ball position at zero position of the robot, locus circle C 1 Has a radius of R 1 (ii) a According to the self-reading of the robot, the coordinate P of the default tool center point under the robot base coordinate system can be obtained 0 =[X 0 ,Y 0 ,Z 0 ] T Definition of P T Point relative to P 0 When the dot offset vector Δ is (Δ X, Δ Y, Δ Z), the vector O is obtained 6 O B Under the base coordinate system, can be represented by the following formula:
Figure FDA0003723073680000013
wherein, O B Is the origin, Δ Y, of the robot base coordinate system 0 =O 6 P 0 ·n C3 Orbit circle C 6 Center of circle O of 6 The coordinate vector under the measuring coordinate system of the laser tracker is
Figure FDA0003723073680000021
Further, a displacement vector Q':
Figure FDA0003723073680000022
in order to ensure that the error of the displacement vector is as small as possible, ten points P are randomly sampled in the robot space iB P i Is a coordinate vector of the target ball under a base coordinate system, C P i calculating a displacement vector Q' for a coordinate vector of the target sphere in a robot default tool coordinate system based on a least square fitting method:
Figure FDA0003723073680000023
respectively calculating the displacement vector errors delta E of the displacement vectors Q 'and Q' through a formula, and selecting the displacement vector with small error as a coordinate system conversion matrix
Figure FDA0003723073680000024
Displacement vector Q in (2):
Figure FDA0003723073680000025
2. the method according to claim 1, wherein the deep reinforcement learning network model is an Actor-Critic network model, and an Actor neural network generates a strategy according to the current environment state S, generates a specific joint motion action as an input of robot motion, and interacts with the environment; the criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation by a value, returning the measured value to the Actor neural network for learning, and performing parameter optimization to make the cost function converge to the global optimum.
3. The method according to claim 2, characterized in that the end execution position TCP pose of the robot, consisting of coordinates position (x, y, z) and euler angle orientation (α, β, γ), stiffness k, temperature variation T, load η, time signal T and time signal functions sin (T) and ln (T) are used as inputs to the deep reinforcement learning network; and taking the angle value delta joint _ angle (a1, a2, a3, a4, a5 and a6) of each joint of the robot as the output of the deep reinforcement learning network.
4. The method according to claim 2 or 3, wherein the step 3 specifically comprises the steps of:
(1) taking the state characteristics of the industrial robot acquired in the step 1 and the corresponding pose error parameter mass data set as training samples, inputting the training samples into robot simulation interactive software, starting each training, taking the actual position as the actual pose of the robot sample data set, and taking the target position as the theoretical pose of the robot sample data set;
(2) the Actor-Critic network obtains current TCP, rigidity k, temperature change T, load eta state values, time signals and time functions of the robot from a robot simulation interaction environment, calculates to obtain current angle correction values of all joints and sends the angle correction values back to the robot simulation interaction software;
(3) after receiving the joint angle correction value, the robot simulation interactive software performs joint limit calculation on the robot, judges whether the robot is in limit, if so, executes joint motion correction, and if a certain robot joint is not in limit, ends the current alignment and transmits a message to an Actor-Critic network;
(4) acquiring the pose and the target position of the current robot, calculating a reward value to obtain a reward function R, and if the R value is too low, ending the current game; if the value of R is normal, continuing the current office, and returning R to the Actor-Critic network for continuous learning;
and repeating the steps, and training to obtain the structure parameters of the Actor-Critic network model.
5. The method according to claim 4, characterized in that the reward function R is calculated from the theoretical pose and the actual pose of the robot:
Figure FDA0003723073680000031
R=η*D M (P,P 0 )
wherein P is the current pose, P 0 For the pose of the target, sigma is P and P 0 η < 0:
Figure FDA0003723073680000032
CN202110856844.6A 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning Active CN113510709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110856844.6A CN113510709B (en) 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110856844.6A CN113510709B (en) 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113510709A CN113510709A (en) 2021-10-19
CN113510709B true CN113510709B (en) 2022-08-19

Family

ID=78068761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110856844.6A Active CN113510709B (en) 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113510709B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113977429A (en) * 2021-11-17 2022-01-28 长春理工大学 Robot constant-force polishing system based on deep learning and polishing control method
CN114310873A (en) * 2021-12-17 2022-04-12 上海术航机器人有限公司 Pose conversion model generation method, control method, system, device and medium
CN114952849B (en) * 2022-06-01 2023-05-16 浙江大学 Robot track tracking controller design method based on reinforcement learning and dynamics feedforward fusion
CN115673596B (en) * 2022-12-28 2023-03-17 苏芯物联技术(南京)有限公司 Welding abnormity real-time diagnosis method based on Actor-Critic reinforcement learning model
CN116663204B (en) * 2023-07-31 2023-10-17 南京航空航天大学 Offline programming method, system and equipment for robot milling
CN117331342B (en) * 2023-12-01 2024-02-02 北京航空航天大学 FFRLS algorithm-based machine tool feed shaft parameter identification method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06114768A (en) * 1992-09-29 1994-04-26 Toyoda Mach Works Ltd Robot control device
US5566275A (en) * 1991-08-14 1996-10-15 Kabushiki Kaisha Toshiba Control method and apparatus using two neural networks
CN107421442A (en) * 2017-05-22 2017-12-01 天津大学 A kind of robot localization error online compensation method of externally measured auxiliary
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN110967042A (en) * 2019-12-23 2020-04-07 襄阳华中科技大学先进制造工程研究院 Industrial robot positioning precision calibration method, device and system
CN112497216A (en) * 2020-12-01 2021-03-16 南京航空航天大学 Industrial robot pose precision compensation method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5566275A (en) * 1991-08-14 1996-10-15 Kabushiki Kaisha Toshiba Control method and apparatus using two neural networks
JPH06114768A (en) * 1992-09-29 1994-04-26 Toyoda Mach Works Ltd Robot control device
CN107421442A (en) * 2017-05-22 2017-12-01 天津大学 A kind of robot localization error online compensation method of externally measured auxiliary
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN110967042A (en) * 2019-12-23 2020-04-07 襄阳华中科技大学先进制造工程研究院 Industrial robot positioning precision calibration method, device and system
CN112497216A (en) * 2020-12-01 2021-03-16 南京航空航天大学 Industrial robot pose precision compensation method based on deep learning

Also Published As

Publication number Publication date
CN113510709A (en) 2021-10-19

Similar Documents

Publication Publication Date Title
CN113510709B (en) Industrial robot pose precision online compensation method based on deep reinforcement learning
CN110193829B (en) Robot precision control method for coupling kinematics and rigidity parameter identification
CN110434851B (en) 5-degree-of-freedom mechanical arm inverse kinematics solving method
CN110154024B (en) Assembly control method based on long-term and short-term memory neural network incremental model
CN112536797A (en) Comprehensive compensation method for position and attitude errors of industrial robot
CN112605996A (en) Model-free collision avoidance control method for redundant mechanical arm
CN112109084A (en) Terminal position compensation method based on robot joint angle compensation and application thereof
CN109176487A (en) A kind of cooperating joint section scaling method, system, equipment, storage medium
Hu et al. Robot positioning error compensation method based on deep neural network
CN112192614A (en) Man-machine cooperation based shaft hole assembling method for nuclear operation and maintenance robot
CN115139301A (en) Mechanical arm motion planning method based on topological structure adaptive neural network
Gao et al. Kinematic calibration for industrial robots using articulated arm coordinate machines
CN113910218A (en) Robot calibration method and device based on kinematics and deep neural network fusion
CN114888793B (en) Double-layer cooperative control method for multi-arm double-beam laser welding robot
CN114030008B (en) Industrial robot practical training energy consumption measurement method based on data driving
Li et al. Inverse kinematics study for intelligent agriculture robot development via differential evolution algorithm
CN114800529A (en) Industrial robot positioning error online compensation method based on fixed-length memory window incremental learning and incremental model reconstruction
CN110480641B (en) Recursive distributed rapid convergence robust control method for mechanical arm
CN114012733A (en) Mechanical arm control method for scribing PC (personal computer) component mold
Jing et al. Research on neural network PID adaptive control with industrial welding robot in multi-degree of freedom
Yang et al. Multi-degree-of-freedom joint nonlinear motion control with considering the friction effect
CN113103262A (en) Robot control device and method for controlling robot
Wang et al. Optimizing Robot Arm Reaching Ability with Different Joints Functionality
Bai et al. Apply fuzzy interpolation method to calibrate parallel machine tools
Du Shanghai et al. Research on Positioning Error Compensation of Industrial Robot Based on BP Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant