CN113510709A - Industrial robot pose precision online compensation method based on deep reinforcement learning - Google Patents

Industrial robot pose precision online compensation method based on deep reinforcement learning Download PDF

Info

Publication number
CN113510709A
CN113510709A CN202110856844.6A CN202110856844A CN113510709A CN 113510709 A CN113510709 A CN 113510709A CN 202110856844 A CN202110856844 A CN 202110856844A CN 113510709 A CN113510709 A CN 113510709A
Authority
CN
China
Prior art keywords
robot
pose
reinforcement learning
deep reinforcement
coordinate system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110856844.6A
Other languages
Chinese (zh)
Other versions
CN113510709B (en
Inventor
肖文磊
孙子惠
姚开然
吴少宇
张鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110856844.6A priority Critical patent/CN113510709B/en
Publication of CN113510709A publication Critical patent/CN113510709A/en
Application granted granted Critical
Publication of CN113510709B publication Critical patent/CN113510709B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses an industrial robot pose accuracy online compensation method based on deep reinforcement learning, which comprises the following steps of: operating the robot in different running states, acquiring the actual pose of the robot, and performing error operation on the actual pose and the theoretical pose to serve as a training set; constructing a deep reinforcement learning network model, and determining an input and output layer of the learning network; completing the pre-training of the deep reinforcement learning network model, and training to obtain network model parameters; and predicting the pose deviation of the robot on line by using the trained deep reinforcement learning network model, realizing the return of the closed-loop real-time error compensation, and performing on-line compensation on the non-system error. The method realizes the interactive learning of the robot model and the current environment by using two networks with different functions together, dynamically adjusts the control parameters, and solves the problem of non-systematic error pose compensation of the industrial robot.

Description

Industrial robot pose precision online compensation method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of industrial robot pose precision online compensation, in particular to an industrial robot pose precision online compensation method based on deep reinforcement learning.
Background
Along with the development of the domestic high-precision manufacturing industry towards the direction of automation and intellectualization, the industrial robot has the characteristics of high efficiency, high quality, good environmental adaptability and the like, the industrial robot is more and more widely applied to automatic production such as spraying, welding, carrying, assembling and the like, the demand of the industrial robot is increased day by day, the technical innovation of the high-precision manufacturing industry is realized, the processing quality and the production efficiency are greatly improved, and the problem of breaking through the high-precision positioning of the industrial robot is a difficult problem which must be solved. The operation precision of the robot directly influences the operation effect of the robot, and especially when higher requirements are made on certain performance indexes in the operation process, higher requirements are also made for improving the operation precision of the robot.
The precision compensation method of the robot at present mainly has two types: error prediction compensation and error calibration compensation. The error prediction compensation method has high production cost, and the long-time movement of the robot can cause the abrasion of a mechanical structure, so the generated error can not be avoided, and the method is less applied in practice. The error calibration compensation mainly adopts the idea of error modeling of the system to obtain a mathematical model of non-system errors, thereby realizing dynamic error feedback. However, for a serial structure of an industrial robot, the dynamic calculation of the serial structure is very complex, and meanwhile, the influence of load change under temperature and different postures is introduced, so that the error model is necessarily very large and complex, and the calculation is very difficult. Meanwhile, because the influence of the non-system errors in a general industrial application environment is far smaller than the system errors, a relatively uniform model for online compensation of the non-system errors is not formed in the current industrial environment. In addition, the existing pose precision compensation method cannot realize the pose compensation of the on-line real-time robot target pose; although the absolute position accuracy and the attitude accuracy of the robot can be improved simultaneously by off-line pose compensation, on-line compensation cannot be performed. For example, patent CN107351089A discloses an optimized selection method for robot kinematic parameter calibration pose, but the algorithm convergence time of the method is affected by the number of iterations, the number of parameters to be identified, and the number of pose points, and is not easy to converge. Patent CN108608425A discloses an offline programming method for milling of a six-axis industrial robot, which needs to construct a complex one-dimensional robot pose optimization model, is difficult to ensure the similarity between a mathematical model and the actual robot cutting process, and reduces the actual upper limit of compensation effect. Patent CN112450820A discloses a pose optimization method, a mobile robot and a storage medium, which cannot realize prediction and compensation of robot attitude errors. Patent CN112536797A discloses a method for comprehensively compensating position and attitude errors of an industrial robot, which does not need to establish a complex motion error model, and improves the absolute position accuracy and attitude accuracy of the industrial robot at the same time, but the interpretability of the error prediction process is weak, and in addition, online prediction and compensation of non-systematic errors cannot be realized under different working application environments.
Disclosure of Invention
In order to solve the problems, the invention provides an industrial robot pose accuracy online compensation method based on deep reinforcement learning, which does not depend on a mathematical model of an industrial robot, realizes interactive learning of a robot model and the current environment by using two networks with different functions together, dynamically adjusts control parameters and solves the problem of non-systematic error pose compensation of the industrial robot. The invention adopts the following technical scheme:
an industrial robot pose accuracy online compensation method based on deep reinforcement learning comprises the following steps:
step 1, operating the robot in different running states, acquiring an actual pose of the robot, and performing error operation on the actual pose and a theoretical pose to serve as a training set;
step 2, constructing a deep reinforcement learning network model, and determining an input and output layer of the deep reinforcement learning network;
step 3, completing the pre-training of the deep reinforcement learning network model to obtain network model parameters;
and 4, predicting the pose deviation of the robot on line by using the trained deep reinforcement learning network model, realizing the real-time error compensation return of a closed loop, and performing on-line compensation on non-system errors.
Further, in the step 1, the actual pose of the robot is measured by using a laser tracker, and a coordinate system conversion matrix is adopted for a measurement coordinate system of the laser tracker and a base coordinate system of the robot
Figure BDA0003184444230000021
And (3) conversion is carried out:
Figure BDA0003184444230000022
wherein R is a rotation matrix:
R=(nC3,nC1×nC3,nC1)
in the formula, nC1Is C1Normal direction of the locus circle, nC3Is C3The normal direction of the trajectory circle;
q is a displacement vector and is obtained by adopting the following method:
locus circle C1And the locus circle C6Intersect at PTPoints, i.e. target ball position at zero position of the robot, locus circle C1Has a radius of R1(ii) a According to the self-reading of the robot, the coordinate P of the default tool center point under the robot base coordinate system can be obtained0=[X0,Y0,Z0]TDefinition of PTPoint relative to P0When the dot offset vector Δ is (Δ X, Δ Y, Δ Z), the vector O is obtained6OBUnder the base coordinate system, can be represented by the following formula:
Figure BDA0003184444230000031
wherein, Delta Y0=O6P0·nC3Orbit circle C6Center of circle O of6The coordinate vector under the measuring coordinate system of the laser tracker is
Figure BDA0003184444230000032
Further, a displacement vector Q':
Figure BDA0003184444230000033
to ensure that the error of the displacement vector is as small as possible, ten points P are randomly sampled in the robot spaceiBPiIs a coordinate vector of the target ball under a base coordinate system,CPiand calculating a displacement vector Q' for the coordinate vector of the target ball under the robot default tool coordinate system based on a least square fitting method.
Figure BDA0003184444230000034
Respectively calculating displacement vector errors delta E of the displacement vectors Q 'and Q' through a formula, and selecting the displacement vector with small error as a coordinate system conversion matrix
Figure BDA0003184444230000035
Displacement vector Q in (1):
Figure BDA0003184444230000036
Q=min{ΔE(Qi),Qi∈{Q′,Q″}}
furthermore, the deep reinforcement learning network model is an Actor-Critic network model, an Actor neural network calculates and generates a strategy according to the current environment state S, generates specific joint motion actions as the input of robot motion, and interacts with the environment; the criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation by a value, returning the measured value to the Actor neural network for learning, and performing parameter optimization to make the cost function converge to the global optimum.
Further, an end execution position TCP pose, a rigidity k, a temperature change T, a load eta, a time signal T and time signal functions sin (T) and ln (T) of the robot are used as the input of the deep reinforcement learning network, wherein the end execution position TCP pose is composed of a coordinate position (x, y, z) and an Euler angle orientation (alpha, beta, gamma); and taking the angle value delta joint _ angle (a1, a2, a3, a4, a5, a6) of each joint of the robot as the output of the deep reinforcement learning network.
Further, the step 3 specifically includes the following steps:
(1) taking the state characteristics of the industrial robot acquired in the step 1 and the corresponding pose error parameter mass data set as training samples, inputting the training samples into robot simulation interactive software, starting each training, taking the actual position as the actual pose of the robot sample data set, and taking the target position as the theoretical pose of the robot sample data set;
(2) the Actor-criticic network obtains a current TCP (transmission control protocol), rigidity k, temperature change T, a load eta state value, a time signal and a time function of the robot from a robot simulation interaction environment, calculates to obtain an angle correction value of each current joint and sends the angle correction value back to the robot simulation interaction software;
(3) after receiving the joint angle correction value, the robot simulation interactive software performs joint limit calculation on the robot, judges whether the robot is in limit, if so, executes joint motion correction, and if a certain robot joint is not in limit, ends the current alignment and transmits a message to an Actor-Critic network;
(4) obtaining the current robot pose and the target position, calculating a reward value to obtain a reward function R, and if the R value is too low, ending the current game; if the value of R is normal, continuing the current office alignment, and returning R to the Actor-Critic network for continuous learning;
and repeating the steps, and training to obtain the structure parameters of the Actor-Critic network model.
Further, the reward function R is calculated by the theoretical pose and the actual pose of the robot:
Figure BDA0003184444230000041
R=η*DM(P,P0)
wherein P is the current pose, P0For the pose of the target, sigma is P and P0η < 0:
Figure BDA0003184444230000042
compared with the prior art, the invention has the following beneficial effects:
(1) the method does not depend on a mathematical model of the industrial robot, but utilizes a reinforcement learning algorithm to find an optimal control strategy through continuous exploration and trial-and-error learning, realizes online compensation of non-system errors such as temperature change and rigidity, and solves the problem of the non-system errors caused by factors such as temperature and dynamic load change in the motion of the mechanical arm.
(2) The invention uses two networks with different functions to realize the interactive learning of the robot model and the current environment, namely an Actor neural network and a Critic neural network. The Actor neural network generates a robot motion strategy according to the current environment state S (comprising a TCP pose P, rigidity k, temperature change T and load eta) by calculation, generates specific joint motion actions as the output of the robot motion, and interacts with the environment. The criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation through a value, and returning the measured value to the Actor neural network for learning, so that parameter optimization is carried out, and the cost function is converged to the global optimum.
Drawings
FIG. 1 is a flow chart of an industrial robot pose accuracy online compensation method based on deep reinforcement learning;
FIG. 2 is a schematic diagram of an experimental platform for acquiring terminal pose position information and online pose accuracy compensation of an industrial robot;
FIG. 3 is a schematic diagram of a robot body and a coordinate system;
FIG. 4 is a diagram of the logic structure of an Actor-Critic network;
FIG. 5 is a flowchart of an algorithm for performing deep reinforcement learning network training in interaction with a robot in a robot simulation scenario.
Detailed Description
The present invention will be described in further detail below with reference to the drawings and examples, but the embodiments of the present invention are not limited thereto.
In a field working environment, robot positioning is affected by external factors such as complex and variable loads, dynamics and temperature changes, the action form of a system error changes, and a non-system error is introduced, so that the invention provides an industrial robot pose accuracy online compensation method based on deep reinforcement learning, as shown in fig. 1, the method comprises the following steps:
step 1: and operating the robot under different running states (load and temperature), measuring the actual pose, performing error operation on the actual pose and the theoretical pose, and collecting all data to serve as a training set. The method comprises the following specific steps:
the invention discloses an experimental platform for realizing acquisition of tail end pose position information and precision compensation of a mechanical arm, which comprises an industrial robot and a control cabinet thereof, a pose position measuring system device (a laser tracker and a pose measuring target) and a mobile workstation, wherein the industrial robot is of a six-degree-of-freedom open chain structure, a tail end actuator is arranged at the tail end of the robot, and the absolute positioning precision is 2-3mm, as shown in figure 2. The position of the robot is monitored in real time through a laser tracker, and the position is transmitted to a TwinCAT master station in real time based on an EtherCAT bus, so that a full closed loop is realized; the robot end effector six-degree-of-freedom pose information from the laser tracker and the motion control information from the industrial robot are acquired in real time, and the robot-laser tracker system state machine can be analyzed and controlled in real time.
For subsequent error calculation, a coordinate system needs to be unified, conversion between a laser tracker measurement coordinate system and an industrial robot base coordinate system is carried out, the pose data of the industrial robot coordinate system is converted into the laser tracker coordinate system, and axis measurement and multi-axis measurement are adoptedAnd calculating the coordinate origin of the base coordinate system by a point fitting combination method, thereby obtaining a conversion matrix. Calculating a displacement vector Q by using a multipoint fitting method, and ensuring the calculation precision of the displacement vector Q; the rotation matrix R is calculated using axis vector measurement,
Figure BDA0003184444230000051
a transformation matrix for the robot base coordinate system B to the laser tracker measurement coordinate system L:
Figure BDA0003184444230000061
specifically, as shown in fig. 3, the robot is moved to the HOME position, the target ball of the laser tracker is placed on the target holder of the end effector, and the a1 axis, the A3 axis and the a6 axis of the robot are independently rotated to fit to obtain a trajectory circle C1、C3And C6The center of the circle corresponds to O1、O3And O6And obtaining C1And C3Normal direction n of the locus circleC1And nC3The Z and Y directions of the base coordinate system are respectively calculated to obtain a rotation matrix R:
R=(nC3,nC1×nC3,nC1)
locus circle C1And the locus circle C6Intersect at PTPoints, i.e. target ball position at zero position of the robot, locus circle C1Has a radius of R1. According to the self reading of the robot, the coordinate P of the default tool center point (defined at the center of the sixth shaft flange plate of the robot) under the robot base coordinate system can be obtained0=[X0,Y0,Z0]TDefinition of PTPoint relative to P0When the dot offset vector Δ is (Δ X, Δ Y, Δ Z), the vector O is obtained6OBUnder the base coordinate system, can be represented by the following formula:
Figure BDA0003184444230000062
wherein, Delta Y0=O6P0·nC3Orbit circle C6Center of circle O of6The coordinate vector under the measuring coordinate system of the laser tracker is
Figure BDA0003184444230000063
Further, a displacement vector Q':
Figure BDA0003184444230000064
to ensure that the error of the displacement vector is as small as possible, ten points P are randomly sampled in the robot spaceiBPiIs a coordinate vector of the target ball under a base coordinate system,CPiand calculating a displacement vector Q' for the coordinate vector of the target ball under the robot default tool coordinate system based on a least square fitting method.
Figure BDA0003184444230000065
Respectively calculating displacement vector errors delta E of the displacement vectors Q 'and Q' through a formula, and selecting the displacement vector with small error as a coordinate system conversion matrix
Figure BDA0003184444230000066
Displacement vector Q in (1):
Figure BDA0003184444230000071
Q=min{ΔE(Qi),Qi∈{Q′,Q″}}
for non-systematic errors, they are generated during the robot use and will vary with factors such as working temperature, running time and motion attitude. The method comprises the steps of operating the industrial robot to move under different running states (rigidity, load and temperature), measuring the actual position of the industrial robot by using the laser tracker, further converting actual data measured by the laser tracker through coordinate system conversion matrix operation, converting the actual data to a robot coordinate system from the laser tracker coordinate system, and performing error operation on the actual pose and the theoretical pose of the robot to obtain a robot pose error.
And storing the data samples in a format of < pose error and robot running state (rigidity, load and temperature) > and constructing a robot motion error data set of a large sample through experimental acquisition.
Step 2: and constructing a deep reinforcement learning network model and determining the input and output layers of the learning network.
Fig. 4 is a logic structure diagram of an Actor-Critic network, which provides a deep reinforcement learning network design framework, and two networks with different functions are used to jointly implement interactive learning between a robot model and a current environment, namely an Actor neural network and a Critic neural network. The Actor neural network is essentially a DPG network, and generates specific joint motion actions as robot motion input according to a calculation generation strategy of a current environment state S (comprising a TCP pose P, rigidity k, temperature change T and load eta) so as to interact with the environment. The criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation through a value, and returning the measured value to the Actor neural network for learning, so that parameter optimization is carried out, and the cost function is converged to the global optimum.
The terminal execution position TCP pose of the robot is taken as an input layer of the network, the state values of the rigidity k, the temperature change T and the load eta of the robot are taken as input layers of the network, the terminal execution position TCP pose is composed of coordinate position (x, y, z) and Euler angle orientation (alpha, beta, gamma), however, the motion deviation of the robot is generally extremely small, if an error value corresponding to the theoretical position is taken as an output layer, the output and the input of the network are extremely similar, the learning difficulty is improved, and the learning result cannot be obtained correctly. Therefore, in order to make the input and output of the network as far as possible, the input and output are established into a nonlinear relationship, the values of the joint angles of the robot are used as the output of the network, the values are represented by delta joint _ angle (a1, a2, a3, a4, a5 and a6), and finally the TCP pose of the robot can also be obtained by performing positive kinematic calculation on the joint angles.
The non-systematic errors of the robot, such as rigidity, temperature change, load and the like, change slightly in a short period and are functions of time. If the influence factor data is directly used as network input, due to the fact that the change is always lacked, in the process of multiple updates of network parameters based on gradients, neuron parameters connected with the network parameters are considered to be low in learning value, numerical values are pressed to be small, and the neural parameters are quickly fixed, so that non-system error factors are ignored equivalently. Therefore, the time signal t and the time signal functions sin (t) and ln (t) are used as the input of the network, and because the influence factors of the time-varying signal have periodicity or a logarithmic relation, the number of neurons used by the reinforcement learning network is reduced, and the feature information can be learned more quickly. The final network inputs and outputs are shown in table 1.
Table 1 robot deep reinforcement learning network input/output table
Figure BDA0003184444230000081
And step 3: and completing the pre-training of the reinforcement learning network model, and training to obtain network model parameters.
A virtual training virtual scene of a reinforcement learning network model is built in the robot simulation interactive software, and the virtual training virtual scene is communicated with Python through a UDP protocol, so that a deep reinforcement learning training network and a robot simulation interactive scene are interactively trained, as shown in FIG. 5, the process of training the deep reinforcement learning network is as follows:
(1) and (2) inputting the state characteristic dimension S (comprising TCP pose P, rigidity k, temperature change T and load eta) of the industrial robot acquired in the step (1) and a pose error parameter massive data set corresponding to the state characteristic dimension S as training samples into the robot simulation interaction software, wherein each training is started, the actual position is the actual pose of the robot sample data set, and the target position is the theoretical pose of the robot sample data set.
(2) The Actor-Critic network obtains a current robot TCP, rigidity k, temperature change T, a load eta state value, a time signal and a time function from a robot simulation interactive environment, and initiatesChange the system state
Figure BDA0003184444230000082
Use in an Actor network
Figure BDA0003184444230000083
As an input, a calculation is performed to output a current motion angle correction value a of each joint { Δ joint _ angle (a1, a2, a3, a4, a5, a6) }, and this value is sent back to the robot simulation interaction software.
(3) And after receiving the joint angle correction value, the robot simulation interactive software performs limit calculation on each joint of the robot, judges whether the joint is in the limit, and if so, performs motion correction on each joint to obtain a new state S'. If a certain robot joint is not in the limit, ending the current game and transmitting the message to the reinforcement learning network.
(4) Use in Critic networks separately
Figure BDA0003184444230000084
The Q value output V (S), V (S') is obtained as input, TD error is calculated, the step size is alpha, the attenuation factor gamma, the search rate epsilon:
δ=R+γV(S′)-V(S)
using a mean square error loss function Σ (R + γ V (S') -V (S, ω))2And performing gradient update on the Critic network parameter omega, and updating an Actor network strategy parameter theta as follows:
Figure BDA0003184444230000091
score function for Actor
Figure BDA0003184444230000092
Either softmax or gaussian score function may be selected.
(4) Obtaining the pose and the target position of the current robot, calculating to obtain a reward function R, and calculating by using the Ma distance negative number:
Figure BDA0003184444230000093
R=η*DM(P,P0)
where P is the current pose, P0For the pose of the target, sigma is P and P0η < 0:
Figure BDA0003184444230000094
if the R value is too low, the current game is also ended, because the low R value indicates that the network output correction value is abnormal and is useless, and the game is ended to prevent the network from memorizing wrong operation data and carrying out learning. If the value of R is normal, continuing to check the game at present and returning R to the reinforcement learning network for continuous learning.
And repeating the steps so as to train and obtain the structure parameters of the Actor-Critic network model.
And 4, step 4: and calculating the current pose deviation aiming at the current robot state on line through the trained Actor-Critic network model to obtain a real-time pose error compensation value, realizing the return of the closed-loop real-time error compensation, and performing on-line compensation on non-system errors, thereby realizing the on-line compensation of the robot pose positioning precision.
The pose positioning accuracy online compensation scheme provided by the invention aims at the non-system errors in the online input track, realizes the online compensation of the non-system errors such as rigidity, temperature variation, load and the like by an online error reinforcement learning method, can improve the absolute positioning accuracy of the industrial robot, and realizes the real-time compensation and control of the motion pose of the robot. The compensation method has the advantages of no need of establishing a robot kinematic model, high calculation speed and universality, and provides guarantee for real-time online calibration of the subsequent robot and improvement of the accuracy and speed of online calibration.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. An industrial robot pose accuracy online compensation method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, operating the robot in different running states, acquiring an actual pose of the robot, and performing error operation on the actual pose and a theoretical pose to serve as a training set;
step 2, constructing a deep reinforcement learning network model, and determining an input and output layer of the deep reinforcement learning network;
step 3, completing the pre-training of the deep reinforcement learning network model to obtain network model parameters;
and 4, predicting the pose deviation of the robot on line by using the trained deep reinforcement learning network model, realizing the real-time error compensation return of a closed loop, and performing on-line compensation on non-system errors.
2. The method according to claim 1, wherein in the step 1, the actual pose of the robot is measured by using a laser tracker, and a coordinate system conversion matrix is adopted for a coordinate system of the laser tracker and a robot base coordinate system
Figure FDA0003184444220000011
And (3) conversion is carried out:
Figure FDA0003184444220000012
wherein R is a rotation matrix:
R=(nC3,nC1×nC3,nC1)
in the formula, nC1Is C1Normal direction of the locus circle, nC3Is C3The normal direction of the trajectory circle;
q is a displacement vector and is obtained by adopting the following method:
locus circle C1And the locus circle C6Intersect at PTPoints, i.e. target ball position at zero position of the robot, locus circle C1Has a radius of R1(ii) a According to the self-reading of the robot, the coordinate P of the default tool center point under the robot base coordinate system can be obtained0=[X0,Y0,Z0]TDefinition of PTPoint relative to P0When the dot offset vector Δ is (Δ X, Δ Y, Δ Z), the vector O is obtained6OBUnder the base coordinate system, can be represented by the following formula:
Figure FDA0003184444220000013
wherein, Delta Y0=O6P0·nC3Orbit circle C6Center of circle O of6The coordinate vector under the measuring coordinate system of the laser tracker is
Figure FDA0003184444220000014
Further, a displacement vector Q':
Figure FDA0003184444220000015
to ensure that the error of the displacement vector is as small as possible, ten points P are randomly sampled in the robot spaceiBPiIs a coordinate vector of the target ball under a base coordinate system,CPicalculating a displacement vector Q' for a coordinate vector of the target sphere under a robot default tool coordinate system based on a least square fitting method:
Figure FDA0003184444220000021
respectively calculating displacement vector errors delta E of the displacement vectors Q 'and Q' through a formula, and selecting the displacement vector with small error as a coordinate system conversion matrix
Figure FDA0003184444220000022
Displacement vector Q in (1):
Figure FDA0003184444220000023
Q=min{ΔE(Qi),Qi∈{Q′,Q″}}
3. the method according to claim 1, wherein the deep reinforcement learning network model is an Actor-Critic network model, and an Actor neural network calculates a generation strategy according to a current environment state S, generates a specific joint motion action as an input of robot motion, and interacts with the environment; the criticic neural network is used for evaluating the strategic joint action output generated by the Actor network in the state S, determining whether the situation is good or bad at the moment, measuring the situation by a value, returning the measured value to the Actor neural network for learning, and performing parameter optimization to make the cost function converge to the global optimum.
4. The method according to claim 3, characterized in that the end execution position TCP pose of the robot, consisting of coordinates position (x, y, z) and Euler angle orientation (α, β, γ), stiffness k, temperature variation T, load η, time signal T and time signal functions sin (T) and ln (T) are used as input to the deep reinforcement learning network; and taking the angle value delta joint _ angle (a1, a2, a3, a4, a5, a6) of each joint of the robot as the output of the deep reinforcement learning network.
5. The method according to claim 3 or 4, wherein the step 3 comprises the following steps:
(1) taking the state characteristics of the industrial robot acquired in the step 1 and the corresponding pose error parameter mass data set as training samples, inputting the training samples into robot simulation interactive software, starting each training, taking the actual position as the actual pose of the robot sample data set, and taking the target position as the theoretical pose of the robot sample data set;
(2) the Actor-criticic network obtains a current TCP (transmission control protocol), rigidity k, temperature change T, a load eta state value, a time signal and a time function of the robot from a robot simulation interaction environment, calculates to obtain an angle correction value of each current joint and sends the angle correction value back to the robot simulation interaction software;
(3) after receiving the joint angle correction value, the robot simulation interactive software performs joint limit calculation on the robot, judges whether the robot is in limit, if so, executes joint motion correction, and if a certain robot joint is not in limit, ends the current alignment and transmits a message to an Actor-Critic network;
(4) obtaining the current robot pose and the target position, calculating a reward value to obtain a reward function R, and if the R value is too low, ending the current game; if the value of R is normal, continuing the current office alignment, and returning R to the Actor-Critic network for continuous learning;
and repeating the steps, and training to obtain the structure parameters of the Actor-Critic network model.
6. The method according to claim 5, wherein the reward function R is calculated from the theoretical pose and the actual pose of the robot:
Figure FDA0003184444220000031
R=η*DM(P,P0)
wherein P is the current pose, P0For the pose of the target, sigma is P and P0η < 0:
Figure FDA0003184444220000032
CN202110856844.6A 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning Active CN113510709B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110856844.6A CN113510709B (en) 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110856844.6A CN113510709B (en) 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113510709A true CN113510709A (en) 2021-10-19
CN113510709B CN113510709B (en) 2022-08-19

Family

ID=78068761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110856844.6A Active CN113510709B (en) 2021-07-28 2021-07-28 Industrial robot pose precision online compensation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113510709B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113977429A (en) * 2021-11-17 2022-01-28 长春理工大学 Robot constant-force polishing system based on deep learning and polishing control method
CN114310873A (en) * 2021-12-17 2022-04-12 上海术航机器人有限公司 Pose conversion model generation method, control method, system, device and medium
CN114952849A (en) * 2022-06-01 2022-08-30 浙江大学 Robot trajectory tracking controller design method based on reinforcement learning and dynamics feedforward fusion
CN115556110A (en) * 2022-10-25 2023-01-03 华中科技大学 Robot pose error sensing method based on active semi-supervised transfer learning
CN115673596A (en) * 2022-12-28 2023-02-03 苏芯物联技术(南京)有限公司 Welding abnormity real-time diagnosis method based on Actor-Critic reinforcement learning model
CN116663204A (en) * 2023-07-31 2023-08-29 南京航空航天大学 Offline programming method, system and equipment for robot milling
CN117150425A (en) * 2023-07-10 2023-12-01 郑州轻工业大学 Segment erector motion state prediction method based on mechanism data fusion
CN117331342A (en) * 2023-12-01 2024-01-02 北京航空航天大学 FFRLS algorithm-based machine tool feed shaft parameter identification method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06114768A (en) * 1992-09-29 1994-04-26 Toyoda Mach Works Ltd Robot control device
US5566275A (en) * 1991-08-14 1996-10-15 Kabushiki Kaisha Toshiba Control method and apparatus using two neural networks
CN107421442A (en) * 2017-05-22 2017-12-01 天津大学 A kind of robot localization error online compensation method of externally measured auxiliary
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN110967042A (en) * 2019-12-23 2020-04-07 襄阳华中科技大学先进制造工程研究院 Industrial robot positioning precision calibration method, device and system
CN112497216A (en) * 2020-12-01 2021-03-16 南京航空航天大学 Industrial robot pose precision compensation method based on deep learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5566275A (en) * 1991-08-14 1996-10-15 Kabushiki Kaisha Toshiba Control method and apparatus using two neural networks
JPH06114768A (en) * 1992-09-29 1994-04-26 Toyoda Mach Works Ltd Robot control device
CN107421442A (en) * 2017-05-22 2017-12-01 天津大学 A kind of robot localization error online compensation method of externally measured auxiliary
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN110967042A (en) * 2019-12-23 2020-04-07 襄阳华中科技大学先进制造工程研究院 Industrial robot positioning precision calibration method, device and system
CN112497216A (en) * 2020-12-01 2021-03-16 南京航空航天大学 Industrial robot pose precision compensation method based on deep learning

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113977429A (en) * 2021-11-17 2022-01-28 长春理工大学 Robot constant-force polishing system based on deep learning and polishing control method
CN114310873A (en) * 2021-12-17 2022-04-12 上海术航机器人有限公司 Pose conversion model generation method, control method, system, device and medium
CN114310873B (en) * 2021-12-17 2024-05-24 上海术航机器人有限公司 Pose conversion model generation method, control method, system, equipment and medium
CN114952849A (en) * 2022-06-01 2022-08-30 浙江大学 Robot trajectory tracking controller design method based on reinforcement learning and dynamics feedforward fusion
CN114952849B (en) * 2022-06-01 2023-05-16 浙江大学 Robot track tracking controller design method based on reinforcement learning and dynamics feedforward fusion
CN115556110A (en) * 2022-10-25 2023-01-03 华中科技大学 Robot pose error sensing method based on active semi-supervised transfer learning
CN115673596A (en) * 2022-12-28 2023-02-03 苏芯物联技术(南京)有限公司 Welding abnormity real-time diagnosis method based on Actor-Critic reinforcement learning model
CN117150425B (en) * 2023-07-10 2024-04-26 郑州轻工业大学 Segment erector motion state prediction method based on mechanism data fusion
CN117150425A (en) * 2023-07-10 2023-12-01 郑州轻工业大学 Segment erector motion state prediction method based on mechanism data fusion
CN116663204A (en) * 2023-07-31 2023-08-29 南京航空航天大学 Offline programming method, system and equipment for robot milling
CN116663204B (en) * 2023-07-31 2023-10-17 南京航空航天大学 Offline programming method, system and equipment for robot milling
CN117331342B (en) * 2023-12-01 2024-02-02 北京航空航天大学 FFRLS algorithm-based machine tool feed shaft parameter identification method
CN117331342A (en) * 2023-12-01 2024-01-02 北京航空航天大学 FFRLS algorithm-based machine tool feed shaft parameter identification method

Also Published As

Publication number Publication date
CN113510709B (en) 2022-08-19

Similar Documents

Publication Publication Date Title
CN113510709B (en) Industrial robot pose precision online compensation method based on deep reinforcement learning
CN110193829B (en) Robot precision control method for coupling kinematics and rigidity parameter identification
CN110434851B (en) 5-degree-of-freedom mechanical arm inverse kinematics solving method
CN109782601B (en) Design method of self-adaptive neural network synchronous robust controller of coordinated mechanical arm
CN110154024B (en) Assembly control method based on long-term and short-term memory neural network incremental model
CN112536797A (en) Comprehensive compensation method for position and attitude errors of industrial robot
CN112109084A (en) Terminal position compensation method based on robot joint angle compensation and application thereof
CN113733088B (en) Mechanical arm kinematics self-calibration method based on binocular vision
CN113910218B (en) Robot calibration method and device based on kinematic and deep neural network fusion
CN112192614A (en) Man-machine cooperation based shaft hole assembling method for nuclear operation and maintenance robot
CN113878581B (en) Error prediction and real-time compensation method for five-degree-of-freedom hybrid robot
CN109176487A (en) A kind of cooperating joint section scaling method, system, equipment, storage medium
Hu et al. Robot positioning error compensation method based on deep neural network
CN112338913A (en) Trajectory tracking control method and system of multi-joint flexible mechanical arm
CN115139301A (en) Mechanical arm motion planning method based on topological structure adaptive neural network
Gao et al. Kinematic calibration for industrial robots using articulated arm coordinate machines
CN115781685A (en) High-precision mechanical arm control method and system based on reinforcement learning
CN113103262A (en) Robot control device and method for controlling robot
CN114888793B (en) Double-layer cooperative control method for multi-arm double-beam laser welding robot
Li et al. Inverse kinematics study for intelligent agriculture robot development via differential evolution algorithm
CN114030008B (en) Industrial robot practical training energy consumption measurement method based on data driving
CN115609584A (en) Mechanical arm motion planning method based on sigmoid punishment strategy
CN110480641B (en) Recursive distributed rapid convergence robust control method for mechanical arm
Jing et al. Research on neural network PID adaptive control with industrial welding robot in multi-degree of freedom
CN116079730B (en) Control method and system for operation precision of arm of elevator robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant