CN111546035B

CN111546035B - Online rapid gear assembly method based on learning and prediction

Info

Publication number: CN111546035B
Application number: CN202010263884.5A
Authority: CN
Inventors: 刘冬; 丛明; 袁利恒
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2021-07-02
Anticipated expiration: 2040-04-07
Also published as: CN111546035A

Abstract

A learning and prediction-based gear online rapid assembly method belongs to the technical field of robot intelligent assembly. The method establishes a relation between assembly characteristics and robot actions, forms a contact state-robot action pair data type, establishes an offline assembly database by using a small amount of offline data, predicts the robot actions by using an online gear assembly prediction algorithm based on Gaussian process regression, simultaneously performs performance evaluation, and adjusts robot assembly parameters and updates the robot assembly database by using an online assembly parameter learning algorithm based on an improved particle swarm optimization algorithm if the performance effect does not meet the requirement, thereby completing the automatic gear assembly task of the robot. The invention solves the problems of low efficiency and large workload in the early stage by adopting the traditional teaching mode, also solves the defects of high cost, long single assembly time, difficult deployment and the like by adopting a deep reinforcement learning mode, and improves the efficiency and the accuracy of the gear assembly task.

Description

Online rapid gear assembly method based on learning and prediction

Technical Field

The invention belongs to the technical field of intelligent assembly of industrial robots, and relates to a learning and prediction-based gear online rapid assembly method.

Background

In recent years, robots are widely applied to the fields of industrial production, medical treatment and the like, however, the traditional robot assembly mode is often a teaching or off-line learning mode to complete the robot assembly task. This approach requires a large amount of manual design or off-line experimentation at the outset. However, this method has problems such as low efficiency and poor generalization. With the development of the field of artificial intelligence, some deep reinforcement learning-based modes are used for robot assembly. In patent CN108161934A, the patent of lusterless university and great scholar discloses "a method for realizing robot multi-axis hole assembly by using deep reinforcement learning. The method adopts a deep reinforcement learning method, and continuously trains the network in a robot and environment interaction mode to achieve the aim of automatic assembly of the robot. On the basis, WangYongqing of university of great continental engineering, arm spread Guiben and the like disclose a method for realizing robot square part assembly based on deep reinforcement learning in patent CN110666793A, which makes the robot assembly task Markov, constructs a deep reinforcement learning neural network, trains and migrates the constructed deep reinforcement learning neural network, and realizes the task of robot automatic assembly. However, the above method has a very limited application range, firstly, a large amount of data is needed to train the neural network continuously, and the operation time is very long when each robot assembly task is performed, so that the requirement of the production beat is difficult to meet. Secondly, it needs to use a higher-precision vision sensor and a computer with stronger computing power, and the cost is higher. Finally, whether shaft hole assembly or square assembly, the workpiece is relatively simple, and the method is difficult to apply to gear assembly tasks. Therefore, the above method is difficult to be practically deployed in an assembly line.

Disclosure of Invention

The invention mainly solves the problems that the defects of the method are overcome, and the traditional gear assembling method is low in assembling efficiency, long in single prediction time, high in cost, difficult to deploy and the like in a deep reinforcement learning mode in the conventional gear assembling task. The invention provides a learning and prediction-based gear online rapid assembly method. The invention is based on a machine learning method, establishes a relation between assembly characteristics and robot actions, forms a contact state-robot action pair data type, establishes an offline assembly database by using a small amount of offline data, predicts the robot actions by using an online gear assembly prediction algorithm based on Gaussian process regression in the experimental process of a gear assembly platform, simultaneously performs performance evaluation, and adjusts robot assembly parameters and updates the robot assembly database by using an online assembly parameter learning algorithm based on an improved particle swarm optimization algorithm if the performance effect does not meet the requirement, thereby completing the automatic gear assembly task of the robot.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a gear online rapid assembly method based on learning and prediction comprises the following steps:

step 1: establishing a contact state-action pair between the assembly characteristic and the action of the robot;

the press-in force during press-fitting and the displacement of the end effector of the press-fitting machine are used as the assembly characteristics CS of the robot, the pose of the end effector of the robot in a tool coordinate system is used for representing the action RM of the robot, and the contact state-robot action pair at the moment i is as follows:

t_i＝[f_x,f_y,f_z,d,x,y,z,α,β,γ] (1)

i＝1,2,...,k (2)

wherein f is_x,f_y,f_zThe pressing force obtained for the press-fitting machine; d is the displacement of the end effector of the press; and X, Y, Z, alpha, beta and gamma are the pose of the robot end effector and are coordinates along the directions of an X axis, a Y axis and a Z axis and coordinates around the directions of the X axis, the Y axis and the Z axis respectively.

Step 2: an assembly experience database is constructed, a small amount of off-line assembly tests are carried out on a gear assembly platform, the obtained small amount of data are used as an initial training set, an initial assembly experience database is established according to the training set, off-line initial learning is carried out, and an initial off-line assembly parameter data set is obtained.

And step 3: constructing an online gear assembly prediction algorithm based on Gaussian process regression;

in the assembly platform, an initial assembly parameter data set is utilized, according to the current gear contact state, the action of the robot end effector is predicted by a Gaussian process regression algorithm, and the Gaussian process regression model considers the relation between the contact state x and the robot action y as a Gaussian process f, wherein f (x) is determined by a mean function m (x) and a covariance function K (x, x'):

m(x)＝E(f(x)) (3)

K(x,x')＝E[(f(x)-m(x))(f(x')-m(x'))] (4)

wherein x and x' are n-dimensional input vectors; f (x), f (x ') are gaussian process regression functions with x and x' as input, respectively; m (x), m (x ') are the mean value in the gaussian process with x as input and the mean value in the gaussian process with x' as input respectively.

For a model y ═ f (x) + ω, and ω to N,0(σ)²) The joint distribution of (1), where N is white noise in the gaussian process, and the independent white noise is considered to be f (x), a model of regression in the standard gaussian process can be established:

wherein x is_*And X is the current contact state and the prior contact state in the assembly experience database, respectively; k (X, X') is a positive definite covariance matrix of order n; k (X, X)_*)，K(x_*X) is the covariance between the prior contact state and the current contact state; k (x)_*,x_*) The covariance of the current contact state is self; i is_nIs an n-order identity matrix; sigma_nIs a hyperparameter in the return of the Gaussian process.

In order to reduce the computational difficulty of the high dimensional space in the above formula, a commonly used gaussian kernel function is used as the kernel function:

wherein x and x' are n-dimensional input vectors; gamma is a hyperparameter in the Gaussian kernel function; and by { sigma_nAnd gamma as an assembly parameter.

By deducing the condition distribution, the posterior distribution function f of the robot action pair can be obtained_*Comprises the following steps:

f_*|X,y,x_*～N(μ_*,σ_*) (7)

wherein, mu_*Representing the mean value of the predicted robot action value; sigma_*A variance representing a predicted value of the robot motion; the specific formula is as follows:

μ_*＝K(x_*,X)[K(X,X')+σ² _nI_n]^-1y (8)

σ_*＝K(x_*,x_*)-K(x_*,X)*[K(X,X')+σ² _nI_n]^-1*K(X,x_*) (9)

and 4, step 4: evaluating the assembly performance;

and (4) executing assembly action according to the motion mean value of the robot predicted by the algorithm in the step (3), and carrying out assembly performance evaluation according to the following formula.

Wherein y represents the assembly time of the assembly;

is the mean value of the assembly time in the assembly experience database.

And if the current assembly performance between the gear and the spline shaft meets the assembly requirement, predicting according to the current assembly parameters, adding the current assembly data into the assembly experience database constructed in the step 2, and finishing the assembly. Otherwise, entering step 5;

and 5: constructing an assembly parameter adjusting algorithm based on a particle swarm optimization algorithm for generating countermeasures;

firstly, in order to adjust the hyper-parameters in the Gaussian process regression algorithm constructed in the step 3, a particle swarm is initialized in a random distribution mode, a discriminator and a generator are introduced, and strategy selection in a generator model is guided through a discrimination model. The parameter adjustment algorithm of the particle swarm optimization algorithm consists of two parts: a discriminator G and a generator D. And the discriminator evaluates the quality of the particle swarm according to the prediction effect of the Gaussian process model and adopts the one-hot code for coding. The generator is used for acquiring distribution of the assembly parameters and generating different assembly parameter updating strategies.

Secondly, the assembling parameters of the particle swarm optimization algorithm form different assembling parameter updating strategies according to the advantages and disadvantages of the particles and the unique hot codes by utilizing a generator, a failed population is explored, the global searching capability of the failed population is enhanced, a superior population is developed, and the local searching capability of the superior population is enhanced.

And finally, according to the contact state-robot action pair constructed in the step 1, adopting an assembly parameter adjustment algorithm of the particle swarm optimization algorithm to search again to generate a proper assembly parameter, updating an assembly parameter data set, and returning to the step 3.

The invention has the advantages that:

the invention can effectively solve the problems of low production efficiency, poor generalization, poor quality consistency and the like of manual assembly and offline assembly, introduces the machine learning method into the gear assembly task, learns and adjusts the assembly parameters of the robot on line to form an assembly experience database, and finally completes the gear assembly task. In addition, the invention can effectively improve the gear assembly success rate and reduce the single assembly time under the conditions of a small amount of training data and lower cost, and has higher assembly efficiency and accuracy compared with the traditional method.

Drawings

FIG. 1 is a flow chart of a learning and prediction based gear online rapid assembly method provided in an embodiment of the present invention;

FIG. 2 is a gear assembly task;

FIG. 3 is a flow chart of a particle swarm optimization algorithm based on generation of a countermeasure;

FIG. 4 is a diagram of the convergence effect of the particle swarm optimization algorithm based on the generation countermeasure;

fig. 5 is a schematic view of the result of gear assembly.

Detailed Description

The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.

The invention provides a flow chart of a gear online rapid assembly method based on learning and prediction. Referring to fig. 1, the invention establishes a relation between assembly characteristics and robot actions based on a machine learning method, forms a contact state-robot action pair data type, establishes an offline assembly database by using a small amount of offline data, predicts the robot actions by using an online gear assembly prediction algorithm based on gaussian process regression in the gear assembly platform experiment process, simultaneously performs performance evaluation, and adjusts robot assembly parameters and updates the robot assembly database by using an online assembly parameter learning algorithm based on an improved particle swarm optimization algorithm if the performance effect does not meet the requirement, thereby completing the robot automatic gear assembly task.

Referring to fig. 1, in the embodiment, taking an assembly process between a gear and a spline shaft as an example, the gear online rapid assembly method includes the following steps:

step 1: a contact state-action pair between the fitting feature and the robot action is established. Firstly, converting the robot gear assembly task in FIG. 2 into a contact state-action pair prediction task between assembly characteristics and robot actions, adopting press-in force during press fitting and displacement of a press-fitting machine end effector as assembly characteristics CS of a robot, adopting the pose of the robot end effector in a tool coordinate system to represent a robot action RM, and the contact state-robot action pair at the moment i is as follows:

t_i＝[f_x,f_y,f_z,d,x,y,z,α,β,γ] (1)

i＝1,2,...,k (2)

In the embodiment, 10 moments of press-in force and press-in displacement are selected as assembly characteristics, the selected moments are uniformly distributed at the moment when the gear and the spline shaft are just in contact with each other, and the assembly force reaches a threshold value F_maxBetween 20KN times, wherein F_maxIndicates a set maximum press-in force, in orderThe safety is ensured, and the press-in force threshold value is set to be one third of the full range of the press-in machine.

Step 2: and constructing an assembly experience database. A small number of off-line assembly tests were performed in this example using a teach pendant robot. First, a teach pendant robot is used to place a gear on an assembly line into an assembly jig in a press machine. Next, a robot is used to grab the spline shaft to randomly find a suitable insertion attitude and rotation angle within the search range described in fig. 2. And finally, the press-mounting machine moves the press-mounting machine pressure head along the Z-axis direction of the user coordinate system to press the gear into the spline shaft to complete the off-line gear assembly process, the press-in force and the press-in displacement of the time are recorded, the pose of the robot end effector is used as the contact state-robot action pair in the step one, the press-mounting time of the time is recorded as an evaluation index of the assembly effect, and the assembly parameters are recorded as an assembly parameter data set. In the embodiment, the contact state-robot action pair data is used as an initial training set, the off-line assembly process between a small number of gears and spline shafts is repeated, and the off-line assembly data is added into the initial training set, so that an initial off-line assembly experience database is formed.

And step 3: and constructing an online gear assembly prediction algorithm based on Gaussian process regression. In the embodiment, firstly, a robot is operated to place a gear into a clamp, then the gear is pressed into a spline shaft grasped by the robot by a press-fitting machine, and the contact state in the initial off-line assembly experience database constructed in the step 2 and the current gear contact state returned by a force and displacement sensor in the press-fitting machine are respectively used as X and X in the Gaussian process_*And predicting the action of the appropriate robot end effector by using a Gaussian process regression algorithm. The Gaussian process regression model regards the relation between the contact state x and the robot action y as a Gaussian process f; f (x) is determined by its mean function m (x) and covariance function K (x, x'):

m(x)＝E(f(x)) (3)

K(x,x')＝E[(f(x)-m(x))(f(x')-m(x'))] (4)

wherein x and x' are n-dimensional input vectors; f (x), f (x ') are gaussian process regression functions with x and x' as input, respectively; m (x), m (x ') are the mean values in the gaussian process with x and x' as inputs, respectively. In this example n is 7.

For a model y ═ f (x) + ω, and ω to N0, (σ)²) The joint distribution of (1), where N is white noise in the gaussian process, and the independent white noise is considered to be f (x), a model of regression in the standard gaussian process can be established:

wherein x is_*And X is the current contact state and the prior contact state in the assembly experience database, respectively; k (X, X') is a positive definite covariance matrix of order n; k (X, X)_*)，K(x_*X) is the covariance between the prior contact state and the current contact state; k (x)_*,x_*) The covariance of the current contact state is self; i is_nIs an n-order identity matrix; sigma_nIs a hyperparameter in the return of the Gaussian process. In this example I_nIs an identity matrix of order 7.

In order to reduce the computational difficulty of the high dimensional space in the above formula and shorten the single assembly time, the gaussian process regression model is optimized by using the commonly used gaussian kernel function as the kernel function:

wherein x and x' are n-dimensional input vectors; gamma is a hyperparameter in the Gaussian kernel function; and by { sigma_nγ } as assembly parameter, n is set to 7 in this example, and the initial σ is_nIs 1 and the initial gamma is 1.

f_*|X,y,x_*～N(μ_*,σ_*) (7)

μ_*＝K(x_*,X)[K(X,X')+σ² _nI_n]^-1y (8)

σ_*＝K(x_*,x_*)-K(x_*,X)*[K(X,X')+σ² _nI_n]^-1*K(X,x_*) (9)

the present example is based on_*The current robot end effector pose is adjusted, after which the gear is pressed into the splined shaft gripped by the robot end effector, as shown in fig. 2.

And 4, step 4: and (3) evaluating the assembling performance of the gear and the spline shaft, and after the assembling action is executed according to the motion mean value of the robot predicted by the algorithm in the step (3), evaluating the assembling performance according to the following formula according to the assembling time of the current grabbing process.

Wherein y is the assembly time of the assembly;

is the mean value of the assembly time in the assembly experience database.

If the current performance meets the assembly requirement, the prediction is carried out according to the current assembly parameters, the current data is added into the assembly experience database in the step 2, the assembly is finished, and the final assembly result between the gear and the spline shaft is shown in fig. 5. Otherwise, entering step 5;

in order to optimize the assembly performance, the assembly parameters in the Gaussian process regression algorithm constructed in the step 3 are adjusted in an online mode, firstly, the assembly parameters are initialized in a random distribution mode, a discriminator and a generator are introduced, and strategy selection in the generated model is guided through a discriminant model. The population of particles in this example is the hyperparameter in the gaussian process regression. The parameter adjustment algorithm of the particle swarm optimization algorithm consists of two parts: a discriminator G and a generator D. And the discriminator evaluates the quality of the current assembly parameter according to the prediction effect of the Gaussian process model taking the current assembly parameter as the over-parameter, and adopts the one-hot code for coding. The one-hot code for the better assembly parameter is set to 000001 and the one-hot code for the worse assembly parameter is set to 000010. The generator is used for obtaining the distribution of the assembly parameters, and generating different assembly parameter updating strategies according to the advantages and disadvantages of the particle swarm and the unique hot codes judged by the judger.

Secondly, forming different assembly parameter updating strategies according to the advantages and disadvantages in the discriminator and the one-hot codes through the generator for the assembly parameters in the particle swarm optimization algorithm, exploring the poor gear assembly parameters, enhancing the global search capability of the gear assembly parameters, developing the good gear assembly parameters, and enhancing the local search capability of the gear assembly parameters.

And finally, according to the contact state-robot action pair constructed in the step 1, adopting the assembly parameters of the particle swarm optimization algorithm, re-searching according to the contact state between the current gear and the spline shaft and the prior contact state to generate appropriate assembly parameters, updating an assembly parameter data set, and returning to the step 3. The convergence effect of the fitting parameter adjustment algorithm of the particle group optimization algorithm in this example is shown in fig. 4.

The above description of exemplary embodiments has been presented only to illustrate the technical solution of the invention and is not intended to be exhaustive or to limit the invention to the precise form described. Obviously, many modifications and variations are possible in light of the above teaching to those skilled in the art. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to thereby enable others skilled in the art to understand, implement and utilize the invention in various exemplary embodiments and with various alternatives and modifications. It is intended that the scope of the invention be defined by the following claims and their equivalents.

Claims

1. A gear online rapid assembly method based on learning and prediction is characterized by comprising the following steps:

t_i＝[f_x,f_y,f_z,d,x,y,z,α,β,γ] (1)

i＝1,2,...,k (2)

wherein f is_x,f_y,f_zThe pressing force obtained for the press-fitting machine; d is the displacement of the end effector of the press; x, y, z, alpha, beta and gamma are pose positions of the robot end effector;

step 2: constructing an assembly experience database, performing an off-line assembly test on a gear assembly platform, taking the obtained data as an initial training set, establishing the initial assembly experience database according to the training set, and performing off-line initial learning to obtain an initial off-line assembly parameter data set;

and predicting the action of the robot end effector by using an initial assembly parameter data set and adopting a Gaussian process regression algorithm according to the current gear contact state on an assembly platform, wherein the Gaussian process regression model considers the relation between the contact state x and the robot action y as a Gaussian process f, and the f (x) is determined by a mean function m (x) and a covariance function K (x, x'):

m(x)＝E(f(x)) (3)

K(x,x')＝E[(f(x)-m(x))(f(x')-m(x'))] (4)

wherein x and x' are n-dimensional input vectors; f (x), f (x ') are gaussian process regression functions with x and x' as input, respectively; m (x), m (x ') is the mean value in the Gaussian process taking x as input and the mean value in the Gaussian process taking x' as input respectively;

for a model y ═ f (x) + ω, and

the joint distribution of (1), where N is white noise in the gaussian process, and f (x) is considered to be independent white noise, a model of regression in the standard gaussian process can be established:

wherein x is_*And X is the current contact state and the prior contact state in the assembly experience database, respectively; k (X, X') is a positive definite covariance matrix of order n; k (X, X)_*)，K(x_*X) is the covariance between the prior contact state and the current contact state; k (x)_*,x_*) The covariance of the current contact state is self; i is_nIs an n-order identity matrix; sigma_nThe hyper-parameters in the Gaussian process regression are adopted;

a commonly used gaussian kernel function is used as the kernel function:

wherein x and x' are n-dimensional input vectors; gamma is a hyperparameter in the Gaussian kernel function; and by { sigma_nγ } as an assembly parameter;

by deducing the condition distribution, the posterior distribution function f of the robot action pair is obtained_*Comprises the following steps:

f_*|X,y,x_*～N(μ_*,σ_*) (7)

μ_*＝K(x_*,X)[K(X,X')+σ² _nI_n]^-1y (8)

σ_*＝K(x_*,x_*)-K(x_*,X)*[K(X,X')+σ² _nI_n]^-1*K(X,x_*) (9)

and 4, step 4: evaluating the assembly performance;

performing assembly action according to the robot action mean value predicted by the algorithm in the step 3, and performing assembly performance evaluation according to the following formula;

wherein y represents the assembly time of the assembly;

the mean value of the assembly time in the assembly experience database;

if the current assembly performance between the gear and the spline shaft meets the assembly requirement, predicting according to the current assembly parameters, adding the current assembly data into the assembly experience database constructed in the step 2, and finishing the assembly; otherwise, entering step 5;

firstly, initializing a particle swarm in a random distribution mode for adjusting hyper-parameters in the Gaussian process regression algorithm constructed in the step 3, introducing a discriminator and a generator, and guiding strategy selection in a generator model through a discrimination model; the parameter adjustment algorithm of the particle swarm optimization algorithm consists of two parts: a discriminator G and a generator D; the discriminator evaluates the quality of the particle swarm according to the prediction effect of the Gaussian process model and adopts the one-hot code for coding; the generator is used for acquiring distribution of the assembly parameters and generating different assembly parameter updating strategies;

secondly, forming different assembly parameter updating strategies by utilizing the assembly parameters of the particle swarm optimization algorithm according to the advantages and disadvantages of the particles and the unique hot codes, exploring the failed population, enhancing the global search capability of the failed population, developing the dominant population and enhancing the local search capability of the dominant population;