CN114193458B

CN114193458B - Robot control method based on Gaussian process online learning

Info

Publication number: CN114193458B
Application number: CN202210088894.9A
Authority: CN
Inventors: 潘永平; 李威
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2024-04-09
Anticipated expiration: 2042-01-25
Also published as: CN114193458A

Abstract

The invention discloses a robot control method based on Gaussian process online learning, which comprises the steps of obtaining initial data through low-gain proportional-differential control, and constructing an initial Gaussian process online learning model according to the initial data, wherein the initial Gaussian process online learning model is used for carrying out preliminary control on a robot; updating a Gaussian process online learning model in each control period; and taking the expected position, speed and acceleration as inputs, predicting a plurality of moments according to the latest Gaussian process online learning model, and taking the moments as feedforward inputs of robot control so as to control the robot. The invention can improve tracking precision and reduce model updating frequency, and can be widely applied to the technical field of robot control.

Description

Robot control method based on Gaussian process online learning

Technical Field

The invention relates to the technical field of robot control, in particular to a robot control method based on online learning of a Gaussian process.

Background

The mechanical arm with high degree of freedom is widely applied to the fields of industry, medical treatment, logistics and the like, and is often required to have the capabilities of accurate control, flexible perception, man-machine interaction and the like, so that the mechanical arm needs to be accurately modeled and controlled. The high-degree-of-freedom mechanical arm is a nonlinear and high-coupling system and comprises non-modeling factors such as friction force, motor dynamics and the like, so that a dynamics model is difficult to accurately obtain in practice. For control tasks, kinetic model information is very important. For example, in the track following task, the simple PID control cannot guarantee the accurate completion of the track following task under the conditions of high speed and heavy load.

Gaussian process online learning (Gaussian Process Online Learning, GPOL) is a data-driven, non-parameterized learning method that updates a model in real time using continuously arriving data (also called streaming data). And in contrast to neural networks, this learning method has interpretability and provides uncertainty estimates, which are very important in the trajectory tracking task of the robotic arm. The main idea of gaussian process online learning is to maintain a set of Basis Vectors (BVs) from the stream data for continuous prediction.

For the robot-controlled online learning process, the prior art has the following disadvantages:

1. most of the current methods (such as neural network and gaussian process regression) are offline learning a model and then applying the model to track tracking. These methods require a lot of training data and training time, and the trained model may also be affected by other real-time factors (e.g., temperature, unknown load), which will greatly reduce the utility of the model.

2. The current Gaussian process online learning is applied to the mechanical arm, and the key challenge that the control frequency is too high to predict in time is also faced. I.e. the frequency at which the robotic arm requires to send a torque command is 1kHz, the model only has 1ms to predict at a time. The current online learning method of the Gaussian process focuses more on modeling the whole model (global optimum). This results in insufficient modeling capability in the case of high frequency control commands, which is not well applicable for trajectory tracking.

3. Some current online learning methods learn some data that may not be worth learning or even erroneous (e.g., occasional passes, locations that will not pass long after, stream data that arrives under sudden, non-static disturbances).

Disclosure of Invention

Therefore, the embodiment of the invention provides the robot control method with high tracking precision based on Gaussian process online learning.

One aspect of the present invention provides a robot control method based on online learning of a gaussian process, including:

acquiring initial data through low-gain proportional-differential control, and constructing an initial Gaussian process online learning model according to the initial data, wherein the initial Gaussian process online learning model is used for carrying out primary control on a robot;

updating a Gaussian process online learning model in each control period;

and taking the expected position, speed and acceleration as inputs, predicting a plurality of moments according to the latest Gaussian process online learning model, and taking the moments as feedforward inputs of robot control so as to control the robot.

Optionally, the acquiring initial data through low-gain proportional-differential control, and constructing an initial gaussian process online learning model according to the initial data includes:

configuring an initialization super parameter; wherein the initialization super parameters comprise proportional and differential gain, base vector set size, kernel function, model noise, variance threshold and forgetting speed parameters;

performing low-gain proportional-differential control according to the initialized super-parameters;

acquiring initial data according to the proportional-differential control;

and constructing an initial Gaussian process online learning model according to the initial data, and taking the predicted moment output by the Gaussian process online learning model as a feedforward term.

Optionally, the updating the online learning model of the gaussian process in each control period comprises:

taking the regenerated Hilbert space norm as a standard for measuring data points to measure the distance between a new data point and an original space;

when the calculated distance of the new data point is larger than a preset threshold value, adding the data point into the base vector set and updating the corresponding auxiliary variable;

and deleting useless points in the base vector set when the base vector set is larger than a preset size.

Optionally, in the step of adding the new data point to the set of basis vectors and updating the corresponding auxiliary variable when the calculated distance of the data point is greater than a preset threshold,

when a data point cannot enter a base vector set, the base vector set is adjusted according to the data point, so that the data point can enter the adjusted base vector set, and the size of the base vector set is not increased.

Optionally, when the set of base vectors is greater than a preset size, deleting the dead points in the set of base vectors includes:

configuring a counter and a forgetting condition, wherein the counter is increased by 1 when data points are newly added in the base vector set;

when the value of the counter reaches a preset value, the oldest data point is deleted, and the counter is set to zero.

Optionally, when the set of base vectors is greater than a preset size, deleting the dead points in the set of base vectors, further including:

and when the value of the counter does not reach the preset value, deleting the point closest to the current BVs as the useless point.

Optionally, the expected position, speed and acceleration are taken as inputs, a plurality of moments are predicted according to an online learning model of the latest Gaussian process, and the moments are taken as feedforward inputs of robot control;

for a new data point, calculating a predicted mean and a predicted variance for the data point;

taking the predicted average value as a feedforward term of the control moment;

combining the feedforward term with a corresponding feedback term to obtain a control command;

and inputting the control command into the robot to control the robot.

Another aspect of the embodiment of the present invention further provides a robot control device based on online learning in a gaussian process, including:

the robot control system comprises a first module, a second module and a third module, wherein the first module is used for acquiring initial data through low-gain proportional-differential control, constructing an initial Gaussian process online learning model according to the initial data, and the initial Gaussian process online learning model is used for carrying out primary control on a robot;

the second module is used for updating the online learning model of the Gaussian process in a rotating way in each control period;

and the third module is used for taking the expected position, speed and acceleration as inputs, predicting a plurality of moments according to the latest Gaussian process online learning model, and taking the moments as feedforward inputs of robot control so as to control the robot.

Another aspect of the embodiment of the invention also provides an electronic device, which includes a processor and a memory;

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

Another aspect of the embodiments of the present invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement a method as described above.

Embodiments of the present invention also disclose a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions may be read from a computer-readable storage medium by a processor of a computer device, and executed by the processor, to cause the computer device to perform the foregoing method.

According to the embodiment of the invention, initial data is obtained through low-gain proportional-differential control, an initial Gaussian process online learning model is constructed according to the initial data, and the initial Gaussian process online learning model is used for carrying out primary control on a robot; updating a Gaussian process online learning model in each control period; and taking the expected position, speed and acceleration as inputs, predicting a plurality of moments according to the latest Gaussian process online learning model, and taking the moments as feedforward inputs of robot control so as to control the robot. The invention can improve tracking precision and reduce model updating frequency.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of robot control provided in an embodiment of the present invention;

FIG. 2 is a schematic diagram of a data point deletion strategy according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating overall steps provided in an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In order to solve the problems in the prior art, an aspect of the present invention provides a robot control method based on online learning of a gaussian process, as shown in fig. 3, including:

updating a Gaussian process online learning model in each control period;

acquiring initial data according to the proportional-differential control;

and when the value of the counter does not reach the preset value, deleting the point closest to the current base vector set as the useless point.

taking the predicted average value as a feedforward term of the control moment;

and inputting the control command into the robot to control the robot.

the memory is used for storing programs;

the processor executes the program to implement the method as described above.

The following describes the specific implementation of the present invention in detail with reference to the drawings of the specification:

the invention is in the Gaussian processIn the online learning (Gaussian Process Online Learning, GPOL) process, the frequency of actually updating the model is reduced by adopting a round robin updating mode, and a certain forgetting condition is added, so that the learned model better performs in a complex track tracking task. For an n-joint manipulator, n independent GPOL's are used for modeling, the model inputs are vectors (vector total 3n elements) composed of position, velocity and acceleration, and the outputs are control moments, i.eWherein->Is the actual joint position, and for convenience, some variables are not explicitly written with the function parameter t. The method comprises the following specific steps:

step one: using only low gain proportional-derivative control, a small amount of data is collected in advance for initializing the gaussian process online learning model, and then switching is made to a model-based control mode in which the control law isIs the desired joint position, +.>Is the actual joint position, +.>And->Proportional gain and differential gain, respectively;

step two: in one control period, the online learning model of the Gaussian process is updated in a rotating way (namely, only one online learning model of the Gaussian process is updated at a time);

step three: according to the model learned so far, n moments are predicted as feed-forward inputs to the control with the desired position, speed and acceleration as inputs (control flow chart is shown in fig. 1).

Wherein the overall control law u=u of the present embodiment _ff +u _fb The first term is a feed forward termThe prediction output of the online learning model of the Gaussian process; the second item is the feedback item->

In step one, only a small amount of data is collected using low gain proportional-derivative control (e.gRespectively representing position information, speed information, acceleration information and control moment corresponding to the last moment, wherein the information can be used as a data pair after being combined) for normalizing input and output, and then adding the moment predicted by the model as a feed forward term. Note that the super parameters in which setting is required are: the size N of BVs (base vector set), the setting of this parameter directly affects the prediction speed; kernel k (x, x ')=exp (-0.5 (x-x') ^T The value of Λ in Λ (c-c')) is consistent with the input dimension, and if the noise ratio of the input data is large, the corresponding item can obtain a small point; sigma (sigma) _n Is model noise, and is not too small (e.g., less than 0.0001) or 0 according to the noise setting of the output term, otherwise, matrix inverse solution failure is caused; variance threshold epsilon _Tol For measuring whether a data point should incorporate BVs, it can be set to 0.01 for readjustment; h is used for setting forgetting speed and can be set to be 10% or 100% of N, when the expected track starts to change continuously, reasonable h can quickly learn the dynamics related to a new track, and the predicted moment can be ensured to be relatively smooth.

In step two, the regenerated Hilbert spatial norm (reproducing kernel Hilbert space, RKHS) is used as a criterion for measuring new data points, i.e. new data points x _* Distance from original spaceThe present embodiment defines K _XX Representing a gram matrix consisting of matrix X, namely:

where X represents a matrix of input vectors of N data points. K (K) _* And K _** There are also similar definitions.

(1) Defining a variance threshold E according to actual conditions _Tol (e.g., 0.01), if new data point x _* If the calculated gamma is larger than the threshold value, adding the point to enter the BVs, and updating the corresponding auxiliary variable:

α _m+1 ＝T _m+1 (α _m )+q _m+1 s _m+1

S _m+1 ＝T _m+1 (C _m K _X* )+e _m+1

wherein m is the size of the current BVs,can be understood as information weight vector, +.>It can be understood as an auxiliary kernel matrix, the following variables s, q, r, < >>And->For convenience of presentation, T _m+1 By expanding a vector to the m+1 dimension by adding 0 at the end, U _m+1 The matrix is extended to a matrix of m +1 x m +1 dimensions by adding 0 in the last row and in the last column +.>Represents a vector in which only the (m+1) th element is 1 and the other elements are 0, σ _n Is a model noise set in advance. If a new data point cannot enter a BVs, the BVs are trimmed based on this point alone without increasing the size of the BVs, which can keep the size of the BVs from making the prediction too slow, as follows:

α _m+1 ＝α _m +q _m+1 s _m+1

wherein,is the weight of the new data point represented by the original spatial data point.

(2) If BVs are larger than a certain predetermined size N (i.e., N+1), then the most useless point is selected from BVs for deletion. Here use of policy selection with forgetting requiresDeleted points (as shown in fig. 2). In this embodiment, a counter c is maintained, a preset forgetting condition h is selected, c is added with 1 each time a new point is added to enter BVs, and when c reaches h, the oldest point is directly selected for deletion, and then 0 is set to start counting. Otherwise, select the nearest point to other points in the current BVs to delete, the p _i ＝α _i /Q _ii To calculate the distance, here alpha _i Is the ith element of alpha, Q _ii Is the element of row i column i of Q. Then, according to the selected ith element, deleting the elements of the corresponding positions of alpha, C and Q (the dimensions are reduced to N and N multiplied by N at the moment), and correcting as follows:

α＝α-ρ _i Q

wherein Q is _i And C _i The ith columns for Q and C are shown, respectively. And deleting the corresponding data points in the BVs.

In step three, for a new point x _* When prediction is performed, the prediction mean value isPrediction variance isFeedforward term u directly using predicted mean as control torque _ff Plus an appropriate feedback term u _fb And sending the control command to the robot. Then according to the new->And the control command u of the last step is used as a new training point to update the online learning model of the Gaussian process.

It should be noted that, in this embodiment, first, the first 0.6 seconds is the proportional-differential control, and then the gaussian process online learning model is continuously updated according to the input torque and the measured position velocity acceleration, i.e., the second step above; and the Gaussian process online learning model predicts moment for control according to the expected position, speed and acceleration, and integrally realizes the process of learning and application.

In summary, the invention uses the gaussian process online learning mechanical arm inverse dynamics, and proposes a method for applying the gaussian process online learning in the track tracking task under high control frequency: i.e., reduce the frequency of model updates and poll for updates. In the online learning of the Gaussian process, the forgetting mechanism is added to improve the short-term tracking precision, and the unknown dynamics can be quickly learned when the track tracking task is switched.

Compared with the prior art, the invention has the following advantages:

1. the prior art is either off-line or limited by the requirement of high control frequency of the mechanical arm, and is difficult to be practically applied in practice.

2. Existing learning methods either concern that overall performance results in local performance starvation (not well predictive of recent dynamics) or cannot exclude learned erroneous data.

The invention solves the problem that the Gaussian process learning is difficult to be applied to the high-control-frequency mechanical arm, and the Gaussian process online learning has a theoretical basis of probability theory and can also provide uncertainty estimation. Secondly, the invention provides a strategy for increasing the forgetting speed, and compared with the prior art, the method has better local performance and higher tracking precision when tracking the complex track.

In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, while the invention is described in the context of functional modules, it should be appreciated that, unless otherwise indicated, one or more of the described functions and/or features may be integrated in a single physical device and/or software module or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be apparent to those skilled in the art from consideration of their attributes, functions and internal relationships. Accordingly, one of ordinary skill in the art can implement the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative and are not intended to be limiting upon the scope of the invention, which is to be defined in the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims

1. The robot control method based on Gaussian process online learning is characterized by comprising the following steps of:

obtaining initial data through low-gain proportional-differential control, and constructing an initial Gaussian process online learning model according to the initial data, wherein the initial Gaussian process online learning model is used for carrying out primary control on a robot; wherein the input parameter of the proportional-differential control is an initialization super parameter; the initialization super parameters comprise proportional and differential gain, base vector set size, kernel function, model noise, variance threshold and forgetting speed parameters;

updating a Gaussian process online learning model in each control period;

taking the expected position, speed and acceleration as inputs, predicting a plurality of moments according to the latest Gaussian process online learning model, and taking the moments as feedforward inputs of robot control so as to control the robot;

wherein, the online learning model of the Gaussian process is updated by rotating in each control period, which comprises the following steps:

modeling the mechanical arms of the n joints by adopting n independent Gaussian process online learning to obtain a Gaussian process online learning model;

the gaussian process online learning model is updated one at a time during each control period.

2. The robot control method based on gaussian process online learning according to claim 1, wherein said obtaining initial data by low gain proportional-derivative control, constructing an initial gaussian process online learning model from said initial data, comprises:

configuring an initialization super parameter;

acquiring initial data according to the proportional-differential control;

3. The robot control method based on online learning of gaussian process according to claim 1, wherein updating the online learning model of gaussian process in each control cycle comprises:

taking the regenerated kernel Hilbert space norm as a standard for measuring data points to measure the distance between a new data point and an original space;

when the base vector set is larger than a preset size, deleting useless points in the base vector set; wherein the dead point is a number of the data points that provide the least location information.

4. A robot control method based on Gaussian process online learning according to claim 3, wherein, when the calculated distance of a new data point is larger than a preset threshold, adding the data point to the basis vector set and updating the corresponding auxiliary variable,

5. The robot control method based on online learning of gaussian process according to claim 3, wherein when the set of basis vectors is larger than a preset size, deleting the dead points in the set of basis vectors comprises:

6. The method for controlling a robot based on online learning of a gaussian process according to claim 5, wherein when the set of basis vectors is larger than a predetermined size, the method for deleting dead points in the set of basis vectors further comprises:

7. The robot control method based on online learning of a gaussian process according to claim 1, wherein the expected position, velocity and acceleration are used as inputs, a plurality of moments are predicted according to an up-to-date online learning model of the gaussian process, and the moments are used as feedforward inputs of robot control;

taking the predicted average value as a feedforward term of the control moment;

and inputting the control command into the robot to control the robot.

8. A robot control device based on gaussian process online learning, comprising:

the robot control system comprises a first module, a second module and a third module, wherein the first module is used for obtaining initial data through low-gain proportional-differential control, constructing an initial Gaussian process online learning model according to the initial data, and the initial Gaussian process online learning model is used for carrying out primary control on a robot; wherein the input parameter of the proportional-differential control is an initialization super parameter; the initialization super parameters comprise proportional and differential gain, base vector set size, kernel function, model noise, variance threshold and forgetting speed parameters;

the third module is used for taking expected positions, speeds and accelerations as input, predicting a plurality of moments according to an online learning model of the latest Gaussian process, and taking the moments as feedforward input of robot control so as to control the robot;

9. An electronic device comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program implements the method of any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the storage medium stores a program that is executed by a processor to implement the method of any one of claims 1 to 7.