CN108153153A

CN108153153A - A kind of study impedance control system and control method

Info

Publication number: CN108153153A
Application number: CN201711393308.7A
Authority: CN
Inventors: 夏桂华; 李超; 张智; 谢心如; 朱齐丹; 蔡成涛; 吕晓龙; 刘志林; 班瑞阳
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2017-12-19
Filing date: 2017-12-19
Publication date: 2018-06-12
Anticipated expiration: 2037-12-19
Also published as: CN108153153B

Abstract

The present invention is to provide a kind of study impedance control system and control methods.Mainly include impedance controller, four part of the Gaussian process model of system, impedance control strategy and policy learning algorithm.Any priori of environment is not needed to, the Gaussian process model of system is built according to interaction data, long-term reasoning and planning are carried out to system in a manner of Bayes.Complicated power control task can be completed with minimum interaction time study in the limited more useful informations of observation extracting data.By adding in energy loss item in cost function, realize the tradeoff of error and energy, make robot that there is good submissive ability.Finally, the impedance control strategy obtained can adjust target rigidity and damping parameter simultaneously according to system mode in the different phase of task.It the composite can be widely applied in the Shared controls tasks such as double mechanical arms assembling, the cooperation of more mechanical arms and Biped Robot Control, ensure safety and the robustness of interactive operation.

Description

Learning variable impedance control system and control method

Technical Field

The invention relates to compliance control of a robot, in particular to a high-efficiency learning variable impedance control system and a control method.

Background

With increasing application of robots to contact operation tasks in non-structural environments, such as compliant assembly, human-computer interaction and the like, due to complex tasks and variable and unpredictable contact environments, it is difficult to establish an accurate dynamic model of a system, so that how to enable the robot to safely, efficiently and quickly execute new tasks and accurately control contact forces in different environments is a new challenge faced by the robot. The impedance control is widely applied to the robot interaction control task due to good adaptability and robustness. Since the force control characteristics are determined by the impedance parameters of the robot, the selection of the inertia, stiffness and damping parameters is highly task dependent, and often difficult to infer a priori, in order to obtain good control performance, it is often necessary to have in-depth knowledge of the controller design and its parameters, and still manually adjust the control parameters. And especially for complex tasks, the fixed-parameter impedance control method is difficult to realize the target task because the environmental conditions usually contain some non-linear and time-varying factors. If the impedance control parameters can be dynamically planned and adjusted according to the change of tasks and environments, the control performance is obviously better than the condition that the impedance control parameters are fixed. Therefore, learning the variable impedance control capability is the key to a modern robot to safely and quickly complete complex operation tasks.

For operational tasks requiring force control, the fewer the learning exploration times the better, since a large number of physical interaction attempts may cause damage to the robot or workpiece, and a large amount of sampled data is time consuming and expensive, which is impractical. Therefore, the learning efficiency of the learning variable impedance control algorithm is improved, the required trial and error interaction times are reduced, and the method is very important for the robot to quickly learn and complete a new task.

Disclosure of Invention

The invention aims to provide a learning variable impedance control system which is high in learning efficiency, can be widely applied to compliance control tasks such as two-mechanical-arm assembly, multi-mechanical-arm cooperation and robot gait control and ensures the safety and robustness of interactive operation. The invention aims to provide a control method based on a learning variable impedance control system.

The learning variable impedance control system of the invention comprises a variable impedance controller, a Gaussian process model module of the system, a variable impedance control strategy module and a strategy learning algorithm module,

a Gaussian process model module of the system establishes a Gaussian process model of the system according to the actual position of the tail end of the robot and the information of the force sensor, and the Gaussian process model is used as a transformation dynamic model of the control system;

the strategy learning algorithm module infers and predicts the long-term distribution of the state of the control system through a cascade one-step prediction process according to a Gaussian process model of the system, and then performs internal simulation and predicts the behavior of the control system according to the model;

the variable impedance control strategy module calculates impedance parameters, namely target rigidity and damping coefficient, in real time according to the state of a control system, namely the tail end position of the mechanical arm and the actual contact force, and transmits the impedance parameters to the variable impedance controller;

and the variable impedance controller corrects the expected reference track according to the time-varying target stiffness, the damping coefficient and the current contact force error, and outputs the expected position increment of the tail end of the mechanical arm.

The control method of the learning variable impedance control system based on the invention comprises the following steps:

(1) random initialization control variable u ═ K_d(t)B_d(t)]And acts on the control system to record the initial data XF_a]In which K is_d(t) target stiffness, B_d(t) damping coefficient, X mechanical arm end position, F_aThe actual contact force;

(2) based on historical sample data XF_a]Establishing a Gaussian process dynamic model of the system as a transformation dynamic model of the system;

(3) searching an optimal impedance control strategy pi (theta) by using a strategy learning algorithm;

(4) setting strategy pi x ← pi (theta), applying the strategy to a variable impedance controller for force control, and collecting new data [ XF ← pi (theta) ]_a]；

(5) And (5) repeating the steps (2-4) until a satisfactory force tracking effect is obtained, and learning to obtain a satisfactory control strategy.

The control method of the learning variable impedance control system based on the invention can also comprise the following steps:

1. the establishing of the Gaussian process dynamic model of the system specifically comprises the following steps:

(1) the Gaussian process model isWherein the prior mean value is m [ identical to ] 0, and the square index kernel function is selected as

(2) Taking the state and the control quantity as input tuples of a Gaussian process, and taking the state increment as a training target;

(3) given N sets of training inputs X ═ X₁,...,x_n]And a corresponding training target y ═ y₁,...,y_n]^TLearning the hyper-parameters of the Gaussian process model using an evidence maximization algorithm

Wherein,in order to be in an observable state,for the training target, Δ_tIn the case of a state increment,for independent and identically distributed system noise, alpha²For the signal variance of the underlying function f,l_iis the characteristic length of each input dimension.

2. The policy learning algorithm specifically includes:

(1) acting the control strategy pi on a system Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;

(2) long-term inferential prediction of state p (x) using learned Gaussian process models₁|π),...,p(x_T|π)；

(3) Evaluating the expected total cost J over time T^π(θ)；

(4) Calculating cost versus policy parameter gradient information dJ^π(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating the strategyA parameter theta;

(5) and (4) repeating the steps (1) - (4) until the strategy parameter theta is converged.

3. The variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;

the specific form of the variable impedance controller is as follows:

ω₁＝4M_d(t)+2B_d(t)T+K_d(t)T²

ω₂＝-8M_d(t)+2K_d(t)T²

ω₃＝4M_d(t)-2B_d(t)T+K_d(t)T²

where is the T control period.

The invention provides an efficient learning variable impedance control system and a control method, and aims to realize that a robot learns to complete a force control task efficiently and autonomously.

The technical scheme of the invention is mainly characterized in that:

(1) the variable impedance controller can correct the expected reference track according to the time-varying target rigidity and the damping coefficient;

(2) the Gaussian process model of the system is a probabilistic model of the system established according to actual sampling data and used as a transformation dynamic model of the system;

(3) the variable impedance control strategy is a probabilistic Gaussian process control strategy, expressed by a mean function and a variance function and according to the system state, namely the tail end position X of the mechanical arm and the actual contact force F_aReal timeCalculating an impedance parameter-target stiffness K_d(t) and damping coefficient B_d(t) and transmitting to the variable impedance controller;

(4) the strategy learning algorithm is obtained by learning through a model-based reinforcement learning algorithm, the long-term distribution of the system state is inferred and predicted through a cascade one-step prediction process, and then internal simulation and the behavior of the system are predicted according to the model; and adding an energy loss term into the cost function, reducing the impedance gain required by completing the task through a penalty control action, and realizing the balance of error and energy minimization.

The variable impedance controller is specifically as follows:

(1) the variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;

(2) the control period is T, M_d(t)、K_d(t)、B_d(t) is a target inertia, a time-varying target stiffness and a time-varying damping coefficient, respectively, and the specific form of the variable impedance controller is as follows:

ω₁＝4M_d(t)+2B_d(t)T+K_d(t)T²

ω₂＝-8M_d(t)+2K_d(t)T²

ω₃＝4M_d(t)-2B_d(t)T+K_d(t)T²

the Gaussian process model of the system is specifically as follows:

(1) the Gaussian process model isWherein the selection isTaking the prior mean value as m ≡ 0, and selecting a square exponential kernel function as

The variable impedance control strategy specifically comprises the following steps:

(1) the variable impedance control strategy is a Gaussian process controllerIs a probabilistic Gaussian process control strategy expressed in mean function and variance function, whereinOutputting strategy for observable state of robotTarget stiffness K for impedance controller_d(t) and damping coefficient B_d(t), theta is a control strategy parameter needing to be learned;

(2) using a bounded differentiable trapezoidal saturation function S (π)_t)＝u_min+u_max+u_max[9sinπ_t+sin(3π_t)]/8 physical boundary of control parameter u, limiting control variable u to interval [ u_minu_min+u_max]In which u is_maxFor maximum clipping of the control variable, u_minIs the minimum clipping of the control variable.

The strategy learning algorithm specifically comprises the following steps:

(1) applying a control strategy pi to a system model, namely a Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;

(2) long-term inferential prediction of state p (x) using Gaussian process models₁|π),...,p(x_T|π)；

(3) Evaluating the expected total cost over time TWherein the instantaneous cost function includes a state error cost termAnd energy loss term c_e(u_t)＝c_e(π(x_t))＝ζ·(u_t/u_max)²Two parts, where d (-) is the Euclidean distance, σ_cIs the width of the cost function, ζ is the energy loss coefficient, u_tThe current control quantity;

(4) calculating cost versus policy parameter gradient information dJ^π(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating the strategy parameter theta.

In order to enable the robot to automatically learn and complete complex force control tasks in a non-structural environment and accurately control contact force in contact operation tasks, the invention provides a novel scheme for learning and adjusting impedance parameters of the robot by using a model-based reinforcement learning algorithm. The method mainly comprises a variable impedance controller, a Gaussian process model of a system, a variable impedance control strategy and a strategy learning algorithm. The method is characterized in that no prior knowledge of the environment is needed, a Gaussian process model of the system is constructed according to the interactive data, and long-term reasoning and planning are carried out on the system in a Bayesian mode. In this way, more useful information can be extracted from the limited observed data, and complex force control tasks can be completed with the least interaction time. By adding an energy loss term into the cost function, the balance between errors and energy is realized, and the robot has good compliance capability. Finally, the obtained variable impedance control strategy can simultaneously adjust the target rigidity and the damping parameters according to the system state at different stages of the task. The invention can enable the robot to efficiently and independently learn and complete the complex force control task under the unstructured environment, can learn and obtain the optimal control strategy only by interaction for a plurality of times, has high data efficiency, can be widely applied to the soft control tasks such as double-mechanical-arm assembly, multi-mechanical-arm cooperation, robot gait control and the like, and ensures the safety and robustness of interactive operation.

The invention solves the high efficiency problem in the robot learning variable impedance control, furthest reduces the interaction time required by learning to complete the force control task by extracting more useful information from observation data, has important reference significance for the robot to realize high-efficiency autonomous learning to complete the compliance control, and can be directly applied to the robot needing contact force control.

Drawings

FIG. 1 is a block diagram of the architecture of the system of the present invention;

FIG. 2 is a flow chart of the method of the present invention;

FIG. 3 is a flow chart of the policy learning algorithm of the present invention.

Detailed Description

The invention is described in more detail below by way of example.

As shown in fig. 1, a system structure diagram of the learning variable impedance control method is shown, and each part in a dashed line frame is a specific structure of the present invention, and includes a variable impedance controller, a gaussian process model of the system, a variable impedance control strategy, and a strategy learning algorithm. The method specifically comprises the following steps:

1) a variable impedance controller based onTime-varying target stiffness, damping coefficient and current contact force error F_eCorrecting the expected reference track, and calculating and outputting the expected position increment delta X of the tail end of the mechanical arm;

2) according to the actual position X of the tail end of the robot and the force sensor information F of the sampled data_aEstablishing a Gaussian process model of the system as a transformation dynamic model of the system;

3) the strategy learning algorithm infers and predicts the long-term distribution of the system state through a cascade one-step prediction process according to a Gaussian process model, then carries out internal simulation and predicts the behavior of the system according to the model, and obtains a variable impedance control strategy pi by minimizing the expected cost by using a model-based reinforcement learning algorithm;

4) the variable impedance control strategy is a probabilistic Gaussian process control strategy, expressed by a mean function and a variance function and according to the system state, namely the tail end position X of the mechanical arm and the actual contact force F_aReal-time calculation of impedance parameter-target stiffness K_d(t) and damping coefficient B_dAnd (t) and transmitting to the variable impedance controller.

F in FIG. 1_dTo expect contact force, X_dTo a desired position, X_eFor the total desired position of the end of the arm, q_dFor the expected joint position calculated from the inverse kinematics equation of the robot, the actual position of the joint measured from q, K_E,B_ERespectively unknown ambient stiffness and damping.

The method of the present invention as shown in fig. 2 mainly comprises five steps:

1) random initialization control variable u ═ K_d(t)B_d(t)]And acts on the system to record the initial data XF_a]；

2) Based on historical sample data XF_a]Establishing a Gaussian process dynamic model of the system as a transformation dynamic model of the system;

3) searching an optimal impedance control strategy pi (theta) by using a strategy learning algorithm;

4) setting policy pi x ← pi (theta), applying the policy to the system for force control, and collecting new data [ XF ← pi (theta) ]_a]；

5) And (5) repeating the steps (2-4) until a satisfactory force tracking effect is obtained, and learning to obtain a satisfactory control strategy.

(1) Variable impedance controller

To achieve the desired dynamic behavior of the tip, a second order impedance model is used:

wherein M is_d(t)、B_d(t)、K_d(t) respectively representing a time-varying target inertia matrix, a target damping matrix and a target stiffness matrix in the impedance model,x is the actual acceleration, speed and position of the tail end of the robot in a Cartesian space respectively, X_drespectively the desired acceleration, velocity and position of the robot tip, F_dAnd F is the expected and actual contact force between the robot tip and the environment, respectively.

To obtain the corrected desired position increment, the second order impedance model is lagrange transformed and a bilinear transformation s-2T is used^-1(z-1)(z+1)^-1Discretizing to obtain:

ω₁＝4M_d(t)+2B_d(t)T+K_d(t)T²(3)

ω₂＝-8M_d(t)+2K_d(t)T²(4)

ω₃＝4M_d(t)-2B_d(t)T+K_d(t)T²(5)

where T is the control period, the difference equation of the impedance controller, i.e., the expected position increment of the terminal, is:

to simplify the calculation, the target inertia matrix is set to a constant M_d(t) is I, so the variable impedance controller needs to have a target stiffness K that varies in time_d(t) damping coefficient B_d(t) adjusting the desired position with the contact force error E (n).

(2) Gaussian process model for a system

The gaussian process model is an unparameterized probability model represented by a mean function m (-) and a semi-positive covariance function k (-) and is a mathematical model. Let the kinetic equation describing the system be:

x_t＝f(x_t-1,u_t-1) (7)

y_t＝x_t+ε_t(8)

wherein,for the observable state, here the actual position X and the actual contact force F of the robot tip_a。For control input, where K_d(t) target stiffness, B_dAnd (t) is a damping coefficient.Is a training target, where_tIs the state increment.Is independent and equally distributed system noise. f is a Gaussian process model functionWhereinIn order to train the input tuples,is independent and equally distributed measurement noise.

In order to consider model uncertainty in prediction and planning and avoid the deterministic equivalence assumption of a learning model, a posterior distribution of a potential function f is presumed by using a Gaussian process according to obtained sampling data, and all possible dynamic models are described. For the sake of simple calculation, the prior mean value is m ≡ 0, and the square exponential kernel function is selected:

wherein alpha is²For the signal variance of the underlying function f,l_iis the characteristic length of each input dimension. Given N sets of training inputs X ═ X₁,...,x_n]And a corresponding training target y ═ y₁,...,y_n]^TBy using evidence maximization algorithm, can learnParameters of Gaussian process model

Given deterministic test input x_*Function value f_*＝f(x_*) A posteriori prediction distribution p (f x |)_*) Obeying a gaussian distribution:

whereinK (X, X) is a kernel function matrix.

(3) Variable impedance control strategy

Defining a variable impedance control strategy asWhereinOutputting strategy for observable state of robotTarget stiffness K for impedance controller_d(t) and damping coefficient B_dAnd (t), theta is a control strategy parameter needing to be learned. Selecting a Gaussian process controller as a control strategy pi:

wherein n is the number of Gaussian process controllers, X_πFor training input, y_πFor the training purpose, initialized to a random value close to zero,for each of the characteristic lengths of the states,is the signal variance, hereThis is similar in function to an RBF network,to measure the noise variance. Therefore, the hyperparameter of the Gaussian process control strategy π is

In a practical control system, the physical boundary of a control parameter u must be considered, and the method selects a bounded and differentiable trapezoidal saturation function to limit a control variable u to an interval [ u [ u ] ]_minu_min+u_max]Internal:

(4) strategy learning algorithm

Fig. 3 shows a flowchart of the strategy learning algorithm, which mainly includes five steps:

1) applying a control strategy pi to a system model, namely a Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;

2) long-term inferential prediction of state p (x) using learned Gaussian process models₁|π),...,p(x_T|π)；

3) Evaluating the expected total cost J over time T^π(θ)；

4) Calculating cost versus policy parameter gradient information dJ^π(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating a strategy parameter theta;

5) and (4) repeating the steps (1-4) until the strategy parameter theta is converged.

To obtain optimal control strategyThe cost J is found according to the long-term prediction evolution of the state^π(theta) minimized policy parameter theta^*. We use a Gaussian process model to represent the transformation dynamics of a real system, and obtain a long-term prediction p (x) of state distribution by cascading one-step prediction₁),...,p(x_T). The Gaussian process model can transfer input uncertainty and map a Gaussian distribution state space to a target space, so that the uncertainty of the model is included in long-term planning, and negative effects caused by model deviation are reduced. The process of state one-step prediction can be simplified as follows:

p(x_t-1)→p(u_t-1)→p(x_t-1,u_t-1)→p(Δ_t)→p(x_t) (18)

if p (x)_t-1) Known, in order to derive from p (x)_t-1) Predicting p (x)_t) According to the control variable u_t-1＝π(x_t-1) Is distributed to compute a joint distributionThe distribution p (u) of the predictive control variable is calculated first_t-1) Then calculate the mutual covariance cov [ x ]_t-1,u_t-1]And finally obtainThe approximate gaussian distribution of (a) is:

training target delta_tThe predicted distribution of (c) is:

in which the posterior predicted distribution of the transformation functionCan be calculated according to the formula (11) - (13). The distribution p (Delta) of the training targets can be matched by using a moment matching method_t) Approximately gaussian distributionThen, the desired state is distributed p (x)_t) Approximately gaussian distribution:

μ_t＝μ_t-1+μ_Δ(22)

Σ_t＝Σ_t-1+Σ_Δ+cov[x_t-1,Δ_t]+cov[Δ_t,x_t-1](23)

to evaluate the performance of the control strategy π, the total expected cost J over time T is used^π(θ) as an evaluation criterion. And (3) applying a control strategy pi to the system, and calculating the total expected cost according to the long-term evolution of the predicted state:

wherein, c (x)_t) For the instant cost at time t,expected value of instantaneous cost versus predicted state distribution:

in order to enable the robot to realize the balance between errors and energy minimization, have the variable impedance characteristic, restrain the contact force to ensure safety, and have better compliance, an energy loss term is added into a cost function, and the impedance gain required by completing a task is reduced through a punishment control action. Defining the instantaneous cost function as:

c_t＝c_b(x_t)+c_e(u_t) (27)

c_e(u_t)＝c_e(π(x_t))＝ζ·(u_t/u_max)²(29)

instantaneous cost function c_tMainly comprises two terms, c_b(x_t) Is the state error costIs a quadratic saturation cost function, the saturation is 1 when the deviation from the target state is large, d (-) is the Euclidean distance, σ_cIs the width of the cost function; c. C_e(u_t) Is an energy loss term, i.e. the mean square energy loss function of the impedance gain, ζ is the energy loss coefficient, u_tIs the current control quantity u_maxIs the maximum clipping of the control quantity.

Then, according to the chain rule, calculating the gradient of the expected cost relative to the controller parameter theta, and using a strategy search method based on the gradient, obtaining J^π(theta) minimized controller parameter theta^*。

Claims

1. A learning variable impedance control system is characterized in that: comprises a variable impedance controller, a Gaussian process model module of a system, a variable impedance control strategy module and a strategy learning algorithm module,

2. A control method of a learning variable impedance control system according to claim 1, wherein:

(1) random initialization control variable u ═ K_d(t) B_d(t)]And acts on the control system to record initial data X F_a]In which K is_d(t) target stiffness, B_d(t) damping coefficient, X mechanical arm end position, F_aThe actual contact force;

(2) sampling data according to history [ X F_a]Establishing a Gaussian process dynamic model of the system as a transformation dynamic model of the system;

3. The control method based on learning variable impedance control system according to claim 2, wherein the establishing a gaussian process dynamic model of the system specifically comprises:

(1) the Gaussian process model isWherein the prior mean value is m ≡ 0, and the square index kernel function is selectedNumber is

4. The control method based on learning variable impedance control system according to claim 2 or 3, wherein the strategy learning algorithm specifically comprises:

(3) Evaluating the expected total cost J over time T^π(θ)；

(4) Calculating cost versus policy parameter gradient information dJ^π(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating a strategy parameter theta;

5. The control method based on learning variable impedance control system according to claim 2 or 3, characterized in that: the variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;

the specific form of the variable impedance controller is as follows:

ω₁＝4M_d(t)+2B_d(t)T+K_d(t)T²

ω₂＝-8M_d(t)+2K_d(t)T²

ω₃＝4M_d(t)-2B_d(t)T+K_d(t)T²

where is the T control period.

6. The control method based on learning variable impedance control system according to claim 4, wherein: the variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;

the specific form of the variable impedance controller is as follows:

ω₁＝4M_d(t)+2B_d(t)T+K_d(t)T²

ω₂＝-8M_d(t)+2K_d(t)T²

ω₃＝4M_d(t)-2B_d(t)T+K_d(t)T²

where is the T control period.