CN108153153A - A kind of study impedance control system and control method - Google Patents

A kind of study impedance control system and control method Download PDF

Info

Publication number
CN108153153A
CN108153153A CN201711393308.7A CN201711393308A CN108153153A CN 108153153 A CN108153153 A CN 108153153A CN 201711393308 A CN201711393308 A CN 201711393308A CN 108153153 A CN108153153 A CN 108153153A
Authority
CN
China
Prior art keywords
control
strategy
variable impedance
gaussian process
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711393308.7A
Other languages
Chinese (zh)
Other versions
CN108153153B (en
Inventor
夏桂华
李超
张智
谢心如
朱齐丹
蔡成涛
吕晓龙
刘志林
班瑞阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN201711393308.7A priority Critical patent/CN108153153B/en
Publication of CN108153153A publication Critical patent/CN108153153A/en
Application granted granted Critical
Publication of CN108153153B publication Critical patent/CN108153153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/048Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators using a predictor

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention is to provide a kind of study impedance control system and control methods.Mainly include impedance controller, four part of the Gaussian process model of system, impedance control strategy and policy learning algorithm.Any priori of environment is not needed to, the Gaussian process model of system is built according to interaction data, long-term reasoning and planning are carried out to system in a manner of Bayes.Complicated power control task can be completed with minimum interaction time study in the limited more useful informations of observation extracting data.By adding in energy loss item in cost function, realize the tradeoff of error and energy, make robot that there is good submissive ability.Finally, the impedance control strategy obtained can adjust target rigidity and damping parameter simultaneously according to system mode in the different phase of task.It the composite can be widely applied in the Shared controls tasks such as double mechanical arms assembling, the cooperation of more mechanical arms and Biped Robot Control, ensure safety and the robustness of interactive operation.

Description

Learning variable impedance control system and control method
Technical Field
The invention relates to compliance control of a robot, in particular to a high-efficiency learning variable impedance control system and a control method.
Background
With increasing application of robots to contact operation tasks in non-structural environments, such as compliant assembly, human-computer interaction and the like, due to complex tasks and variable and unpredictable contact environments, it is difficult to establish an accurate dynamic model of a system, so that how to enable the robot to safely, efficiently and quickly execute new tasks and accurately control contact forces in different environments is a new challenge faced by the robot. The impedance control is widely applied to the robot interaction control task due to good adaptability and robustness. Since the force control characteristics are determined by the impedance parameters of the robot, the selection of the inertia, stiffness and damping parameters is highly task dependent, and often difficult to infer a priori, in order to obtain good control performance, it is often necessary to have in-depth knowledge of the controller design and its parameters, and still manually adjust the control parameters. And especially for complex tasks, the fixed-parameter impedance control method is difficult to realize the target task because the environmental conditions usually contain some non-linear and time-varying factors. If the impedance control parameters can be dynamically planned and adjusted according to the change of tasks and environments, the control performance is obviously better than the condition that the impedance control parameters are fixed. Therefore, learning the variable impedance control capability is the key to a modern robot to safely and quickly complete complex operation tasks.
For operational tasks requiring force control, the fewer the learning exploration times the better, since a large number of physical interaction attempts may cause damage to the robot or workpiece, and a large amount of sampled data is time consuming and expensive, which is impractical. Therefore, the learning efficiency of the learning variable impedance control algorithm is improved, the required trial and error interaction times are reduced, and the method is very important for the robot to quickly learn and complete a new task.
Disclosure of Invention
The invention aims to provide a learning variable impedance control system which is high in learning efficiency, can be widely applied to compliance control tasks such as two-mechanical-arm assembly, multi-mechanical-arm cooperation and robot gait control and ensures the safety and robustness of interactive operation. The invention aims to provide a control method based on a learning variable impedance control system.
The learning variable impedance control system of the invention comprises a variable impedance controller, a Gaussian process model module of the system, a variable impedance control strategy module and a strategy learning algorithm module,
a Gaussian process model module of the system establishes a Gaussian process model of the system according to the actual position of the tail end of the robot and the information of the force sensor, and the Gaussian process model is used as a transformation dynamic model of the control system;
the strategy learning algorithm module infers and predicts the long-term distribution of the state of the control system through a cascade one-step prediction process according to a Gaussian process model of the system, and then performs internal simulation and predicts the behavior of the control system according to the model;
the variable impedance control strategy module calculates impedance parameters, namely target rigidity and damping coefficient, in real time according to the state of a control system, namely the tail end position of the mechanical arm and the actual contact force, and transmits the impedance parameters to the variable impedance controller;
and the variable impedance controller corrects the expected reference track according to the time-varying target stiffness, the damping coefficient and the current contact force error, and outputs the expected position increment of the tail end of the mechanical arm.
The control method of the learning variable impedance control system based on the invention comprises the following steps:
(1) random initialization control variable u ═ Kd(t)Bd(t)]And acts on the control system to record the initial data XFa]In which K isd(t) target stiffness, Bd(t) damping coefficient, X mechanical arm end position, FaThe actual contact force;
(2) based on historical sample data XFa]Establishing a Gaussian process dynamic model of the system as a transformation dynamic model of the system;
(3) searching an optimal impedance control strategy pi (theta) by using a strategy learning algorithm;
(4) setting strategy pi x ← pi (theta), applying the strategy to a variable impedance controller for force control, and collecting new data [ XF ← pi (theta) ]a];
(5) And (5) repeating the steps (2-4) until a satisfactory force tracking effect is obtained, and learning to obtain a satisfactory control strategy.
The control method of the learning variable impedance control system based on the invention can also comprise the following steps:
1. the establishing of the Gaussian process dynamic model of the system specifically comprises the following steps:
(1) the Gaussian process model isWherein the prior mean value is m [ identical to ] 0, and the square index kernel function is selected as
(2) Taking the state and the control quantity as input tuples of a Gaussian process, and taking the state increment as a training target;
(3) given N sets of training inputs X ═ X1,...,xn]And a corresponding training target y ═ y1,...,yn]TLearning the hyper-parameters of the Gaussian process model using an evidence maximization algorithm
Wherein,in order to be in an observable state,for the training target, ΔtIn the case of a state increment,for independent and identically distributed system noise, alpha2For the signal variance of the underlying function f,liis the characteristic length of each input dimension.
2. The policy learning algorithm specifically includes:
(1) acting the control strategy pi on a system Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;
(2) long-term inferential prediction of state p (x) using learned Gaussian process models1|π),...,p(xT|π);
(3) Evaluating the expected total cost J over time Tπ(θ);
(4) Calculating cost versus policy parameter gradient information dJπ(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating the strategyA parameter theta;
(5) and (4) repeating the steps (1) - (4) until the strategy parameter theta is converged.
3. The variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;
the specific form of the variable impedance controller is as follows:
ω1=4Md(t)+2Bd(t)T+Kd(t)T2
ω2=-8Md(t)+2Kd(t)T2
ω3=4Md(t)-2Bd(t)T+Kd(t)T2
where is the T control period.
The invention provides an efficient learning variable impedance control system and a control method, and aims to realize that a robot learns to complete a force control task efficiently and autonomously.
The technical scheme of the invention is mainly characterized in that:
(1) the variable impedance controller can correct the expected reference track according to the time-varying target rigidity and the damping coefficient;
(2) the Gaussian process model of the system is a probabilistic model of the system established according to actual sampling data and used as a transformation dynamic model of the system;
(3) the variable impedance control strategy is a probabilistic Gaussian process control strategy, expressed by a mean function and a variance function and according to the system state, namely the tail end position X of the mechanical arm and the actual contact force FaReal timeCalculating an impedance parameter-target stiffness Kd(t) and damping coefficient Bd(t) and transmitting to the variable impedance controller;
(4) the strategy learning algorithm is obtained by learning through a model-based reinforcement learning algorithm, the long-term distribution of the system state is inferred and predicted through a cascade one-step prediction process, and then internal simulation and the behavior of the system are predicted according to the model; and adding an energy loss term into the cost function, reducing the impedance gain required by completing the task through a penalty control action, and realizing the balance of error and energy minimization.
The variable impedance controller is specifically as follows:
(1) the variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;
(2) the control period is T, Md(t)、Kd(t)、Bd(t) is a target inertia, a time-varying target stiffness and a time-varying damping coefficient, respectively, and the specific form of the variable impedance controller is as follows:
ω1=4Md(t)+2Bd(t)T+Kd(t)T2
ω2=-8Md(t)+2Kd(t)T2
ω3=4Md(t)-2Bd(t)T+Kd(t)T2
the Gaussian process model of the system is specifically as follows:
(1) the Gaussian process model isWherein the selection isTaking the prior mean value as m ≡ 0, and selecting a square exponential kernel function as
(2) Taking the state and the control quantity as input tuples of a Gaussian process, and taking the state increment as a training target;
(3) given N sets of training inputs X ═ X1,...,xn]And a corresponding training target y ═ y1,...,yn]TLearning the hyper-parameters of the Gaussian process model using an evidence maximization algorithm
The variable impedance control strategy specifically comprises the following steps:
(1) the variable impedance control strategy is a Gaussian process controllerIs a probabilistic Gaussian process control strategy expressed in mean function and variance function, whereinOutputting strategy for observable state of robotTarget stiffness K for impedance controllerd(t) and damping coefficient Bd(t), theta is a control strategy parameter needing to be learned;
(2) using a bounded differentiable trapezoidal saturation function S (π)t)=umin+umax+umax[9sinπt+sin(3πt)]/8 physical boundary of control parameter u, limiting control variable u to interval [ uminumin+umax]In which u ismaxFor maximum clipping of the control variable, uminIs the minimum clipping of the control variable.
The strategy learning algorithm specifically comprises the following steps:
(1) applying a control strategy pi to a system model, namely a Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;
(2) long-term inferential prediction of state p (x) using Gaussian process models1|π),...,p(xT|π);
(3) Evaluating the expected total cost over time TWherein the instantaneous cost function includes a state error cost termAnd energy loss term ce(ut)=ce(π(xt))=ζ·(ut/umax)2Two parts, where d (-) is the Euclidean distance, σcIs the width of the cost function, ζ is the energy loss coefficient, utThe current control quantity;
(4) calculating cost versus policy parameter gradient information dJπ(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating the strategy parameter theta.
In order to enable the robot to automatically learn and complete complex force control tasks in a non-structural environment and accurately control contact force in contact operation tasks, the invention provides a novel scheme for learning and adjusting impedance parameters of the robot by using a model-based reinforcement learning algorithm. The method mainly comprises a variable impedance controller, a Gaussian process model of a system, a variable impedance control strategy and a strategy learning algorithm. The method is characterized in that no prior knowledge of the environment is needed, a Gaussian process model of the system is constructed according to the interactive data, and long-term reasoning and planning are carried out on the system in a Bayesian mode. In this way, more useful information can be extracted from the limited observed data, and complex force control tasks can be completed with the least interaction time. By adding an energy loss term into the cost function, the balance between errors and energy is realized, and the robot has good compliance capability. Finally, the obtained variable impedance control strategy can simultaneously adjust the target rigidity and the damping parameters according to the system state at different stages of the task. The invention can enable the robot to efficiently and independently learn and complete the complex force control task under the unstructured environment, can learn and obtain the optimal control strategy only by interaction for a plurality of times, has high data efficiency, can be widely applied to the soft control tasks such as double-mechanical-arm assembly, multi-mechanical-arm cooperation, robot gait control and the like, and ensures the safety and robustness of interactive operation.
The invention solves the high efficiency problem in the robot learning variable impedance control, furthest reduces the interaction time required by learning to complete the force control task by extracting more useful information from observation data, has important reference significance for the robot to realize high-efficiency autonomous learning to complete the compliance control, and can be directly applied to the robot needing contact force control.
Drawings
FIG. 1 is a block diagram of the architecture of the system of the present invention;
FIG. 2 is a flow chart of the method of the present invention;
FIG. 3 is a flow chart of the policy learning algorithm of the present invention.
Detailed Description
The invention is described in more detail below by way of example.
As shown in fig. 1, a system structure diagram of the learning variable impedance control method is shown, and each part in a dashed line frame is a specific structure of the present invention, and includes a variable impedance controller, a gaussian process model of the system, a variable impedance control strategy, and a strategy learning algorithm. The method specifically comprises the following steps:
1) a variable impedance controller based onTime-varying target stiffness, damping coefficient and current contact force error FeCorrecting the expected reference track, and calculating and outputting the expected position increment delta X of the tail end of the mechanical arm;
2) according to the actual position X of the tail end of the robot and the force sensor information F of the sampled dataaEstablishing a Gaussian process model of the system as a transformation dynamic model of the system;
3) the strategy learning algorithm infers and predicts the long-term distribution of the system state through a cascade one-step prediction process according to a Gaussian process model, then carries out internal simulation and predicts the behavior of the system according to the model, and obtains a variable impedance control strategy pi by minimizing the expected cost by using a model-based reinforcement learning algorithm;
4) the variable impedance control strategy is a probabilistic Gaussian process control strategy, expressed by a mean function and a variance function and according to the system state, namely the tail end position X of the mechanical arm and the actual contact force FaReal-time calculation of impedance parameter-target stiffness Kd(t) and damping coefficient BdAnd (t) and transmitting to the variable impedance controller.
F in FIG. 1dTo expect contact force, XdTo a desired position, XeFor the total desired position of the end of the arm, qdFor the expected joint position calculated from the inverse kinematics equation of the robot, the actual position of the joint measured from q, KE,BERespectively unknown ambient stiffness and damping.
The method of the present invention as shown in fig. 2 mainly comprises five steps:
1) random initialization control variable u ═ Kd(t)Bd(t)]And acts on the system to record the initial data XFa];
2) Based on historical sample data XFa]Establishing a Gaussian process dynamic model of the system as a transformation dynamic model of the system;
3) searching an optimal impedance control strategy pi (theta) by using a strategy learning algorithm;
4) setting policy pi x ← pi (theta), applying the policy to the system for force control, and collecting new data [ XF ← pi (theta) ]a];
5) And (5) repeating the steps (2-4) until a satisfactory force tracking effect is obtained, and learning to obtain a satisfactory control strategy.
(1) Variable impedance controller
To achieve the desired dynamic behavior of the tip, a second order impedance model is used:
wherein M isd(t)、Bd(t)、Kd(t) respectively representing a time-varying target inertia matrix, a target damping matrix and a target stiffness matrix in the impedance model,x is the actual acceleration, speed and position of the tail end of the robot in a Cartesian space respectively, Xdrespectively the desired acceleration, velocity and position of the robot tip, FdAnd F is the expected and actual contact force between the robot tip and the environment, respectively.
To obtain the corrected desired position increment, the second order impedance model is lagrange transformed and a bilinear transformation s-2T is used-1(z-1)(z+1)-1Discretizing to obtain:
ω1=4Md(t)+2Bd(t)T+Kd(t)T2(3)
ω2=-8Md(t)+2Kd(t)T2(4)
ω3=4Md(t)-2Bd(t)T+Kd(t)T2(5)
where T is the control period, the difference equation of the impedance controller, i.e., the expected position increment of the terminal, is:
to simplify the calculation, the target inertia matrix is set to a constant Md(t) is I, so the variable impedance controller needs to have a target stiffness K that varies in timed(t) damping coefficient Bd(t) adjusting the desired position with the contact force error E (n).
(2) Gaussian process model for a system
The gaussian process model is an unparameterized probability model represented by a mean function m (-) and a semi-positive covariance function k (-) and is a mathematical model. Let the kinetic equation describing the system be:
xt=f(xt-1,ut-1) (7)
yt=xtt(8)
wherein,for the observable state, here the actual position X and the actual contact force F of the robot tipaFor control input, where Kd(t) target stiffness, BdAnd (t) is a damping coefficient.Is a training target, wheretIs the state increment.Is independent and equally distributed system noise. f is a Gaussian process model functionWhereinIn order to train the input tuples,is independent and equally distributed measurement noise.
In order to consider model uncertainty in prediction and planning and avoid the deterministic equivalence assumption of a learning model, a posterior distribution of a potential function f is presumed by using a Gaussian process according to obtained sampling data, and all possible dynamic models are described. For the sake of simple calculation, the prior mean value is m ≡ 0, and the square exponential kernel function is selected:
wherein alpha is2For the signal variance of the underlying function f,liis the characteristic length of each input dimension. Given N sets of training inputs X ═ X1,...,xn]And a corresponding training target y ═ y1,...,yn]TBy using evidence maximization algorithm, can learnParameters of Gaussian process model
Given deterministic test input x*Function value f*=f(x*) A posteriori prediction distribution p (f x |)*) Obeying a gaussian distribution:
whereinK (X, X) is a kernel function matrix.
(3) Variable impedance control strategy
Defining a variable impedance control strategy asWhereinOutputting strategy for observable state of robotTarget stiffness K for impedance controllerd(t) and damping coefficient BdAnd (t), theta is a control strategy parameter needing to be learned. Selecting a Gaussian process controller as a control strategy pi:
wherein n is the number of Gaussian process controllers, XπFor training input, yπFor the training purpose, initialized to a random value close to zero,for each of the characteristic lengths of the states,is the signal variance, hereThis is similar in function to an RBF network,to measure the noise variance. Therefore, the hyperparameter of the Gaussian process control strategy π is
In a practical control system, the physical boundary of a control parameter u must be considered, and the method selects a bounded and differentiable trapezoidal saturation function to limit a control variable u to an interval [ u [ u ] ]minumin+umax]Internal:
(4) strategy learning algorithm
Fig. 3 shows a flowchart of the strategy learning algorithm, which mainly includes five steps:
1) applying a control strategy pi to a system model, namely a Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;
2) long-term inferential prediction of state p (x) using learned Gaussian process models1|π),...,p(xT|π);
3) Evaluating the expected total cost J over time Tπ(θ);
4) Calculating cost versus policy parameter gradient information dJπ(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating a strategy parameter theta;
5) and (4) repeating the steps (1-4) until the strategy parameter theta is converged.
To obtain optimal control strategyThe cost J is found according to the long-term prediction evolution of the stateπ(theta) minimized policy parameter theta*. We use a Gaussian process model to represent the transformation dynamics of a real system, and obtain a long-term prediction p (x) of state distribution by cascading one-step prediction1),...,p(xT). The Gaussian process model can transfer input uncertainty and map a Gaussian distribution state space to a target space, so that the uncertainty of the model is included in long-term planning, and negative effects caused by model deviation are reduced. The process of state one-step prediction can be simplified as follows:
p(xt-1)→p(ut-1)→p(xt-1,ut-1)→p(Δt)→p(xt) (18)
if p (x)t-1) Known, in order to derive from p (x)t-1) Predicting p (x)t) According to the control variable ut-1=π(xt-1) Is distributed to compute a joint distributionThe distribution p (u) of the predictive control variable is calculated firstt-1) Then calculate the mutual covariance cov [ x ]t-1,ut-1]And finally obtainThe approximate gaussian distribution of (a) is:
training target deltatThe predicted distribution of (c) is:
in which the posterior predicted distribution of the transformation functionCan be calculated according to the formula (11) - (13). The distribution p (Delta) of the training targets can be matched by using a moment matching methodt) Approximately gaussian distributionThen, the desired state is distributed p (x)t) Approximately gaussian distribution:
μt=μt-1Δ(22)
Σt=Σt-1Δ+cov[xt-1t]+cov[Δt,xt-1](23)
to evaluate the performance of the control strategy π, the total expected cost J over time T is usedπ(θ) as an evaluation criterion. And (3) applying a control strategy pi to the system, and calculating the total expected cost according to the long-term evolution of the predicted state:
wherein, c (x)t) For the instant cost at time t,expected value of instantaneous cost versus predicted state distribution:
in order to enable the robot to realize the balance between errors and energy minimization, have the variable impedance characteristic, restrain the contact force to ensure safety, and have better compliance, an energy loss term is added into a cost function, and the impedance gain required by completing a task is reduced through a punishment control action. Defining the instantaneous cost function as:
ct=cb(xt)+ce(ut) (27)
ce(ut)=ce(π(xt))=ζ·(ut/umax)2(29)
instantaneous cost function ctMainly comprises two terms, cb(xt) Is the state error costIs a quadratic saturation cost function, the saturation is 1 when the deviation from the target state is large, d (-) is the Euclidean distance, σcIs the width of the cost function; c. Ce(ut) Is an energy loss term, i.e. the mean square energy loss function of the impedance gain, ζ is the energy loss coefficient, utIs the current control quantity umaxIs the maximum clipping of the control quantity.
Then, according to the chain rule, calculating the gradient of the expected cost relative to the controller parameter theta, and using a strategy search method based on the gradient, obtaining Jπ(theta) minimized controller parameter theta*

Claims (6)

1. A learning variable impedance control system is characterized in that: comprises a variable impedance controller, a Gaussian process model module of a system, a variable impedance control strategy module and a strategy learning algorithm module,
a Gaussian process model module of the system establishes a Gaussian process model of the system according to the actual position of the tail end of the robot and the information of the force sensor, and the Gaussian process model is used as a transformation dynamic model of the control system;
the strategy learning algorithm module infers and predicts the long-term distribution of the state of the control system through a cascade one-step prediction process according to a Gaussian process model of the system, and then performs internal simulation and predicts the behavior of the control system according to the model;
the variable impedance control strategy module calculates impedance parameters, namely target rigidity and damping coefficient, in real time according to the state of a control system, namely the tail end position of the mechanical arm and the actual contact force, and transmits the impedance parameters to the variable impedance controller;
and the variable impedance controller corrects the expected reference track according to the time-varying target stiffness, the damping coefficient and the current contact force error, and outputs the expected position increment of the tail end of the mechanical arm.
2. A control method of a learning variable impedance control system according to claim 1, wherein:
(1) random initialization control variable u ═ Kd(t) Bd(t)]And acts on the control system to record initial data X Fa]In which K isd(t) target stiffness, Bd(t) damping coefficient, X mechanical arm end position, FaThe actual contact force;
(2) sampling data according to history [ X Fa]Establishing a Gaussian process dynamic model of the system as a transformation dynamic model of the system;
(3) searching an optimal impedance control strategy pi (theta) by using a strategy learning algorithm;
(4) setting strategy pi x ← pi (theta), applying the strategy to a variable impedance controller for force control, and collecting new data [ XF ← pi (theta) ]a];
(5) And (5) repeating the steps (2-4) until a satisfactory force tracking effect is obtained, and learning to obtain a satisfactory control strategy.
3. The control method based on learning variable impedance control system according to claim 2, wherein the establishing a gaussian process dynamic model of the system specifically comprises:
(1) the Gaussian process model isWherein the prior mean value is m ≡ 0, and the square index kernel function is selectedNumber is
(2) Taking the state and the control quantity as input tuples of a Gaussian process, and taking the state increment as a training target;
(3) given N sets of training inputs X ═ X1,...,xn]And a corresponding training target y ═ y1,...,yn]TLearning the hyper-parameters of the Gaussian process model using an evidence maximization algorithm
Wherein,in order to be in an observable state,for the training target, ΔtIn the case of a state increment,for independent and identically distributed system noise, alpha2For the signal variance of the underlying function f,liis the characteristic length of each input dimension.
4. The control method based on learning variable impedance control system according to claim 2 or 3, wherein the strategy learning algorithm specifically comprises:
(1) acting the control strategy pi on a system Gaussian process model, carrying out internal simulation, and predicting the behavior and performance of the system;
(2) long-term inferential prediction of state p (x) using learned Gaussian process models1|π),...,p(xT|π);
(3) Evaluating the expected total cost J over time Tπ(θ);
(4) Calculating cost versus policy parameter gradient information dJπ(theta)/d theta, using a gradient-based strategy search algorithm to find an optimal strategy pi x ← pi (theta), and updating a strategy parameter theta;
(5) and (4) repeating the steps (1) - (4) until the strategy parameter theta is converged.
5. The control method based on learning variable impedance control system according to claim 2 or 3, characterized in that: the variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;
the specific form of the variable impedance controller is as follows:
ω1=4Md(t)+2Bd(t)T+Kd(t)T2
ω2=-8Md(t)+2Kd(t)T2
ω3=4Md(t)-2Bd(t)T+Kd(t)T2
where is the T control period.
6. The control method based on learning variable impedance control system according to claim 4, wherein: the variable impedance controller is an indirect impedance controller based on a position, and corrects an expected reference track according to a contact force error, time-varying target rigidity and a damping coefficient to obtain an expected position increment delta X of the tail end of the mechanical arm;
the specific form of the variable impedance controller is as follows:
ω1=4Md(t)+2Bd(t)T+Kd(t)T2
ω2=-8Md(t)+2Kd(t)T2
ω3=4Md(t)-2Bd(t)T+Kd(t)T2
where is the T control period.
CN201711393308.7A 2017-12-19 2017-12-19 Learning variable impedance control system and control method Active CN108153153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711393308.7A CN108153153B (en) 2017-12-19 2017-12-19 Learning variable impedance control system and control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711393308.7A CN108153153B (en) 2017-12-19 2017-12-19 Learning variable impedance control system and control method

Publications (2)

Publication Number Publication Date
CN108153153A true CN108153153A (en) 2018-06-12
CN108153153B CN108153153B (en) 2020-09-11

Family

ID=62464705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711393308.7A Active CN108153153B (en) 2017-12-19 2017-12-19 Learning variable impedance control system and control method

Country Status (1)

Country Link
CN (1) CN108153153B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108972546A (en) * 2018-06-22 2018-12-11 华南理工大学 A kind of robot constant force curved surface tracking method based on intensified learning
CN109062032A (en) * 2018-10-19 2018-12-21 江苏省(扬州)数控机床研究院 A kind of robot PID impedance control method based on Approximate dynamic inversion
CN109702740A (en) * 2018-12-14 2019-05-03 中国科学院深圳先进技术研究院 Robot compliance control method, apparatus, equipment and storage medium
CN111352384A (en) * 2018-12-21 2020-06-30 罗伯特·博世有限公司 Method and evaluation unit for controlling an automated or autonomous movement mechanism
CN111640495A (en) * 2020-05-29 2020-09-08 北京机械设备研究所 Variable force tracking control method and device based on impedance control
CN111673733A (en) * 2020-03-26 2020-09-18 华南理工大学 Intelligent self-adaptive compliance control method of robot in unknown environment
CN111687834A (en) * 2020-04-30 2020-09-22 广西科技大学 Reverse priority impedance control system and method for redundant mechanical arm of mobile manipulator
CN111687832A (en) * 2020-04-30 2020-09-22 广西科技大学 Reverse priority impedance control system and method for redundant manipulator of space manipulator
CN111687835A (en) * 2020-04-30 2020-09-22 广西科技大学 Reverse priority impedance control system and method for redundant manipulator of underwater manipulator
CN111687833A (en) * 2020-04-30 2020-09-22 广西科技大学 Manipulator inverse priority impedance control system and control method
CN111904795A (en) * 2020-08-28 2020-11-10 中山大学 Variable impedance control method for rehabilitation robot combined with trajectory planning
CN112372630A (en) * 2020-09-24 2021-02-19 哈尔滨工业大学(深圳) Multi-mechanical-arm cooperative polishing force compliance control method and system
CN112428278A (en) * 2020-10-26 2021-03-02 北京理工大学 Control method and device of mechanical arm and training method of man-machine cooperation model
CN112743540A (en) * 2020-12-09 2021-05-04 华南理工大学 Hexapod robot impedance control method based on reinforcement learning
CN112859868A (en) * 2021-01-19 2021-05-28 武汉大学 KMP (Kernel Key P) -based lower limb exoskeleton rehabilitation robot and motion trajectory planning algorithm
CN113427483A (en) * 2021-05-19 2021-09-24 广州中国科学院先进技术研究所 Double-machine manpower/bit multivariate data driving method based on reinforcement learning
CN113641099A (en) * 2021-07-13 2021-11-12 西北工业大学 Impedance control imitation learning training method for surpassing expert demonstration
CN113966264A (en) * 2019-05-17 2022-01-21 西门子股份公司 Method, computer program product and robot control device for positioning an object that is movable during manipulation by a robot on the basis of contact, and robot
CN114193458A (en) * 2022-01-25 2022-03-18 中山大学 Robot control method based on Gaussian process online learning
CN114378820A (en) * 2022-01-18 2022-04-22 中山大学 Robot impedance learning method based on safety reinforcement learning
CN114789444A (en) * 2022-05-05 2022-07-26 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control
CN115421387A (en) * 2022-09-22 2022-12-02 中国科学院自动化研究所 Variable impedance control system and control method based on inverse reinforcement learning
CN115496099A (en) * 2022-09-20 2022-12-20 哈尔滨工业大学 Filtering and high-order state observation method for mechanical arm sensor
CN115723139A (en) * 2022-12-02 2023-03-03 哈尔滨工业大学(深圳) Method and device for flexibly controlling operation space of rope-driven flexible mechanical arm
CN116643501A (en) * 2023-07-18 2023-08-25 湖南大学 Variable impedance control method and system for aerial working robot under stability constraint
CN117817674A (en) * 2024-03-05 2024-04-05 纳博特控制技术(苏州)有限公司 Self-adaptive impedance control method for robot

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104626168A (en) * 2014-12-16 2015-05-20 苏州大学 Robot force position compliant control method based on intelligent algorithm
CN105213153A (en) * 2015-09-14 2016-01-06 西安交通大学 Based on the lower limb rehabilitation robot control method of brain flesh information impedance
US20170007308A1 (en) * 2015-07-08 2017-01-12 Research & Business Foundation Sungkyunkwan University Apparatus and method for discriminating biological tissue, surgical apparatus using the apparatus
CN106406098A (en) * 2016-11-22 2017-02-15 西北工业大学 Man-machine interaction control method of robot system in unknown environment
CN106938470A (en) * 2017-03-22 2017-07-11 华中科技大学 A kind of device and method of Robot Force control teaching learning by imitation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104626168A (en) * 2014-12-16 2015-05-20 苏州大学 Robot force position compliant control method based on intelligent algorithm
US20170007308A1 (en) * 2015-07-08 2017-01-12 Research & Business Foundation Sungkyunkwan University Apparatus and method for discriminating biological tissue, surgical apparatus using the apparatus
CN105213153A (en) * 2015-09-14 2016-01-06 西安交通大学 Based on the lower limb rehabilitation robot control method of brain flesh information impedance
CN106406098A (en) * 2016-11-22 2017-02-15 西北工业大学 Man-machine interaction control method of robot system in unknown environment
CN106938470A (en) * 2017-03-22 2017-07-11 华中科技大学 A kind of device and method of Robot Force control teaching learning by imitation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUIHUA XIA 等: "Hybrid force/position control of industrial robotic manipulator based on Kalman filter", 《2016 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION》 *
李二超等: "基于神经网络视觉伺服的机器人模糊自适应阻抗控制", 《电工技术学报》 *

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108972546A (en) * 2018-06-22 2018-12-11 华南理工大学 A kind of robot constant force curved surface tracking method based on intensified learning
CN109062032A (en) * 2018-10-19 2018-12-21 江苏省(扬州)数控机床研究院 A kind of robot PID impedance control method based on Approximate dynamic inversion
CN109702740A (en) * 2018-12-14 2019-05-03 中国科学院深圳先进技术研究院 Robot compliance control method, apparatus, equipment and storage medium
CN109702740B (en) * 2018-12-14 2020-12-04 中国科学院深圳先进技术研究院 Robot compliance control method, device, equipment and storage medium
CN111352384A (en) * 2018-12-21 2020-06-30 罗伯特·博世有限公司 Method and evaluation unit for controlling an automated or autonomous movement mechanism
US12076865B2 (en) 2019-05-17 2024-09-03 Siemens Aktiengesellschaft Method, computer program product and robot control system for the contact-based localization of objects that can be moved when manipulated by robot, and robot
CN113966264A (en) * 2019-05-17 2022-01-21 西门子股份公司 Method, computer program product and robot control device for positioning an object that is movable during manipulation by a robot on the basis of contact, and robot
CN111673733A (en) * 2020-03-26 2020-09-18 华南理工大学 Intelligent self-adaptive compliance control method of robot in unknown environment
CN111673733B (en) * 2020-03-26 2022-03-29 华南理工大学 Intelligent self-adaptive compliance control method of robot in unknown environment
CN111687833A (en) * 2020-04-30 2020-09-22 广西科技大学 Manipulator inverse priority impedance control system and control method
CN111687832A (en) * 2020-04-30 2020-09-22 广西科技大学 Reverse priority impedance control system and method for redundant manipulator of space manipulator
CN111687835A (en) * 2020-04-30 2020-09-22 广西科技大学 Reverse priority impedance control system and method for redundant manipulator of underwater manipulator
CN111687834A (en) * 2020-04-30 2020-09-22 广西科技大学 Reverse priority impedance control system and method for redundant mechanical arm of mobile manipulator
CN111640495B (en) * 2020-05-29 2024-05-31 北京机械设备研究所 Variable force tracking control method and device based on impedance control
CN111640495A (en) * 2020-05-29 2020-09-08 北京机械设备研究所 Variable force tracking control method and device based on impedance control
CN111904795A (en) * 2020-08-28 2020-11-10 中山大学 Variable impedance control method for rehabilitation robot combined with trajectory planning
CN111904795B (en) * 2020-08-28 2022-08-26 中山大学 Variable impedance control method for rehabilitation robot combined with trajectory planning
CN112372630B (en) * 2020-09-24 2022-02-22 哈尔滨工业大学(深圳) Multi-mechanical-arm cooperative polishing force compliance control method and system
CN112372630A (en) * 2020-09-24 2021-02-19 哈尔滨工业大学(深圳) Multi-mechanical-arm cooperative polishing force compliance control method and system
CN112428278A (en) * 2020-10-26 2021-03-02 北京理工大学 Control method and device of mechanical arm and training method of man-machine cooperation model
CN112743540A (en) * 2020-12-09 2021-05-04 华南理工大学 Hexapod robot impedance control method based on reinforcement learning
CN112743540B (en) * 2020-12-09 2022-05-24 华南理工大学 Hexapod robot impedance control method based on reinforcement learning
CN112859868A (en) * 2021-01-19 2021-05-28 武汉大学 KMP (Kernel Key P) -based lower limb exoskeleton rehabilitation robot and motion trajectory planning algorithm
CN113427483A (en) * 2021-05-19 2021-09-24 广州中国科学院先进技术研究所 Double-machine manpower/bit multivariate data driving method based on reinforcement learning
CN113641099A (en) * 2021-07-13 2021-11-12 西北工业大学 Impedance control imitation learning training method for surpassing expert demonstration
CN113641099B (en) * 2021-07-13 2023-02-10 西北工业大学 Impedance control imitation learning training method for surpassing expert demonstration
CN114378820A (en) * 2022-01-18 2022-04-22 中山大学 Robot impedance learning method based on safety reinforcement learning
CN114193458B (en) * 2022-01-25 2024-04-09 中山大学 Robot control method based on Gaussian process online learning
CN114193458A (en) * 2022-01-25 2022-03-18 中山大学 Robot control method based on Gaussian process online learning
CN114789444B (en) * 2022-05-05 2022-12-16 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control
CN114789444A (en) * 2022-05-05 2022-07-26 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control
CN115496099A (en) * 2022-09-20 2022-12-20 哈尔滨工业大学 Filtering and high-order state observation method for mechanical arm sensor
CN115421387A (en) * 2022-09-22 2022-12-02 中国科学院自动化研究所 Variable impedance control system and control method based on inverse reinforcement learning
CN115723139A (en) * 2022-12-02 2023-03-03 哈尔滨工业大学(深圳) Method and device for flexibly controlling operation space of rope-driven flexible mechanical arm
CN116643501A (en) * 2023-07-18 2023-08-25 湖南大学 Variable impedance control method and system for aerial working robot under stability constraint
CN116643501B (en) * 2023-07-18 2023-10-24 湖南大学 Variable impedance control method and system for aerial working robot under stability constraint
CN117817674A (en) * 2024-03-05 2024-04-05 纳博特控制技术(苏州)有限公司 Self-adaptive impedance control method for robot

Also Published As

Publication number Publication date
CN108153153B (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN108153153B (en) Learning variable impedance control system and control method
Carron et al. Data-driven model predictive control for trajectory tracking with a robotic arm
CN114761966A (en) System and method for robust optimization for trajectory-centric model-based reinforcement learning
Tutsoy et al. Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay
CN110647042B (en) Robot robust learning prediction control method based on data driving
Cutler et al. Efficient reinforcement learning for robots using informative simulated priors
JP7301034B2 (en) System and Method for Policy Optimization Using Quasi-Newton Trust Region Method
Qi et al. Stable indirect adaptive control based on discrete-time T–S fuzzy model
CN112571420B (en) Dual-function model prediction control method under unknown parameters
Li Robot target localization and interactive multi-mode motion trajectory tracking based on adaptive iterative learning
CN111399375A (en) Neural network prediction controller based on nonlinear system
McKinnon et al. Learning probabilistic models for safe predictive control in unknown environments
CN116460860A (en) Model-based robot offline reinforcement learning control method
JP5220542B2 (en) Controller, control method and control program
Le et al. ADMM-based adaptive sampling strategy for nonholonomic mobile robotic sensor networks
Komeno et al. Deep koopman with control: Spectral analysis of soft robot dynamics
Hager et al. Adaptive Neural network control of a helicopter system with optimal observer and actor-critic design
He et al. Adaptive robust control of uncertain euler–lagrange systems using gaussian processes
CN116880184A (en) Unmanned ship track tracking prediction control method, unmanned ship track tracking prediction control system and storage medium
CN116048085B (en) Fault estimation and fault-tolerant iterative learning control method for mobile robot
Zhou et al. Launch vehicle adaptive flight control with incremental model based heuristic dynamic programming
Wang et al. A data driven method of feedforward compensator optimization for autonomous vehicle control
CN107894709A (en) Controlled based on Adaptive critic network redundancy Robot Visual Servoing
Tan et al. Edge-Enabled Adaptive Shape Estimation of 3-D Printed Soft Actuators With Gaussian Processes and Unscented Kalman Filters
Yan et al. A neural network approach to nonlinear model predictive control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant