CN112894809B - Impedance controller design method and system based on reinforcement learning - Google Patents

Impedance controller design method and system based on reinforcement learning Download PDF

Info

Publication number
CN112894809B
CN112894809B CN202110061914.9A CN202110061914A CN112894809B CN 112894809 B CN112894809 B CN 112894809B CN 202110061914 A CN202110061914 A CN 202110061914A CN 112894809 B CN112894809 B CN 112894809B
Authority
CN
China
Prior art keywords
learning
control
function
impedance
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110061914.9A
Other languages
Chinese (zh)
Other versions
CN112894809A (en
Inventor
赵兴炜
陶波
韩世博
丁汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110061914.9A priority Critical patent/CN112894809B/en
Publication of CN112894809A publication Critical patent/CN112894809A/en
Application granted granted Critical
Publication of CN112894809B publication Critical patent/CN112894809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/1605Simulation of manipulator lay-out, design, modelling of manipulator
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/1633Programme controls characterised by the control loop compliant, force, torque control, e.g. combined with position control

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses an impedance controller design method and system based on reinforcement learning, and belongs to the field of robot control. The method comprehensively considers the control input, the position and the speed of the controlled system and the influence of the external force, designs an effective reward function and a cost function by utilizing the direct proportional relation between the external force and the position of the controlled system, can design an optimal impedance controller by reinforcement learning under the condition that a system model and an environment model are unknown, and can modify the response characteristic of the system by adjusting parameters to generate an ideal robot impedance controller. The method of the invention defines the form of the cost function, greatly reduces the number of undetermined coefficients, does not need a complex deep network to fit the cost function, and greatly accelerates the learning process.

Description

Impedance controller design method and system based on reinforcement learning
Technical Field
The invention belongs to the field of robot control, and particularly relates to an impedance controller design method and system based on reinforcement learning.
Background
With the advent of compliant operation and human-computer interaction scenarios, the goal of robot control is no longer to reduce positional error singly, and its compliance is receiving more and more attention. Impedance control is a very effective robot compliance control method, when external force exists, balance is automatically kept between the external force and a target position, rigid collision and overlarge contact force are avoided, and a robot, a workpiece and a user are protected; when no external force exists, higher position precision can be realized, and the requirements of various kinds of work are met.
Patent CN202010771033.1 proposes a method for controlling adaptive impedance of a mechanical arm based on RBF neural network, but it needs to obtain a nominal dynamical model to design an impedance controller and an error compensation controller, and the structure is complicated. The CN201910352004.9 patent needs to identify the kinetic parameters of the robot through preprocessing. The method for optimizing the multi-axis hole assembly control of the non-model robot by using environment prediction, which is proposed by patent CN201910287227.1, uses reinforcement learning for robot assembly, but the adopted deep reinforcement learning method has low convergence rate, long training time and certain limitation in use.
Disclosure of Invention
In view of the above drawbacks and needs of the prior art, the present invention provides a method and system for designing an impedance controller based on reinforcement learning, which aims to obtain an optimal impedance controller quickly without knowing a system dynamics model.
To achieve the above object, according to one aspect of the present invention, there is provided an impedance controller design method based on reinforcement learning, including:
s1, designing a reward function and a cost function; set up as a reward function
Figure BDA0002903010380000021
Set as a cost function
Figure BDA0002903010380000022
Figure BDA0002903010380000023
Wherein the content of the first and second substances,
Figure BDA0002903010380000024
respectively representing the current position, speed and external force of the controlled system; q f ,Q x ,Q v Weights for external force, position, velocity in the impedance controller design objective, respectively; and u is KX is the control input of the system; k is the impedance control parameter to be designed and optimized,
Figure BDA0002903010380000025
representing the kronecker product of the matrix; theta is a cost function parameter;
and S2, estimating theta by adopting a reinforcement learning method based on the reward function and the cost function to obtain an optimal impedance control parameter K, and finishing the design of the impedance controller.
Further, step S1 specifically includes:
s101, regarding an external force f borne by a controlled system as a part of a system stateObtaining an augmented state vector of
Figure BDA0002903010380000026
Respectively representing the current position, speed and external force of the controlled system; the form of the impedance controller is set as that u is KX, u is the control input of the system, and K is the impedance control parameter to be designed and optimized;
s102, inputting control u, the current position q and the speed of a controlled system
Figure BDA0002903010380000027
And the received external force f is regarded as the cost of the control system, and the cost function is set as:
Figure BDA0002903010380000028
Q 1 ,Q 2 ,Q 3 all are positive and real numbers for controlling the weight;
s103, changing F to F e Substituting q into the cost function to obtain:
Figure BDA0002903010380000029
Figure BDA00029030103800000210
in the form of a matrix
Figure BDA00029030103800000211
S104, designing the reward function as the inverse number of the cost function:
Figure BDA0002903010380000031
the cost function being designed as an accumulation of reward functions
Figure BDA0002903010380000032
Further, the order of arrangement of the elements in the augmented state vector X is arbitrary, K, c k 、r、Q v (X,u)、
Figure BDA0002903010380000033
The specific form of θ varies depending on the order of arrangement of the elements in X.
Further, step S2 is specifically:
s201, learning process initialization: setting K as zero vector and theta as zero vector, and setting updating period i update ,i update Is a positive integer;
s202, learning period initialization: setting a controlled system to an initial state X ΔT Setting a learning parameter P ═ δ H, wherein δ is a positive integer, and H is a unit matrix of n × n;
s203, calculating control input u ═ KX iΔT + σ Rand, Rand being a random number, σ being a weighting factor; x iΔT The system state of the current control period is 1, 2, 3 …, and Δ T is the control period of the controlled system;
s204, calculating a reward function
Figure BDA0002903010380000034
Figure BDA0002903010380000035
S205, obtaining the system state X of the next control cycle (i+1)ΔT Updating a system value function parameter theta and a learning parameter P:
Figure BDA0002903010380000036
Figure BDA0002903010380000037
Figure BDA0002903010380000038
θ=θ+gradient
Figure BDA0002903010380000039
gradient is intermediate quantity, gamma is a prediction factor, and gamma is more than 0 and less than 1;
s206, updating an impedance control parameter K: when i is i update When the number of the H is multiple, the elements of theta are sequentially arranged into a matrix of n x n, and H is partitioned to obtain the H
Figure BDA00029030103800000310
Wherein H 21 Is a matrix with the same dimension as K; let K updated =K-l*(H 21 +KH 22 ),K=K updated (ii) a l is an update weight;
s207, judging termination of the learning period: if i Δ T is not less than T max If not, making i equal to i +1, returning to S203; t is max Is a maximum learning cycle length;
s208, learning termination judgment: the control laws before and after the K-th learning period are respectively K k X and u ═ K k-1 X, if max (abs (K) k -K k-1 ) Is less than or equal to epsilon, the learning process is terminated and the obtained impedance controller is K k X, otherwise, returning to S202; ε is the termination determination threshold.
Corresponding to the implementation process of the method, the invention also provides an impedance controller design system based on reinforcement learning, which comprises the following steps:
the control target design module is used for designing a reward function and a value function; set up as a reward function
Figure BDA0002903010380000041
A cost function set to
Figure BDA0002903010380000042
Figure BDA0002903010380000043
Wherein the content of the first and second substances,
Figure BDA0002903010380000044
respectively representing the current position, speed and external force of the controlled system; q f ,Q x ,Q v Are respectively outsideThe weight of force, position, velocity in the impedance controller design objective; and u is KX is the control input of the system; k is the impedance control parameter to be designed and optimized,
Figure BDA0002903010380000045
representing the kronecker product of the matrix; theta is a cost function parameter;
and the impedance control parameter optimization module is used for estimating theta by adopting a reinforcement learning method based on the reward function and the value function to obtain an optimal impedance control parameter K and complete the design of the impedance controller.
In general, the above technical solutions contemplated by the present invention can achieve the following advantageous effects compared to the prior art.
(1) The method comprehensively considers the control input, the position and the speed of the controlled system and the influence of the external force, designs an effective reward function and a cost function by utilizing the direct proportional relation between the external force and the position of the controlled system, can design an optimal impedance controller by reinforcement learning under the condition that a system model and an environment model are unknown, and can modify the response characteristic of the system by adjusting parameters to generate an ideal robot impedance controller.
(2) The method of the invention defines the form of the cost function, greatly reduces the number of undetermined coefficients, does not need a complex deep network to fit the cost function, and greatly accelerates the learning process.
Drawings
FIG. 1 is a flow chart of a method for designing an impedance controller based on reinforcement learning according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Referring to fig. 1, the method for designing an impedance controller based on reinforcement learning according to the present invention includes:
s1, designing a reward function and a value function;
s101, regarding an external force f borne by a controlled system as a part of a system state, and obtaining an augmented state vector
Figure BDA0002903010380000051
Respectively representing the current position, speed and external force of the controlled system; the impedance controller is configured in the form of u-KX, where u is the control input to the system and K is the impedance control parameter to be designed and optimized.
S102, inputting control u, the position q and the speed of a controlled system
Figure BDA0002903010380000052
And the received external force f is regarded as the cost of the control system, and the cost function is set as:
Figure BDA0002903010380000053
Q 1 ,Q 2 ,Q 3 for controlling the weights, all are positive real numbers.
S103, the external force F borne by the controlled system is a direct proportional function of the position of the controlled system, namely F is F e And q, substituting the cost function to obtain:
Figure BDA0002903010380000061
it is written in matrix form:
Figure BDA0002903010380000062
s104, designing the reward function as the inverse number of the control function:
Figure BDA0002903010380000063
the merit function being designed as an accumulation of reward functions, i.e.
Figure BDA0002903010380000064
Set the cost function to
Figure BDA0002903010380000065
Representing the kronecker product of the matrix. Theta is the control weight to be estimated as Q 1 ,Q 2 ,Q 3 A time-value function parameter.
The reward function represents the design target of the impedance controller, the reward function provided by the invention comprehensively considers the control input, the position and the speed of the controlled system and the influence of the external force, and the form of the reward function has sufficient physical meaning by utilizing the direct proportional relation between the external force and the position of the controlled system, so that the stability and the convergence of the learning process can be ensured in the actual use. At the same time, Q f ,Q x ,Q v And the weights of the external force, the position and the speed in the design target of the impedance controller are respectively, so that a user can flexibly adjust the design target according to the application scene requirement to obtain the optimal impedance controller under the scene.
In the reinforcement learning process, a real value function is finally obtained through continuous learning iteration. A wrong cost function form will lead to learning failure, and a complex cost function form (such as a deep neural network) will make the learning process tedious. The value function form provided by the invention greatly reduces the number of undetermined coefficients, does not need a complex deep network to fit the value function, and greatly accelerates the learning process.
And S2, estimating theta by adopting a reinforcement learning method based on the reward function and the value function to obtain an optimal impedance control parameter K, and finishing the design of the impedance controller.
The method for finishing the design of the controller by adopting the reinforcement learning method comprises the following steps:
s201, initializing a learning process, setting K as a zero vector, setting theta as a zero vector, and setting an updating period i update ,i update Is a positive integer.
S202, initializing a learning period, and setting a controlled system to be in an initial state X ΔT And setting a learning parameter P as δ H, wherein δ is a positive integer, and H is an identity matrix of n by n. Recording system stateIs X iΔT Where i is 1, 2, 3 … and Δ T is the control period of the controlled system. When i is 1.
S203, calculating control input u ═ KX iΔT + σ Rand; rand is a random number and σ is a weighting factor.
S204, calculating a reward function
Figure BDA0002903010380000071
S205, obtaining a system state X (i+1)ΔT Updating a system value function parameter theta and a learning parameter P:
Figure BDA0002903010380000072
Figure BDA0002903010380000073
Figure BDA0002903010380000074
θ=θ+gradient
the method for updating the parameters of the value function has small calculation amount, can update the value function in real time in the learning process, and improves the convergence speed of the value function.
S206, updating an impedance control parameter K: when i is i update When the multiple of the number of the elements is multiple, the elements of theta are sequentially arranged into a matrix of n x n, and H is partitioned to obtain the multiple of the number of the elements of theta
Figure BDA0002903010380000075
Wherein H 21 Is a matrix with the same dimension as K; let K updated =K-l*(H 21 +KH 22 ),K=K updated (ii) a l is the update weight.
The method provides an analytic control parameter updating strategy, and is low in calculation complexity and high in convergence speed. At the same time, i can be adjusted update Adjusting control parameter update frequencyAnd adjusting the updating speed of the control parameters by adjusting the I, realizing effective control on the learning process, and effectively avoiding the situations of redundant learning process caused by too slow learning and incapability of convergence of the control parameters caused by too fast learning.
S207, judging termination of the learning period: if i Δ T is not less than T max If not, i is made to be i +1, and the process returns to S203. T is max Is the maximum length of a learning cycle.
S208, learning termination judgment: the control laws recorded before and after the K-th learning period are respectively u-K k X and u ═ K k-1 X, if max (abs (K) k -K k-1 ) Is less than or equal to epsilon, the learning process is terminated and the obtained impedance controller is K k And X, otherwise, returning to S202.ε is the termination decision threshold.
In order to verify the effectiveness of the method, simulation and experimental verification are carried out according to the method, and the result shows that by adopting the method, the kinetic parameters of a controlled system do not need to be obtained, and the learning length is T after 10 learning lengths max After a learning period of 250 Δ T, an optimal impedance controller can be generated, which has obvious advantages compared with deep reinforcement learning which often requires thousands of training.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (6)

1. An impedance controller design method based on reinforcement learning is characterized by comprising the following steps:
s1, designing a reward function and a value function; set up as a reward function
Figure FDA0003643088280000011
Set as a cost function
Figure FDA0003643088280000012
Figure FDA0003643088280000013
Wherein the content of the first and second substances,
Figure FDA0003643088280000014
q,
Figure FDA0003643088280000015
f respectively represents the current position, speed and external force of the controlled system; q f ,Q x ,Q v Weights for external force, position, velocity in the impedance controller design objective, respectively; and u is KX is the control input of the system; k is the impedance control parameter to be designed and optimized,
Figure FDA0003643088280000016
Figure FDA0003643088280000017
representing a kronecker product of the matrix; theta is a cost function parameter;
s2, estimating theta by adopting a reinforcement learning method based on the reward function and the value function to obtain an optimal impedance control parameter K, and finishing the design of the impedance controller; step S2 specifically includes:
s201, learning process initialization: setting K as zero vector and theta as zero vector, and setting updating period i update ,i update Is a positive integer;
s202, learning period initialization: setting a controlled system to an initial state X ΔT Setting a learning parameter P ═ δ H, wherein δ is a positive integer, and H is a unit matrix of n × n;
s203, calculating control input u ═ KX iΔT + σ Rand, Rand being a random number, σ being a weight factor; x iΔT The system state of the current control period is 1, 2, 3 …, and Δ T is the control period of the controlled system;
s204, calculating a reward function
Figure FDA0003643088280000018
Figure FDA0003643088280000019
S205, obtaining the system state X of the next control cycle (i+1)ΔT Updating a system value function parameter theta and a learning parameter P:
Figure FDA0003643088280000021
Figure FDA0003643088280000022
gradient is the median, gamma is the predictor, 0<γ<1;
S206, updating an impedance control parameter K: when i is i update When the number of the H is multiple, the elements of theta are sequentially arranged into a matrix of n x n, and H is partitioned to obtain the H
Figure FDA0003643088280000023
Wherein H 21 Is a matrix with the same dimension as K; let K updated =K-l*(H 21 +KH 22 ),K=K updated (ii) a l is an update weight;
s207, learning period termination judgment: if i Δ T is not less than T max If not, making i equal to i +1, returning to S203; t is max Is a maximum learning cycle length;
s208, learning termination judgment: the control laws before and after the K-th learning period are respectively K k X and u ═ K k-1 X, if max (abs (K) k -K k-1 ) Is less than or equal to epsilon, the learning process is terminated and the obtained impedance controller is K k X, otherwise, returning to S202; ε is the termination decision threshold.
2. The method as claimed in claim 1, wherein the step S1 specifically includes:
s101, regarding an external force f borne by a controlled system as one of system statesIn part, obtaining an augmented state vector of
Figure FDA0003643088280000024
q,
Figure FDA0003643088280000025
f respectively represents the current position, speed and external force of the controlled system; the form of the impedance controller is set as that u is KX, u is the control input of the system, and K is the impedance control parameter to be designed and optimized;
s102, inputting control u, the current position q and the speed of a controlled system
Figure FDA0003643088280000026
And the received external force f is regarded as the cost of the control system, and the cost function is set as:
Figure FDA0003643088280000027
Q 1 ,Q 2 ,Q 3 for controlling the weight, the values are positive and real;
s103, changing F to F e Substituting q into the cost function to obtain:
Figure FDA0003643088280000031
Figure FDA0003643088280000032
in the form of a matrix of
Figure FDA0003643088280000033
S104, designing the reward function as the inverse number of the cost function:
Figure FDA0003643088280000034
the cost function being designed as an accumulation of reward functions
Figure FDA0003643088280000035
3. The method of claim 2, wherein the element in the augmented state vector X is arranged in an arbitrary order, K, c k 、r、Q v (X,u)、
Figure FDA0003643088280000036
The specific form of θ varies depending on the order of arrangement of the elements in X.
4. An impedance controller design system based on reinforcement learning, comprising:
the control target design module is used for designing a reward function and a value function; set up as a reward function
Figure FDA0003643088280000037
Set as a cost function
Figure FDA0003643088280000038
Figure FDA0003643088280000039
Wherein the content of the first and second substances,
Figure FDA00036430882800000310
q,
Figure FDA00036430882800000311
f respectively represents the current position, speed and external force of the controlled system; q f ,Q x ,Q v Weights for external force, position, velocity in the impedance controller design objective, respectively; and u is KX is the control input of the system; k is the impedance control parameter to be designed and optimized,
Figure FDA00036430882800000312
Figure FDA00036430882800000313
representing the kronecker product of the matrix; theta is a cost function parameter;
the impedance control parameter optimization module is used for estimating theta by adopting a reinforcement learning method based on the reward function and the value function to obtain an optimal impedance control parameter K and complete the design of the impedance controller; the implementation process of the impedance control parameter optimization module specifically comprises the following steps:
s201, learning process initialization: setting K as zero vector and theta as zero vector, and setting updating period i update ,i update Is a positive integer;
s202, learning period initialization: setting a controlled system to an initial state X ΔT Setting a learning parameter P ═ δ H, wherein δ is a positive integer, and H is a unit matrix of n × n;
s203, calculating control input u ═ KX iΔT + σ Rand, Rand being a random number, σ being a weighting factor; x iΔT The system state of the current control period is 1, 2, 3 …, and Δ T is the control period of the controlled system;
s204, calculating a reward function
Figure FDA0003643088280000041
S205, obtaining the system state X of the next control cycle (i+1)ΔT Updating a system value function parameter theta and a learning parameter P:
Figure FDA0003643088280000042
Figure FDA0003643088280000043
Figure FDA0003643088280000044
θ=θ+gradient
Figure FDA0003643088280000045
gradient is the median, gamma is the predictor, 0<γ<1;
S206, updating an impedance control parameter K: when i is i update When the number of the H is multiple, the elements of theta are sequentially arranged into a matrix of n x n, and H is partitioned to obtain the H
Figure FDA0003643088280000046
Wherein H 21 Is a matrix with the same dimension as K; let K updated =K-l*(H 21 +KH 22 ),K=K updated (ii) a l is an update weight;
s207, judging termination of the learning period: if i Δ T is not less than T max If not, making i equal to i +1, returning to S203; t is max Is a maximum learning cycle length;
s208, learning termination judgment: the control laws before and after the K-th learning period are respectively K k X and u ═ K k-1 X, if max (abs (K) k -K k-1 ) Is less than or equal to epsilon, the learning process is terminated and the obtained impedance controller is K k X, otherwise, returning to S202; ε is the termination decision threshold.
5. The reinforcement learning-based impedance controller design system according to claim 4, wherein the control target design module is implemented by:
the external force f borne by the controlled system is regarded as a part of the system state, and an augmented state vector is obtained
Figure FDA0003643088280000051
q,
Figure FDA0003643088280000052
f respectively represents the current position, speed and external force of the controlled system; the form of the impedance controller is set as that u is KX, u is the control input of the system, and K is the impedance control parameter to be designed and optimized;
inputting control u, current position q and speed of the controlled system
Figure FDA0003643088280000053
And the received external force f is regarded as the cost of the control system, and the cost function is set as:
Figure FDA0003643088280000054
Q 1 ,Q 2 ,Q 3 for controlling the weight, the values are positive and real;
changing F to F e Substituting q into the cost function to obtain:
Figure FDA0003643088280000055
Figure FDA0003643088280000056
in the form of a matrix
Figure FDA0003643088280000057
The reward function is designed as the inverse of the cost function:
Figure FDA0003643088280000058
the cost function being designed as an accumulation of reward functions
Figure FDA0003643088280000059
6. The reinforcement learning-based impedance controller design system of claim 5, wherein the arrangement order of the elements in the augmented state vector X is arbitrary, K, c k 、r、Q v (X,u)、
Figure FDA00036430882800000510
The specific form of θ varies depending on the order of arrangement of the elements in X.
CN202110061914.9A 2021-01-18 2021-01-18 Impedance controller design method and system based on reinforcement learning Active CN112894809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110061914.9A CN112894809B (en) 2021-01-18 2021-01-18 Impedance controller design method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110061914.9A CN112894809B (en) 2021-01-18 2021-01-18 Impedance controller design method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112894809A CN112894809A (en) 2021-06-04
CN112894809B true CN112894809B (en) 2022-08-02

Family

ID=76114670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110061914.9A Active CN112894809B (en) 2021-01-18 2021-01-18 Impedance controller design method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112894809B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114789444B (en) * 2022-05-05 2022-12-16 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107020636A (en) * 2017-05-09 2017-08-08 重庆大学 A kind of Learning Control Method for Robot based on Policy-Gradient
CN108255182A (en) * 2018-01-30 2018-07-06 上海交通大学 A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method
CN111401556A (en) * 2020-04-22 2020-07-10 清华大学深圳国际研究生院 Selection method of opponent type imitation learning winning incentive function
CN111531543A (en) * 2020-05-12 2020-08-14 中国科学院自动化研究所 Robot self-adaptive impedance control method based on biological heuristic neural network
CN111613200A (en) * 2020-05-26 2020-09-01 辽宁工程技术大学 Noise reduction method based on reinforcement learning
US10766136B1 (en) * 2017-11-03 2020-09-08 Amazon Technologies, Inc. Artificial intelligence system for modeling and evaluating robotic success at task performance
CN111708355A (en) * 2020-06-19 2020-09-25 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle action decision method and device based on reinforcement learning
CN111782870A (en) * 2020-06-18 2020-10-16 湖南大学 Antagonistic video time retrieval method and device based on reinforcement learning, computer equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107020636A (en) * 2017-05-09 2017-08-08 重庆大学 A kind of Learning Control Method for Robot based on Policy-Gradient
US10766136B1 (en) * 2017-11-03 2020-09-08 Amazon Technologies, Inc. Artificial intelligence system for modeling and evaluating robotic success at task performance
CN108255182A (en) * 2018-01-30 2018-07-06 上海交通大学 A kind of service robot pedestrian based on deeply study perceives barrier-avoiding method
CN111401556A (en) * 2020-04-22 2020-07-10 清华大学深圳国际研究生院 Selection method of opponent type imitation learning winning incentive function
CN111531543A (en) * 2020-05-12 2020-08-14 中国科学院自动化研究所 Robot self-adaptive impedance control method based on biological heuristic neural network
CN111613200A (en) * 2020-05-26 2020-09-01 辽宁工程技术大学 Noise reduction method based on reinforcement learning
CN111782870A (en) * 2020-06-18 2020-10-16 湖南大学 Antagonistic video time retrieval method and device based on reinforcement learning, computer equipment and storage medium
CN111708355A (en) * 2020-06-19 2020-09-25 中国人民解放军国防科技大学 Multi-unmanned aerial vehicle action decision method and device based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
复杂结构柔顺加工的人机示教编程与机器人力控研究;李科霖;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20200115(第01期);第43-46页 *

Also Published As

Publication number Publication date
CN112894809A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
US11537897B2 (en) Artificial neural network circuit training method, training program, and training device
EP3136304A1 (en) Methods and systems for performing reinforcement learning in hierarchical and temporally extended environments
CN108375907B (en) Adaptive compensation control method of hypersonic aircraft based on neural network
CN110647042A (en) Robot robust learning prediction control method based on data driving
CN110286595B (en) Fractional order system self-adaptive control method influenced by saturated nonlinear input
CN111665853A (en) Unmanned vehicle motion planning method for planning control joint optimization
CN112894809B (en) Impedance controller design method and system based on reinforcement learning
CN113406886B (en) Fuzzy self-adaptive control method and system for single-link mechanical arm and storage medium
CN110488603B (en) Rigid aircraft adaptive neural network tracking control method considering actuator limitation problem
CN112085050A (en) Antagonistic attack and defense method and system based on PID controller
CN113043251A (en) Robot teaching reproduction track learning method
CN114326405B (en) Neural network backstepping control method based on error training
CN112904726B (en) Neural network backstepping control method based on error reconstruction weight updating
CN113627075B (en) Projectile pneumatic coefficient identification method based on adaptive particle swarm optimization extreme learning
CN113346552A (en) Self-adaptive optimal AGC control method based on integral reinforcement learning
CN110991606B (en) Piezoelectric ceramic driver composite control method based on radial basis function neural network
CN112947090A (en) Data-driven iterative learning control method for wheeled robot under DOS attack
CN109709809B (en) Modeling method and tracking method of electromagnetic/magneto-rheological actuator based on hysteresis kernel
CN114559429B (en) Neural network control method of flexible mechanical arm based on self-adaptive iterative learning
CN110554605A (en) complex mechanical system adaptive robust control method based on constraint tracking
CN112685835B (en) Elastic event trigger control method and system for autonomous driving of vehicle
CN112346342B (en) Single-network self-adaptive evaluation design method of non-affine dynamic system
CN115047769A (en) Unmanned combat platform obstacle avoidance-arrival control method based on constraint following
CN112305916B (en) Self-adaptive control method and system for mobile robot based on barrier function
CN114139282A (en) Underwater impact load modeling method of cross-medium aircraft

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant