CN114789444B - Compliant human-computer contact method based on deep reinforcement learning and impedance control - Google Patents

Compliant human-computer contact method based on deep reinforcement learning and impedance control Download PDF

Info

Publication number
CN114789444B
CN114789444B CN202210484043.6A CN202210484043A CN114789444B CN 114789444 B CN114789444 B CN 114789444B CN 202210484043 A CN202210484043 A CN 202210484043A CN 114789444 B CN114789444 B CN 114789444B
Authority
CN
China
Prior art keywords
actuator
axis direction
mechanical arm
contact
contact force
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210484043.6A
Other languages
Chinese (zh)
Other versions
CN114789444A (en
Inventor
舒明雷
张铁译
陈超
王若同
刘照阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Original Assignee
Qilu University of Technology
Shandong Institute of Artificial Intelligence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Institute of Artificial Intelligence filed Critical Qilu University of Technology
Priority to CN202210484043.6A priority Critical patent/CN114789444B/en
Publication of CN114789444A publication Critical patent/CN114789444A/en
Application granted granted Critical
Publication of CN114789444B publication Critical patent/CN114789444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/1633Programme controls characterised by the control loop compliant, force, torque control, e.g. combined with position control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/1605Simulation of manipulator lay-out, design, modelling of manipulator
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control

Abstract

A flexible man-machine contact method based on deep reinforcement learning and impedance control is characterized in that related state space, action space and reward function are established according to the requirements of the task, a virtual contact surface is established on the surface of a body in order to achieve flexibility of the task, the virtual contact force of an actuator reaching a target part is obtained in advance by combining impedance control, the state space is input, the actuator action is adjusted through a deep reinforcement learning algorithm, force adjustment is achieved, and the task is completed. The method realizes the combination of deep reinforcement learning and compliance control, establishes a virtual contact surface, can acquire the contact force of an actuator in advance, realizes the adjustment of the contact force, and adapts to complex and variable compliance man-machine contact tasks.

Description

Compliant human-computer contact method based on deep reinforcement learning and impedance control
Technical Field
The invention relates to the technical field of compliance control, in particular to a compliance man-machine contact method based on deep reinforcement learning and impedance control.
Background
In recent years, with the progress of artificial intelligence technology, the perception capability and interactive communication capability of robots are stronger, and as a machine system for simulating human behaviors, intelligent robots can assist humans to complete various tasks, and contact between the robots is inevitable, so that 'human-computer integration' becomes an important development trend for realizing close combination of humans and robots. In this trend, the level of robot operation is also more and more demanding. In terms of position control and force control of the robot, the conventional compliance control technology has been developed to a very mature level. However, the design of the control system relies on an accurate mathematical model, which is difficult to obtain in a compliant human-computer contact task due to the complexity, time-varying property and uncertainty of the environment. Therefore, conventional control techniques still present certain challenges and limitations in handling the respective tasks.
Disclosure of Invention
In order to overcome the defects of the technology, the invention provides a flexible man-machine contact method based on deep reinforcement learning and impedance control.
The technical scheme adopted by the invention for overcoming the technical problems is as follows:
a compliant human-computer contact method based on deep reinforcement learning and impedance control comprises the following steps:
a) According to the soft human-computer contact task, a mechanical arm coordinate system is established by the mechanical arm base, and initial position coordinates { P ] of the actuator are obtained x ,P y ,P z Position coordinates of the target portion in the arm coordinate system { O } x ,O y ,O z },P x Is the X-axis coordinate, P, of the actuator y Is the Y-axis coordinate, P, of the actuator z Is the Z-axis coordinate of the actuator, O x Is the X-axis coordinate of the target site, O y Is the Y-axis coordinate of the target site, O z Is the Z-axis coordinate of the target part;
b) Establishing a state space S and an action space A, S = { P' x ,P′ y ,P′ z ,O x ,O y ,O z ,F x ,F y ,F z In the formula, F x Is the contact force component in the X-axis direction of the actuator, F y Is the contact force component in the Y-axis direction of the actuator, F z Being contact in the Z-axis direction of the actuatorComponent of force, P' x X-axis coordinate, P 'of real-time position of actuator' y Y-axis coordinate, P ', being the real-time position of the actuator' z A Z-axis coordinate that is a real-time position of the actuator;
c) Initializing the pose of the mechanical arm, wherein the real-time position coordinate of the initialized actuator is { P' x ,P′ y ,P′ z Obtaining the distance d between the initial position of the actuator and the target position i And the distance d between the current position of the actuator and the target part c
d) By the formula r 1 =(d i -d c )/d i Calculating a distance-based reward function r 1
e) Setting the contact force of an actuator and a target in a contact task of a flexible man-machine within m-nN;
f) Setting a virtual contact surface at a distance gamma from the body;
g) Through the formula l = | P' z -O z I, calculating to obtain a judgment condition value l of whether the virtual contact surface is contacted;
h) When l is larger than gamma, the actuator is judged not to contact the virtual contact surface, and the state space F x ,F y ,F z Are all 0;
i) When l is more than 0 and less than or equal to gamma, the actuator is judged to be in contact with the virtual contact surface, and the virtual contact force F is obtained through impedance control at the moment v ,F v =(F′ x ,F′ y ,F′ z ) T ,F′ x Is a virtual contact force component force, F ', of the actuator in the X-axis direction' y Is a virtual contact force component force, F 'in the Y-axis direction of the actuator' z Is a virtual contact force component force of the actuator in the Z-axis direction, T is transposed, F' x 、F′ y And F' z Are respectively equal to F in the state space S x 、F y And F z
j) When the actuator is in contact with the body, the real contact force F e And 0, stopping the control of the actuator.
Further, the step a) is realized by a vision model in a compliant human machinePosition coordinates { P } of the block obtaining actuator x ,P y ,P z And the position coordinates (O) of the target part in the mechanical arm coordinate system x ,O y ,O z }。
Further, in step b) by the formula a = { a = { (a) x ,a y ,a z Establishment of an action space, a x Is the offset of the actuator in the X-axis direction under the coordinate system of the mechanical arm, a y Is the offset of the actuator in the Y-axis direction under the coordinate system of the mechanical arm, a z And the offset of the actuator in the Z-axis direction under the coordinate system of the mechanical arm.
Further, in step c), the formula is used
Figure BDA0003627004080000031
Calculating to obtain the distance d between the current position of the actuator and the target part c By the formula
Figure BDA0003627004080000032
Calculating to obtain the distance d between the initial position of the actuator and the target part i
Preferably, m in step e) is 1 and n is 7.
Further, in step i) by formula
Figure BDA0003627004080000033
Calculating to obtain the virtual contact force F v In the formula M d 、B d 、K d Are all parameters of the impedance of the power supply,
Figure BDA0003627004080000034
is the speed of the actuator or actuators,
Figure BDA0003627004080000035
is the acceleration of the actuator, λ is a constant, I is a vector, I = [0, 1] T Δ x is a distance difference between the actuator and the target site, and Δ x = (| P' x -O x |,|P′ y -O y |,|P′ z -O z |) T
Preferably, the deep reinforcement learning algorithm in the step b) is a PPO-clip algorithm.
Further, the method also comprises the step h) after the step j): when the true contact force F e And when the number of the actuators is larger than nN, immediately stopping the operation of the actuators, and initializing the mechanical arm.
Further, the method also comprises the step of training the deep reinforcement learning algorithm by using the reward function r.
Further, the reward function r = r 1 +r 2 +r 3 +r 4 Wherein when the mechanical arm is explored in the range of
Figure BDA0003627004080000036
Time r 2 =0, if the robot arm search range is not
Figure BDA0003627004080000037
Time r 2 = -1, wherein X min Is the minimum value, X, that the actuator can run in the X-axis direction of the robot arm coordinate system max Is the maximum value, Y, that the actuator can run in the X-axis direction of the robot arm coordinate system min Is the minimum value that the actuator can run in the Y-axis direction of the mechanical arm coordinate system, Y max Is the maximum value, Z, that the actuator can run in the Y-axis direction of the robot arm coordinate system min Is the minimum value that the actuator can run in the Z-axis direction of the robot arm coordinate system, Z max The maximum value of the actuator which can be reached by operation in the Z-axis direction of the mechanical arm coordinate system; setting the maximum step number of the mechanical arm as 1500 steps, and when the step number of the mechanical arm is adjusted to be more than 1500 steps, r 3 =1, stopping operation and initializing the mechanical arm; if the true contact force F e R when the component force in the Z-axis direction is between m-nN 4 =1 if true contact force F e R when the component force in the Z-axis direction is not between m-nN 4 =-1。
The invention has the beneficial effects that: according to the requirements of the task, a relevant state space, an action space and a reward function are established, in order to achieve task flexibility, a virtual contact surface is established on the surface of a body, the virtual contact force of an actuator reaching a target part is obtained in advance by combining impedance control, the state space is input, the action of the actuator is adjusted through a depth reinforcement learning algorithm, force adjustment is achieved, and the task is completed. The method realizes the combination of deep reinforcement learning and compliance control, establishes a virtual contact surface, can acquire the contact force of an actuator in advance, realizes the adjustment of the contact force, and adapts to complex and variable compliance man-machine contact tasks.
Drawings
FIG. 1 is a schematic diagram of a control structure of the present invention.
Detailed Description
The invention is further illustrated with reference to fig. 1.
A compliant human-computer contact method based on deep reinforcement learning and impedance control comprises the following steps:
a) According to the compliant human-computer contact task, establishing a mechanical arm coordinate system by using a mechanical arm base to obtain initial position coordinates { P ] of an actuator x ,P y ,P z And the position coordinates { O } of the target portion under the coordinate system of the arm x ,O y ,O z },P x Is the X-axis coordinate, P, of the actuator y Is the Y-axis coordinate, P, of the actuator z As Z-axis coordinate of the actuator, O x Is the X-axis coordinate, O, of the target site y Is the Y-axis coordinate of the target site, O z Is the Z-axis coordinate of the target site.
b) Establishing a state space S and an action space A, S = { P' x ,P′ y ,P′ z ,O x ,O y ,O z ,F x ,F y ,F z In the formula F x Is the contact force component in the X-axis direction of the actuator, F y Is the contact force component in the Y-axis direction of the actuator, F z Is a contact force component in the Z-axis direction of the actuator, P' x X-axis coordinate, P 'of real-time position of actuator' y Y-axis coordinate, P 'of real-time position of actuator' z Is the Z-axis coordinate of the real-time position of the actuator.
c) The correct man-machine contact task is carried out, which is mainly divided into two parts for processing,the first part is used for controlling the actuator to reach a target position, and the second part considers that the actuator is controlled to contact a target part in a proper force range, and the specific steps are as follows: initializing the pose of the mechanical arm, keeping the actuator in a vertical downward pose all the time, and setting the real-time position coordinate of the initialized actuator as { P' x ,P′ y ,P′ z Obtaining the distance d between the initial position of the actuator and the target position i And the distance d between the current position of the actuator and the target part c
d) By the formula r 1 =(d i -d c )/d i Calculating to obtain a distance-based reward function r 1 . The reward function is set to complete the arrival task, and the closer the actuator is to the target portion, the greater the reward value.
e) And setting the contact force of the actuator and the target in the contact task of the flexible man-machine within m-nN. The sound signal obtained at this time is good, and at the same time, in the contact force, the pressure caused by the contact force can be ensured to be in a comfortable range. Where mN represents the minimum contact force and nN the maximum contact force.
f) A virtual contact surface is provided at a distance γ from the body.
g) When the actuator approaches the target part, the condition for judging whether the actuator contacts the virtual contact surface is as follows: through the formula l = | P' z -O z And | calculating to obtain a judgment condition value l of whether the virtual contact surface is contacted.
h) When l is larger than gamma, the actuator is judged not to contact the virtual contact surface, and the state space F x ,F y ,F z Are all 0.
i) When l is more than 0 and less than or equal to gamma, the actuator is judged to be in contact with the virtual contact surface, and the virtual contact force F is obtained through impedance control at the moment v ,F v =(F′ x ,F′ y ,F′ z ) T ,F′ x Is a virtual contact force component, F ', in the X-axis direction of the actuator' y Is a virtual contact force component force, F ', of the actuator in the Y-axis direction' z Is a virtual contact force component in the Z-axis direction of the actuator, T is the transpose, F' x 、F′ y And F' z Are respectively equivalent to F in the state space S x 、F y And F z
j) When the actuator is in contact with the body, the real contact force F e And 0, stopping the control of the actuator. According to the requirements of the task, a relevant state space, an action space and a reward function are established, in order to achieve flexibility of the task, a virtual contact surface is established on the surface of a body, the virtual contact force of the actuator reaching a target part is obtained in advance by combining impedance control, the state space is input, the action of the actuator is adjusted through a depth reinforcement learning algorithm, force adjustment is achieved, and the task is completed. The method realizes the combination of deep reinforcement learning and compliance control, establishes a virtual contact surface, can acquire the contact force of an actuator in advance, realizes the adjustment of the contact force, and adapts to complex and variable compliance man-machine contact tasks.
Example 1:
obtaining the position coordinate { P) of the actuator through a vision module in the flexible man-machine in the step a) x ,P y ,P z Position coordinates (O) of the target portion in the coordinate system of the arm x ,O y ,O z }。
Example 2:
in step b) by the formula A = { a = x ,a y ,a z Establishment of an action space, a x Is the offset of the actuator in the X-axis direction under the coordinate system of the mechanical arm, a y Is the offset of the actuator in the Y-axis direction under the coordinate system of the mechanical arm, a z The offset of the actuator in the Z-axis direction under the robot arm coordinate system is shown.
Example 3:
in step c) by the formula
Figure BDA0003627004080000061
Calculating to obtain the distance d between the current position of the actuator and the target part c By the formula
Figure BDA0003627004080000062
Calculating to obtain the initial position of the actuatorDistance d from target site i
Example 4:
in step e) m is 1 and n is 7. Because the muscle of the human body is elastic, the human body is bound to generate the depression when in force contact, but the final control of the tail end force of the mechanical arm is within the range of 1-7N, so that the human body is not uncomfortable or damaged.
Example 5:
in step i) by the formula
Figure BDA0003627004080000063
Calculating to obtain the virtual contact force F v In the formula M d 、B d 、K d Are all parameters of the impedance, and are,
Figure BDA0003627004080000064
is the speed of the actuator or actuators,
Figure BDA0003627004080000065
lambda is a constant for the acceleration of the actuator, in order to prevent a virtual contact force F during contact v The case where I is a vector, although a partial arc of the target surface causes an arc of the virtual contact surface, since the actuator keeps the direction of contact with the target portion unchanged, the component forces in the X and Y directions are small when the actuator contacts the virtual contact surface, so that I = [0, 1 ] needs to be considered only to control the force of the actuator in the Z direction within a safe range] T Δ x is a distance difference between the actuator and the target site, and Δ x = (| P' x -O x ′|,|P′ y -O y |,|P′ z -O z |) T . Adjusting Δ X, by adjusting the amount of offset in the X, Y, Z directions while the actuator is approaching the target site,
Figure BDA0003627004080000071
To F v And (4) adjusting.
Example 6:
the deep reinforcement learning algorithm in the step b) is a PPO-clip algorithm.
Example 7:
due to the existence of the error, the method also comprises the step h) after the step j): when the true contact force F e And when the number of the actuators is larger than nN, immediately stopping the operation of the actuators, and initializing the mechanical arm.
Example 8:
and training the deep reinforcement learning algorithm by using the reward function r.
Example 9:
in particular, the reward function r = r 1 +r 2 +r 3 +r 4 Wherein the exploration range of the mechanical arm is
Figure BDA0003627004080000072
Time r 2 =0, if the robot arm search range is not
Figure BDA0003627004080000073
Time r 2 = -1, wherein X min Is the minimum value, X, that the actuator can run in the X-axis direction of the robot arm coordinate system max Is the maximum value, Y, that the actuator can run in the X-axis direction of the mechanical arm coordinate system min Is the minimum value that the actuator can run in the Y-axis direction of the robot arm coordinate system, Y max Is the maximum value, Z, that the actuator can run in the Y-axis direction of the robot arm coordinate system min Is the minimum value that the actuator can run in the Z-axis direction of the mechanical arm coordinate system, Z max The maximum value of the actuator which can be operated in the Z-axis direction of the mechanical arm coordinate system is obtained; setting the maximum step number of the mechanical arm as 1500 steps, and when the step number of the mechanical arm is adjusted to be more than 1500 steps, r 3 = -1, stopping running, and initializing a mechanical arm; if the true contact force F e R when the component force in the Z-axis direction is between m and nN 4 =1, if the true contact force F e R when the component force in the Z-axis direction is not between m-nN 4 =-1。
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A compliant human-computer contact method based on deep reinforcement learning and impedance control is characterized by comprising the following steps:
a) According to the soft human-computer contact task, a mechanical arm coordinate system is established by the mechanical arm base, and initial position coordinates { P ] of the actuator are obtained x ,P y ,P z And the position coordinates { O } of the target portion under the coordinate system of the arm x ,O y ,O z },P x Is the X-axis coordinate, P, of the actuator y Is the Y-axis coordinate, P, of the actuator z As Z-axis coordinate of the actuator, O x Is the X-axis coordinate of the target site, O y Is the Y-axis coordinate of the target site, O z Is the Z-axis coordinate of the target part;
b) Establishing a state space S and an action space A through a deep reinforcement learning algorithm, wherein S = { P = x ′,P y ′,P z ′,O x ,O y ,O z ,F x ,F y ,F z In the formula, F x Is the contact force component in the X-axis direction of the actuator, F y Is the contact force component in the Y-axis direction of the actuator, F z Is the contact force component of the actuator in the Z-axis direction, P x ' X-axis coordinate, P, which is the real-time position of the actuator y ' is the Y-axis coordinate, P, of the real-time position of the actuator z ' is the Z-axis coordinate of the real-time position of the actuator;
c) Initializing the pose of the mechanical arm, wherein the real-time position coordinate of the initialized actuator is { P' x ,P′ y ,P′ z Obtaining the distance d between the initial position of the actuator and the target position i And the distance d between the current position of the actuator and the target part c
d) By the formula r 1 =(d i -d c )/d i Calculating to obtain a distance-based reward function r 1
e) Setting the contact force of an actuator and a target in a contact task of a compliant man-machine within m-nN;
f) Setting a virtual contact surface at a distance gamma from the body;
g) Through the formula l = | P' z -O z I, calculating to obtain a judgment condition value l of whether the virtual contact surface is contacted or not;
h) When l is more than gamma, the actuator is judged not to contact the virtual contact surface, and the state space F x ,F y ,F z Are all 0;
i) When l is more than 0 and less than or equal to gamma, judging that the actuator is contacted with the virtual contact surface, and obtaining the virtual contact force F through impedance control at the moment v ,F v =(F′ x ,F′ y ,F′ z ) T ,F′ x Is a virtual contact force component in the X-axis direction of the actuator, F y ' is the virtual contact force component in the Y-axis direction of the actuator, F z ' is the virtual contact force component in the Z-axis direction of the actuator, T is the transposition, F is x ′、F y ' and F z ' respectively equivalent to F in the state space S x 、F y And F z
j) When the actuator is in contact with the body, the real contact force F e Not equal to 0, at the moment, stopping controlling the actuator; in step b) by the formula A = { a = x ,a y ,a z Establishment of an action space, a x Is the offset of the actuator in the X-axis direction under the coordinate system of the mechanical arm, a y Is the offset of the actuator in the Y-axis direction under the coordinate system of the mechanical arm, a z The offset of the actuator in the Z-axis direction under the mechanical arm coordinate system is obtained;
in step i) by the formula
Figure FDA0003933105520000021
Calculating to obtain a virtual contact force F v In the formula M d 、B d 、K d Are all resistanceThe resistance parameter is a parameter of the resistance,
Figure FDA0003933105520000022
is the speed of the actuator or actuators,
Figure FDA0003933105520000023
is the acceleration of the actuator, λ is a constant, I is a vector, I = [0, 1] T Δ x is a distance difference between the actuator and the target portion,
Figure FDA0003933105520000024
2. the compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 1, wherein: obtaining the position coordinate { P) of the actuator through a vision module in the flexible man-machine in the step a) x ,P y ,P z And the position coordinates (O) of the target part in the mechanical arm coordinate system x ,O y ,O z }。
3. The method of claim 2, wherein the method comprises the steps of: in step c) by the formula
Figure FDA0003933105520000025
Calculating to obtain the distance d between the current position of the actuator and the target part c By the formula
Figure FDA0003933105520000026
Calculating to obtain the distance d between the initial position of the actuator and the target part i
4. The compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 1, wherein: in step e) m is 1 and n is 7.
5. The compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 1, wherein: the deep reinforcement learning algorithm in the step b) is a PPO-clip algorithm.
6. The method for compliant human-machine contact based on deep reinforcement learning and impedance control according to claim 1, further comprising performing step h) after step j): when the true contact force F e And when the number of the actuators is larger than nN, immediately stopping the operation of the actuators, and initializing the mechanical arm.
7. The method of claim 1, wherein the method comprises the steps of: and training the deep reinforcement learning algorithm by using the reward function r.
8. The compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 7, wherein: reward function r = r 1 +r 2 +r 3 +r 4 Wherein the exploration range of the mechanical arm is
Figure FDA0003933105520000031
Time r 2 =0, if the robot arm search range is not
Figure FDA0003933105520000032
Time r 2 = -1, wherein X min Is the minimum value, X, that the actuator can run in the X-axis direction of the mechanical arm coordinate system max Is the maximum value, Y, that the actuator can run in the X-axis direction of the robot arm coordinate system min Is the minimum value that the actuator can run in the Y-axis direction of the robot arm coordinate system, Y max Is the maximum value, Z, that the actuator can run in the Y-axis direction of the robot arm coordinate system min Is the minimum value that the actuator can run in the Z-axis direction of the robot arm coordinate system, Z max The maximum value of the actuator which can be operated in the Z-axis direction of the mechanical arm coordinate system is obtained; is provided withThe maximum step number of the mechanical arm is 1500 steps, and when the step number of the mechanical arm is adjusted to be more than 1500 steps, r 3 = -1, stopping running, and initializing a mechanical arm; if the true contact force F e R when the component force in the Z-axis direction is between m-nN 4 =1, if the true contact force F e R when the component force in the Z-axis direction is not between m-nN 4 =-1。
CN202210484043.6A 2022-05-05 2022-05-05 Compliant human-computer contact method based on deep reinforcement learning and impedance control Active CN114789444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210484043.6A CN114789444B (en) 2022-05-05 2022-05-05 Compliant human-computer contact method based on deep reinforcement learning and impedance control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210484043.6A CN114789444B (en) 2022-05-05 2022-05-05 Compliant human-computer contact method based on deep reinforcement learning and impedance control

Publications (2)

Publication Number Publication Date
CN114789444A CN114789444A (en) 2022-07-26
CN114789444B true CN114789444B (en) 2022-12-16

Family

ID=82462197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210484043.6A Active CN114789444B (en) 2022-05-05 2022-05-05 Compliant human-computer contact method based on deep reinforcement learning and impedance control

Country Status (1)

Country Link
CN (1) CN114789444B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105291102A (en) * 2009-12-17 2016-02-03 库卡机器人有限公司 Method and device for controlling a manipulator
CN106483964A (en) * 2015-08-31 2017-03-08 中南大学 A kind of robot Shared control method based on contact force observer
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN108153153A (en) * 2017-12-19 2018-06-12 哈尔滨工程大学 A kind of study impedance control system and control method
JP2019020826A (en) * 2017-07-12 2019-02-07 国立大学法人九州大学 Force control device, force control method and force control program
CN111290269A (en) * 2020-02-11 2020-06-16 西北工业大学深圳研究院 Self-adaptive compliance stable control method of space robot
CN111716361A (en) * 2020-07-03 2020-09-29 深圳市优必选科技股份有限公司 Robot control method and device and surface-surface contact model construction method
CN111975746A (en) * 2019-05-24 2020-11-24 精工爱普生株式会社 Robot control method
WO2020239181A1 (en) * 2019-05-29 2020-12-03 Universal Robots A/S Detection of change in contact between robot arm and an object
CN112506044A (en) * 2020-09-10 2021-03-16 上海交通大学 Flexible arm control and planning method based on visual feedback and reinforcement learning
KR20210065738A (en) * 2019-11-27 2021-06-04 한국생산기술연구원 Method for controlling 7-axis robot using reinforcement learning
CN112894809A (en) * 2021-01-18 2021-06-04 华中科技大学 Impedance controller design method and system based on reinforcement learning
CN112975977A (en) * 2021-03-05 2021-06-18 西北大学 Efficient mechanical arm grabbing depth reinforcement learning reward training method and system
CN113134839A (en) * 2021-04-26 2021-07-20 湘潭大学 Robot precision flexible assembly method based on vision and force position image learning
CN113319857A (en) * 2021-08-03 2021-08-31 季华实验室 Mechanical arm force and position hybrid control method and device, electronic equipment and storage medium
CN113427483A (en) * 2021-05-19 2021-09-24 广州中国科学院先进技术研究所 Double-machine manpower/bit multivariate data driving method based on reinforcement learning
CN113635297A (en) * 2021-07-05 2021-11-12 武汉库柏特科技有限公司 Robot adaptive force contact control method and system based on rigidity detection
CN113967909A (en) * 2021-09-13 2022-01-25 中国人民解放军军事科学院国防科技创新研究院 Mechanical arm intelligent control method based on direction reward
CN114131617A (en) * 2021-12-30 2022-03-04 华中科技大学 Intelligent compliance control method and device for industrial robot

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013054629A (en) * 2011-09-06 2013-03-21 Honda Motor Co Ltd Control apparatus and method
CN105583824B (en) * 2016-01-26 2017-05-24 清华大学 Force control traction and swinging multi-degree-of-freedom mechanical arm control device and method
JP6423815B2 (en) * 2016-03-30 2018-11-14 ファナック株式会社 Human collaborative robot system
JP6431017B2 (en) * 2016-10-19 2018-11-28 ファナック株式会社 Human cooperative robot system with improved external force detection accuracy by machine learning
JP7427358B2 (en) * 2017-07-20 2024-02-05 キヤノン株式会社 Robot system, article manufacturing method, control method, control program, and recording medium
WO2022054947A1 (en) * 2020-09-14 2022-03-17 株式会社アイシン Robot device and control method for same

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105291102A (en) * 2009-12-17 2016-02-03 库卡机器人有限公司 Method and device for controlling a manipulator
CN106483964A (en) * 2015-08-31 2017-03-08 中南大学 A kind of robot Shared control method based on contact force observer
JP2019020826A (en) * 2017-07-12 2019-02-07 国立大学法人九州大学 Force control device, force control method and force control program
CN108052004A (en) * 2017-12-06 2018-05-18 湖北工业大学 Industrial machinery arm autocontrol method based on depth enhancing study
CN108153153A (en) * 2017-12-19 2018-06-12 哈尔滨工程大学 A kind of study impedance control system and control method
CN111975746A (en) * 2019-05-24 2020-11-24 精工爱普生株式会社 Robot control method
WO2020239181A1 (en) * 2019-05-29 2020-12-03 Universal Robots A/S Detection of change in contact between robot arm and an object
KR20210065738A (en) * 2019-11-27 2021-06-04 한국생산기술연구원 Method for controlling 7-axis robot using reinforcement learning
CN111290269A (en) * 2020-02-11 2020-06-16 西北工业大学深圳研究院 Self-adaptive compliance stable control method of space robot
CN111716361A (en) * 2020-07-03 2020-09-29 深圳市优必选科技股份有限公司 Robot control method and device and surface-surface contact model construction method
CN112506044A (en) * 2020-09-10 2021-03-16 上海交通大学 Flexible arm control and planning method based on visual feedback and reinforcement learning
CN112894809A (en) * 2021-01-18 2021-06-04 华中科技大学 Impedance controller design method and system based on reinforcement learning
CN112975977A (en) * 2021-03-05 2021-06-18 西北大学 Efficient mechanical arm grabbing depth reinforcement learning reward training method and system
CN113134839A (en) * 2021-04-26 2021-07-20 湘潭大学 Robot precision flexible assembly method based on vision and force position image learning
CN113427483A (en) * 2021-05-19 2021-09-24 广州中国科学院先进技术研究所 Double-machine manpower/bit multivariate data driving method based on reinforcement learning
CN113635297A (en) * 2021-07-05 2021-11-12 武汉库柏特科技有限公司 Robot adaptive force contact control method and system based on rigidity detection
CN113319857A (en) * 2021-08-03 2021-08-31 季华实验室 Mechanical arm force and position hybrid control method and device, electronic equipment and storage medium
CN113967909A (en) * 2021-09-13 2022-01-25 中国人民解放军军事科学院国防科技创新研究院 Mechanical arm intelligent control method based on direction reward
CN114131617A (en) * 2021-12-30 2022-03-04 华中科技大学 Intelligent compliance control method and device for industrial robot

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于模糊强化学习的微创外科手术机械臂人机交互方法;杜志江等;《机器人 ROBOT》;20170531;全文 *
基于深度强化学习的空间机械臂柔顺捕获控制方法研究;文闻等;《空间控制技术与应用》;20220228;全文 *

Also Published As

Publication number Publication date
CN114789444A (en) 2022-07-26

Similar Documents

Publication Publication Date Title
CN111660306B (en) Robot variable admittance control method and system based on operator comfort
CN108241339B (en) Motion solving and configuration control method of humanoid mechanical arm
Yang et al. Haptics electromyography perception and learning enhanced intelligence for teleoperated robot
CN110039542B (en) Visual servo tracking control method with speed and direction control function and robot system
CN109249394B (en) Robot control method and system based on admittance control algorithm
CN107053179B (en) A kind of mechanical arm Compliant Force Control method based on Fuzzy Reinforcement Learning
CN110597072B (en) Robot admittance compliance control method and system
US20170348858A1 (en) Multiaxial motion control device and method, in particular control device and method for a robot arm
CN110000795A (en) A kind of method of Visual servoing control, system and equipment
CN112631128A (en) Robot assembly skill learning method and system based on multi-mode heterogeneous information fusion
Zeng et al. A unified parametric representation for robotic compliant skills with adaptation of impedance and force
CN108427282A (en) A kind of solution of Inverse Kinematics method based on learning from instruction
CN115469576A (en) Teleoperation system based on human-mechanical arm heterogeneous motion space hybrid mapping
CN110181517B (en) Double teleoperation training method based on virtual clamp
CN114789444B (en) Compliant human-computer contact method based on deep reinforcement learning and impedance control
Wu et al. Learning from demonstration and interactive control of variable-impedance to cut soft tissues
Lee et al. Physical human robot interaction in imitation learning
Liu et al. Multi-fingered tactile servoing for grasping adjustment under partial observation
Steil et al. Guiding attention for grasping tasks by gestural instruction: The gravis-robot architecture
CN116587275A (en) Mechanical arm intelligent impedance control method and system based on deep reinforcement learning
CN113967909B (en) Direction rewarding-based intelligent control method for mechanical arm
CN111546035B (en) Online rapid gear assembly method based on learning and prediction
CN105467841B (en) A kind of class nerve control method of humanoid robot upper extremity exercise
CN110919650A (en) Low-delay grabbing teleoperation system based on SVM (support vector machine)
CN116852397B (en) Self-adaptive adjusting method for physiotherapy force and physiotherapy path of negative pressure physiotherapy robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant