CN114789444B

CN114789444B - Compliant human-computer contact method based on deep reinforcement learning and impedance control

Info

Publication number: CN114789444B
Application number: CN202210484043.6A
Authority: CN
Inventors: 舒明雷; 张铁译; 陈超; 王若同; 刘照阳
Original assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Current assignee: Qilu University of Technology; Shandong Institute of Artificial Intelligence
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2022-12-16
Anticipated expiration: 2042-05-05
Also published as: CN114789444A

Abstract

A flexible man-machine contact method based on deep reinforcement learning and impedance control is characterized in that related state space, action space and reward function are established according to the requirements of the task, a virtual contact surface is established on the surface of a body in order to achieve flexibility of the task, the virtual contact force of an actuator reaching a target part is obtained in advance by combining impedance control, the state space is input, the actuator action is adjusted through a deep reinforcement learning algorithm, force adjustment is achieved, and the task is completed. The method realizes the combination of deep reinforcement learning and compliance control, establishes a virtual contact surface, can acquire the contact force of an actuator in advance, realizes the adjustment of the contact force, and adapts to complex and variable compliance man-machine contact tasks.

Description

Compliant human-computer contact method based on deep reinforcement learning and impedance control

Technical Field

The invention relates to the technical field of compliance control, in particular to a compliance man-machine contact method based on deep reinforcement learning and impedance control.

Background

In recent years, with the progress of artificial intelligence technology, the perception capability and interactive communication capability of robots are stronger, and as a machine system for simulating human behaviors, intelligent robots can assist humans to complete various tasks, and contact between the robots is inevitable, so that 'human-computer integration' becomes an important development trend for realizing close combination of humans and robots. In this trend, the level of robot operation is also more and more demanding. In terms of position control and force control of the robot, the conventional compliance control technology has been developed to a very mature level. However, the design of the control system relies on an accurate mathematical model, which is difficult to obtain in a compliant human-computer contact task due to the complexity, time-varying property and uncertainty of the environment. Therefore, conventional control techniques still present certain challenges and limitations in handling the respective tasks.

Disclosure of Invention

In order to overcome the defects of the technology, the invention provides a flexible man-machine contact method based on deep reinforcement learning and impedance control.

The technical scheme adopted by the invention for overcoming the technical problems is as follows:

a compliant human-computer contact method based on deep reinforcement learning and impedance control comprises the following steps:

a) According to the soft human-computer contact task, a mechanical arm coordinate system is established by the mechanical arm base, and initial position coordinates { P ] of the actuator are obtained _x ,P _y ,P _z Position coordinates of the target portion in the arm coordinate system { O } _x ,O _y ,O _z }，P _x Is the X-axis coordinate, P, of the actuator _y Is the Y-axis coordinate, P, of the actuator _z Is the Z-axis coordinate of the actuator, O _x Is the X-axis coordinate of the target site, O _y Is the Y-axis coordinate of the target site, O _z Is the Z-axis coordinate of the target part;

b) Establishing a state space S and an action space A, S = { P' _x ,P′ _y ,P′ _z ,O _x ,O _y ,O _z ,F _x ,F _y ,F _z In the formula, F _x Is the contact force component in the X-axis direction of the actuator, F _y Is the contact force component in the Y-axis direction of the actuator, F _z Being contact in the Z-axis direction of the actuatorComponent of force, P' _x X-axis coordinate, P 'of real-time position of actuator' _y Y-axis coordinate, P ', being the real-time position of the actuator' _z A Z-axis coordinate that is a real-time position of the actuator;

c) Initializing the pose of the mechanical arm, wherein the real-time position coordinate of the initialized actuator is { P' _x ,P′ _y ,P′ _z Obtaining the distance d between the initial position of the actuator and the target position _i And the distance d between the current position of the actuator and the target part _c ；

d) By the formula r ₁ ＝(d _i -d _c )/d _i Calculating a distance-based reward function r ₁ ；

e) Setting the contact force of an actuator and a target in a contact task of a flexible man-machine within m-nN;

f) Setting a virtual contact surface at a distance gamma from the body;

g) Through the formula l = | P' _z -O _z I, calculating to obtain a judgment condition value l of whether the virtual contact surface is contacted;

h) When l is larger than gamma, the actuator is judged not to contact the virtual contact surface, and the state space F _x ,F _y ,F _z Are all 0;

i) When l is more than 0 and less than or equal to gamma, the actuator is judged to be in contact with the virtual contact surface, and the virtual contact force F is obtained through impedance control at the moment _v ，F _v ＝(F′ _x ,F′ _y ,F′ _z ) ^T ，F′ _x Is a virtual contact force component force, F ', of the actuator in the X-axis direction' _y Is a virtual contact force component force, F 'in the Y-axis direction of the actuator' _z Is a virtual contact force component force of the actuator in the Z-axis direction, T is transposed, F' _x 、F′ _y And F' _z Are respectively equal to F in the state space S _x 、F _y And F _z ；

j) When the actuator is in contact with the body, the real contact force F _e And 0, stopping the control of the actuator.

Further, the step a) is realized by a vision model in a compliant human machinePosition coordinates { P } of the block obtaining actuator _x ,P _y ,P _z And the position coordinates (O) of the target part in the mechanical arm coordinate system _x ,O _y ,O _z }。

Further, in step b) by the formula a = { a = { (a) _x ,a _y ,a _z Establishment of an action space, a _x Is the offset of the actuator in the X-axis direction under the coordinate system of the mechanical arm, a _y Is the offset of the actuator in the Y-axis direction under the coordinate system of the mechanical arm, a _z And the offset of the actuator in the Z-axis direction under the coordinate system of the mechanical arm.

Further, in step c), the formula is used

Calculating to obtain the distance d between the current position of the actuator and the target part _c By the formula

Calculating to obtain the distance d between the initial position of the actuator and the target part _i 。

Preferably, m in step e) is 1 and n is 7.

Further, in step i) by formula

Calculating to obtain the virtual contact force F _v In the formula M _d 、B _d 、K _d Are all parameters of the impedance of the power supply,

is the speed of the actuator or actuators,

is the acceleration of the actuator, λ is a constant, I is a vector, I = [0, 1] ^T Δ x is a distance difference between the actuator and the target site, and Δ x = (| P' _x -O _x |,|P′ _y -O _y |,|P′ _z -O _z |) ^T 。

Preferably, the deep reinforcement learning algorithm in the step b) is a PPO-clip algorithm.

Further, the method also comprises the step h) after the step j): when the true contact force F _e And when the number of the actuators is larger than nN, immediately stopping the operation of the actuators, and initializing the mechanical arm.

Further, the method also comprises the step of training the deep reinforcement learning algorithm by using the reward function r.

Further, the reward function r = r ₁ +r ₂ +r ₃ +r ₄ Wherein when the mechanical arm is explored in the range of

Time r ₂ =0, if the robot arm search range is not

Time r ₂ = -1, wherein X _min Is the minimum value, X, that the actuator can run in the X-axis direction of the robot arm coordinate system _max Is the maximum value, Y, that the actuator can run in the X-axis direction of the robot arm coordinate system _min Is the minimum value that the actuator can run in the Y-axis direction of the mechanical arm coordinate system, Y _max Is the maximum value, Z, that the actuator can run in the Y-axis direction of the robot arm coordinate system _min Is the minimum value that the actuator can run in the Z-axis direction of the robot arm coordinate system, Z _max The maximum value of the actuator which can be reached by operation in the Z-axis direction of the mechanical arm coordinate system; setting the maximum step number of the mechanical arm as 1500 steps, and when the step number of the mechanical arm is adjusted to be more than 1500 steps, r ₃ =1, stopping operation and initializing the mechanical arm; if the true contact force F _e R when the component force in the Z-axis direction is between m-nN ₄ =1 if true contact force F _e R when the component force in the Z-axis direction is not between m-nN ₄ ＝-1。

The invention has the beneficial effects that: according to the requirements of the task, a relevant state space, an action space and a reward function are established, in order to achieve task flexibility, a virtual contact surface is established on the surface of a body, the virtual contact force of an actuator reaching a target part is obtained in advance by combining impedance control, the state space is input, the action of the actuator is adjusted through a depth reinforcement learning algorithm, force adjustment is achieved, and the task is completed. The method realizes the combination of deep reinforcement learning and compliance control, establishes a virtual contact surface, can acquire the contact force of an actuator in advance, realizes the adjustment of the contact force, and adapts to complex and variable compliance man-machine contact tasks.

Drawings

FIG. 1 is a schematic diagram of a control structure of the present invention.

Detailed Description

The invention is further illustrated with reference to fig. 1.

a) According to the compliant human-computer contact task, establishing a mechanical arm coordinate system by using a mechanical arm base to obtain initial position coordinates { P ] of an actuator _x ,P _y ,P _z And the position coordinates { O } of the target portion under the coordinate system of the arm _x ,O _y ,O _z }，P _x Is the X-axis coordinate, P, of the actuator _y Is the Y-axis coordinate, P, of the actuator _z As Z-axis coordinate of the actuator, O _x Is the X-axis coordinate, O, of the target site _y Is the Y-axis coordinate of the target site, O _z Is the Z-axis coordinate of the target site.

b) Establishing a state space S and an action space A, S = { P' _x ,P′ _y ,P′ _z ,O _x ,O _y ,O _z ,F _x ,F _y ,F _z In the formula F _x Is the contact force component in the X-axis direction of the actuator, F _y Is the contact force component in the Y-axis direction of the actuator, F _z Is a contact force component in the Z-axis direction of the actuator, P' _x X-axis coordinate, P 'of real-time position of actuator' _y Y-axis coordinate, P 'of real-time position of actuator' _z Is the Z-axis coordinate of the real-time position of the actuator.

c) The correct man-machine contact task is carried out, which is mainly divided into two parts for processing,the first part is used for controlling the actuator to reach a target position, and the second part considers that the actuator is controlled to contact a target part in a proper force range, and the specific steps are as follows: initializing the pose of the mechanical arm, keeping the actuator in a vertical downward pose all the time, and setting the real-time position coordinate of the initialized actuator as { P' _x ,P′ _y ,P′ _z Obtaining the distance d between the initial position of the actuator and the target position _i And the distance d between the current position of the actuator and the target part _c 。

d) By the formula r ₁ ＝(d _i -d _c )/d _i Calculating to obtain a distance-based reward function r ₁ . The reward function is set to complete the arrival task, and the closer the actuator is to the target portion, the greater the reward value.

e) And setting the contact force of the actuator and the target in the contact task of the flexible man-machine within m-nN. The sound signal obtained at this time is good, and at the same time, in the contact force, the pressure caused by the contact force can be ensured to be in a comfortable range. Where mN represents the minimum contact force and nN the maximum contact force.

f) A virtual contact surface is provided at a distance γ from the body.

g) When the actuator approaches the target part, the condition for judging whether the actuator contacts the virtual contact surface is as follows: through the formula l = | P' _z -O _z And | calculating to obtain a judgment condition value l of whether the virtual contact surface is contacted.

h) When l is larger than gamma, the actuator is judged not to contact the virtual contact surface, and the state space F _x ,F _y ,F _z Are all 0.

i) When l is more than 0 and less than or equal to gamma, the actuator is judged to be in contact with the virtual contact surface, and the virtual contact force F is obtained through impedance control at the moment _v ，F _v ＝(F′ _x ,F′ _y ,F′ _z ) ^T ，F′ _x Is a virtual contact force component, F ', in the X-axis direction of the actuator' _y Is a virtual contact force component force, F ', of the actuator in the Y-axis direction' _z Is a virtual contact force component in the Z-axis direction of the actuator, T is the transpose, F' _x 、F′ _y And F' _z Are respectively equivalent to F in the state space S _x 、F _y And F _z 。

j) When the actuator is in contact with the body, the real contact force F _e And 0, stopping the control of the actuator. According to the requirements of the task, a relevant state space, an action space and a reward function are established, in order to achieve flexibility of the task, a virtual contact surface is established on the surface of a body, the virtual contact force of the actuator reaching a target part is obtained in advance by combining impedance control, the state space is input, the action of the actuator is adjusted through a depth reinforcement learning algorithm, force adjustment is achieved, and the task is completed. The method realizes the combination of deep reinforcement learning and compliance control, establishes a virtual contact surface, can acquire the contact force of an actuator in advance, realizes the adjustment of the contact force, and adapts to complex and variable compliance man-machine contact tasks.

Example 1:

obtaining the position coordinate { P) of the actuator through a vision module in the flexible man-machine in the step a) _x ,P _y ,P _z Position coordinates (O) of the target portion in the coordinate system of the arm _x ,O _y ,O _z }。

Example 2:

in step b) by the formula A = { a = _x ,a _y ,a _z Establishment of an action space, a _x Is the offset of the actuator in the X-axis direction under the coordinate system of the mechanical arm, a _y Is the offset of the actuator in the Y-axis direction under the coordinate system of the mechanical arm, a _z The offset of the actuator in the Z-axis direction under the robot arm coordinate system is shown.

Example 3:

in step c) by the formula

Calculating to obtain the initial position of the actuatorDistance d from target site _i 。

Example 4:

in step e) m is 1 and n is 7. Because the muscle of the human body is elastic, the human body is bound to generate the depression when in force contact, but the final control of the tail end force of the mechanical arm is within the range of 1-7N, so that the human body is not uncomfortable or damaged.

Example 5:

in step i) by the formula

Calculating to obtain the virtual contact force F _v In the formula M _d 、B _d 、K _d Are all parameters of the impedance, and are,

is the speed of the actuator or actuators,

lambda is a constant for the acceleration of the actuator, in order to prevent a virtual contact force F during contact _v The case where I is a vector, although a partial arc of the target surface causes an arc of the virtual contact surface, since the actuator keeps the direction of contact with the target portion unchanged, the component forces in the X and Y directions are small when the actuator contacts the virtual contact surface, so that I = [0, 1 ] needs to be considered only to control the force of the actuator in the Z direction within a safe range] ^T Δ x is a distance difference between the actuator and the target site, and Δ x = (| P' _x -O _x ′|,|P′ _y -O _y |,|P′ _z -O _z |) ^T . Adjusting Δ X, by adjusting the amount of offset in the X, Y, Z directions while the actuator is approaching the target site,

To F _v And (4) adjusting.

Example 6:

the deep reinforcement learning algorithm in the step b) is a PPO-clip algorithm.

Example 7:

due to the existence of the error, the method also comprises the step h) after the step j): when the true contact force F _e And when the number of the actuators is larger than nN, immediately stopping the operation of the actuators, and initializing the mechanical arm.

Example 8:

and training the deep reinforcement learning algorithm by using the reward function r.

Example 9:

in particular, the reward function r = r ₁ +r ₂ +r ₃ +r ₄ Wherein the exploration range of the mechanical arm is

Time r ₂ =0, if the robot arm search range is not

Time r ₂ = -1, wherein X _min Is the minimum value, X, that the actuator can run in the X-axis direction of the robot arm coordinate system _max Is the maximum value, Y, that the actuator can run in the X-axis direction of the mechanical arm coordinate system _min Is the minimum value that the actuator can run in the Y-axis direction of the robot arm coordinate system, Y _max Is the maximum value, Z, that the actuator can run in the Y-axis direction of the robot arm coordinate system _min Is the minimum value that the actuator can run in the Z-axis direction of the mechanical arm coordinate system, Z _max The maximum value of the actuator which can be operated in the Z-axis direction of the mechanical arm coordinate system is obtained; setting the maximum step number of the mechanical arm as 1500 steps, and when the step number of the mechanical arm is adjusted to be more than 1500 steps, r ₃ = -1, stopping running, and initializing a mechanical arm; if the true contact force F _e R when the component force in the Z-axis direction is between m and nN ₄ =1, if the true contact force F _e R when the component force in the Z-axis direction is not between m-nN ₄ ＝-1。

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A compliant human-computer contact method based on deep reinforcement learning and impedance control is characterized by comprising the following steps:

a) According to the soft human-computer contact task, a mechanical arm coordinate system is established by the mechanical arm base, and initial position coordinates { P ] of the actuator are obtained _x ,P _y ,P _z And the position coordinates { O } of the target portion under the coordinate system of the arm _x ,O _y ,O _z }，P _x Is the X-axis coordinate, P, of the actuator _y Is the Y-axis coordinate, P, of the actuator _z As Z-axis coordinate of the actuator, O _x Is the X-axis coordinate of the target site, O _y Is the Y-axis coordinate of the target site, O _z Is the Z-axis coordinate of the target part;

b) Establishing a state space S and an action space A through a deep reinforcement learning algorithm, wherein S = { P = _x ′,P _y ′,P _z ′,O _x ,O _y ,O _z ,F _x ,F _y ,F _z In the formula, F _x Is the contact force component in the X-axis direction of the actuator, F _y Is the contact force component in the Y-axis direction of the actuator, F _z Is the contact force component of the actuator in the Z-axis direction, P _x ' X-axis coordinate, P, which is the real-time position of the actuator _y ' is the Y-axis coordinate, P, of the real-time position of the actuator _z ' is the Z-axis coordinate of the real-time position of the actuator;

d) By the formula r ₁ ＝(d _i -d _c )/d _i Calculating to obtain a distance-based reward function r ₁ ；

e) Setting the contact force of an actuator and a target in a contact task of a compliant man-machine within m-nN;

f) Setting a virtual contact surface at a distance gamma from the body;

g) Through the formula l = | P' _z -O _z I, calculating to obtain a judgment condition value l of whether the virtual contact surface is contacted or not;

h) When l is more than gamma, the actuator is judged not to contact the virtual contact surface, and the state space F _x ,F _y ,F _z Are all 0;

i) When l is more than 0 and less than or equal to gamma, judging that the actuator is contacted with the virtual contact surface, and obtaining the virtual contact force F through impedance control at the moment _v ，F _v ＝(F′ _x ,F′ _y ,F′ _z ) ^T ，F′ _x Is a virtual contact force component in the X-axis direction of the actuator, F _y ' is the virtual contact force component in the Y-axis direction of the actuator, F _z ' is the virtual contact force component in the Z-axis direction of the actuator, T is the transposition, F is _x ′、F _y ' and F _z ' respectively equivalent to F in the state space S _x 、F _y And F _z ；

j) When the actuator is in contact with the body, the real contact force F _e Not equal to 0, at the moment, stopping controlling the actuator; in step b) by the formula A = { a = _x ,a _y ,a _z Establishment of an action space, a _x Is the offset of the actuator in the X-axis direction under the coordinate system of the mechanical arm, a _y Is the offset of the actuator in the Y-axis direction under the coordinate system of the mechanical arm, a _z The offset of the actuator in the Z-axis direction under the mechanical arm coordinate system is obtained;

in step i) by the formula

Calculating to obtain a virtual contact force F _v In the formula M _d 、B _d 、K _d Are all resistanceThe resistance parameter is a parameter of the resistance,

is the speed of the actuator or actuators,

is the acceleration of the actuator, λ is a constant, I is a vector, I = [0, 1] ^T Δ x is a distance difference between the actuator and the target portion,

2. the compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 1, wherein: obtaining the position coordinate { P) of the actuator through a vision module in the flexible man-machine in the step a) _x ,P _y ,P _z And the position coordinates (O) of the target part in the mechanical arm coordinate system _x ,O _y ,O _z }。

3. The method of claim 2, wherein the method comprises the steps of: in step c) by the formula

4. The compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 1, wherein: in step e) m is 1 and n is 7.

5. The compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 1, wherein: the deep reinforcement learning algorithm in the step b) is a PPO-clip algorithm.

6. The method for compliant human-machine contact based on deep reinforcement learning and impedance control according to claim 1, further comprising performing step h) after step j): when the true contact force F _e And when the number of the actuators is larger than nN, immediately stopping the operation of the actuators, and initializing the mechanical arm.

7. The method of claim 1, wherein the method comprises the steps of: and training the deep reinforcement learning algorithm by using the reward function r.

8. The compliant human-computer contact method based on deep reinforcement learning and impedance control of claim 7, wherein: reward function r = r ₁ +r ₂ +r ₃ +r ₄ Wherein the exploration range of the mechanical arm is

Time r ₂ =0, if the robot arm search range is not

Time r ₂ = -1, wherein X _min Is the minimum value, X, that the actuator can run in the X-axis direction of the mechanical arm coordinate system _max Is the maximum value, Y, that the actuator can run in the X-axis direction of the robot arm coordinate system _min Is the minimum value that the actuator can run in the Y-axis direction of the robot arm coordinate system, Y _max Is the maximum value, Z, that the actuator can run in the Y-axis direction of the robot arm coordinate system _min Is the minimum value that the actuator can run in the Z-axis direction of the robot arm coordinate system, Z _max The maximum value of the actuator which can be operated in the Z-axis direction of the mechanical arm coordinate system is obtained; is provided withThe maximum step number of the mechanical arm is 1500 steps, and when the step number of the mechanical arm is adjusted to be more than 1500 steps, r ₃ = -1, stopping running, and initializing a mechanical arm; if the true contact force F _e R when the component force in the Z-axis direction is between m-nN ₄ =1, if the true contact force F _e R when the component force in the Z-axis direction is not between m-nN ₄ ＝-1。