CN110861084A

CN110861084A - Four-legged robot falling self-resetting control method based on deep reinforcement learning

Info

Publication number: CN110861084A
Application number: CN201911128299.8A
Authority: CN
Inventors: 宋光明; 何淼; 韦中; 宋爱国
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-03-06
Anticipated expiration: 2039-11-18
Also published as: CN110861084B

Abstract

The invention provides a four-footed robot falling self-resetting control method based on deep reinforcement learning, and belongs to the technical field of machine learning and robot control. The method comprises the following steps: the method comprises the steps of establishing a four-footed robot model, constructing and learning an actuator network, training a control strategy and executing four steps by a bottom system. According to the invention, the robot can realize autonomous reset on the flat ground under any falling posture by using a deep reinforcement learning algorithm, pre-programming and human intervention are not needed, and the intelligence, flexibility and environmental adaptability of the robot are improved.

Description

Four-legged robot falling self-resetting control method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of machine learning and robot control, and particularly relates to a four-legged robot falling self-resetting control method based on deep reinforcement learning.

Background

The legged robot is used as an important branch in the field of robots, can replace human beings to search and operate in unknown complex severe environments such as earthquakes, nuclear radiation, fires and the like, and has wide application prospect. The large land animals are mostly quadruped animals in the whole natural world, and the shadow of the quadruped animals can be seen in cliffs, hills, grasslands or deserts, which fully shows the approval of natural selection for the moving mode of the quadruped. The quadruped robot takes quadruped animals as bionic objects, has the potential capability of flexible movement like the quadruped animals, and is a mobile robot with wide application prospect.

In recent years, the quadruped robot has obvious progress in gait planning, obstacle crossing and the like, but the realization of autonomous motion like quadruped animals has a large gap, wherein the self-resetting function of the quadruped robot can be quickly and flexibly realized after falling; existing control methods are mostly task-specific based on models, and almost every operation needs to be developed from scratch.

Disclosure of Invention

The four-footed robot falling self-resetting control method based on deep reinforcement learning enables the four-footed robot to realize autonomous resetting without artificial assistance; and different tasks can be independently and efficiently executed according to requirements by simply replacing the configuration of the neural network parameters, so that the development period is greatly shortened.

The invention provides a four-footed robot falling self-resetting control method based on deep reinforcement learning, which comprises four steps of establishing a four-footed robot model, constructing and learning an actuator network, training a control strategy and executing by a bottom system, and specifically comprises the following steps:

step 1, establishing a four-footed robot model: determining various physical parameters of the robot; the key point for realizing the falling self-resetting function lies in the mutual matching among the legs and the joints of each leg;

step 2, building a deep reinforcement learning framework and learning an actuator network: learning an actuator network on the system through self-supervision learning, and using the actuator network in simulation modeling of 12 joints of the quadruped robot;

step 3, training the controller: training a simple parameterized controller by using the model generated in the

steps

1 and 2, generating foot tracks in a sine wave form, determining coordinate systems and a centroid coordinate system of each joint by using a coordinate transformation method, and calculating corresponding joint positions in a resetting process by using inverse kinematics;

and 4, executing by the bottom layer system: and (3) randomly setting the initial falling position and the initial falling posture of the robot, outputting the neural network trained in the step (3) as the execution actions of 12 joints of the robot, determining the motion scheme of each joint so as to drive the joint to move, and completing the task of falling and self-resetting.

The invention is further improved in that: the step 2 specifically comprises the following steps:

2.1: the status is a robot status measurement provided to the controller. The state space S is described as a 9-dimensional vector space, comprising

Respectively represent:

-a robot direction vector measured by an IMU (Inertial measurement unit).

r_z-robot base height.

v-base line speed.

w-base angular velocity.

-joint position.

Phi-joint velocity.

Θ — historical state of the joint (t ═ t)_k-0.01s and t ═ t_k-0.02 s).

α_k-1-the previous action of the robot.

C-constant.

An action is a command provided to an actuator. The motion space a is described as a 2-dimensional discrete vector space,

representing joint torque speed and joint torque position, respectively.

The reward is specified to induce the robot to produce ideal behavior; and setting an award function pi, and awarding a strategy corresponding to the maximum value after the discount sum, namely the action selected and executed by the robot according to the strategy instruction.

The reward function is:

where γ ∈ (0,1) is the discount factor and τ (π) is the trajectory distribution under the reward function π.

2.2: the method comprises the following steps of constructing a deep neural network N for judging falling self-reset income of the robot, and specifically:

constructing an MLP (Multi-Layer Perception) four-Layer neural network N for judging the falling self-resetting income of the robot, wherein the MLP comprises the following steps: an input layer L_iTwo hidden layers L_hAn output layer L_o(ii) a The input layer input items are the historical states of the robot under the generalized coordinates q and the generalized speed v.

The output item of the output layer Lo comprises two dimensions which respectively represent the speed estimation deviation S and the position estimation deviation P of the torque of each joint of the robot; the speed estimation deviation S is the deviation between the actual speed of the current robot joint torque and the target speed, the position estimation deviation P is the deviation between the actual position of the current robot joint torque and the target position, each leg of the robot is supposed to have 3 degrees of freedom and 3 × 4 joint torques in total, and the output of an output layer is a 2 × 12 matrix;

setting an activation function of the deep neural network N:

setting an input layer activation function of the deep neural network N as a Relu function:

f(x)＝max(0，x)

the output layer activation function is:

the input layer is vector X, and the output of the hidden layer 1 is:

f(w₁+b₁)

the output of the hidden layer 2 layer is:

f(w₂+b₂)

the final output layer output is then:

f(x)＝f(b₂+w₂(t(b₁+w₁x)))

wherein the function f is the tanh function:

w is the weight and b is the deviation.

The invention is further improved in that: the step 4 specifically comprises the following steps:

4.1 setting the falling initial position and the falling initial posture of the robot randomly.

4.2 the deep neural network N outputs the execution actions of 12 joints of the robot.

4.3 the output position trajectory is simulated assuming that the robot fully follows the joint torque speed command and the joint torque position command.

4.4 judging whether the joint movement exceeds the available space range. If yes, refusing to sample, resetting the position to the previous position, and sampling the output command again; if not, the action is performed.

And 4.5, judging whether the robot recovers the initial normal state. If not, executing the output command again according to the sampling command; if so, the robot completes the falling self-resetting task.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, the deep depth reinforcement learning is applied to the falling self-resetting function of the quadruped robot, so that the complicated manual adjusting process during artificial participation is avoided; the automatic reset reduces the time for completing the task and has high flexibility; the robot can continuously learn and accumulate, and can smoothly complete tasks in an untrained unknown environment and under different falling states.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a schematic view of the hip coordinate system of the present invention;

FIG. 3 is a schematic view of the thigh and calf coordinate system of the present invention;

figure 4 is a schematic diagram of a fall reduction procedure of the invention;

FIG. 5 is a diagram of a neural network architecture of the present invention;

FIG. 6 is a schematic of the overall control strategy of the present invention;

Detailed Description

As shown in fig. 1, the method for fall self-resetting of a quadruped robot based on deep depth reinforcement learning provided by this embodiment includes establishing a quadruped robot model, constructing and learning an actuator network, training a control strategy, executing four steps by a bottom system,

the specific contents are as follows:

step 1, establishing a four-footed robot model, and determining various physical parameters of the robot; the key point for realizing the falling self-resetting function lies in the mutual matching among the legs and the joints of each leg, each physical parameter of the quadruped robot comprises the lengths of the hip, the thigh and the shank of the robot, the motion parameter comprises the degree of freedom of each joint, and the available space position of each joint is limited to accord with the actual biological motion condition. The quadruped robot provided by the invention has three joints, namely three degrees of freedom, namely hip, thigh and shank on each leg.

And 2, building a deep and deep reinforcement learning framework, learning an actuator network on the system through self-supervision learning, and using the actuator network in simulation modeling of 12 joints of the quadruped robot. 2.1: the status is a robot status measurement provided to the controller. The state space S is described as a 9-dimensional vector space, comprising

Respectively represent: a robot direction vector measured by an IMU (Inertial measurement unit);

-a robot direction vector measured by an IMU (Inertial measurement unit).

r_z-robot base height.

v-base line speed.

w-base angular velocity.

-joint position.

Phi-joint velocity.

Θ — historical state of the joint (t ═ t)_k-0.01s and t ═ t_k-0.02 s).

α_k-1-the previous action of the robot.

C-constant.

An action is a command provided to an actuator. The motion space a is described as a two-dimensional discrete vector space,

representing joint torque speed and joint torque position, respectively.

The reward function is:

2.2, constructing a deep neural network N for judging falling self-reset income of the robot, and specifically comprising the following steps:

Output layer L_oThe output item of (2) comprises two dimensions which respectively represent the speed estimation deviation S and the position estimation deviation P of the torque of each joint of the robot, each leg of the robot is assumed to have 3 degrees of freedom and 3 x 4 joint torques in total, and the output of the output layer is a 2 x 12 matrix.

2.3 set activation function of the deep neural network N:

setting an input layer activation function of the deep neural network N as a Relu function: (x) max (0, x), output layer activation function is

The input layer is vector X, and the output of the hidden layer 1 is: f (w)₁+b₁) Hidden layer 2 output: the method comprises the following steps: f (w)₂+b₂) The final output layer output is f (x) f (b)₂+w₂(t(b¹+w¹x))). Wherein the function f is the tanh function:

w is the weight and b is the deviation.

And 3, training a simple parameterized controller by using the models generated in the

steps

1 and 2, generating foot tracks in a sine wave form, establishing each joint coordinate system and a centroid coordinate system by using a coordinate transformation method, and calculating the corresponding joint position in the resetting process by inverse kinematics, wherein the coordinate systems are established as shown in fig. 2 and 3.

And 4, in the execution stage of the bottom system, randomly setting the falling initial position and posture of the robot, outputting the deep neural network trained in the step 3 as the execution actions of 12 joints of the robot, determining each joint movement scheme so as to drive the joints to move, and completing the falling self-resetting task.

The step 4 specifically comprises the following steps:

4.4 judging whether the joint movement exceeds the available space range. If yes, refusing to sample, resetting the position to the previous position, and sampling the output command again; if not, the action is performed, as shown in the process of FIG. 4.

And 4.5, judging whether the robot recovers the initial normal state. If not, executing the output command again according to the sampling command; if so, the robot completes the falling self-resetting task, and the final reset completion state is shown in fig. 5.

Claims

1. A four-footed robot falling self-resetting control method based on deep reinforcement learning is characterized by comprising the following steps:

step 3, training the controller: training a simple parameterized controller by using the model generated in the steps 1 and 2, generating foot tracks in a sine wave form, determining coordinate systems and a centroid coordinate system of each joint by using a coordinate transformation method, and calculating corresponding joint positions in a resetting process by using inverse kinematics;

2. The four-footed robot falling self-resetting control method based on deep reinforcement learning of claim 1 is characterized in that the four-footed robot in step 1 has three joints of hip, thigh and shank, namely three degrees of freedom, on each leg; each physical parameter of the quadruped robot comprises the length of a hip, thigh and shank of the robot, and the motion parameter comprises the degree of freedom of each joint and limits the available space position of each joint to accord with the actual biological motion condition.

3. The four-footed robot fall self-resetting control method based on deep reinforcement learning of claim 1,

the specific steps of building the deep reinforcement learning framework in the step 2 are as follows:

2.1 status is a robot status measurement provided to the controller; the state space S is described as a 9-dimensional vector space, comprising

Wherein:

-a robot direction vector measured by an IMU (Inertial measurement unit);

r_z-a robot base height; v-base line speed; w-base angular velocity;

-a joint position; phi-joint velocity; Θ — historical state of the joint (t ═ t)_k-0.01s and t ═ t_k-0.02s) sparse samples α_k-1-a previous action of the robot; c is a constant;

2.2 action is a command provided to the actuator; the motion space a is described as a two-dimensional discrete vector space,

respectively representing joint torque speed and joint torque position;

2.3, the reward is specified to induce the robot to generate ideal behaviors; setting an award function pi, and awarding a strategy corresponding to the maximum value after the discount sum, namely, selecting an executed action by the robot according to a strategy instruction;

the reward function is:

wherein gamma belongs to (0,1) as a discount factor, and tau (pi) is the track distribution under the reward function pi;

the learning executor network in the step 2 comprises the following specific steps:

2.4, constructing an MLP (Multi-Layer Perception) four-Layer neural network N for judging the falling self-resetting income of the robot, wherein the MLP comprises the following components: an input layer L_iTwo hidden layers L_hAn output layer L_o(ii) a The input items of the input layer are the historical states of the robot under the generalized coordinates q and the generalized speed v;

2.5, setting an activation function of the neural network N:

setting the input layer activation function of the neural network N as a Relu function:

f(x)＝max(0，x)

the output layer activation function is

The input layer is vector X, and the output of hidden layer 1 is:

f(w₁+b₁)

the output of the hidden layer 2 layer is:

f(w₂+b₂)

the final output layer output is then:

f(x)＝f(b₂+w₂(t(b¹+w¹x)))。

wherein the function f is the tanh function:

w is the weight and b is the deviation.

4. The four-footed robot falling self-resetting control method based on deep reinforcement learning of claim 1 is characterized in that the specific steps executed by the bottom layer system in step 4 are as follows:

4.1: randomly setting the falling initial position and the falling initial posture of the robot;

4.2: the deep neural network N outputs the execution actions of 12 joints of the robot;

4.3: assuming that the robot completely follows the joint torque speed command and the joint torque position command, and simulating an output position track;

4.4: judging whether the joint movement exceeds the available space range, if so, refusing to sample, resetting the position to the previous position, and sampling the output command again; if not, executing the action;

4.5) judging whether the robot recovers the initial normal state; if not, executing the output command again according to the sampling command; if so, the robot completes the falling self-resetting task.