CN113156925A

CN113156925A - Biped robot walking control method based on countermeasure network and electronic equipment

Info

Publication number: CN113156925A
Application number: CN202010015274.3A
Authority: CN
Inventors: 王宇
Original assignee: Sichuan Sunrain Sign & Display System Co ltd
Current assignee: Sichuan Sunrain Sign & Display System Co ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2021-07-23
Anticipated expiration: 2040-01-07
Also published as: CN113156925B

Abstract

The invention discloses a biped robot walking control method and electronic equipment based on a confrontation network, which comprises the following steps: s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action; s2 inputting the data collected in step S1 into the countermeasure network for training; and S3, controlling the biped robot to walk according to the action data output by the trained confrontation network. The invention adopts an unsupervised neural network optimization algorithm (confrontation network), can obtain the optimal dynamic model of the robot under the mutual game of the generator and the discriminator, and adopts an intelligent group search optimization algorithm (particle swarm optimization algorithm), can obtain the optimal action parameters.

Description

Biped robot walking control method based on countermeasure network and electronic equipment

Technical Field

The invention relates to the field of walking robots, in particular to a biped robot walking control method based on a countermeasure network and electronic equipment.

Background

The existing biped robot walking control method needs to analyze and model parameters of each structure of the biped robot, and because the nonlinear characteristics of each part of the biped robot and the degree of freedom of the biped robot are more and more complicated than other systems, the coupling degree is extremely high. It is very difficult and complicated to manually establish and analyze system models and parameters for each component and system composed system, and the stability and applicability of the obtained models are difficult to guarantee. Therefore, in the prior art, the traditional engineering physical mechanical analysis is adopted, and the model parameters are simplified, so that the system control model caused by the adjustment of a large amount of control parameter experiences is low in precision and high in modeling difficulty, and even some problems cannot be solved, and the problems such as omission in modeling analysis are solved.

Disclosure of Invention

The invention aims to solve the problems of low precision and high modeling difficulty caused by model parameter simplification in the prior art, and provides a biped robot walking control method and electronic equipment based on an antagonistic network, which can obtain optimal action data for walking of a biped robot.

In order to achieve the above purpose, the invention provides the following technical scheme:

a biped robot walking control method based on an antagonistic network comprises the following steps:

s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action;

s2 inputting the data collected in step S1 into the countermeasure network for training;

and S3, controlling the biped robot to walk according to the action data output by the trained confrontation network.

Preferably, the status data includes: the pitch angle, yaw angle and roll angle of the pelvis, the rotation angle and rotation speed of each joint and the pressure values of the left foot and the right foot; and the pitch angle, the yaw angle and the roll angle of the pelvis are measured by adopting an IMU (inertial measurement Unit).

Preferably, the motion data includes a rotation angle of each joint motor that completes the motion.

Preferably, the step S2 includes:

training a discriminator by using a group of real data of the biped robot, namely sending the collected state data, the collected action data and the state data after action (the collected state data after the real action) into a discriminator model to obtain the output Dx of the discriminator;

simultaneously sending the state data and the action data into a generator, wherein the state data after the action generated by the generator is the generated state data (the state data after the action generated by the generator); sending the state data, the action data and the generated state data into a discriminator to obtain the output Dg of the discriminator;

and training the next time by adopting the next group of data until the output value of the discriminator is in the range of 0.4-0.6, and finishing the training to obtain the well-trained confrontation network.

Preferably, the step S2 includes:

sending the collected state data, the motion data and the state data after motion to a discriminator model for training, wherein a first loss function is loss1 ═ Dx;

and sending the state data, the action data and the generated state data into a discriminator for training, wherein the second loss function is loss2 ═ Dx-Dg, and the third loss function is loss3 ═ norm (Sg-Sa), wherein Sg is the generated state data, and Sa is the state data after the action.

Preferably, the step S3 further includes:

according to the trained generator of the countermeasure network, generating state data are obtained for the randomly generated action data, and the randomly generated action data are screened by adopting a particle swarm optimization algorithm according to the corresponding scores of the generating state data;

and the screened action data is used as the input of the generator of the trained confrontation network, the set times of the steps are iterated, and the optimal action data is selected for controlling the biped robot to walk.

Preferably, the particle swarm optimization algorithm:

the position of the particle is defined as action ═ x0, x1, x2, …, x10, x11]Wherein x is_iThe velocity v of the particles is [ v0, v1, v2, …, v10, v11 ] corresponding to the angle of rotation of each joint motor]Wherein v is_iCorresponds to x_iA change in (c); define 20 particles and have a fitness function of score_i＝f_θ(state,action_i)，f_θTo train the resulting generator model, i ∈ [0,19 ]]The state is the state data,action_iis the ith motion data.

Preferably, the score corresponding to the generated state data:

score＝h-θ²

wherein h is the height of the pelvis and theta is the elevation angle of the pelvis.

According to another aspect of the present invention, there is provided an electronic device, comprising at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods described above.

Compared with the prior art, the invention has the beneficial effects that:

according to the biped robot walking control method based on the countermeasure network and the electronic equipment, an unsupervised neural network optimization algorithm (the countermeasure network) is adopted, the optimal dynamic model of the robot can be obtained under the mutual game of the generator and the discriminator, and an intelligent group search optimization algorithm (a particle swarm optimization algorithm) is adopted, so that the optimal action parameters can be obtained.

Description of the drawings:

FIG. 1 is a schematic flow chart of the present invention.

Fig. 2 is a schematic diagram of a process for training with real data.

FIG. 3 is a schematic diagram of a process flow for training with generated state data.

Fig. 4 is a schematic diagram of a flow chart of optimization using a particle swarm optimization algorithm.

Fig. 5 is a schematic diagram of a structure of an electronic device according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.

As shown in fig. 1, a biped robot walking control method based on a countermeasure network includes:

s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action; the status data includes: the pitch angle, yaw angle and roll angle of the pelvis, the rotation angle and rotation speed of each joint and the pressure values of the left foot and the right foot; and the pitch angle, the yaw angle and the roll angle of the pelvis are measured by adopting an IMU (inertial measurement Unit). The motion data includes the rotation angle of each joint motor that completes the motion.

specifically, training of the countermeasure network:

as shown in fig. 2, training the discriminator with a set of real data of the biped robot, i.e. sending the collected state data, motion data and state data after motion to the discriminator model to obtain the output Dx of the discriminator; the collected state data, motion data and state data after motion are sent to a discriminator model for training, and the first loss function is loss1 ═ Dx.

Meanwhile, as shown in fig. 3, the state data and the action data are sent to a generator, and the state data after the action generated by the generator is the generated state data; sending the state data, the action data and the generated state data into a discriminator to obtain the output Dg of the discriminator; and sending the state data, the action data and the generated state data into a discriminator for training, wherein the second loss function is loss2 ═ Dx-Dg, and the third loss function is loss3 ═ norm (Sg-Sa), wherein Sg is the generated state data, and Sa is the state data after the action.

As shown in fig. 4, according to the trained generator of the countermeasure network, generating state data is obtained for the randomly generated motion data, and the randomly generated motion data is screened by adopting a particle swarm optimization algorithm according to a score corresponding to the generating state data; generating a score corresponding to the state data:

score＝h-θ²；

h is the height of the pelvis, and theta is the elevation angle of the pelvis.

And the screened motion data is used as the input of the generator of the trained confrontation network, the maximum number of times of iterating the steps is 50, and the optimal motion data is selected for controlling the biped robot to walk.

Particle swarm optimization simulates birds in a bird swarm by designing a particle without mass, and the particle only has two attributes: speed, which represents how fast the movement is, and position, which represents the direction of the movement. And each particle independently searches an optimal solution in a search space, records the optimal solution as a current individual extremum, shares the individual extremum with other particles in the whole particle swarm, finds the optimal individual extremum as a current global optimal solution of the whole particle swarm, and adjusts the speed and the position of each particle in the particle swarm according to the found current individual extremum and the current global optimal solution shared by the whole particle swarm. In this embodiment, the position of the particle is defined as action ═ x0, x1, x2, …, x10, and x11]Wherein x is_iThe velocity v of the particles is [ v0, v1, v2, …, v10, v11 ] corresponding to the angle of rotation of each joint motor]Wherein v is_iCorresponds to x_iA change in (c); define 20 particles and have a fitness function of score_i＝f_θ(state,action_i)，f_θTo train the resulting generator model, i ∈ [0,19 ]]State is status data, action_iIs the ith motion data.

FIG. 5 illustrates an electronic device (e.g., a computer server with program execution functionality) including at least one processor, a power source, and a memory and input-output interface communicatively coupled to the at least one processor, according to an exemplary embodiment of the invention; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method disclosed in any one of the preceding embodiments; the input and output interface can comprise a display, a keyboard, a mouse and a USB interface and is used for inputting and outputting data; the power supply is used for supplying electric energy to the electronic equipment.

Those skilled in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

When the integrated unit of the present invention is implemented in the form of a software functional unit and sold or used as a separate product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

Claims

1. A biped robot walking control method based on a countermeasure network is characterized by comprising the following steps:

2. The biped robot walking control method based on antagonistic network as claimed in claim 1, characterized in that the status data comprises: the pitch angle, yaw angle and roll angle of the pelvis, the rotation angle and rotation speed of each joint and the pressure values of the left foot and the right foot; and the pitch angle, the yaw angle and the roll angle of the pelvis are measured by adopting an IMU (inertial measurement Unit).

3. The biped robot walking control method based on antagonistic network as claimed in claim 1, characterized in that the action data includes the rotation angle of each joint motor completing the action.

4. The method for controlling the walking of the biped robot based on the countermeasure network according to claim 1, wherein the step S2 includes:

training a discriminator with a set of real data of the biped robot; sending the collected state data, the collected action data and the state data after the action to a discriminator model to obtain the output Dx of the discriminator;

simultaneously sending the state data and the action data into a generator, wherein the state data generated by the generator after the action is generated state data; sending the state data, the action data and the generated state data into a discriminator to obtain the output Dg of the discriminator;

5. The method for controlling the walking of the biped robot based on the countermeasure network according to claim 4, wherein the step S2 comprises:

6. The method for controlling the walking of the biped robot based on the countermeasure network according to claim 4, wherein the step S3 further comprises:

7. The biped robot walking control method based on antagonistic network as claimed in claim 6, wherein said particle swarm optimization algorithm:

the position of the particle is defined as action ═ x0, x1, x2, …, x10, x11]Wherein x is_iThe velocity v of the particles is [ v0, v1, v2, …, v10, v11 ] corresponding to the angle of rotation of each joint motor]Wherein v is_iCorresponds to x_iA change in (c); define 20 particles and have a fitness function of score_i＝f_θ(state,action_i)，f_θTo train the resulting generator model, i ∈ [0,19 ]]State is status data, action_iIs the ith motion data.

8. The method as claimed in claim 6, wherein the score corresponding to the generated state data is:

score＝h-θ²

9. An electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.