CN113156925A - Biped robot walking control method based on countermeasure network and electronic equipment - Google Patents

Biped robot walking control method based on countermeasure network and electronic equipment Download PDF

Info

Publication number
CN113156925A
CN113156925A CN202010015274.3A CN202010015274A CN113156925A CN 113156925 A CN113156925 A CN 113156925A CN 202010015274 A CN202010015274 A CN 202010015274A CN 113156925 A CN113156925 A CN 113156925A
Authority
CN
China
Prior art keywords
data
action
state data
biped robot
discriminator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010015274.3A
Other languages
Chinese (zh)
Other versions
CN113156925B (en
Inventor
王宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Sunrain Sign & Display System Co ltd
Original Assignee
Sichuan Sunrain Sign & Display System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Sunrain Sign & Display System Co ltd filed Critical Sichuan Sunrain Sign & Display System Co ltd
Priority to CN202010015274.3A priority Critical patent/CN113156925B/en
Publication of CN113156925A publication Critical patent/CN113156925A/en
Application granted granted Critical
Publication of CN113156925B publication Critical patent/CN113156925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B62LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
    • B62DMOTOR VEHICLES; TRAILERS
    • B62D57/00Vehicles characterised by having other propulsion or other ground- engaging means than wheels or endless track, alone or in addition to wheels or endless track
    • B62D57/02Vehicles characterised by having other propulsion or other ground- engaging means than wheels or endless track, alone or in addition to wheels or endless track with ground-engaging propulsion means, e.g. walking members
    • B62D57/032Vehicles characterised by having other propulsion or other ground- engaging means than wheels or endless track, alone or in addition to wheels or endless track with ground-engaging propulsion means, e.g. walking members with alternately or sequentially lifted supporting base and legs; with alternately or sequentially lifted feet or skid
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a biped robot walking control method and electronic equipment based on a confrontation network, which comprises the following steps: s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action; s2 inputting the data collected in step S1 into the countermeasure network for training; and S3, controlling the biped robot to walk according to the action data output by the trained confrontation network. The invention adopts an unsupervised neural network optimization algorithm (confrontation network), can obtain the optimal dynamic model of the robot under the mutual game of the generator and the discriminator, and adopts an intelligent group search optimization algorithm (particle swarm optimization algorithm), can obtain the optimal action parameters.

Description

Biped robot walking control method based on countermeasure network and electronic equipment
Technical Field
The invention relates to the field of walking robots, in particular to a biped robot walking control method based on a countermeasure network and electronic equipment.
Background
The existing biped robot walking control method needs to analyze and model parameters of each structure of the biped robot, and because the nonlinear characteristics of each part of the biped robot and the degree of freedom of the biped robot are more and more complicated than other systems, the coupling degree is extremely high. It is very difficult and complicated to manually establish and analyze system models and parameters for each component and system composed system, and the stability and applicability of the obtained models are difficult to guarantee. Therefore, in the prior art, the traditional engineering physical mechanical analysis is adopted, and the model parameters are simplified, so that the system control model caused by the adjustment of a large amount of control parameter experiences is low in precision and high in modeling difficulty, and even some problems cannot be solved, and the problems such as omission in modeling analysis are solved.
Disclosure of Invention
The invention aims to solve the problems of low precision and high modeling difficulty caused by model parameter simplification in the prior art, and provides a biped robot walking control method and electronic equipment based on an antagonistic network, which can obtain optimal action data for walking of a biped robot.
In order to achieve the above purpose, the invention provides the following technical scheme:
a biped robot walking control method based on an antagonistic network comprises the following steps:
s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action;
s2 inputting the data collected in step S1 into the countermeasure network for training;
and S3, controlling the biped robot to walk according to the action data output by the trained confrontation network.
Preferably, the status data includes: the pitch angle, yaw angle and roll angle of the pelvis, the rotation angle and rotation speed of each joint and the pressure values of the left foot and the right foot; and the pitch angle, the yaw angle and the roll angle of the pelvis are measured by adopting an IMU (inertial measurement Unit).
Preferably, the motion data includes a rotation angle of each joint motor that completes the motion.
Preferably, the step S2 includes:
training a discriminator by using a group of real data of the biped robot, namely sending the collected state data, the collected action data and the state data after action (the collected state data after the real action) into a discriminator model to obtain the output Dx of the discriminator;
simultaneously sending the state data and the action data into a generator, wherein the state data after the action generated by the generator is the generated state data (the state data after the action generated by the generator); sending the state data, the action data and the generated state data into a discriminator to obtain the output Dg of the discriminator;
and training the next time by adopting the next group of data until the output value of the discriminator is in the range of 0.4-0.6, and finishing the training to obtain the well-trained confrontation network.
Preferably, the step S2 includes:
sending the collected state data, the motion data and the state data after motion to a discriminator model for training, wherein a first loss function is loss1 ═ Dx;
and sending the state data, the action data and the generated state data into a discriminator for training, wherein the second loss function is loss2 ═ Dx-Dg, and the third loss function is loss3 ═ norm (Sg-Sa), wherein Sg is the generated state data, and Sa is the state data after the action.
Preferably, the step S3 further includes:
according to the trained generator of the countermeasure network, generating state data are obtained for the randomly generated action data, and the randomly generated action data are screened by adopting a particle swarm optimization algorithm according to the corresponding scores of the generating state data;
and the screened action data is used as the input of the generator of the trained confrontation network, the set times of the steps are iterated, and the optimal action data is selected for controlling the biped robot to walk.
Preferably, the particle swarm optimization algorithm:
the position of the particle is defined as action ═ x0, x1, x2, …, x10, x11]Wherein x isiThe velocity v of the particles is [ v0, v1, v2, …, v10, v11 ] corresponding to the angle of rotation of each joint motor]Wherein v isiCorresponds to xiA change in (c); define 20 particles and have a fitness function of scorei=fθ(state,actioni),fθTo train the resulting generator model, i ∈ [0,19 ]]The state is the state data,actioniis the ith motion data.
Preferably, the score corresponding to the generated state data:
score=h-θ2
wherein h is the height of the pelvis and theta is the elevation angle of the pelvis.
According to another aspect of the present invention, there is provided an electronic device, comprising at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the methods described above.
Compared with the prior art, the invention has the beneficial effects that:
according to the biped robot walking control method based on the countermeasure network and the electronic equipment, an unsupervised neural network optimization algorithm (the countermeasure network) is adopted, the optimal dynamic model of the robot can be obtained under the mutual game of the generator and the discriminator, and an intelligent group search optimization algorithm (a particle swarm optimization algorithm) is adopted, so that the optimal action parameters can be obtained.
Description of the drawings:
FIG. 1 is a schematic flow chart of the present invention.
Fig. 2 is a schematic diagram of a process for training with real data.
FIG. 3 is a schematic diagram of a process flow for training with generated state data.
Fig. 4 is a schematic diagram of a flow chart of optimization using a particle swarm optimization algorithm.
Fig. 5 is a schematic diagram of a structure of an electronic device according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to test examples and specific embodiments. It should be understood that the scope of the above-described subject matter is not limited to the following examples, and any techniques implemented based on the disclosure of the present invention are within the scope of the present invention.
As shown in fig. 1, a biped robot walking control method based on a countermeasure network includes:
s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action; the status data includes: the pitch angle, yaw angle and roll angle of the pelvis, the rotation angle and rotation speed of each joint and the pressure values of the left foot and the right foot; and the pitch angle, the yaw angle and the roll angle of the pelvis are measured by adopting an IMU (inertial measurement Unit). The motion data includes the rotation angle of each joint motor that completes the motion.
S2 inputting the data collected in step S1 into the countermeasure network for training;
specifically, training of the countermeasure network:
as shown in fig. 2, training the discriminator with a set of real data of the biped robot, i.e. sending the collected state data, motion data and state data after motion to the discriminator model to obtain the output Dx of the discriminator; the collected state data, motion data and state data after motion are sent to a discriminator model for training, and the first loss function is loss1 ═ Dx.
Meanwhile, as shown in fig. 3, the state data and the action data are sent to a generator, and the state data after the action generated by the generator is the generated state data; sending the state data, the action data and the generated state data into a discriminator to obtain the output Dg of the discriminator; and sending the state data, the action data and the generated state data into a discriminator for training, wherein the second loss function is loss2 ═ Dx-Dg, and the third loss function is loss3 ═ norm (Sg-Sa), wherein Sg is the generated state data, and Sa is the state data after the action.
And training the next time by adopting the next group of data until the output value of the discriminator is in the range of 0.4-0.6, and finishing the training to obtain the well-trained confrontation network.
And S3, controlling the biped robot to walk according to the action data output by the trained confrontation network.
As shown in fig. 4, according to the trained generator of the countermeasure network, generating state data is obtained for the randomly generated motion data, and the randomly generated motion data is screened by adopting a particle swarm optimization algorithm according to a score corresponding to the generating state data; generating a score corresponding to the state data:
score=h-θ2
h is the height of the pelvis, and theta is the elevation angle of the pelvis.
And the screened motion data is used as the input of the generator of the trained confrontation network, the maximum number of times of iterating the steps is 50, and the optimal motion data is selected for controlling the biped robot to walk.
Particle swarm optimization simulates birds in a bird swarm by designing a particle without mass, and the particle only has two attributes: speed, which represents how fast the movement is, and position, which represents the direction of the movement. And each particle independently searches an optimal solution in a search space, records the optimal solution as a current individual extremum, shares the individual extremum with other particles in the whole particle swarm, finds the optimal individual extremum as a current global optimal solution of the whole particle swarm, and adjusts the speed and the position of each particle in the particle swarm according to the found current individual extremum and the current global optimal solution shared by the whole particle swarm. In this embodiment, the position of the particle is defined as action ═ x0, x1, x2, …, x10, and x11]Wherein x isiThe velocity v of the particles is [ v0, v1, v2, …, v10, v11 ] corresponding to the angle of rotation of each joint motor]Wherein v isiCorresponds to xiA change in (c); define 20 particles and have a fitness function of scorei=fθ(state,actioni),fθTo train the resulting generator model, i ∈ [0,19 ]]State is status data, actioniIs the ith motion data.
FIG. 5 illustrates an electronic device (e.g., a computer server with program execution functionality) including at least one processor, a power source, and a memory and input-output interface communicatively coupled to the at least one processor, according to an exemplary embodiment of the invention; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method disclosed in any one of the preceding embodiments; the input and output interface can comprise a display, a keyboard, a mouse and a USB interface and is used for inputting and outputting data; the power supply is used for supplying electric energy to the electronic equipment.
Those skilled in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
When the integrated unit of the present invention is implemented in the form of a software functional unit and sold or used as a separate product, it may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

Claims (9)

1. A biped robot walking control method based on a countermeasure network is characterized by comprising the following steps:
s1, collecting multiple groups of data of the biped robot, wherein each group of data comprises state data, action data and state data after action;
s2 inputting the data collected in step S1 into the countermeasure network for training;
and S3, controlling the biped robot to walk according to the action data output by the trained confrontation network.
2. The biped robot walking control method based on antagonistic network as claimed in claim 1, characterized in that the status data comprises: the pitch angle, yaw angle and roll angle of the pelvis, the rotation angle and rotation speed of each joint and the pressure values of the left foot and the right foot; and the pitch angle, the yaw angle and the roll angle of the pelvis are measured by adopting an IMU (inertial measurement Unit).
3. The biped robot walking control method based on antagonistic network as claimed in claim 1, characterized in that the action data includes the rotation angle of each joint motor completing the action.
4. The method for controlling the walking of the biped robot based on the countermeasure network according to claim 1, wherein the step S2 includes:
training a discriminator with a set of real data of the biped robot; sending the collected state data, the collected action data and the state data after the action to a discriminator model to obtain the output Dx of the discriminator;
simultaneously sending the state data and the action data into a generator, wherein the state data generated by the generator after the action is generated state data; sending the state data, the action data and the generated state data into a discriminator to obtain the output Dg of the discriminator;
and training the next time by adopting the next group of data until the output value of the discriminator is in the range of 0.4-0.6, and finishing the training to obtain the well-trained confrontation network.
5. The method for controlling the walking of the biped robot based on the countermeasure network according to claim 4, wherein the step S2 comprises:
sending the collected state data, the motion data and the state data after motion to a discriminator model for training, wherein a first loss function is loss1 ═ Dx;
and sending the state data, the action data and the generated state data into a discriminator for training, wherein the second loss function is loss2 ═ Dx-Dg, and the third loss function is loss3 ═ norm (Sg-Sa), wherein Sg is the generated state data, and Sa is the state data after the action.
6. The method for controlling the walking of the biped robot based on the countermeasure network according to claim 4, wherein the step S3 further comprises:
according to the trained generator of the countermeasure network, generating state data are obtained for the randomly generated action data, and the randomly generated action data are screened by adopting a particle swarm optimization algorithm according to the corresponding scores of the generating state data;
and the screened action data is used as the input of the generator of the trained confrontation network, the set times of the steps are iterated, and the optimal action data is selected for controlling the biped robot to walk.
7. The biped robot walking control method based on antagonistic network as claimed in claim 6, wherein said particle swarm optimization algorithm:
the position of the particle is defined as action ═ x0, x1, x2, …, x10, x11]Wherein x isiThe velocity v of the particles is [ v0, v1, v2, …, v10, v11 ] corresponding to the angle of rotation of each joint motor]Wherein v isiCorresponds to xiA change in (c); define 20 particles and have a fitness function of scorei=fθ(state,actioni),fθTo train the resulting generator model, i ∈ [0,19 ]]State is status data, actioniIs the ith motion data.
8. The method as claimed in claim 6, wherein the score corresponding to the generated state data is:
score=h-θ2
wherein h is the height of the pelvis and theta is the elevation angle of the pelvis.
9. An electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.
CN202010015274.3A 2020-01-07 2020-01-07 Biped robot walking control method based on confrontation network and electronic equipment Active CN113156925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010015274.3A CN113156925B (en) 2020-01-07 2020-01-07 Biped robot walking control method based on confrontation network and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010015274.3A CN113156925B (en) 2020-01-07 2020-01-07 Biped robot walking control method based on confrontation network and electronic equipment

Publications (2)

Publication Number Publication Date
CN113156925A true CN113156925A (en) 2021-07-23
CN113156925B CN113156925B (en) 2022-11-29

Family

ID=76881473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010015274.3A Active CN113156925B (en) 2020-01-07 2020-01-07 Biped robot walking control method based on confrontation network and electronic equipment

Country Status (1)

Country Link
CN (1) CN113156925B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9440353B1 (en) * 2014-12-29 2016-09-13 Google Inc. Offline determination of robot behavior
CN108068113A (en) * 2017-11-13 2018-05-25 苏州大学 7-DOF humanoid arm flying object operation minimum acceleration trajectory optimization
US10059392B1 (en) * 2016-06-27 2018-08-28 Boston Dynamics, Inc. Control of robotic devices with non-constant body pitch
CN109032142A (en) * 2018-08-14 2018-12-18 浙江大学 A kind of biped robot's design and feedback containing waist structure
CN109483540A (en) * 2018-11-21 2019-03-19 南京邮电大学 Anthropomorphic robot based on Gauss punishment is layered the optimization method for Optimized model of playing football
CN109753071A (en) * 2019-01-10 2019-05-14 上海物景智能科技有限公司 A kind of robot welt traveling method and system
CN110262511A (en) * 2019-07-12 2019-09-20 同济人工智能研究院(苏州)有限公司 Biped robot's adaptivity ambulation control method based on deeply study

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9440353B1 (en) * 2014-12-29 2016-09-13 Google Inc. Offline determination of robot behavior
US10059392B1 (en) * 2016-06-27 2018-08-28 Boston Dynamics, Inc. Control of robotic devices with non-constant body pitch
CN108068113A (en) * 2017-11-13 2018-05-25 苏州大学 7-DOF humanoid arm flying object operation minimum acceleration trajectory optimization
CN109032142A (en) * 2018-08-14 2018-12-18 浙江大学 A kind of biped robot's design and feedback containing waist structure
CN109483540A (en) * 2018-11-21 2019-03-19 南京邮电大学 Anthropomorphic robot based on Gauss punishment is layered the optimization method for Optimized model of playing football
CN109753071A (en) * 2019-01-10 2019-05-14 上海物景智能科技有限公司 A kind of robot welt traveling method and system
CN110262511A (en) * 2019-07-12 2019-09-20 同济人工智能研究院(苏州)有限公司 Biped robot's adaptivity ambulation control method based on deeply study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吕骥图: "基于生成对抗网络与逆向强化学习的机器人汉字笔画书写方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
尹一伊,等: "基于GAN 网络的机器人逆运动学求解", 《苏州科技大学学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN113156925B (en) 2022-11-29

Similar Documents

Publication Publication Date Title
Yu et al. Learning fast adaptation with meta strategy optimization
Du et al. Diffpd: Differentiable projective dynamics
Nygaard et al. Real-world evolution adapts robot morphology and control to hardware limitations
Kidziński et al. Artificial intelligence for prosthetics: Challenge solutions
Yang et al. Learning whole-body motor skills for humanoids
Jin et al. High-speed quadrupedal locomotion by imitation-relaxation reinforcement learning
KR101577711B1 (en) Method for learning task skill using temporal and spatial relation
Singla et al. Realizing learned quadruped locomotion behaviors through kinematic motion primitives
Shafii et al. Learning to walk fast: Optimized hip height movement for simulated and real humanoid robots
Tarapore et al. Evolvability signatures of generative encodings: beyond standard performance benchmarks
KR20180047391A (en) Learning robot and method for task skill using thereof
Melo et al. Learning humanoid robot running motions with symmetry incentive through proximal policy optimization
Tan et al. A hierarchical framework for quadruped locomotion based on reinforcement learning
Yang et al. Cajun: Continuous adaptive jumping using a learned centroidal controller
Samuelsen et al. A hox gene inspired generative approach to evolving robot morphology
CN113156925B (en) Biped robot walking control method based on confrontation network and electronic equipment
CN111142378A (en) Neural network optimization method of biped robot neural network controller
Luo et al. Universal Humanoid Motion Representations for Physics-Based Control
CN115730521A (en) Intelligent design method of multi-legged robot based on hyper-multitask evolution optimization algorithm
Elmenreich et al. Genetic evolution of a neural network for the autonomous control of a four-wheeled robot
Zhou et al. Efficient and robust learning on elaborated gaits with curriculum learning
Klink et al. Tracking Control for a Spherical Pendulum via Curriculum Reinforcement Learning
Luo et al. Active online learning of the bipedal walking
Zhang et al. Whole-body Humanoid Robot Locomotion with Human Reference
KR20140133417A (en) Method for learning task skill using temporal and spatial entrophies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant