CN114660947A

CN114660947A - Robot gait autonomous learning method and device, electronic equipment and storage medium

Info

Publication number: CN114660947A
Application number: CN202210544154.1A
Authority: CN
Inventors: 邓涛; 张晟东; 张立华; 李志建; 古家威
Original assignee: Ji Hua Laboratory
Current assignee: Ji Hua Laboratory
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-06-24
Anticipated expiration: 2042-05-19
Also published as: CN114660947B

Abstract

The invention relates to the technical field of robot control, and particularly discloses a robot gait autonomous learning method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring motion capture information of a living being to be simulated; acquiring scene state information of an organism to be simulated; constructing a decision network according to the scene state information; constructing a countermeasure discriminator according to the decision network and the motion capture information; a fixed decision network, which is used for confrontation training confrontation discriminator to optimally distinguish the output results of the motion capture information and the decision network; training a decision network according to the trained confrontation discriminator to generate an action decision model for controlling the gait of the robot; the action decision model obtained by the method can directly generate an action decision close to the action capture information according to the scene state information without depending on a Markov chain and implicit variable inference, so that the training and deployment processes of the model are greatly simplified, and the training efficiency of the model is effectively improved.

Description

Robot gait autonomous learning method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of robot control, in particular to a robot gait autonomous learning method, a robot gait autonomous learning device, electronic equipment and a storage medium.

Background

The guarantee of the agility and the stationarity of the robot movement and the control of the leg movement are always difficult problems. Currently, the industry generally generates a gait strategy of the robot by predefined gait, trajectory optimization, model prediction control and other methods,

however, although the foot-controlled motion decision model trained by these methods has a certain robustness advantage, the generated motion performance is far from the agility and the flexibility of a real animal, the motion decision lacks sufficient flexibility and stability, and the training process needs to rely on a markov chain and implicit variable inference, which results in a tedious training process.

In view of the above problems, no effective technical solution exists at present.

Disclosure of Invention

The application aims to provide a robot gait autonomous learning method, a robot gait autonomous learning device, electronic equipment and a storage medium, so that an action decision model has both flexibility and stability, and training and deployment processes of the model are simplified.

In a first aspect, the present application provides a method for autonomous learning of gait of a robot, for enabling the robot to autonomously simulate learning of gait of a living being, the method comprising the steps of:

acquiring motion capture information of a living being to be simulated;

acquiring scene state information of the creature to be imitated;

constructing a decision network according to the scene state information;

constructing a countermeasure discriminator according to the decision network and the motion capture information;

fixing the decision network, and training the countermeasure discriminator in a countermeasure mode so as to optimally distinguish the motion capture information from the output result of the decision network;

and training the decision network according to the trained confrontation discriminator to enable the output result of the decision network to gradually approach the motion capture information so as to generate a motion decision model for controlling the gait of the robot.

According to the robot gait autonomous learning method, the countermeasure discriminator is trained in an countermeasure mode by utilizing the decision network and the action capture information which are constructed based on the scene state information, and then the countermeasure discriminator is used for training the decision network, so that the action decision model is trained quickly.

The robot gait self-learning method comprises the following steps of:

capturing pose information of mark points of the living beings to be simulated based on an optical and inertial fusion method, wherein the mark points are reflecting points arranged on the living beings to be simulated;

generating the motion capture information based on pose information of the marker points.

The method can quickly and accurately confirm the pose information of each joint point of the robot by acquiring the motion capture information based on the process, the output result of the decision network is consistent with the dimension of the motion capture information, the method can be directly used for controlling the pose of the joint points of the robot, the complicated link transplanting and building can be omitted, and the time from algorithm to deployment is effectively shortened.

The robot gait autonomous learning method comprises the steps that the decision network comprises a plurality of convolution layers, a batchnorm layer and a full connection layer which are sequentially connected, and is used for generating an output result according to the scene state information, wherein the output result is a regularized action vector.

In the robot gait autonomous learning method of this example, the decision network having the network layer structure described above can output an output result having a dimension that is consistent with the dimension of the motion capture information, based on the scene state information.

The method for gait self-learning of a robot, wherein the step of fixing the decision network and training the confrontation discriminator in confrontation to optimally distinguish the motion capture information from the output result of the decision network, comprises:

fixing network parameters of each network layer in the decision network;

judging the difference degree of the output results of the action capture information and the fixed decision network by using the confrontation discriminator;

training the confrontation discriminant to maximize the degree of difference to optimally distinguish the motion capture information from the output of the decision network.

The robot gait autonomous learning method, wherein the step of discriminating, by the confrontation discriminator, a degree of difference between the motion capture information and the output result of the fixed decision network includes:

setting a first objective function for representing the difference degree according to the discrimination result of the countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the first objective function is

And satisfies the following conditions:

；

wherein, the first and the second end of the pipe are connected with each other,θas a network parameter of the countermeasure arbiter,D() is the confrontation discriminator,yfor the purpose of capturing information for the motion,sin order to be the scene-state information,G ₁() is the decision network after fixing.

In the robot gait autonomous learning method of this example, the first objective function is calculated from the logarithmic difference of the discrimination results of the confrontation discriminator, and the degree of difference between the two discrimination results can be reflected intuitively based on the digitization.

The method for autonomous learning gait of robot, wherein the step of training the decision network according to the trained confrontation discriminator to make the output result of the decision network gradually approach the motion capture information to generate a motion decision model for controlling the gait of robot includes:

judging the difference degree of the output results of the motion capture information and the decision network by using the trained confrontation discriminator;

training the decision network to minimize the degree of difference to generate an action decision model for controlling robot gait.

The robot gait autonomous learning method, wherein the step of discriminating a degree of difference between the motion capture information and the output result of the decision network using the trained confrontation discriminator includes:

setting a second objective function for representing the difference degree according to the discrimination result of the trained countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the second objective function is

，

Is a network parameter of the decision network.

In a second aspect, the present application further provides a device for autonomous gait learning of a robot, for enabling the robot to autonomously simulate learning of a biological gait, the device comprising:

the motion acquisition module is used for acquiring motion capture information of a living being to be simulated;

the scene acquisition module is used for acquiring the scene state information of the creature to be imitated;

the decision module is used for constructing a decision network according to the scene state information;

the judgment module is used for constructing a confrontation discriminator according to the decision network and the motion capture information;

the first training module is used for fixing the decision network and training the confrontation discriminator in a confrontation way so as to optimally distinguish the motion capture information from the output result of the decision network;

and the second training module is used for training the decision network according to the trained confrontation discriminator so as to enable the output result of the decision network to gradually approach the motion capture information, so as to generate a motion decision model for controlling the gait of the robot.

According to the robot gait autonomous learning device, the confrontation discriminator is trained in confrontation by utilizing the decision network and the motion capture information which are constructed based on the scene state information, and then the decision network is trained by the confrontation discriminator, so that the rapid training of the motion decision model is realized.

In a third aspect, the present application further provides an electronic device, comprising a processor and a memory, where the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, perform the steps of the method as provided in the first aspect.

In a fourth aspect, the present application also provides a storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the method as provided in the first aspect above.

From the above, the application provides a robot gait autonomous learning method, a device, an electronic device and a storage medium, wherein the method utilizes a decision network constructed based on scene state information and action capture information to train an confrontation discriminator, and then trains the decision network through the confrontation discriminator, so that the action decision model is rapidly trained.

Drawings

Fig. 1 is a flowchart of a robot gait autonomous learning method according to an embodiment of the present disclosure.

Fig. 2 is a schematic network layer structure diagram of a decision network in the embodiment of the present application.

Fig. 3 is a schematic network layer structure diagram of the countermeasure arbiter in the embodiment of the present application.

Fig. 4 is a structural framework diagram of model training of a robot gait autonomous learning method according to an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a robot gait autonomous learning device according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals: 201. an action acquisition module; 202. a scene acquisition module; 203. a decision-making module; 204. a discrimination module; 205. a first training module; 206. a second training module; 301. a processor; 302. a memory; 303. a communication bus.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

In the prior art, a motion capture system is adopted to acquire biological motion capture data and deploy the biological motion capture data to a robot for gait learning of the robot, but the learning mode is only to simulate the biological motion process and is equivalent to a teaching process, and a generated decision model still lacks flexibility and stability.

In a first aspect, please refer to fig. 1-4, fig. 1-4 are diagrams of a robot gait autonomous learning method for enabling a robot to autonomously simulate learning of a biological gait, in some embodiments of the present application, the method includes the following steps:

s1, acquiring motion capture information of the creature to be imitated;

specifically, the motion capture information is data information capable of representing the movement of the biological body to be simulated, and the motion capture information at least comprises gait movement data information generated corresponding to the movement of the biological body to be simulated, because the method of the embodiment of the application is used for enabling the robot to simulate the gait of the biological body to be studied autonomously.

More specifically, to clearly reflect the biological movement behavior to be simulated, the gait motion data information generated by the movement should contain dynamic data of joint angles of the trunk and four feet (hip, knee, ankle, foot).

More specifically, the creature to be simulated is a biped creature or a quadruped creature, and a quadruped creature is preferable in the present embodiment so that the method of the present embodiment can be applied to a quadruped robot.

S2, acquiring scene state information of the creature to be imitated;

specifically, the creature has different gait motions in different scene terrains, such as the effect of an undulating field on the height of the raised foot, and the like, and therefore, the gait of the creature has relevance to the scene.

More specifically, the scene state information is used for reflecting morphological characteristics of the field where the creature to be simulated is located, including but not limited to terrain state characteristics.

More specifically, the scene state information may be generated by a visual sensor, lidar, or other device acquisition for environmental data acquisition.

S3, constructing a decision network according to the scene state information;

specifically, the decision network is an algorithm network capable of generating an action decision according to scene state information, and belongs to an action decision model which is not optimized yet, and therefore, the decision network is a control algorithm capable of enabling the robot to give the action decision of each key point according to the current scene state, namely, the control algorithm is a prototype of the action decision model.

S4, constructing a countermeasure discriminator according to the decision network and the motion capture information;

specifically, the countermeasure discriminator is a discriminator established based on the generation countermeasure network, and is capable of outputting a discrimination result based on the input output result of the decision network and the motion capture information.

More specifically, to ensure that the countermeasure discriminator is applied to the decision network and the motion capture information, the step S3 should construct the decision network such that the output result of the decision network is the same dimension as the motion capture information.

S5, fixing a decision network, and training a confrontation discriminator in confrontation so as to optimally distinguish the output results of the motion capture information and the decision network;

specifically, the method of the embodiment of the present application is designed based on the structural frame shown in fig. 4, and specifically includes: designing a confrontation discriminator capable of clearly distinguishing the output results of the motion capture information and the decision network, and guiding the decision network to carry out learning training by using the confrontation discriminator so that the output result of the decision network is as close to the motion capture information as possible, and the robot can generate a gait motion decision which is as similar as possible to the creature to be simulated according to the scene state information; therefore, before training the decision network, a confrontation discriminator capable of clearly distinguishing the motion capture information from the output result of the decision network needs to be obtained, and the stronger the discrimination capability of the confrontation discriminator is, the better the imitation effect generated by the subsequent training of the decision network can be optimized.

More specifically, to reduce the amount of operation data and simplify the optimization process of the countermeasure discriminator, the step first fixes a decision network, and then puts the motion capture information and the output result of the fixed decision network into the countermeasure discriminator, so as to realize rapid optimization of the countermeasure discriminator.

And S6, training a decision network according to the trained confrontation discriminator, and enabling the output result of the decision network to gradually approach the motion capture information so as to generate a motion decision model for controlling the gait of the robot.

Specifically, as can be seen from the foregoing, the second stage of the training process is to train the decision network, and therefore, based on the step S5, an confrontation discriminator capable of clearly distinguishing the output result of the motion capture information from the output result of the decision network is obtained, and the trained confrontation discriminator is used to discriminate whether the output result of the decision network is close to the motion capture information in the step S6, so as to gradually update the decision network, so that the output result of the decision network is as close to the motion capture information as possible, and after the training is completed, the decision network can generate an output result (i.e., a motion decision) nearly identical to the motion capture information according to the scene state information, so that the decision network can be used as a motion decision model for controlling the gait of the robot at this time, so that the robot can autonomously simulate and learn biological gait.

According to the robot gait autonomous learning method, the countermeasure discriminator is used for countertraining by utilizing the decision network and the motion capture information which are constructed based on the scene state information, and then the decision network is trained by the countermeasure discriminator, so that the rapid training of the motion decision model is realized.

In some preferred embodiments, the step of obtaining motion capture information of the living being to be mimicked comprises:

s11, capturing pose information of mark points of the living beings to be simulated based on an optical and inertial fusion method, wherein the mark points are reflecting points arranged on the living beings to be simulated;

specifically, the optical and inertial fusion method is a capturing method of a kinetic capture system, and therefore, this step can also be understood as capturing pose information of marker points of a living organism to be simulated by the kinetic capture system.

And S12, generating motion capture information based on the pose information of the mark points.

Specifically, the optical and inertial fusion method can capture the motion action of a to-be-simulated organism with high precision and high reliability, supposing that n reflecting points are arranged on the to-be-simulated organism, and the gait and motion tracking and positioning are carried out on a certain single action of the to-be-simulated organism through the n reflecting points, so that the pose information (including three-dimensional space coordinate information and degree-of-freedom information) of the n reflecting points is acquired, and the dynamic data of joint angles, such as the head trunk, the four feet (hip, knee, ankle and foot), and the like, and the data of space coordinates and the like of the to-be-simulated organism quadruped animal are calculated based on the arrangement position of the reflecting points on the to-be-simulated organism to form action capture information, and the action capture information can clearly reflect the gait characteristics of the to-be-simulated organism and can provide an action data basis for subsequent gait autonomous learning.

More specifically, the method of the embodiment of the application can quickly and accurately confirm the pose information of each joint of the robot by acquiring the motion capture information based on the process, the output result of the decision network is consistent with the dimension of the motion capture information, the method can be directly used for controlling the pose of the joint of the robot, the complex environment transplantation and construction can be omitted, and the time from algorithm to deployment can be effectively shortened.

In some preferred embodiments, the decision network includes a plurality of convolutional layers, batchnorm layers, and fully-connected layers connected in sequence, and is configured to generate an output result according to the scene state information, where the output result is a regularized action vector.

Specifically, as shown in fig. 2, in the embodiment of the present application, the decision network preferably includes four convolutional layers (Conv 1, Conv2, Conv3, and Conv 4), a batchnorm layer, and a fully-connected layer (full connect) which are sequentially arranged, wherein each of the four convolutional layer convolution results is output by a ReLU function, and an output result of the fully-connected layer is output by a tanh function.

More specifically, four convolutional layers are arranged to perform gradual information subdivision and feature extraction on scene state information (generally, scene pictures) to obtain subdivision feature information, the subdivision feature information can be normalized by using a batchnorm layer, the subdivision feature information is integrated into one action vector by using a full connection layer, and finally the action vector is normalized by using a tanh function to generate an output result.

More specifically, the decision network having the network layer structure can output an output result having a dimension that is consistent with the dimension of the motion capture information based on the scene state information.

In some preferred embodiments, as shown in fig. 3, the countermeasure arbiter includes a convolution layer (conv 1), a convolution layer (conv 2), a batchnorm layer, a convolution layer (conv 3), a flitten layer, and a fully-connected layer (full connect) arranged in this order, wherein each of the three convolution layers is output by an lreul function, and the fully-connected layer (full connect) is output by a sigmoid function.

In some preferred embodiments, ReLU, tanh, lreuu, and sigmoid are expressed as:

（1）

（2）

（3）

（4）

wherein the content of the first and second substances,ain order to design the parameters of the device,xfor the characteristic information of the output of the previous network layer, in the embodiment of the present application, ReLU, tanh, LReLU, and sigmoid are respectively denoted as ReLU, tanh, LReLU, and sigmoidf _G、O _G、f _DAndO _Dthe convolutional layer is represented as

Definition ofD(. cndot.) is a confrontation discriminator,sas the information on the state of the scene,G(. to) is a decision network, then:

（5）

（6）

in which, as shown in figures 2 and 3,

to determine the network parameters of the convolutional layer and the fully-connected layer corresponding to the network,

recording network parameters of convolutional layer and full-link layer corresponding to the countermeasure arbiter

，

，θIn order to combat the network parameters of the arbiter,

to decide the network parameters of the network, therefore, step S5 can be understood as training acquisitionθStep S6 can be understood as training acquisition

In the training process, a matrix whose elements satisfy N (0, 1) is generally initialized,b ₁andb ₂are respectively astanhFunction sumsigmoidA compensation parameter of the function.

In some preferred embodiments, the step of fixing the decision network and training the confrontation discriminant against to optimally distinguish the motion capture information from the output of the decision network comprises:

s51, fixing network parameters of each network layer in the decision network;

specifically, the network parameters of the fixed decision network can fix the decision network, i.e. in the embodiment of the present application, the fixed decision network

。

S52, judging the difference degree of the output results of the action capture information and the fixed decision network by using a confrontation discriminator;

specifically, the motion capture information and the output result of the fixed decision network may be input into the countermeasure arbiter, respectively, and the comparison of the magnitudes of the discrimination results may be performed, and an objective function for comparing the discrimination results may also be established to perform the comparison, so as to reflect the degree of difference between the motion capture information and the output result of the fixed decision network.

S53, training the confrontation discriminant to maximize the difference degree so as to optimally distinguish the output results of the motion capture information and the decision network.

Specifically, the maximization of the degree of difference indicates that the discriminator can discriminate the difference between the motion capture information and the output result of the decision network to the maximum extent, so in the embodiment of the present application, the training process may be regarded as a 01-binary classification problem, and the objective is to make the countermeasure discriminator discriminate the motion capture information as true (i.e. 1) and discriminate the output result of the decision network as false (i.e. 0), thereby establishing a training basis for the subsequently trained decision network.

In some preferred embodiments, the step of using the confrontation discriminator to discriminate the degree of difference between the output of the motion capture information and the fixed decision network comprises:

And satisfies the following conditions:

（7）

wherein, the first and the second end of the pipe are connected with each other,θin order to combat the network parameters of the arbiter,D(. cndot.) is a confrontation discriminator,yin order to capture the information for the motion,sin order to be the scene-state information,G ₁(. cndot.) is a fixed decision network.

Specifically, the first objective function is calculated by the logarithmic difference of the discrimination results of the confrontation discriminator, and the degree of difference between the two discrimination results can be reflected intuitively based on the numeralization.

In some preferred embodiments, the step of training the confrontation discriminant to maximize the degree of the difference comprises:

updating according to a gradient ascent methodθLet the first objective function

And (4) maximizing.

Specifically, according to the gradient ascent method, there are:

（8）

wherein the content of the first and second substances,αfor the learning rate, it is generally set to 0.05,θ _tnetwork parameters of the confrontation arbiter related to the rise of time gradient;

by usingθ _tContinuously updating the first objective function

In aθ _tAfter convergence, the first objective function

To the maximum, willθ _tIs determined asθI.e. determining the first objective function

Network parameters of countermeasure arbiter at maximum

In this case, the countermeasure discriminator can maximally distinguish the motion capture information from the output result of the decision network, i.e., can accurately obtain the difference between the motion capture information and the output result of the decision network, and can be used as a judgment means for judging whether the robot decision motion is accurate or not.

In some preferred embodiments, the step of training the decision network according to the trained confrontation discriminant so that the output result of the decision network gradually approaches the motion capture information to generate the motion decision model for controlling the gait of the robot includes:

s61, judging the difference degree of the output results of the motion capture information and the decision network by using the trained confrontation discriminator;

specifically, the step is to apply a first objective function

Fixing (i.e. fixing) the confrontation criterion at maximum

) While de-pinning the decision network, and thenAnd inputting the output result of the decision network and the motion capture information into the fixed confrontation discriminator to obtain the difference degree of the output result of the decision network and the motion capture information.

And S62, training the decision network to minimize the difference degree so as to generate an action decision model for controlling the gait of the robot.

Specifically, minimizing the difference indicates that the decision network can output the output result closest to the motion capture information according to the scene state information to the maximum extent, i.e. by adjusting

Such that the discrimination result of the countermeasure discriminator is adjusted to true (i.e., 1) as much as possible for the output result of the decision network.

In some preferred embodiments, the step of using the trained confrontation discriminator to discriminate the degree of difference between the output of the decision network and the motion capture information comprises:

setting a second objective function for representing the difference degree according to the judgment result of the trained confrontation discriminator on the scene state information and the judgment result of the confrontation discriminator on the output result of the fixed decision network, wherein the second objective function is

，

To decide network parameters of the network.

In some embodiments of the present invention, the substrate is,

satisfies the following conditions:

(ii) a I.e. in the first objective functionθIs fixed and will

Set as variable, for the objective function, the smaller the output value, the better the simulation effect of the output result representing the decision network, therefore, the further simplification can be realized

In the embodiments of the present application, the first and second,

preferably simplified to:

（9）

the formula (9) omits the discriminant item of the motion capture information, and the simulation effect of the decision network can be quickly obtained only by substituting the countermeasure discriminant on the data result of the decision network, thereby effectively simplifying the operation logic of the step S6, reducing the data calculation amount and improving the training efficiency of the decision network.

In some preferred embodiments, the step of training the decision network to minimise the degree of discrepancy comprises:

updating according to a gradient descent method

Let the second objective function

And (4) minimizing.

Specifically, according to the gradient descent method, there are:

（10）

wherein the content of the first and second substances,βfor the learning rate, it is generally set to 0.05,

network parameters of the decision network with respect to time gradient descent;

by using

Continuously updating the second objective function

In a

After convergence, the second objective function

The minimization is achieved, so that for the trained confrontation discriminator, the output result of the decision network is most similar to the motion capture information, namely, the output result of the decision network is automatically and gradually updated to approach the motion capture information, and the rapid approach of motion decision to the motion capture information is realized; second objective function

When minimum is reached

Is extracted from

I.e. determining the second objective function

And (3) minimizing the network parameters of the decision network, wherein the decision network can output the output result similar to the motion capture information to the maximum extent, so that the decision network can be used as a motion decision model.

More specifically, the gradient descent updating of the second objective function can quickly obtain the optimal decision network, so that the countermeasure arbiter cannot distinguish between the motion capture information and the output result of the decision network, which indicates that the output result of the decision network is approximately consistent with the motion capture information.

More specifically, the loss function shown in FIG. 4 is a first objective function

Or a second objective function

That is, the method of the embodiment of the present application establishes the first objective function successively

And a second objective function

The training of the action decision model is implemented as a loss function.

In some preferred embodiments, the method further comprises the steps of:

and S7, deploying the action decision model on the robot, and performing robot gait control by combining the dynamic capturing system to verify the gait control effect of the action decision model.

Specifically, when the verification result of step S7 is not good, steps S1-S7 are repeated to train the optimization action decision model.

More specifically, step S7 is mainly to verify the control agility and flexibility of the action decision model.

More specifically, the trained data can be deployed on the robot through calculation processes such as coordinate transformation, inverse kinematics and dynamics, so that the gait generated by the robot can be normally applied outside the space of the motion capture system.

In a second aspect, please refer to fig. 5, fig. 5 is a device for autonomous gait learning of a robot provided in some embodiments of the present application, wherein the device is used for enabling the robot to autonomously simulate learning of a biological gait, and the device includes:

a motion acquisition module 201, configured to acquire motion capture information of a living being to be simulated;

a scene acquisition module 202, configured to acquire scene state information of an organism to be simulated;

a decision module 203, configured to construct a decision network according to the scene state information;

the judgment module 204 is used for constructing a countermeasure judgment device according to the decision network and the motion capture information;

a first training module 205 for fixing a decision network, and training a confrontation discriminant against to optimally distinguish the output results of the decision network from the motion capture information;

the second training module 206 is configured to train a decision network according to the trained confrontation discriminator, so that an output result of the decision network gradually approaches the motion capture information, so as to generate a motion decision model for controlling a gait of the robot.

The robot gait autonomous learning device provided by the embodiment of the application utilizes a countermeasure discriminator for countermeasure training of a decision network and motion capture information constructed based on scene state information and trains the decision network through the countermeasure discriminator, so that the rapid training of a motion decision model is realized.

In some preferred embodiments, the apparatus further comprises:

and the verification module is used for deploying the action decision model on the robot and carrying out robot gait control by combining the action capture system so as to verify the gait control effect of the action decision model.

In some preferred embodiments, the robot gait autonomous learning apparatus of the embodiments of the present application is configured to perform the robot gait autonomous learning method provided in the first aspect.

In a third aspect, referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the present application provides an electronic device including: the processor 301 and the memory 302, the processor 301 and the memory 302 being interconnected and communicating with each other via a communication bus 303 and/or other form of connection mechanism (not shown), the memory 302 storing a computer program executable by the processor 301, the processor 301 executing the computer program when the computing device is running to perform the method of any of the alternative implementations of the embodiments described above.

In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the method in any optional implementation manner of the foregoing embodiments. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.

In summary, the embodiment of the application provides a robot gait autonomous learning method, a device, an electronic device and a storage medium, wherein the method utilizes a decision network constructed based on scene state information and action capture information to train an confrontation discriminator, and then trains the decision network through the confrontation discriminator, so that the action decision model is rapidly trained.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for autonomous gait learning of a robot for the robot to autonomously simulate learning a biological gait, the method comprising the steps of:

acquiring motion capture information of a living being to be simulated;

acquiring scene state information of the creature to be imitated;

constructing a decision network according to the scene state information;

2. The robotic gait autonomous learning method according to claim 1, wherein the step of acquiring motion capture information of a living being to be simulated includes:

3. The robot gait autonomous learning method according to claim 1, wherein the decision network includes a plurality of convolutional layers, batchnorm layers, and full-link layers connected in sequence, and is configured to generate an output result according to the scene state information, where the output result is a regularized motion vector.

4. The method of claim 1, wherein the step of fixing the decision network and opportunistically training the opponent arbiter to optimally distinguish the motion capture information from the output of the decision network comprises:

fixing network parameters of each network layer in the decision network;

5. The robot gait autonomous learning method according to claim 4, wherein the step of discriminating, with the confrontation discriminator, a degree of difference between the motion capture information and the output result of the decision network after fixation includes:

And satisfies the following conditions:

；

wherein the content of the first and second substances,θas a network parameter of the countermeasure arbiter,D() is the confrontation discriminator,yfor the purpose of capturing information for the motion,sin order to be the scene-state information,G ₁() is the decision network after fixing.

6. The method of claim 1, wherein the step of training the decision network according to the trained confrontation discriminant to gradually bring the output result of the decision network close to the motion capture information to generate a motion decision model for controlling a gait of the robot comprises:

7. The robot gait autonomous learning method according to claim 6, wherein the step of discriminating the degree of difference between the motion capture information and the output result of the decision network using the trained confrontation discriminator includes:

，

Is a network parameter of the decision network.

8. A robotic gait self-learning apparatus for enabling a robot to autonomously simulate learning a biological gait, the apparatus comprising:

9. An electronic device comprising a processor and a memory, said memory storing computer readable instructions which, when executed by said processor, perform the steps of the method according to any one of claims 1 to 7.

10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method according to any one of claims 1-7.