CN114660947A - Robot gait autonomous learning method and device, electronic equipment and storage medium - Google Patents

Robot gait autonomous learning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114660947A
CN114660947A CN202210544154.1A CN202210544154A CN114660947A CN 114660947 A CN114660947 A CN 114660947A CN 202210544154 A CN202210544154 A CN 202210544154A CN 114660947 A CN114660947 A CN 114660947A
Authority
CN
China
Prior art keywords
decision network
decision
motion capture
discriminator
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210544154.1A
Other languages
Chinese (zh)
Other versions
CN114660947B (en
Inventor
邓涛
张晟东
张立华
李志建
古家威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ji Hua Laboratory
Original Assignee
Ji Hua Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ji Hua Laboratory filed Critical Ji Hua Laboratory
Priority to CN202210544154.1A priority Critical patent/CN114660947B/en
Publication of CN114660947A publication Critical patent/CN114660947A/en
Application granted granted Critical
Publication of CN114660947B publication Critical patent/CN114660947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention relates to the technical field of robot control, and particularly discloses a robot gait autonomous learning method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring motion capture information of a living being to be simulated; acquiring scene state information of an organism to be simulated; constructing a decision network according to the scene state information; constructing a countermeasure discriminator according to the decision network and the motion capture information; a fixed decision network, which is used for confrontation training confrontation discriminator to optimally distinguish the output results of the motion capture information and the decision network; training a decision network according to the trained confrontation discriminator to generate an action decision model for controlling the gait of the robot; the action decision model obtained by the method can directly generate an action decision close to the action capture information according to the scene state information without depending on a Markov chain and implicit variable inference, so that the training and deployment processes of the model are greatly simplified, and the training efficiency of the model is effectively improved.

Description

Robot gait autonomous learning method and device, electronic equipment and storage medium
Technical Field
The application relates to the technical field of robot control, in particular to a robot gait autonomous learning method, a robot gait autonomous learning device, electronic equipment and a storage medium.
Background
The guarantee of the agility and the stationarity of the robot movement and the control of the leg movement are always difficult problems. Currently, the industry generally generates a gait strategy of the robot by predefined gait, trajectory optimization, model prediction control and other methods,
however, although the foot-controlled motion decision model trained by these methods has a certain robustness advantage, the generated motion performance is far from the agility and the flexibility of a real animal, the motion decision lacks sufficient flexibility and stability, and the training process needs to rely on a markov chain and implicit variable inference, which results in a tedious training process.
In view of the above problems, no effective technical solution exists at present.
Disclosure of Invention
The application aims to provide a robot gait autonomous learning method, a robot gait autonomous learning device, electronic equipment and a storage medium, so that an action decision model has both flexibility and stability, and training and deployment processes of the model are simplified.
In a first aspect, the present application provides a method for autonomous learning of gait of a robot, for enabling the robot to autonomously simulate learning of gait of a living being, the method comprising the steps of:
acquiring motion capture information of a living being to be simulated;
acquiring scene state information of the creature to be imitated;
constructing a decision network according to the scene state information;
constructing a countermeasure discriminator according to the decision network and the motion capture information;
fixing the decision network, and training the countermeasure discriminator in a countermeasure mode so as to optimally distinguish the motion capture information from the output result of the decision network;
and training the decision network according to the trained confrontation discriminator to enable the output result of the decision network to gradually approach the motion capture information so as to generate a motion decision model for controlling the gait of the robot.
According to the robot gait autonomous learning method, the countermeasure discriminator is trained in an countermeasure mode by utilizing the decision network and the action capture information which are constructed based on the scene state information, and then the countermeasure discriminator is used for training the decision network, so that the action decision model is trained quickly.
The robot gait self-learning method comprises the following steps of:
capturing pose information of mark points of the living beings to be simulated based on an optical and inertial fusion method, wherein the mark points are reflecting points arranged on the living beings to be simulated;
generating the motion capture information based on pose information of the marker points.
The method can quickly and accurately confirm the pose information of each joint point of the robot by acquiring the motion capture information based on the process, the output result of the decision network is consistent with the dimension of the motion capture information, the method can be directly used for controlling the pose of the joint points of the robot, the complicated link transplanting and building can be omitted, and the time from algorithm to deployment is effectively shortened.
The robot gait autonomous learning method comprises the steps that the decision network comprises a plurality of convolution layers, a batchnorm layer and a full connection layer which are sequentially connected, and is used for generating an output result according to the scene state information, wherein the output result is a regularized action vector.
In the robot gait autonomous learning method of this example, the decision network having the network layer structure described above can output an output result having a dimension that is consistent with the dimension of the motion capture information, based on the scene state information.
The method for gait self-learning of a robot, wherein the step of fixing the decision network and training the confrontation discriminator in confrontation to optimally distinguish the motion capture information from the output result of the decision network, comprises:
fixing network parameters of each network layer in the decision network;
judging the difference degree of the output results of the action capture information and the fixed decision network by using the confrontation discriminator;
training the confrontation discriminant to maximize the degree of difference to optimally distinguish the motion capture information from the output of the decision network.
The robot gait autonomous learning method, wherein the step of discriminating, by the confrontation discriminator, a degree of difference between the motion capture information and the output result of the fixed decision network includes:
setting a first objective function for representing the difference degree according to the discrimination result of the countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the first objective function is
Figure 71982DEST_PATH_IMAGE001
And satisfies the following conditions:
Figure 804314DEST_PATH_IMAGE002
wherein, the first and the second end of the pipe are connected with each other,θas a network parameter of the countermeasure arbiter,D() is the confrontation discriminator,yfor the purpose of capturing information for the motion,sin order to be the scene-state information,G 1 () is the decision network after fixing.
In the robot gait autonomous learning method of this example, the first objective function is calculated from the logarithmic difference of the discrimination results of the confrontation discriminator, and the degree of difference between the two discrimination results can be reflected intuitively based on the digitization.
The method for autonomous learning gait of robot, wherein the step of training the decision network according to the trained confrontation discriminator to make the output result of the decision network gradually approach the motion capture information to generate a motion decision model for controlling the gait of robot includes:
judging the difference degree of the output results of the motion capture information and the decision network by using the trained confrontation discriminator;
training the decision network to minimize the degree of difference to generate an action decision model for controlling robot gait.
The robot gait autonomous learning method, wherein the step of discriminating a degree of difference between the motion capture information and the output result of the decision network using the trained confrontation discriminator includes:
setting a second objective function for representing the difference degree according to the discrimination result of the trained countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the second objective function is
Figure 122163DEST_PATH_IMAGE003
Figure 214622DEST_PATH_IMAGE004
Is a network parameter of the decision network.
In a second aspect, the present application further provides a device for autonomous gait learning of a robot, for enabling the robot to autonomously simulate learning of a biological gait, the device comprising:
the motion acquisition module is used for acquiring motion capture information of a living being to be simulated;
the scene acquisition module is used for acquiring the scene state information of the creature to be imitated;
the decision module is used for constructing a decision network according to the scene state information;
the judgment module is used for constructing a confrontation discriminator according to the decision network and the motion capture information;
the first training module is used for fixing the decision network and training the confrontation discriminator in a confrontation way so as to optimally distinguish the motion capture information from the output result of the decision network;
and the second training module is used for training the decision network according to the trained confrontation discriminator so as to enable the output result of the decision network to gradually approach the motion capture information, so as to generate a motion decision model for controlling the gait of the robot.
According to the robot gait autonomous learning device, the confrontation discriminator is trained in confrontation by utilizing the decision network and the motion capture information which are constructed based on the scene state information, and then the decision network is trained by the confrontation discriminator, so that the rapid training of the motion decision model is realized.
In a third aspect, the present application further provides an electronic device, comprising a processor and a memory, where the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, perform the steps of the method as provided in the first aspect.
In a fourth aspect, the present application also provides a storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the method as provided in the first aspect above.
From the above, the application provides a robot gait autonomous learning method, a device, an electronic device and a storage medium, wherein the method utilizes a decision network constructed based on scene state information and action capture information to train an confrontation discriminator, and then trains the decision network through the confrontation discriminator, so that the action decision model is rapidly trained.
Drawings
Fig. 1 is a flowchart of a robot gait autonomous learning method according to an embodiment of the present disclosure.
Fig. 2 is a schematic network layer structure diagram of a decision network in the embodiment of the present application.
Fig. 3 is a schematic network layer structure diagram of the countermeasure arbiter in the embodiment of the present application.
Fig. 4 is a structural framework diagram of model training of a robot gait autonomous learning method according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a robot gait autonomous learning device according to an embodiment of the present application.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Reference numerals: 201. an action acquisition module; 202. a scene acquisition module; 203. a decision-making module; 204. a discrimination module; 205. a first training module; 206. a second training module; 301. a processor; 302. a memory; 303. a communication bus.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
In the prior art, a motion capture system is adopted to acquire biological motion capture data and deploy the biological motion capture data to a robot for gait learning of the robot, but the learning mode is only to simulate the biological motion process and is equivalent to a teaching process, and a generated decision model still lacks flexibility and stability.
In a first aspect, please refer to fig. 1-4, fig. 1-4 are diagrams of a robot gait autonomous learning method for enabling a robot to autonomously simulate learning of a biological gait, in some embodiments of the present application, the method includes the following steps:
s1, acquiring motion capture information of the creature to be imitated;
specifically, the motion capture information is data information capable of representing the movement of the biological body to be simulated, and the motion capture information at least comprises gait movement data information generated corresponding to the movement of the biological body to be simulated, because the method of the embodiment of the application is used for enabling the robot to simulate the gait of the biological body to be studied autonomously.
More specifically, to clearly reflect the biological movement behavior to be simulated, the gait motion data information generated by the movement should contain dynamic data of joint angles of the trunk and four feet (hip, knee, ankle, foot).
More specifically, the creature to be simulated is a biped creature or a quadruped creature, and a quadruped creature is preferable in the present embodiment so that the method of the present embodiment can be applied to a quadruped robot.
S2, acquiring scene state information of the creature to be imitated;
specifically, the creature has different gait motions in different scene terrains, such as the effect of an undulating field on the height of the raised foot, and the like, and therefore, the gait of the creature has relevance to the scene.
More specifically, the scene state information is used for reflecting morphological characteristics of the field where the creature to be simulated is located, including but not limited to terrain state characteristics.
More specifically, the scene state information may be generated by a visual sensor, lidar, or other device acquisition for environmental data acquisition.
S3, constructing a decision network according to the scene state information;
specifically, the decision network is an algorithm network capable of generating an action decision according to scene state information, and belongs to an action decision model which is not optimized yet, and therefore, the decision network is a control algorithm capable of enabling the robot to give the action decision of each key point according to the current scene state, namely, the control algorithm is a prototype of the action decision model.
S4, constructing a countermeasure discriminator according to the decision network and the motion capture information;
specifically, the countermeasure discriminator is a discriminator established based on the generation countermeasure network, and is capable of outputting a discrimination result based on the input output result of the decision network and the motion capture information.
More specifically, to ensure that the countermeasure discriminator is applied to the decision network and the motion capture information, the step S3 should construct the decision network such that the output result of the decision network is the same dimension as the motion capture information.
S5, fixing a decision network, and training a confrontation discriminator in confrontation so as to optimally distinguish the output results of the motion capture information and the decision network;
specifically, the method of the embodiment of the present application is designed based on the structural frame shown in fig. 4, and specifically includes: designing a confrontation discriminator capable of clearly distinguishing the output results of the motion capture information and the decision network, and guiding the decision network to carry out learning training by using the confrontation discriminator so that the output result of the decision network is as close to the motion capture information as possible, and the robot can generate a gait motion decision which is as similar as possible to the creature to be simulated according to the scene state information; therefore, before training the decision network, a confrontation discriminator capable of clearly distinguishing the motion capture information from the output result of the decision network needs to be obtained, and the stronger the discrimination capability of the confrontation discriminator is, the better the imitation effect generated by the subsequent training of the decision network can be optimized.
More specifically, to reduce the amount of operation data and simplify the optimization process of the countermeasure discriminator, the step first fixes a decision network, and then puts the motion capture information and the output result of the fixed decision network into the countermeasure discriminator, so as to realize rapid optimization of the countermeasure discriminator.
And S6, training a decision network according to the trained confrontation discriminator, and enabling the output result of the decision network to gradually approach the motion capture information so as to generate a motion decision model for controlling the gait of the robot.
Specifically, as can be seen from the foregoing, the second stage of the training process is to train the decision network, and therefore, based on the step S5, an confrontation discriminator capable of clearly distinguishing the output result of the motion capture information from the output result of the decision network is obtained, and the trained confrontation discriminator is used to discriminate whether the output result of the decision network is close to the motion capture information in the step S6, so as to gradually update the decision network, so that the output result of the decision network is as close to the motion capture information as possible, and after the training is completed, the decision network can generate an output result (i.e., a motion decision) nearly identical to the motion capture information according to the scene state information, so that the decision network can be used as a motion decision model for controlling the gait of the robot at this time, so that the robot can autonomously simulate and learn biological gait.
According to the robot gait autonomous learning method, the countermeasure discriminator is used for countertraining by utilizing the decision network and the motion capture information which are constructed based on the scene state information, and then the decision network is trained by the countermeasure discriminator, so that the rapid training of the motion decision model is realized.
In some preferred embodiments, the step of obtaining motion capture information of the living being to be mimicked comprises:
s11, capturing pose information of mark points of the living beings to be simulated based on an optical and inertial fusion method, wherein the mark points are reflecting points arranged on the living beings to be simulated;
specifically, the optical and inertial fusion method is a capturing method of a kinetic capture system, and therefore, this step can also be understood as capturing pose information of marker points of a living organism to be simulated by the kinetic capture system.
And S12, generating motion capture information based on the pose information of the mark points.
Specifically, the optical and inertial fusion method can capture the motion action of a to-be-simulated organism with high precision and high reliability, supposing that n reflecting points are arranged on the to-be-simulated organism, and the gait and motion tracking and positioning are carried out on a certain single action of the to-be-simulated organism through the n reflecting points, so that the pose information (including three-dimensional space coordinate information and degree-of-freedom information) of the n reflecting points is acquired, and the dynamic data of joint angles, such as the head trunk, the four feet (hip, knee, ankle and foot), and the like, and the data of space coordinates and the like of the to-be-simulated organism quadruped animal are calculated based on the arrangement position of the reflecting points on the to-be-simulated organism to form action capture information, and the action capture information can clearly reflect the gait characteristics of the to-be-simulated organism and can provide an action data basis for subsequent gait autonomous learning.
More specifically, the method of the embodiment of the application can quickly and accurately confirm the pose information of each joint of the robot by acquiring the motion capture information based on the process, the output result of the decision network is consistent with the dimension of the motion capture information, the method can be directly used for controlling the pose of the joint of the robot, the complex environment transplantation and construction can be omitted, and the time from algorithm to deployment can be effectively shortened.
In some preferred embodiments, the decision network includes a plurality of convolutional layers, batchnorm layers, and fully-connected layers connected in sequence, and is configured to generate an output result according to the scene state information, where the output result is a regularized action vector.
Specifically, as shown in fig. 2, in the embodiment of the present application, the decision network preferably includes four convolutional layers (Conv 1, Conv2, Conv3, and Conv 4), a batchnorm layer, and a fully-connected layer (full connect) which are sequentially arranged, wherein each of the four convolutional layer convolution results is output by a ReLU function, and an output result of the fully-connected layer is output by a tanh function.
More specifically, four convolutional layers are arranged to perform gradual information subdivision and feature extraction on scene state information (generally, scene pictures) to obtain subdivision feature information, the subdivision feature information can be normalized by using a batchnorm layer, the subdivision feature information is integrated into one action vector by using a full connection layer, and finally the action vector is normalized by using a tanh function to generate an output result.
More specifically, the decision network having the network layer structure can output an output result having a dimension that is consistent with the dimension of the motion capture information based on the scene state information.
In some preferred embodiments, as shown in fig. 3, the countermeasure arbiter includes a convolution layer (conv 1), a convolution layer (conv 2), a batchnorm layer, a convolution layer (conv 3), a flitten layer, and a fully-connected layer (full connect) arranged in this order, wherein each of the three convolution layers is output by an lreul function, and the fully-connected layer (full connect) is output by a sigmoid function.
In some preferred embodiments, ReLU, tanh, lreuu, and sigmoid are expressed as:
Figure 668737DEST_PATH_IMAGE005
(1)
Figure 712916DEST_PATH_IMAGE006
(2)
Figure 377116DEST_PATH_IMAGE007
(3)
Figure 899364DEST_PATH_IMAGE008
(4)
wherein the content of the first and second substances,ain order to design the parameters of the device,xfor the characteristic information of the output of the previous network layer, in the embodiment of the present application, ReLU, tanh, LReLU, and sigmoid are respectively denoted as ReLU, tanh, LReLU, and sigmoidf G O G f D AndO D the convolutional layer is represented as
Figure 83352DEST_PATH_IMAGE009
Definition ofD(. cndot.) is a confrontation discriminator,sas the information on the state of the scene,G(. to) is a decision network, then:
Figure 32853DEST_PATH_IMAGE010
(5)
Figure 653191DEST_PATH_IMAGE011
(6)
in which, as shown in figures 2 and 3,
Figure 353031DEST_PATH_IMAGE012
to determine the network parameters of the convolutional layer and the fully-connected layer corresponding to the network,
Figure 516159DEST_PATH_IMAGE013
recording network parameters of convolutional layer and full-link layer corresponding to the countermeasure arbiter
Figure 761196DEST_PATH_IMAGE014
Figure 275354DEST_PATH_IMAGE015
θIn order to combat the network parameters of the arbiter,
Figure 14771DEST_PATH_IMAGE004
to decide the network parameters of the network, therefore, step S5 can be understood as training acquisitionθStep S6 can be understood as training acquisition
Figure 297985DEST_PATH_IMAGE004
In the training process, a matrix whose elements satisfy N (0, 1) is generally initialized,b 1andb 2are respectively astanhFunction sumsigmoidA compensation parameter of the function.
In some preferred embodiments, the step of fixing the decision network and training the confrontation discriminant against to optimally distinguish the motion capture information from the output of the decision network comprises:
s51, fixing network parameters of each network layer in the decision network;
specifically, the network parameters of the fixed decision network can fix the decision network, i.e. in the embodiment of the present application, the fixed decision network
Figure 713922DEST_PATH_IMAGE015
S52, judging the difference degree of the output results of the action capture information and the fixed decision network by using a confrontation discriminator;
specifically, the motion capture information and the output result of the fixed decision network may be input into the countermeasure arbiter, respectively, and the comparison of the magnitudes of the discrimination results may be performed, and an objective function for comparing the discrimination results may also be established to perform the comparison, so as to reflect the degree of difference between the motion capture information and the output result of the fixed decision network.
S53, training the confrontation discriminant to maximize the difference degree so as to optimally distinguish the output results of the motion capture information and the decision network.
Specifically, the maximization of the degree of difference indicates that the discriminator can discriminate the difference between the motion capture information and the output result of the decision network to the maximum extent, so in the embodiment of the present application, the training process may be regarded as a 01-binary classification problem, and the objective is to make the countermeasure discriminator discriminate the motion capture information as true (i.e. 1) and discriminate the output result of the decision network as false (i.e. 0), thereby establishing a training basis for the subsequently trained decision network.
In some preferred embodiments, the step of using the confrontation discriminator to discriminate the degree of difference between the output of the motion capture information and the fixed decision network comprises:
setting a first objective function for representing the difference degree according to the discrimination result of the countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the first objective function is
Figure 715376DEST_PATH_IMAGE001
And satisfies the following conditions:
Figure 383118DEST_PATH_IMAGE002
(7)
wherein, the first and the second end of the pipe are connected with each other,θin order to combat the network parameters of the arbiter,D(. cndot.) is a confrontation discriminator,yin order to capture the information for the motion,sin order to be the scene-state information,G 1 (. cndot.) is a fixed decision network.
Specifically, the first objective function is calculated by the logarithmic difference of the discrimination results of the confrontation discriminator, and the degree of difference between the two discrimination results can be reflected intuitively based on the numeralization.
In some preferred embodiments, the step of training the confrontation discriminant to maximize the degree of the difference comprises:
updating according to a gradient ascent methodθLet the first objective function
Figure 629161DEST_PATH_IMAGE001
And (4) maximizing.
Specifically, according to the gradient ascent method, there are:
Figure 356945DEST_PATH_IMAGE016
(8)
wherein the content of the first and second substances,αfor the learning rate, it is generally set to 0.05,θ t network parameters of the confrontation arbiter related to the rise of time gradient;
by usingθ t Continuously updating the first objective function
Figure 704750DEST_PATH_IMAGE001
In aθ t After convergence, the first objective function
Figure 910603DEST_PATH_IMAGE001
To the maximum, willθ t Is determined asθI.e. determining the first objective function
Figure 778196DEST_PATH_IMAGE001
Network parameters of countermeasure arbiter at maximum
Figure 676882DEST_PATH_IMAGE014
In this case, the countermeasure discriminator can maximally distinguish the motion capture information from the output result of the decision network, i.e., can accurately obtain the difference between the motion capture information and the output result of the decision network, and can be used as a judgment means for judging whether the robot decision motion is accurate or not.
In some preferred embodiments, the step of training the decision network according to the trained confrontation discriminant so that the output result of the decision network gradually approaches the motion capture information to generate the motion decision model for controlling the gait of the robot includes:
s61, judging the difference degree of the output results of the motion capture information and the decision network by using the trained confrontation discriminator;
specifically, the step is to apply a first objective function
Figure 246404DEST_PATH_IMAGE001
Fixing (i.e. fixing) the confrontation criterion at maximum
Figure 255948DEST_PATH_IMAGE014
) While de-pinning the decision network, and thenAnd inputting the output result of the decision network and the motion capture information into the fixed confrontation discriminator to obtain the difference degree of the output result of the decision network and the motion capture information.
And S62, training the decision network to minimize the difference degree so as to generate an action decision model for controlling the gait of the robot.
Specifically, minimizing the difference indicates that the decision network can output the output result closest to the motion capture information according to the scene state information to the maximum extent, i.e. by adjusting
Figure 742162DEST_PATH_IMAGE015
Such that the discrimination result of the countermeasure discriminator is adjusted to true (i.e., 1) as much as possible for the output result of the decision network.
In some preferred embodiments, the step of using the trained confrontation discriminator to discriminate the degree of difference between the output of the decision network and the motion capture information comprises:
setting a second objective function for representing the difference degree according to the judgment result of the trained confrontation discriminator on the scene state information and the judgment result of the confrontation discriminator on the output result of the fixed decision network, wherein the second objective function is
Figure 546170DEST_PATH_IMAGE003
Figure 868567DEST_PATH_IMAGE004
To decide network parameters of the network.
In some embodiments of the present invention, the substrate is,
Figure 681802DEST_PATH_IMAGE003
satisfies the following conditions:
Figure 523987DEST_PATH_IMAGE017
(ii) a I.e. in the first objective functionθIs fixed and will
Figure 498896DEST_PATH_IMAGE004
Set as variable, for the objective function, the smaller the output value, the better the simulation effect of the output result representing the decision network, therefore, the further simplification can be realized
Figure 308590DEST_PATH_IMAGE003
In the embodiments of the present application, the first and second,
Figure 659936DEST_PATH_IMAGE003
preferably simplified to:
Figure 746841DEST_PATH_IMAGE018
(9)
the formula (9) omits the discriminant item of the motion capture information, and the simulation effect of the decision network can be quickly obtained only by substituting the countermeasure discriminant on the data result of the decision network, thereby effectively simplifying the operation logic of the step S6, reducing the data calculation amount and improving the training efficiency of the decision network.
In some preferred embodiments, the step of training the decision network to minimise the degree of discrepancy comprises:
updating according to a gradient descent method
Figure 532132DEST_PATH_IMAGE004
Let the second objective function
Figure 438908DEST_PATH_IMAGE003
And (4) minimizing.
Specifically, according to the gradient descent method, there are:
Figure 453001DEST_PATH_IMAGE019
(10)
wherein the content of the first and second substances,βfor the learning rate, it is generally set to 0.05,
Figure 394412DEST_PATH_IMAGE020
network parameters of the decision network with respect to time gradient descent;
by using
Figure 586490DEST_PATH_IMAGE020
Continuously updating the second objective function
Figure 980562DEST_PATH_IMAGE003
In a
Figure 798346DEST_PATH_IMAGE020
After convergence, the second objective function
Figure 594263DEST_PATH_IMAGE003
The minimization is achieved, so that for the trained confrontation discriminator, the output result of the decision network is most similar to the motion capture information, namely, the output result of the decision network is automatically and gradually updated to approach the motion capture information, and the rapid approach of motion decision to the motion capture information is realized; second objective function
Figure 721357DEST_PATH_IMAGE003
When minimum is reached
Figure 602725DEST_PATH_IMAGE020
Is extracted from
Figure DEST_PATH_IMAGE021
I.e. determining the second objective function
Figure 427462DEST_PATH_IMAGE003
And (3) minimizing the network parameters of the decision network, wherein the decision network can output the output result similar to the motion capture information to the maximum extent, so that the decision network can be used as a motion decision model.
More specifically, the gradient descent updating of the second objective function can quickly obtain the optimal decision network, so that the countermeasure arbiter cannot distinguish between the motion capture information and the output result of the decision network, which indicates that the output result of the decision network is approximately consistent with the motion capture information.
More specifically, the loss function shown in FIG. 4 is a first objective function
Figure 218831DEST_PATH_IMAGE001
Or a second objective function
Figure 142925DEST_PATH_IMAGE003
That is, the method of the embodiment of the present application establishes the first objective function successively
Figure 636223DEST_PATH_IMAGE001
And a second objective function
Figure 405596DEST_PATH_IMAGE003
The training of the action decision model is implemented as a loss function.
In some preferred embodiments, the method further comprises the steps of:
and S7, deploying the action decision model on the robot, and performing robot gait control by combining the dynamic capturing system to verify the gait control effect of the action decision model.
Specifically, when the verification result of step S7 is not good, steps S1-S7 are repeated to train the optimization action decision model.
More specifically, step S7 is mainly to verify the control agility and flexibility of the action decision model.
More specifically, the trained data can be deployed on the robot through calculation processes such as coordinate transformation, inverse kinematics and dynamics, so that the gait generated by the robot can be normally applied outside the space of the motion capture system.
In a second aspect, please refer to fig. 5, fig. 5 is a device for autonomous gait learning of a robot provided in some embodiments of the present application, wherein the device is used for enabling the robot to autonomously simulate learning of a biological gait, and the device includes:
a motion acquisition module 201, configured to acquire motion capture information of a living being to be simulated;
a scene acquisition module 202, configured to acquire scene state information of an organism to be simulated;
a decision module 203, configured to construct a decision network according to the scene state information;
the judgment module 204 is used for constructing a countermeasure judgment device according to the decision network and the motion capture information;
a first training module 205 for fixing a decision network, and training a confrontation discriminant against to optimally distinguish the output results of the decision network from the motion capture information;
the second training module 206 is configured to train a decision network according to the trained confrontation discriminator, so that an output result of the decision network gradually approaches the motion capture information, so as to generate a motion decision model for controlling a gait of the robot.
The robot gait autonomous learning device provided by the embodiment of the application utilizes a countermeasure discriminator for countermeasure training of a decision network and motion capture information constructed based on scene state information and trains the decision network through the countermeasure discriminator, so that the rapid training of a motion decision model is realized.
In some preferred embodiments, the apparatus further comprises:
and the verification module is used for deploying the action decision model on the robot and carrying out robot gait control by combining the action capture system so as to verify the gait control effect of the action decision model.
In some preferred embodiments, the robot gait autonomous learning apparatus of the embodiments of the present application is configured to perform the robot gait autonomous learning method provided in the first aspect.
In a third aspect, referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, where the present application provides an electronic device including: the processor 301 and the memory 302, the processor 301 and the memory 302 being interconnected and communicating with each other via a communication bus 303 and/or other form of connection mechanism (not shown), the memory 302 storing a computer program executable by the processor 301, the processor 301 executing the computer program when the computing device is running to perform the method of any of the alternative implementations of the embodiments described above.
In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program performs the method in any optional implementation manner of the foregoing embodiments. The storage medium may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
In summary, the embodiment of the application provides a robot gait autonomous learning method, a device, an electronic device and a storage medium, wherein the method utilizes a decision network constructed based on scene state information and action capture information to train an confrontation discriminator, and then trains the decision network through the confrontation discriminator, so that the action decision model is rapidly trained.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for autonomous gait learning of a robot for the robot to autonomously simulate learning a biological gait, the method comprising the steps of:
acquiring motion capture information of a living being to be simulated;
acquiring scene state information of the creature to be imitated;
constructing a decision network according to the scene state information;
constructing a countermeasure discriminator according to the decision network and the motion capture information;
fixing the decision network, and training the countermeasure discriminator in a countermeasure mode so as to optimally distinguish the motion capture information from the output result of the decision network;
and training the decision network according to the trained confrontation discriminator to enable the output result of the decision network to gradually approach the motion capture information so as to generate a motion decision model for controlling the gait of the robot.
2. The robotic gait autonomous learning method according to claim 1, wherein the step of acquiring motion capture information of a living being to be simulated includes:
capturing pose information of mark points of the living beings to be simulated based on an optical and inertial fusion method, wherein the mark points are reflecting points arranged on the living beings to be simulated;
generating the motion capture information based on pose information of the marker points.
3. The robot gait autonomous learning method according to claim 1, wherein the decision network includes a plurality of convolutional layers, batchnorm layers, and full-link layers connected in sequence, and is configured to generate an output result according to the scene state information, where the output result is a regularized motion vector.
4. The method of claim 1, wherein the step of fixing the decision network and opportunistically training the opponent arbiter to optimally distinguish the motion capture information from the output of the decision network comprises:
fixing network parameters of each network layer in the decision network;
judging the difference degree of the output results of the action capture information and the fixed decision network by using the confrontation discriminator;
training the confrontation discriminant to maximize the degree of difference to optimally distinguish the motion capture information from the output of the decision network.
5. The robot gait autonomous learning method according to claim 4, wherein the step of discriminating, with the confrontation discriminator, a degree of difference between the motion capture information and the output result of the decision network after fixation includes:
setting a first objective function for representing the difference degree according to the discrimination result of the countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the first objective function is
Figure DEST_PATH_IMAGE001
And satisfies the following conditions:
Figure 512120DEST_PATH_IMAGE002
wherein the content of the first and second substances,θas a network parameter of the countermeasure arbiter,D() is the confrontation discriminator,yfor the purpose of capturing information for the motion,sin order to be the scene-state information,G 1 () is the decision network after fixing.
6. The method of claim 1, wherein the step of training the decision network according to the trained confrontation discriminant to gradually bring the output result of the decision network close to the motion capture information to generate a motion decision model for controlling a gait of the robot comprises:
judging the difference degree of the output results of the motion capture information and the decision network by using the trained confrontation discriminator;
training the decision network to minimize the degree of difference to generate an action decision model for controlling robot gait.
7. The robot gait autonomous learning method according to claim 6, wherein the step of discriminating the degree of difference between the motion capture information and the output result of the decision network using the trained confrontation discriminator includes:
setting a second objective function for representing the difference degree according to the discrimination result of the trained countermeasure discriminator on the scene state information and the discrimination result of the countermeasure discriminator on the output result of the fixed decision network, wherein the second objective function is
Figure DEST_PATH_IMAGE003
Figure DEST_PATH_IMAGE005
Is a network parameter of the decision network.
8. A robotic gait self-learning apparatus for enabling a robot to autonomously simulate learning a biological gait, the apparatus comprising:
the motion acquisition module is used for acquiring motion capture information of a living being to be simulated;
the scene acquisition module is used for acquiring the scene state information of the creature to be imitated;
the decision module is used for constructing a decision network according to the scene state information;
the judgment module is used for constructing a confrontation discriminator according to the decision network and the motion capture information;
the first training module is used for fixing the decision network and training the confrontation discriminator in a confrontation way so as to optimally distinguish the motion capture information from the output result of the decision network;
and the second training module is used for training the decision network according to the trained confrontation discriminator so as to enable the output result of the decision network to gradually approach the motion capture information, so as to generate a motion decision model for controlling the gait of the robot.
9. An electronic device comprising a processor and a memory, said memory storing computer readable instructions which, when executed by said processor, perform the steps of the method according to any one of claims 1 to 7.
10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the method according to any one of claims 1-7.
CN202210544154.1A 2022-05-19 2022-05-19 Robot gait autonomous learning method and device, electronic equipment and storage medium Active CN114660947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210544154.1A CN114660947B (en) 2022-05-19 2022-05-19 Robot gait autonomous learning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210544154.1A CN114660947B (en) 2022-05-19 2022-05-19 Robot gait autonomous learning method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114660947A true CN114660947A (en) 2022-06-24
CN114660947B CN114660947B (en) 2022-07-29

Family

ID=82037534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210544154.1A Active CN114660947B (en) 2022-05-19 2022-05-19 Robot gait autonomous learning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114660947B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202017106132U1 (en) * 2016-10-10 2017-11-13 Google Llc Neural networks for selecting actions to be performed by a robot agent
CN108803874A (en) * 2018-05-30 2018-11-13 广东省智能制造研究所 A kind of human-computer behavior exchange method based on machine vision
WO2018206504A1 (en) * 2017-05-10 2018-11-15 Telefonaktiebolaget Lm Ericsson (Publ) Pre-training system for self-learning agent in virtualized environment
US20180345495A1 (en) * 2017-05-30 2018-12-06 Sisu Devices Llc Robotic point capture and motion control
CN109968355A (en) * 2019-03-08 2019-07-05 北京工业大学 A kind of method that humanoid robot gait's balance model is established
US20200090042A1 (en) * 2017-05-19 2020-03-19 Deepmind Technologies Limited Data efficient imitation of diverse behaviors
US20200086483A1 (en) * 2018-09-15 2020-03-19 X Development Llc Action prediction networks for robotic grasping
CN110991027A (en) * 2019-11-27 2020-04-10 华南理工大学 Robot simulation learning method based on virtual scene training
CN111136659A (en) * 2020-01-15 2020-05-12 南京大学 Mechanical arm action learning method and system based on third person scale imitation learning
CN111461437A (en) * 2020-04-01 2020-07-28 北京工业大学 Data-driven crowd movement simulation method based on generation of confrontation network
CN111868759A (en) * 2018-04-11 2020-10-30 三星电子株式会社 System and method for active machine learning
CN111856925A (en) * 2020-06-02 2020-10-30 清华大学 State trajectory-based confrontation type imitation learning method and device
WO2021009293A1 (en) * 2019-07-17 2021-01-21 Deepmind Technologies Limited Training a neural network to control an agent using task-relevant adversarial imitation learning
CN113298252A (en) * 2021-05-31 2021-08-24 浙江工业大学 Strategy abnormity detection method and device for deep reinforcement learning
CN113777917A (en) * 2021-07-12 2021-12-10 山东建筑大学 Bionic robot fish scene perception system based on Mobilenet network
CN113970030A (en) * 2021-10-25 2022-01-25 季华实验室 Six-foot pipeline robot system with self-adjusting function

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE202017106132U1 (en) * 2016-10-10 2017-11-13 Google Llc Neural networks for selecting actions to be performed by a robot agent
WO2018206504A1 (en) * 2017-05-10 2018-11-15 Telefonaktiebolaget Lm Ericsson (Publ) Pre-training system for self-learning agent in virtualized environment
US20200090042A1 (en) * 2017-05-19 2020-03-19 Deepmind Technologies Limited Data efficient imitation of diverse behaviors
US20180345495A1 (en) * 2017-05-30 2018-12-06 Sisu Devices Llc Robotic point capture and motion control
CN111868759A (en) * 2018-04-11 2020-10-30 三星电子株式会社 System and method for active machine learning
CN108803874A (en) * 2018-05-30 2018-11-13 广东省智能制造研究所 A kind of human-computer behavior exchange method based on machine vision
US20200086483A1 (en) * 2018-09-15 2020-03-19 X Development Llc Action prediction networks for robotic grasping
CN109968355A (en) * 2019-03-08 2019-07-05 北京工业大学 A kind of method that humanoid robot gait's balance model is established
WO2021009293A1 (en) * 2019-07-17 2021-01-21 Deepmind Technologies Limited Training a neural network to control an agent using task-relevant adversarial imitation learning
CN110991027A (en) * 2019-11-27 2020-04-10 华南理工大学 Robot simulation learning method based on virtual scene training
CN111136659A (en) * 2020-01-15 2020-05-12 南京大学 Mechanical arm action learning method and system based on third person scale imitation learning
CN111461437A (en) * 2020-04-01 2020-07-28 北京工业大学 Data-driven crowd movement simulation method based on generation of confrontation network
CN111856925A (en) * 2020-06-02 2020-10-30 清华大学 State trajectory-based confrontation type imitation learning method and device
CN113298252A (en) * 2021-05-31 2021-08-24 浙江工业大学 Strategy abnormity detection method and device for deep reinforcement learning
CN113777917A (en) * 2021-07-12 2021-12-10 山东建筑大学 Bionic robot fish scene perception system based on Mobilenet network
CN113970030A (en) * 2021-10-25 2022-01-25 季华实验室 Six-foot pipeline robot system with self-adjusting function

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BARHATE, N.等: "Offline Imitation Learning for Robotic Control using Contrastive methods", 《2021 INTERNATIONAL CONFERENCE ON COMMUNICATION INFORMATION AND COMPUTING TECHNOLOGY》 *
LIU, JY等: "Compliant Control of a Space Robot with Multi Arms for Capturing Large Tumbling Target", 《2017 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION》 *
吴磊等: "基于LM算法的BP神经网络在NAO模型运动学求逆解中的应用", 《软件导刊》 *
宋子轩: "基于生成对抗网络的多模态异常步态识别", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
石磊: "基于CPG网络的步态运动及环境自适应研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN114660947B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
US9019278B2 (en) Systems and methods for animating non-humanoid characters with human motion data
CN111325318B (en) Neural network training method, neural network training device and electronic equipment
JP2009157948A (en) Robot apparatus, face recognition method, and face recognition apparatus
Leiva et al. Robust rl-based map-less local planning: Using 2d point clouds as observations
US20110208685A1 (en) Motion Capture Using Intelligent Part Identification
Pang et al. Efficient hybrid-supervised deep reinforcement learning for person following robot
CN109940614A (en) A kind of quick motion planning method of the more scenes of mechanical arm merging memory mechanism
Wahby et al. A robot to shape your natural plant: the machine learning approach to model and control bio-hybrid systems
Saeedvand et al. Hierarchical deep reinforcement learning to drag heavy objects by adult-sized humanoid robot
Buonamente et al. Hierarchies of self-organizing maps for action recognition
CN115761905A (en) Diver action identification method based on skeleton joint points
Hirose et al. ExAug: Robot-conditioned navigation policies via geometric experience augmentation
CN114660947B (en) Robot gait autonomous learning method and device, electronic equipment and storage medium
CN117115911A (en) Hypergraph learning action recognition system based on attention mechanism
Yellapantula Synthesizing realistic data for vision based drone-to-drone detection
Buonamente et al. Recognizing actions with the associative self-organizing map
CN117238026B (en) Gesture reconstruction interactive behavior understanding method based on skeleton and image features
Wang et al. Transfer Reinforcement Learning of Robotic Grasping Training using Neural Networks with Lateral Connections
Tavella et al. Signs of Language: Embodied Sign Language Fingerspelling Acquisition from Demonstrations for Human-Robot Interaction
Hu et al. Hybrid learning architecture for fuzzy control of quadruped walking robots
Luo et al. Traffic sign recognition in outdoor environments using reconfigurable neural networks
Lakaemper et al. Using virtual scans for improved mapping and evaluation
CN115454096B (en) Course reinforcement learning-based robot strategy training system and training method
Gao Sensor fusion and stroke learning in robotic table tennis
Ruud Reinforcement learning with the TIAGo research robot: manipulator arm control with actor-critic reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant