CN112454390B - Humanoid robot facial expression simulation method based on deep reinforcement learning - Google Patents
Humanoid robot facial expression simulation method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN112454390B CN112454390B CN202011355989.XA CN202011355989A CN112454390B CN 112454390 B CN112454390 B CN 112454390B CN 202011355989 A CN202011355989 A CN 202011355989A CN 112454390 B CN112454390 B CN 112454390B
- Authority
- CN
- China
- Prior art keywords
- humanoid robot
- reinforcement learning
- facial
- deep reinforcement
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J11/00—Manipulators not otherwise provided for
- B25J11/0005—Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
- B25J11/0015—Face robots, animated artificial faces for imitating human expressions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1661—Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Automation & Control Theory (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Fuzzy Systems (AREA)
- Manipulator (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, which comprises the following steps: step 1, acquiring a face picture of a target to be simulated, and performing face action unit vector prediction on the face picture by a depth reinforcement learning algorithm operated by a humanoid robot entity to obtain a corresponding face action unit vector and a corresponding motor action vector; step 2, acting the motor motion vector on the humanoid robot of the entity, and capturing the corresponding facial expression of the humanoid robot; and 3, taking the corresponding facial expression as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face. The method reduces the times of training on the humanoid robot entity, avoids influencing the service life of the humanoid robot hardware, and can ensure accurate imitation of the target human face expression.
Description
Technical Field
The invention relates to the field of robot facial expression simulation, in particular to a method for simulating facial expression of a humanoid robot based on deep reinforcement learning.
Background
In human face-to-face communication, over 55% of the information is transmitted through the face. Because the humanoid robot has a humanoid face, the facial expression of the humanoid robot also plays a very important role in human-computer interaction, such as emotion expression and the like. In order to enable the humanoid robot to generate human recognizable facial expressions, one research content is that the humanoid robot can simulate human facial expressions and present the human facial expressions through the linkage effect of a head rigid motion structure and a face flexible material.
At present, the simulation method of the facial expression of the humanoid robot comprises a manual preset method, a feature point mapping method, an end-to-end network training method and the like. In the manual presetting method, because a series of motor motion vectors corresponding to basic expressions such as 'happy' and 'sad' closely related to a rigid motion structure of the humanoid robot need to be arranged in advance, the number of expressions which can be imitated by the humanoid robot is limited and fixed, and the facial expression type of the human face to be presented also needs to be fixedThe method has the defect that the capability of imitating the expressions of the humanoid robot is limited to a great extent in the range of realizing the arranged expression categories. Although the method for mapping the feature points realizes real-time simulation of facial expressions of the humanoid robot by using the motion capture system, the method needs to paste mark points on the faces of the human and the humanoid robot so as to realize the linear mapping relation of the positions of the corresponding feature points of the faces of the human and the humanoid robot, or further learns the mapping relation between the feature points of the faces and the motor values by using a machine learning method, but the machine learning method needs to train on the entity of the humanoid robot so as to obtain the mapping relation, but the method is not practical to paste special mark points on the faces of the humanoid robot in the real human-computer interaction process, so that the method is poor in convenience in use in an actual application scene. The end-to-end network training method realizes end-to-end training on the basis of manually arranging a large number of real sample data sets related to the humanoid robot entity and constructing a network model. Thereby obtaining the mapping relation between the facial action unit or the facial feature point and the motor value. Compared with a method of manual presetting and feature point mapping, the method of end-to-end network training can improve the richness of the imitation generation of the facial expression of the humanoid robot, but the method needs to manually arrange a large number of real sample data sets related to the humanoid robot, the whole training process is executed on a robot entity, the real sample data sets need to be rearranged when a new humanoid robot faces, and the model is constructed by a machine learning method again to carry out end-to-end training. And the output value of the motor of the humanoid robot for driving the rigid motion structure is a continuous value, if the output value of each motor is discretized, the quantity of the motor action vectors which can be arranged manually is exponentially increased (for example, N motors are provided, the rotation range of each motor is discretized into M, and the quantity of the motor action vectors which can be arranged at the moment is MNOne) and also needs to take into account the programmed motor motion directionWhether the volume is capable of producing a corresponding real facial expression also causes a lot of time consumption in arranging the data set.
Therefore, how to provide a method for simulating the facial expression of the humanoid robot, which avoids manual arrangement, reduces the training times of the humanoid robot entity and reduces the hardware life consumption, is a problem to be solved.
Disclosure of Invention
Based on the problems in the prior art, the invention aims to provide a method for simulating the facial expression of a humanoid robot based on deep reinforcement learning, which can solve the problems that the existing end-to-end network training method for simulating the facial expression of the humanoid robot needs to manually arrange a data set, the time consumption is long, the training times on a humanoid robot entity are more, and the service life of hardware is consumed.
The purpose of the invention is realized by the following technical scheme:
the embodiment of the invention provides a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, which comprises the following steps:
the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained;
and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.
According to the technical scheme provided by the invention, the method for simulating the facial expression of the humanoid robot based on the deep reinforcement learning has the following beneficial effects:
because the pre-training neural network model which does not run on the humanoid robot entity is adopted, the real face data set is pre-trained, and then the pre-trained neural network model and the parameters are transferred to the deep reinforcement learning algorithm running on the humanoid robot entity, so that the actual training times on the humanoid robot entity are reduced; in addition, the pre-training neural network model can take a real face data set as training data, so that the work of manually arranging the data set is not needed; the method can greatly reduce manual arrangement work and actual training times on the humanoid robot entity, and does not need special auxiliary equipment to realize the simulation of the facial expression of the humanoid robot. The method can be conveniently applied to different humanoid robot entities according to the relation between the rigid motion structure and the face action unit.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention;
fig. 2 is a training flowchart of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a pre-trained neural network model in the method according to the embodiment of the present invention;
FIG. 4 is a schematic diagram of a building block of a deep reinforcement learning algorithm in the method according to the embodiment of the present invention;
fig. 5 is an overall architecture diagram of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the specific contents of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, including:
the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained;
and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.
In step 1 of the above method, in step 1, performing, by using a depth-enhanced learning algorithm running on a humanoid robot entity, facial action unit vector prediction on a target face image:
and performing face action unit vector prediction on the picture of the target face part obtained by cutting the target face picture by using a depth reinforcement learning algorithm.
Referring to fig. 2, the method includes the step of pre-training the pre-training neural network model, specifically:
in the method, before step 1, the method further comprises the steps of pre-training the pre-trained neural network model and training a deep reinforcement learning algorithm, and specifically comprises the following steps:
adopting a real human face data set consisting of human face pictures and corresponding facial action unit vectors as a training data set, and screening out label dimensions required for simulating the facial expression of the humanoid robot according to the corresponding relation between a rigid motion structure in the head of the humanoid robot and the facial action units;
pre-training the pre-training neural network model by taking a face image in the real face data set as input and a face action unit vector corresponding to the face image in the real face data set as output, and determining the output of the pre-training neural network model by the label dimension screened in the step 11;
migrating the pre-trained neural network model and the pre-trained parameters to a deep reinforcement learning algorithm running on the humanoid robot entity;
and training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot.
In step 11 of the method, according to the corresponding relationship between the rigid motion structure inside the head of the humanoid robot and the facial action unit, the label dimensions required for simulating the facial expression of the humanoid robot are screened out as follows:
and if the humanoid robot can realize a certain action unit described in the facial action coding system, selecting a label corresponding to the action dimension in the real human face data set. Specifically, the generation of the facial expression of the humanoid robot is realized by driving an external flexible material by a rigid motion structure positioned in the head of the robot, so that deformation is generated and presented, the facial expression of the human face is generated by pulling skin by muscle tissues positioned under the skin, and the facial expression and the skin have certain similarity, so that the label dimension is screened according to the above.
In step 12 of the above method, the output of the pre-trained neural network model is determined by the label dimensions screened in step 11 as follows:
and determining the output dimension size and the corresponding meaning of each dimension of the pre-training neural network model according to the dimension size and the meaning of the label screened in the step 11.
In step 13 of the method, migrating the neural network model and parameters pre-trained in step 12 to a deep reinforcement learning algorithm running on the humanoid robot entity is:
the Actor module of the deep reinforcement learning algorithm adopts the structure and parameters completely same as those of the pre-trained neural network model;
and migrating the pre-trained neural network model and the parameters to an Actor module of the deep reinforcement learning algorithm.
In step 14 of the method, the training of the deep reinforcement learning algorithm on the entity of the humanoid robot is as follows:
in the training process, the corresponding motor action vector predicted by the pre-training neural network model at each time acts on a humanoid robot entity, and after the humanoid robot executes the facial action corresponding to the motor action vector, the next training of the deep reinforcement learning algorithm is carried out.
In step 2 of the method, a camera is used for capturing a facial picture of the humanoid robot to obtain a corresponding facial expression from the facial picture.
In step 3 of the method, the deep reinforcement learning algorithm running on the humanoid robot controls the facial motion of the humanoid robot according to the initial state until the humanoid robot finishes imitating the expression of the target human face as follows:
and determining the next facial action taken by the humanoid robot according to the initial state by a deep reinforcement learning algorithm operated on the humanoid robot, displaying a new facial expression on the humanoid robot after executing the facial action, giving a corresponding reward according to the similarity between the new facial expression and the target facial expression, and realizing the simulation of the facial expression of the humanoid robot in a limited step under the guidance of the reward. Specifically, the facial expression of the human face is set as a target, and the facial expression of the humanoid robot at a certain time t is set as an initial state StIn the initial state StThe physical action acted on the humanoid robot by the face action unit of the humanoid robot is atThe facial expression of the humanoid robot after the action is executed is changed under the driving of the facial action unit, and the changed facial expression is a state St+1At the moment, corresponding reward r is given according to the similarity between the facial expression of the human face and the facial expression of the humanoid robottWill then be further based on the award rtAnd the state determines the next facial action unit action to be performed.
Referring to fig. 3, in the above method, the pre-training neural network model adopts a neural network model formed by sequentially connecting a VGG16 neural network model, a leveling layer, a first fully-connected layer, a second fully-connected layer and an output layer.
In the method, the depth reinforcement learning algorithm adopts a depth determination strategy gradient algorithm.
In the method for simulating the facial expression of the humanoid robot, the collected real human face data set is used as training data, and a pre-training mode is carried out by matching with a pre-training neural network model which does not run on the humanoid robot entity, so that the training times on the humanoid robot entity are reduced while manual arrangement is avoided, special auxiliary equipment is not needed, and the accurate simulation of the facial expression of the humanoid robot can be realized. The method can be conveniently applied to different humanoid robot entities according to the relation between the rigid motion structure and the face action unit.
The embodiments of the present invention are described in further detail below.
Referring to fig. 1, the invention provides a method for simulating facial expressions of a humanoid robot based on a deep reinforcement learning algorithm, which comprises the following steps:
and 3, taking the captured facial expression of the humanoid robot as an initial state, determining the facial action to be taken by the humanoid robot next step by a depth reinforcement learning algorithm (a depth determination strategy gradient algorithm, namely a DDPG algorithm, is adopted in the embodiment) running in the humanoid robot according to the initial state, presenting a new facial expression on the humanoid robot after executing the relevant action, and realizing the simulation of the humanoid robot on the facial expression of the target human face in a limited step under the guidance of obtaining the reward.
Referring to fig. 2, since the pre-trained neural network model needs to be pre-trained and the deep reinforcement learning algorithm needs to be trained on the decision humanoid robot, a specific process related to training of the two models is shown in fig. 1, and includes the following steps:
Step 4, deep reinforcement learning algorithm training: the deep reinforcement learning algorithm is trained on a humanoid robot entity, and the facial expression of the humanoid robot achieves the same or nearly the same effect as the facial expression of the human face in a limited step under the control of the trained deep reinforcement learning algorithm, so that the simulation generation of the facial expression of the humanoid robot is achieved. Specifically, in this step, the facial expression of the human face is regarded as the target, and the facial expression of the humanoid robot at a certain time t is regarded as the state StIn this state, the action applied to the humanoid robot entity by the bottom layer driving module is atThe facial expression of the humanoid robot after the action is executed is changed under the driving of the motor, and the changed facial expression is a state St+1At the moment, corresponding reward r is given according to the similarity between the facial expression of the human face and the facial expression of the humanoid robottThe motor action to be executed next is further determined according to the reward and the state.
For ease of understanding, the methods described below in connection with the specific examples are described.
The embodiment provides a humanoid robot facial expression imitation method based on a deep reinforcement learning algorithm, which comprises the following steps:
the real face data set used in this step is a face data set FEAFA (see http: www.iiplab.net/FEAFA /) collected in the real environment calculated by the Chinese academy, and is composed of face pictures and floating point intensity value labels corresponding to 24 redefined face action units in each face picture. According to the characteristics of a rigid operation structure of the humanoid robot, 11 label dimensions in the original FEAFA face data set are selected, the selected label dimensions are redefined and described, and the redefined face action units can be displayed through the humanoid robot face under the action of the motor. Table 1 shows the redefined facial action units and their associated description taken in this embodiment.
AU numbering | Description of the invention | |
1 | Left Eye Close | AU43, |
2 | Right Eye Close | AU43, |
3 | Left Lid Raise | AU5,Lid Raise |
4 | Right Lid Raise | AU5,Lid Raise |
5 | Left Brow Lower | AU4,Brow Lower |
6 | Right Brow Lower | AU4,Brow Lower |
7 | Jaw Drop | AU26,Jaw Drop |
8 | Left Lip Corner Pull | AU12,Lip Corner Pull |
9 | Right Lip Corner Pull | AU12, |
10 | Left Lip Corner Stretch | AU20, |
11 | Right Lip Corner Stretch | AU20,Lip Corner Stretch |
It can be known that, the required data set label dimension is screened according to the linkage relationship between the facial action unit and the rigid motion structure of the head of the humanoid robot, the FEAFA face data set of the department of china is used in the embodiment of the present invention, other similar face data sets can be used in the method of the present invention, and the use of other types of face data sets should not be considered as the main difference from the present invention.
In step 11, the filtering dimensions and filtering results of the face dataset are redefined and described, only the exemplary filtering dimensions and redefining results (shown in table 1) of 11 labels are redefined and described, and other numbers of filtering dimensions and redefining results of labels may be filtered.
the pre-trained neural network model adopted in this step is shown in fig. 3, and is composed of a feature extraction layer composed of convolutional layers and two full-connection layers, where the feature extraction layer is a feature extraction layer composed of convolutional layers in a VGG-16 neural network model pre-trained on an ImageNet large-scale data set, and the full-connection layer is composed of two hidden layers (the activation function uses a relu function) and an output layer (the activation function uses a sigmoid function), and in this embodiment, the number of neurons in the full-connection layers is 512 and 128, and the number can be defined by itself according to actual requirements. The training data set used for the pre-training neural network model is a filtered FEAFA real face data set: the human face picture and 11 face action units corresponding to the rigid motion structure of the humanoid robot; the cut face picture is input into a pre-training neural network, the output after feature extraction is input into a full connection layer, the output of the whole pre-training neural network is a floating point intensity value vector of a designated face action unit (each floating point intensity value range is 0-1), and each dimensional element in the floating point intensity value vector is linearly converted into a motor rotation angle corresponding to a rigid motion structure (if the mouth opening and closing floating point intensity value is 0.5, the motor rotation range is 0-40 degrees, the corresponding motor rotation angle is 20 degrees);
it can be known that, in this step 12, the screened face data set is used for the pre-trained neural network model, so as to improve the convergence speed of the neural network model training, only the feature extraction layer of the VGG16 pre-trained network model and the self-defined full connection are used for constructing the full connection layer as an example, other types of pre-trained neural network models may also be used, and the use of other types of pre-trained neural network models should not be considered as the main difference from the present invention.
migrating the pre-trained neural network model and parameters in the step 12 to a Deep reinforcement learning algorithm, wherein a Deep determination strategy Gradient algorithm (DDPG) is used in the Deep reinforcement learning algorithm, and the pre-trained neural network model and parameters shown in fig. 4 are migrated in the following manner: an Actor module in the depth determination strategy gradient algorithm adopts a structure and parameters completely identical to those of a pre-trained neural network model, a criticic module is used for evaluating actions output by the Actor module in a characteristic state, so that a characteristic extraction layer and two layers of full connection layers of the pre-trained neural network model are migrated into the criticic module, floating point strength value vectors of a face action unit are fused after the output result of the characteristic extraction layer is subjected to a flattening operation and further serve as the input of the full connection layers, and the final output result of the criticic module (the structure of the criticic module is schematically shown in figure 4) is the evaluation of the current state and the actions. In the depth determination strategy gradient algorithm, the Actor module inputs a face picture of the humanoid robot and outputs a face action unit floating point strength value vector (namely a face action unit vector), and the vector is converted into a corresponding motor action vector; the Critic module inputs a face picture of the humanoid robot and a motor action vector to be executed by the humanoid robot in the face state;
according to the reward obtained after the motor action vector is executed, the humanoid robot enables the similarity between the facial expression of the humanoid robot and the target facial expression to be continuously close to an allowable range, and therefore the purpose of simulating the target facial expression is achieved.
It can be understood that, in this step 13, the pre-trained neural network model and the parameters are migrated to the deep reinforcement learning algorithm structure according to a certain requirement, or other methods may be used for migration, and the migration operation performed by using other methods should not be considered as a main difference from the present invention.
In addition, in this step 13, the google FECNet network is used in the example to compare the similarity between the facial expression of the humanoid robot and the facial expression of the target human face, and other methods may also be used to implement the similarity comparison, and the use of other methods to implement the similarity comparison function should not be regarded as the main difference from the present invention.
and training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot.
In this embodiment, the architecture for implementing the method is shown in fig. 5, the pre-training neural network and the deep reinforcement learning algorithm training are both performed on an upper computer (the machine memory is 16G, the CPU is Intel I7, and the graphics card is RTX 2080Ti) with an Ubuntu operating system, the deep learning frame used is Keras, the rear end of the deep learning frame is tens flow, the upper computer is used for capturing a human face and is a high-definition auto-focusing common camera webcam, and the upper computer performs actual control on an action value output by the algorithm through a 16-way steering engine drive board produced by Torobot corporation.
It can be known that the example in step 14 uses a DDPG algorithm (depth determination strategy gradient algorithm) as a depth reinforcement learning algorithm, the DDPG algorithm is used for solving the depth reinforcement learning problem on the continuous motion space, it first approximates the Q function using a deep neural network, and second it is a deterministic strategy, for any state, outputting the optimal action corresponding to the state, and determining the action of the action, instead of a set of actions comprising probability distributions, and finally a method introducing a policy gradient, other types of deep reinforcement learning algorithms that can be used for continuous motion spatial control can also be applied to the method of the present invention, such as NAF algorithm, A3C algorithm, etc., as long as the deep reinforcement learning algorithm satisfying the control of the facial expression of the humanoid robot can be used, the use of other types of deep reinforcement learning algorithms should not be considered as the main difference from the present invention.
The simulation method is used for the pre-training neural network model for identifying the strength of the human face action unit, so that the training times of the humanoid robot entity are greatly reduced in the training process of the humanoid robot facial expression simulation method based on the deep reinforcement learning algorithm. In addition, the method can realize that the facial expression of the humanoid robot and the target facial expression can be observed approximately in a limited step, thereby solving the problem of unmoldability and simulation caused by the strong nonlinear characteristic of the flexible material of the humanoid robot face. In addition, when the pre-trained neural network model is trained, the network output of the pre-trained neural network model contains unused face Action units, so that if the degree of freedom is increased or a new humanoid robot is faced on the basis of the existing structure, the method can be generalized according to the relationship between the new structure and the face Action units, and the method can be generalized to the humanoid robot face expression simulation based on a Face Action Coding System (FACS) as a rigid operation structure design reference.
Those of ordinary skill in the art will understand that: all or part of the processes of the methods according to the embodiments may be implemented by a program, which may be stored in a computer-readable storage medium, and when executed, may include the processes according to the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (9)
1. A facial expression imitation method of a humanoid robot based on deep reinforcement learning is characterized by comprising the following steps:
step 1, acquiring a target human face image to be simulated, performing facial action unit vector prediction on the target human face image through a depth reinforcement learning algorithm running in a humanoid robot entity to obtain a corresponding facial action unit vector, and obtaining a corresponding motor action vector according to the facial action unit vector;
the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained; the method comprises the following steps of pre-training the pre-training neural network model and training a deep reinforcement learning algorithm, and specifically comprises the following steps:
step 11, screening the labels of the real face data set:
adopting a real human face data set consisting of human face pictures and corresponding facial action unit vectors as a training data set, and screening out label dimensions required for simulating the facial expression of the humanoid robot according to the corresponding relation between a rigid motion structure in the head of the humanoid robot and the facial action units;
step 12, pre-training the neural network model:
pre-training the pre-training neural network model by taking a face image in the real face data set as input and a face action unit vector corresponding to the face image in the real face data set as output, and determining the output of the pre-training neural network model by the label dimension screened in the step 11;
step 13, migrating the pre-trained neural network model and parameters:
migrating the pre-trained neural network model and the pre-trained parameters to a deep reinforcement learning algorithm running on the humanoid robot entity;
step 14, training a deep reinforcement learning algorithm:
training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot;
step 2, the obtained motor motion vector acts on a humanoid robot of an entity, and the corresponding facial expression of the humanoid robot is captured;
and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.
2. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 1, the vector prediction of facial action units is performed on the target human face picture by a deep reinforcement learning algorithm running on a humanoid robot entity as follows:
and performing face action unit vector prediction on the picture of the target face part obtained by cutting the target face picture by using a depth reinforcement learning algorithm.
3. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 11, label dimensions screened for simulating facial expressions of the humanoid robot are:
and if the humanoid robot can realize a certain action unit described in the facial action coding system, selecting a label corresponding to the action dimension in the real human face data set.
4. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 12, the label dimensions screened out in the step 11 determine the output of the pre-trained neural network model as:
and determining the output dimension size and the corresponding meaning of each dimension of the pre-training neural network model according to the dimension size and the meaning of the label screened in the step 11.
5. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 13, the migration of the neural network model and parameters pre-trained in the step 12 into the deep reinforcement learning algorithm running on the humanoid robot entity is:
the Actor module of the deep reinforcement learning algorithm adopts the structure and parameters completely same as those of the pre-trained neural network model;
and migrating the pre-trained neural network model and the parameters to an Actor module of the deep reinforcement learning algorithm.
6. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 14, the deep reinforcement learning algorithm is trained on the body of the humanoid robot as follows:
in the training process, the corresponding motor action vector predicted by the pre-training neural network model at each time acts on a humanoid robot entity, and after the humanoid robot executes the facial action corresponding to the motor action vector, the next training of the deep reinforcement learning algorithm is carried out.
7. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning as claimed in any one of claims 1 to 2, wherein in the step 2, a camera is used for capturing a picture of the face of the humanoid robot to obtain the corresponding facial expressions.
8. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to any one of claims 1 to 2, wherein in the step 3, the deep reinforcement learning algorithm running on the humanoid robot controls the facial actions of the humanoid robot to simulate the target facial expressions according to the initial state until the humanoid robot finishes simulating the target facial expressions:
and determining the next facial action taken by the humanoid robot according to the initial state by a deep reinforcement learning algorithm operated on the humanoid robot entity, displaying a new facial expression on the humanoid robot face after executing the facial action, giving a corresponding reward according to the similarity between the new facial expression and the target facial expression, and realizing the simulation of the humanoid robot on the target facial expression in a limited step under the guidance of the reward.
9. The method for simulating facial expressions of the humanoid robot based on deep reinforcement learning of any one of claims 1 to 2, wherein the pre-trained neural network model adopts a neural network model formed by sequentially connecting a VGG16 neural network model, a flattening layer, a first full-link layer, a second full-link layer and an output layer;
the depth reinforcement learning algorithm adopts any one of a depth determination strategy gradient algorithm, a NAF algorithm and an A3C algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011355989.XA CN112454390B (en) | 2020-11-27 | 2020-11-27 | Humanoid robot facial expression simulation method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011355989.XA CN112454390B (en) | 2020-11-27 | 2020-11-27 | Humanoid robot facial expression simulation method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112454390A CN112454390A (en) | 2021-03-09 |
CN112454390B true CN112454390B (en) | 2022-05-17 |
Family
ID=74809713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011355989.XA Active CN112454390B (en) | 2020-11-27 | 2020-11-27 | Humanoid robot facial expression simulation method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112454390B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113156892B (en) * | 2021-04-16 | 2022-04-08 | 西湖大学 | Four-footed robot simulated motion control method based on deep reinforcement learning |
CN113724367A (en) * | 2021-07-13 | 2021-11-30 | 北京理工大学 | Robot expression driving method and device |
CN114789470A (en) * | 2022-01-25 | 2022-07-26 | 北京萌特博智能机器人科技有限公司 | Method and device for adjusting simulation robot |
CN116089611B (en) * | 2023-01-13 | 2023-07-18 | 北京控制工程研究所 | Spacecraft fault diagnosis method and device based on performance-fault relation map |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7113848B2 (en) * | 2003-06-09 | 2006-09-26 | Hanson David F | Human emulation robot system |
CN105437247B (en) * | 2016-01-27 | 2017-06-20 | 龙卷风机电科技(昆山)有限公司 | A kind of expression robot |
CN109640785A (en) * | 2016-04-08 | 2019-04-16 | 维扎瑞尔股份公司 | For obtaining, assembling and analyzing vision data with the method and system of the eyesight performance of evaluator |
KR102102685B1 (en) * | 2018-04-18 | 2020-04-23 | 한국생산기술연구원 | A method for robotic facial expressions by learning human facial demonstrations |
CN108908353B (en) * | 2018-06-11 | 2021-08-13 | 安庆师范大学 | Robot expression simulation method and device based on smooth constraint reverse mechanical model |
CN109800864B (en) * | 2019-01-18 | 2023-05-30 | 中山大学 | Robot active learning method based on image input |
CN109773807B (en) * | 2019-03-04 | 2024-03-12 | 苏州塔米机器人有限公司 | Motion control method and robot |
CN111260762B (en) * | 2020-01-19 | 2023-03-28 | 腾讯科技(深圳)有限公司 | Animation implementation method and device, electronic equipment and storage medium |
CN111814713A (en) * | 2020-07-15 | 2020-10-23 | 陕西科技大学 | Expression recognition method based on BN parameter transfer learning |
-
2020
- 2020-11-27 CN CN202011355989.XA patent/CN112454390B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112454390A (en) | 2021-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112454390B (en) | Humanoid robot facial expression simulation method based on deep reinforcement learning | |
CN106127139B (en) | A kind of dynamic identifying method of MOOC course middle school student's facial expression | |
CN112561064B (en) | Knowledge base completion method based on OWKBC model | |
CN101393599B (en) | Game role control method based on human face expression | |
CN105825268B (en) | The data processing method and system of object manipulator action learning | |
CN109829541A (en) | Deep neural network incremental training method and system based on learning automaton | |
CN112200894B (en) | Automatic digital human facial expression animation migration method based on deep learning framework | |
CN111144580B (en) | Hierarchical reinforcement learning training method and device based on imitation learning | |
CN110222574A (en) | Production operation Activity recognition method, apparatus, equipment, system and storage medium based on structuring double fluid convolutional neural networks | |
CN106407889A (en) | Video human body interaction motion identification method based on optical flow graph depth learning model | |
CN109131348A (en) | A kind of intelligent vehicle Driving Decision-making method based on production confrontation network | |
CN108363973A (en) | A kind of unconfined 3D expressions moving method | |
JP2023524252A (en) | Generative nonlinear human shape model | |
CN112651360B (en) | Skeleton action recognition method under small sample | |
CN113633983B (en) | Virtual character expression control method and device, electronic equipment and medium | |
CN113779289A (en) | Drawing step reduction system based on artificial intelligence | |
Li et al. | An e-learning system model based on affective computing | |
CN115761905A (en) | Diver action identification method based on skeleton joint points | |
CN106326980A (en) | Robot and method for simulating human facial movements by robot | |
CN110163098A (en) | Based on the facial expression recognition model construction of depth of seam division network and recognition methods | |
Khalifa et al. | An automatic facial age proression estimation system | |
Huang et al. | Human-like facial expression imitation for humanoid robot based on recurrent neural network | |
CN112115779B (en) | Interpretable classroom student emotion analysis method, system, device and medium | |
Cao et al. | Facial Expression Study Based on 3D Facial Emotion Recognition | |
CN114127748A (en) | Memory in personal intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |