CN112454390B - Humanoid robot facial expression simulation method based on deep reinforcement learning - Google Patents

Humanoid robot facial expression simulation method based on deep reinforcement learning Download PDF

Info

Publication number
CN112454390B
CN112454390B CN202011355989.XA CN202011355989A CN112454390B CN 112454390 B CN112454390 B CN 112454390B CN 202011355989 A CN202011355989 A CN 202011355989A CN 112454390 B CN112454390 B CN 112454390B
Authority
CN
China
Prior art keywords
humanoid robot
reinforcement learning
facial
deep reinforcement
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011355989.XA
Other languages
Chinese (zh)
Other versions
CN112454390A (en
Inventor
唐冰
吴锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202011355989.XA priority Critical patent/CN112454390B/en
Publication of CN112454390A publication Critical patent/CN112454390A/en
Application granted granted Critical
Publication of CN112454390B publication Critical patent/CN112454390B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • B25J11/0015Face robots, animated artificial faces for imitating human expressions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • B25J9/161Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1661Programme controls characterised by programming, planning systems for manipulators characterised by task planning, object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Automation & Control Theory (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Fuzzy Systems (AREA)
  • Manipulator (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, which comprises the following steps: step 1, acquiring a face picture of a target to be simulated, and performing face action unit vector prediction on the face picture by a depth reinforcement learning algorithm operated by a humanoid robot entity to obtain a corresponding face action unit vector and a corresponding motor action vector; step 2, acting the motor motion vector on the humanoid robot of the entity, and capturing the corresponding facial expression of the humanoid robot; and 3, taking the corresponding facial expression as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face. The method reduces the times of training on the humanoid robot entity, avoids influencing the service life of the humanoid robot hardware, and can ensure accurate imitation of the target human face expression.

Description

Humanoid robot facial expression simulation method based on deep reinforcement learning
Technical Field
The invention relates to the field of robot facial expression simulation, in particular to a method for simulating facial expression of a humanoid robot based on deep reinforcement learning.
Background
In human face-to-face communication, over 55% of the information is transmitted through the face. Because the humanoid robot has a humanoid face, the facial expression of the humanoid robot also plays a very important role in human-computer interaction, such as emotion expression and the like. In order to enable the humanoid robot to generate human recognizable facial expressions, one research content is that the humanoid robot can simulate human facial expressions and present the human facial expressions through the linkage effect of a head rigid motion structure and a face flexible material.
At present, the simulation method of the facial expression of the humanoid robot comprises a manual preset method, a feature point mapping method, an end-to-end network training method and the like. In the manual presetting method, because a series of motor motion vectors corresponding to basic expressions such as 'happy' and 'sad' closely related to a rigid motion structure of the humanoid robot need to be arranged in advance, the number of expressions which can be imitated by the humanoid robot is limited and fixed, and the facial expression type of the human face to be presented also needs to be fixedThe method has the defect that the capability of imitating the expressions of the humanoid robot is limited to a great extent in the range of realizing the arranged expression categories. Although the method for mapping the feature points realizes real-time simulation of facial expressions of the humanoid robot by using the motion capture system, the method needs to paste mark points on the faces of the human and the humanoid robot so as to realize the linear mapping relation of the positions of the corresponding feature points of the faces of the human and the humanoid robot, or further learns the mapping relation between the feature points of the faces and the motor values by using a machine learning method, but the machine learning method needs to train on the entity of the humanoid robot so as to obtain the mapping relation, but the method is not practical to paste special mark points on the faces of the humanoid robot in the real human-computer interaction process, so that the method is poor in convenience in use in an actual application scene. The end-to-end network training method realizes end-to-end training on the basis of manually arranging a large number of real sample data sets related to the humanoid robot entity and constructing a network model. Thereby obtaining the mapping relation between the facial action unit or the facial feature point and the motor value. Compared with a method of manual presetting and feature point mapping, the method of end-to-end network training can improve the richness of the imitation generation of the facial expression of the humanoid robot, but the method needs to manually arrange a large number of real sample data sets related to the humanoid robot, the whole training process is executed on a robot entity, the real sample data sets need to be rearranged when a new humanoid robot faces, and the model is constructed by a machine learning method again to carry out end-to-end training. And the output value of the motor of the humanoid robot for driving the rigid motion structure is a continuous value, if the output value of each motor is discretized, the quantity of the motor action vectors which can be arranged manually is exponentially increased (for example, N motors are provided, the rotation range of each motor is discretized into M, and the quantity of the motor action vectors which can be arranged at the moment is MNOne) and also needs to take into account the programmed motor motion directionWhether the volume is capable of producing a corresponding real facial expression also causes a lot of time consumption in arranging the data set.
Therefore, how to provide a method for simulating the facial expression of the humanoid robot, which avoids manual arrangement, reduces the training times of the humanoid robot entity and reduces the hardware life consumption, is a problem to be solved.
Disclosure of Invention
Based on the problems in the prior art, the invention aims to provide a method for simulating the facial expression of a humanoid robot based on deep reinforcement learning, which can solve the problems that the existing end-to-end network training method for simulating the facial expression of the humanoid robot needs to manually arrange a data set, the time consumption is long, the training times on a humanoid robot entity are more, and the service life of hardware is consumed.
The purpose of the invention is realized by the following technical scheme:
the embodiment of the invention provides a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, which comprises the following steps:
step 1, acquiring a target human face image to be simulated, performing facial action unit vector prediction on the target human face image through a depth reinforcement learning algorithm running in a humanoid robot entity to obtain a corresponding facial action unit vector, and obtaining a corresponding motor action vector according to the facial action unit vector;
the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained;
step 2, the obtained motor motion vector acts on a humanoid robot of an entity, and the corresponding facial expression of the humanoid robot is captured;
and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.
According to the technical scheme provided by the invention, the method for simulating the facial expression of the humanoid robot based on the deep reinforcement learning has the following beneficial effects:
because the pre-training neural network model which does not run on the humanoid robot entity is adopted, the real face data set is pre-trained, and then the pre-trained neural network model and the parameters are transferred to the deep reinforcement learning algorithm running on the humanoid robot entity, so that the actual training times on the humanoid robot entity are reduced; in addition, the pre-training neural network model can take a real face data set as training data, so that the work of manually arranging the data set is not needed; the method can greatly reduce manual arrangement work and actual training times on the humanoid robot entity, and does not need special auxiliary equipment to realize the simulation of the facial expression of the humanoid robot. The method can be conveniently applied to different humanoid robot entities according to the relation between the rigid motion structure and the face action unit.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1 is a flowchart of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention;
fig. 2 is a training flowchart of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a pre-trained neural network model in the method according to the embodiment of the present invention;
FIG. 4 is a schematic diagram of a building block of a deep reinforcement learning algorithm in the method according to the embodiment of the present invention;
fig. 5 is an overall architecture diagram of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the specific contents of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, including:
step 1, acquiring a target human face image to be simulated, performing facial action unit vector prediction on the target human face image through a depth reinforcement learning algorithm running in a humanoid robot entity to obtain a corresponding facial action unit vector, and obtaining a corresponding motor action vector according to the facial action unit vector;
the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained;
step 2, the obtained motor motion vector acts on a humanoid robot of an entity, and the corresponding facial expression of the humanoid robot is captured;
and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.
In step 1 of the above method, in step 1, performing, by using a depth-enhanced learning algorithm running on a humanoid robot entity, facial action unit vector prediction on a target face image:
and performing face action unit vector prediction on the picture of the target face part obtained by cutting the target face picture by using a depth reinforcement learning algorithm.
Referring to fig. 2, the method includes the step of pre-training the pre-training neural network model, specifically:
in the method, before step 1, the method further comprises the steps of pre-training the pre-trained neural network model and training a deep reinforcement learning algorithm, and specifically comprises the following steps:
step 11, screening the labels of the real face data set:
adopting a real human face data set consisting of human face pictures and corresponding facial action unit vectors as a training data set, and screening out label dimensions required for simulating the facial expression of the humanoid robot according to the corresponding relation between a rigid motion structure in the head of the humanoid robot and the facial action units;
step 12, pre-training the neural network model:
pre-training the pre-training neural network model by taking a face image in the real face data set as input and a face action unit vector corresponding to the face image in the real face data set as output, and determining the output of the pre-training neural network model by the label dimension screened in the step 11;
step 13, migrating the pre-trained neural network model and parameters:
migrating the pre-trained neural network model and the pre-trained parameters to a deep reinforcement learning algorithm running on the humanoid robot entity;
step 14, training a deep reinforcement learning algorithm:
and training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot.
In step 11 of the method, according to the corresponding relationship between the rigid motion structure inside the head of the humanoid robot and the facial action unit, the label dimensions required for simulating the facial expression of the humanoid robot are screened out as follows:
and if the humanoid robot can realize a certain action unit described in the facial action coding system, selecting a label corresponding to the action dimension in the real human face data set. Specifically, the generation of the facial expression of the humanoid robot is realized by driving an external flexible material by a rigid motion structure positioned in the head of the robot, so that deformation is generated and presented, the facial expression of the human face is generated by pulling skin by muscle tissues positioned under the skin, and the facial expression and the skin have certain similarity, so that the label dimension is screened according to the above.
In step 12 of the above method, the output of the pre-trained neural network model is determined by the label dimensions screened in step 11 as follows:
and determining the output dimension size and the corresponding meaning of each dimension of the pre-training neural network model according to the dimension size and the meaning of the label screened in the step 11.
In step 13 of the method, migrating the neural network model and parameters pre-trained in step 12 to a deep reinforcement learning algorithm running on the humanoid robot entity is:
the Actor module of the deep reinforcement learning algorithm adopts the structure and parameters completely same as those of the pre-trained neural network model;
and migrating the pre-trained neural network model and the parameters to an Actor module of the deep reinforcement learning algorithm.
In step 14 of the method, the training of the deep reinforcement learning algorithm on the entity of the humanoid robot is as follows:
in the training process, the corresponding motor action vector predicted by the pre-training neural network model at each time acts on a humanoid robot entity, and after the humanoid robot executes the facial action corresponding to the motor action vector, the next training of the deep reinforcement learning algorithm is carried out.
In step 2 of the method, a camera is used for capturing a facial picture of the humanoid robot to obtain a corresponding facial expression from the facial picture.
In step 3 of the method, the deep reinforcement learning algorithm running on the humanoid robot controls the facial motion of the humanoid robot according to the initial state until the humanoid robot finishes imitating the expression of the target human face as follows:
and determining the next facial action taken by the humanoid robot according to the initial state by a deep reinforcement learning algorithm operated on the humanoid robot, displaying a new facial expression on the humanoid robot after executing the facial action, giving a corresponding reward according to the similarity between the new facial expression and the target facial expression, and realizing the simulation of the facial expression of the humanoid robot in a limited step under the guidance of the reward. Specifically, the facial expression of the human face is set as a target, and the facial expression of the humanoid robot at a certain time t is set as an initial state StIn the initial state StThe physical action acted on the humanoid robot by the face action unit of the humanoid robot is atThe facial expression of the humanoid robot after the action is executed is changed under the driving of the facial action unit, and the changed facial expression is a state St+1At the moment, corresponding reward r is given according to the similarity between the facial expression of the human face and the facial expression of the humanoid robottWill then be further based on the award rtAnd the state determines the next facial action unit action to be performed.
Referring to fig. 3, in the above method, the pre-training neural network model adopts a neural network model formed by sequentially connecting a VGG16 neural network model, a leveling layer, a first fully-connected layer, a second fully-connected layer and an output layer.
In the method, the depth reinforcement learning algorithm adopts a depth determination strategy gradient algorithm.
In the method for simulating the facial expression of the humanoid robot, the collected real human face data set is used as training data, and a pre-training mode is carried out by matching with a pre-training neural network model which does not run on the humanoid robot entity, so that the training times on the humanoid robot entity are reduced while manual arrangement is avoided, special auxiliary equipment is not needed, and the accurate simulation of the facial expression of the humanoid robot can be realized. The method can be conveniently applied to different humanoid robot entities according to the relation between the rigid motion structure and the face action unit.
The embodiments of the present invention are described in further detail below.
Referring to fig. 1, the invention provides a method for simulating facial expressions of a humanoid robot based on a deep reinforcement learning algorithm, which comprises the following steps:
step 1, firstly, obtaining a picture of a target face, and performing corresponding face action unit vector prediction on the cut target face by using a pre-trained neural network model to further obtain a motor action unit vector corresponding to the humanoid robot;
step 2, the obtained motor motion vector acts on a humanoid robot entity, and the facial expression of the humanoid robot is captured by a camera;
and 3, taking the captured facial expression of the humanoid robot as an initial state, determining the facial action to be taken by the humanoid robot next step by a depth reinforcement learning algorithm (a depth determination strategy gradient algorithm, namely a DDPG algorithm, is adopted in the embodiment) running in the humanoid robot according to the initial state, presenting a new facial expression on the humanoid robot after executing the relevant action, and realizing the simulation of the humanoid robot on the facial expression of the target human face in a limited step under the guidance of obtaining the reward.
Referring to fig. 2, since the pre-trained neural network model needs to be pre-trained and the deep reinforcement learning algorithm needs to be trained on the decision humanoid robot, a specific process related to training of the two models is shown in fig. 1, and includes the following steps:
step 11, screening real face data set labels: adopting a real human face data set consisting of human face pictures and corresponding facial action unit value vectors, and screening out label dimensionality required by the invention for simulating the facial expression of the humanoid robot according to the corresponding relation between the rigid motion structure in the head of the humanoid robot and the facial action units;
step 12, pre-training the pre-training neural network model: the input of the pre-training neural network model is a cut human face image, the output is a value vector of a facial action unit corresponding to the facial image, and the output of the pre-training neural network model is determined by the result of label screening of a real human face data set; the pre-training neural network model is mainly used for constructing mapping from facial expression pictures to facial action unit value vectors, so that under the condition of giving the facial pictures, the pre-training neural network model can give the facial action unit value vectors screened from the current face. Because the pre-training neural network model is used, the training times of directly acting on a robot entity in the process of realizing the simulation generation of the facial expression of the humanoid robot are reduced, thereby achieving the purpose of protecting the hardware of the robot;
step 13, pre-training the network model and parameter migration: the pre-trained neural network model and parameters are wholly or partially transferred to a deep reinforcement learning algorithm, and the deep reinforcement learning algorithm runs on a humanoid robot entity for training, so that the pre-trained neural network model can be fully utilized; the specific migration method can be determined according to the model structure of the adopted deep reinforcement learning algorithm, the structure for action prediction in the structure can adopt all migration, the structure for action evaluation adopts partial migration, and the number of times and time of training the deep reinforcement learning algorithm on the entity humanoid robot are reduced through the migration, so that the possibility of damage to hardware of the entity humanoid robot is reduced. By adopting a certain degree of similarity between the pre-trained neural network model and the model adopted by the deep reinforcement learning algorithm, the pre-trained network model and the parameters can be migrated to the deep reinforcement learning algorithm, and specifically, only the relevant network layer parameters in the pre-trained network model need to be loaded to the corresponding network layer in the deep reinforcement learning algorithm model.
Step 4, deep reinforcement learning algorithm training: the deep reinforcement learning algorithm is trained on a humanoid robot entity, and the facial expression of the humanoid robot achieves the same or nearly the same effect as the facial expression of the human face in a limited step under the control of the trained deep reinforcement learning algorithm, so that the simulation generation of the facial expression of the humanoid robot is achieved. Specifically, in this step, the facial expression of the human face is regarded as the target, and the facial expression of the humanoid robot at a certain time t is regarded as the state StIn this state, the action applied to the humanoid robot entity by the bottom layer driving module is atThe facial expression of the humanoid robot after the action is executed is changed under the driving of the motor, and the changed facial expression is a state St+1At the moment, corresponding reward r is given according to the similarity between the facial expression of the human face and the facial expression of the humanoid robottThe motor action to be executed next is further determined according to the reward and the state.
For ease of understanding, the methods described below in connection with the specific examples are described.
The embodiment provides a humanoid robot facial expression imitation method based on a deep reinforcement learning algorithm, which comprises the following steps:
step 11, screening real face data set labels:
the real face data set used in this step is a face data set FEAFA (see http: www.iiplab.net/FEAFA /) collected in the real environment calculated by the Chinese academy, and is composed of face pictures and floating point intensity value labels corresponding to 24 redefined face action units in each face picture. According to the characteristics of a rigid operation structure of the humanoid robot, 11 label dimensions in the original FEAFA face data set are selected, the selected label dimensions are redefined and described, and the redefined face action units can be displayed through the humanoid robot face under the action of the motor. Table 1 shows the redefined facial action units and their associated description taken in this embodiment.
AU numbering Description of the invention Original description
1 Left Eye Close AU43,Eye Close
2 Right Eye Close AU43,Eye Close
3 Left Lid Raise AU5,Lid Raise
4 Right Lid Raise AU5,Lid Raise
5 Left Brow Lower AU4,Brow Lower
6 Right Brow Lower AU4,Brow Lower
7 Jaw Drop AU26,Jaw Drop
8 Left Lip Corner Pull AU12,Lip Corner Pull
9 Right Lip Corner Pull AU12,Lip Corner Pull
10 Left Lip Corner Stretch AU20,Lip Corner Stretch
11 Right Lip Corner Stretch AU20,Lip Corner Stretch
It can be known that, the required data set label dimension is screened according to the linkage relationship between the facial action unit and the rigid motion structure of the head of the humanoid robot, the FEAFA face data set of the department of china is used in the embodiment of the present invention, other similar face data sets can be used in the method of the present invention, and the use of other types of face data sets should not be considered as the main difference from the present invention.
In step 11, the filtering dimensions and filtering results of the face dataset are redefined and described, only the exemplary filtering dimensions and redefining results (shown in table 1) of 11 labels are redefined and described, and other numbers of filtering dimensions and redefining results of labels may be filtered.
Step 12, pre-training the pre-training neural network model:
the pre-trained neural network model adopted in this step is shown in fig. 3, and is composed of a feature extraction layer composed of convolutional layers and two full-connection layers, where the feature extraction layer is a feature extraction layer composed of convolutional layers in a VGG-16 neural network model pre-trained on an ImageNet large-scale data set, and the full-connection layer is composed of two hidden layers (the activation function uses a relu function) and an output layer (the activation function uses a sigmoid function), and in this embodiment, the number of neurons in the full-connection layers is 512 and 128, and the number can be defined by itself according to actual requirements. The training data set used for the pre-training neural network model is a filtered FEAFA real face data set: the human face picture and 11 face action units corresponding to the rigid motion structure of the humanoid robot; the cut face picture is input into a pre-training neural network, the output after feature extraction is input into a full connection layer, the output of the whole pre-training neural network is a floating point intensity value vector of a designated face action unit (each floating point intensity value range is 0-1), and each dimensional element in the floating point intensity value vector is linearly converted into a motor rotation angle corresponding to a rigid motion structure (if the mouth opening and closing floating point intensity value is 0.5, the motor rotation range is 0-40 degrees, the corresponding motor rotation angle is 20 degrees);
it can be known that, in this step 12, the screened face data set is used for the pre-trained neural network model, so as to improve the convergence speed of the neural network model training, only the feature extraction layer of the VGG16 pre-trained network model and the self-defined full connection are used for constructing the full connection layer as an example, other types of pre-trained neural network models may also be used, and the use of other types of pre-trained neural network models should not be considered as the main difference from the present invention.
Step 13, pre-training neural network model and parameter migration:
migrating the pre-trained neural network model and parameters in the step 12 to a Deep reinforcement learning algorithm, wherein a Deep determination strategy Gradient algorithm (DDPG) is used in the Deep reinforcement learning algorithm, and the pre-trained neural network model and parameters shown in fig. 4 are migrated in the following manner: an Actor module in the depth determination strategy gradient algorithm adopts a structure and parameters completely identical to those of a pre-trained neural network model, a criticic module is used for evaluating actions output by the Actor module in a characteristic state, so that a characteristic extraction layer and two layers of full connection layers of the pre-trained neural network model are migrated into the criticic module, floating point strength value vectors of a face action unit are fused after the output result of the characteristic extraction layer is subjected to a flattening operation and further serve as the input of the full connection layers, and the final output result of the criticic module (the structure of the criticic module is schematically shown in figure 4) is the evaluation of the current state and the actions. In the depth determination strategy gradient algorithm, the Actor module inputs a face picture of the humanoid robot and outputs a face action unit floating point strength value vector (namely a face action unit vector), and the vector is converted into a corresponding motor action vector; the Critic module inputs a face picture of the humanoid robot and a motor action vector to be executed by the humanoid robot in the face state;
according to the reward obtained after the motor action vector is executed, the humanoid robot enables the similarity between the facial expression of the humanoid robot and the target facial expression to be continuously close to an allowable range, and therefore the purpose of simulating the target facial expression is achieved.
It can be understood that, in this step 13, the pre-trained neural network model and the parameters are migrated to the deep reinforcement learning algorithm structure according to a certain requirement, or other methods may be used for migration, and the migration operation performed by using other methods should not be considered as a main difference from the present invention.
In addition, in this step 13, the google FECNet network is used in the example to compare the similarity between the facial expression of the humanoid robot and the facial expression of the target human face, and other methods may also be used to implement the similarity comparison, and the use of other methods to implement the similarity comparison function should not be regarded as the main difference from the present invention.
Step 14, deep reinforcement learning algorithm training:
and training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot.
In this embodiment, the architecture for implementing the method is shown in fig. 5, the pre-training neural network and the deep reinforcement learning algorithm training are both performed on an upper computer (the machine memory is 16G, the CPU is Intel I7, and the graphics card is RTX 2080Ti) with an Ubuntu operating system, the deep learning frame used is Keras, the rear end of the deep learning frame is tens flow, the upper computer is used for capturing a human face and is a high-definition auto-focusing common camera webcam, and the upper computer performs actual control on an action value output by the algorithm through a 16-way steering engine drive board produced by Torobot corporation.
It can be known that the example in step 14 uses a DDPG algorithm (depth determination strategy gradient algorithm) as a depth reinforcement learning algorithm, the DDPG algorithm is used for solving the depth reinforcement learning problem on the continuous motion space, it first approximates the Q function using a deep neural network, and second it is a deterministic strategy, for any state, outputting the optimal action corresponding to the state, and determining the action of the action, instead of a set of actions comprising probability distributions, and finally a method introducing a policy gradient, other types of deep reinforcement learning algorithms that can be used for continuous motion spatial control can also be applied to the method of the present invention, such as NAF algorithm, A3C algorithm, etc., as long as the deep reinforcement learning algorithm satisfying the control of the facial expression of the humanoid robot can be used, the use of other types of deep reinforcement learning algorithms should not be considered as the main difference from the present invention.
The simulation method is used for the pre-training neural network model for identifying the strength of the human face action unit, so that the training times of the humanoid robot entity are greatly reduced in the training process of the humanoid robot facial expression simulation method based on the deep reinforcement learning algorithm. In addition, the method can realize that the facial expression of the humanoid robot and the target facial expression can be observed approximately in a limited step, thereby solving the problem of unmoldability and simulation caused by the strong nonlinear characteristic of the flexible material of the humanoid robot face. In addition, when the pre-trained neural network model is trained, the network output of the pre-trained neural network model contains unused face Action units, so that if the degree of freedom is increased or a new humanoid robot is faced on the basis of the existing structure, the method can be generalized according to the relationship between the new structure and the face Action units, and the method can be generalized to the humanoid robot face expression simulation based on a Face Action Coding System (FACS) as a rigid operation structure design reference.
Those of ordinary skill in the art will understand that: all or part of the processes of the methods according to the embodiments may be implemented by a program, which may be stored in a computer-readable storage medium, and when executed, may include the processes according to the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. A facial expression imitation method of a humanoid robot based on deep reinforcement learning is characterized by comprising the following steps:
step 1, acquiring a target human face image to be simulated, performing facial action unit vector prediction on the target human face image through a depth reinforcement learning algorithm running in a humanoid robot entity to obtain a corresponding facial action unit vector, and obtaining a corresponding motor action vector according to the facial action unit vector;
the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained; the method comprises the following steps of pre-training the pre-training neural network model and training a deep reinforcement learning algorithm, and specifically comprises the following steps:
step 11, screening the labels of the real face data set:
adopting a real human face data set consisting of human face pictures and corresponding facial action unit vectors as a training data set, and screening out label dimensions required for simulating the facial expression of the humanoid robot according to the corresponding relation between a rigid motion structure in the head of the humanoid robot and the facial action units;
step 12, pre-training the neural network model:
pre-training the pre-training neural network model by taking a face image in the real face data set as input and a face action unit vector corresponding to the face image in the real face data set as output, and determining the output of the pre-training neural network model by the label dimension screened in the step 11;
step 13, migrating the pre-trained neural network model and parameters:
migrating the pre-trained neural network model and the pre-trained parameters to a deep reinforcement learning algorithm running on the humanoid robot entity;
step 14, training a deep reinforcement learning algorithm:
training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot;
step 2, the obtained motor motion vector acts on a humanoid robot of an entity, and the corresponding facial expression of the humanoid robot is captured;
and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.
2. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 1, the vector prediction of facial action units is performed on the target human face picture by a deep reinforcement learning algorithm running on a humanoid robot entity as follows:
and performing face action unit vector prediction on the picture of the target face part obtained by cutting the target face picture by using a depth reinforcement learning algorithm.
3. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 11, label dimensions screened for simulating facial expressions of the humanoid robot are:
and if the humanoid robot can realize a certain action unit described in the facial action coding system, selecting a label corresponding to the action dimension in the real human face data set.
4. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 12, the label dimensions screened out in the step 11 determine the output of the pre-trained neural network model as:
and determining the output dimension size and the corresponding meaning of each dimension of the pre-training neural network model according to the dimension size and the meaning of the label screened in the step 11.
5. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 13, the migration of the neural network model and parameters pre-trained in the step 12 into the deep reinforcement learning algorithm running on the humanoid robot entity is:
the Actor module of the deep reinforcement learning algorithm adopts the structure and parameters completely same as those of the pre-trained neural network model;
and migrating the pre-trained neural network model and the parameters to an Actor module of the deep reinforcement learning algorithm.
6. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 14, the deep reinforcement learning algorithm is trained on the body of the humanoid robot as follows:
in the training process, the corresponding motor action vector predicted by the pre-training neural network model at each time acts on a humanoid robot entity, and after the humanoid robot executes the facial action corresponding to the motor action vector, the next training of the deep reinforcement learning algorithm is carried out.
7. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning as claimed in any one of claims 1 to 2, wherein in the step 2, a camera is used for capturing a picture of the face of the humanoid robot to obtain the corresponding facial expressions.
8. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to any one of claims 1 to 2, wherein in the step 3, the deep reinforcement learning algorithm running on the humanoid robot controls the facial actions of the humanoid robot to simulate the target facial expressions according to the initial state until the humanoid robot finishes simulating the target facial expressions:
and determining the next facial action taken by the humanoid robot according to the initial state by a deep reinforcement learning algorithm operated on the humanoid robot entity, displaying a new facial expression on the humanoid robot face after executing the facial action, giving a corresponding reward according to the similarity between the new facial expression and the target facial expression, and realizing the simulation of the humanoid robot on the target facial expression in a limited step under the guidance of the reward.
9. The method for simulating facial expressions of the humanoid robot based on deep reinforcement learning of any one of claims 1 to 2, wherein the pre-trained neural network model adopts a neural network model formed by sequentially connecting a VGG16 neural network model, a flattening layer, a first full-link layer, a second full-link layer and an output layer;
the depth reinforcement learning algorithm adopts any one of a depth determination strategy gradient algorithm, a NAF algorithm and an A3C algorithm.
CN202011355989.XA 2020-11-27 2020-11-27 Humanoid robot facial expression simulation method based on deep reinforcement learning Active CN112454390B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011355989.XA CN112454390B (en) 2020-11-27 2020-11-27 Humanoid robot facial expression simulation method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011355989.XA CN112454390B (en) 2020-11-27 2020-11-27 Humanoid robot facial expression simulation method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112454390A CN112454390A (en) 2021-03-09
CN112454390B true CN112454390B (en) 2022-05-17

Family

ID=74809713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011355989.XA Active CN112454390B (en) 2020-11-27 2020-11-27 Humanoid robot facial expression simulation method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112454390B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113156892B (en) * 2021-04-16 2022-04-08 西湖大学 Four-footed robot simulated motion control method based on deep reinforcement learning
CN113724367A (en) * 2021-07-13 2021-11-30 北京理工大学 Robot expression driving method and device
CN114789470A (en) * 2022-01-25 2022-07-26 北京萌特博智能机器人科技有限公司 Method and device for adjusting simulation robot
CN116089611B (en) * 2023-01-13 2023-07-18 北京控制工程研究所 Spacecraft fault diagnosis method and device based on performance-fault relation map

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7113848B2 (en) * 2003-06-09 2006-09-26 Hanson David F Human emulation robot system
CN105437247B (en) * 2016-01-27 2017-06-20 龙卷风机电科技(昆山)有限公司 A kind of expression robot
CN109640785A (en) * 2016-04-08 2019-04-16 维扎瑞尔股份公司 For obtaining, assembling and analyzing vision data with the method and system of the eyesight performance of evaluator
KR102102685B1 (en) * 2018-04-18 2020-04-23 한국생산기술연구원 A method for robotic facial expressions by learning human facial demonstrations
CN108908353B (en) * 2018-06-11 2021-08-13 安庆师范大学 Robot expression simulation method and device based on smooth constraint reverse mechanical model
CN109800864B (en) * 2019-01-18 2023-05-30 中山大学 Robot active learning method based on image input
CN109773807B (en) * 2019-03-04 2024-03-12 苏州塔米机器人有限公司 Motion control method and robot
CN111260762B (en) * 2020-01-19 2023-03-28 腾讯科技(深圳)有限公司 Animation implementation method and device, electronic equipment and storage medium
CN111814713A (en) * 2020-07-15 2020-10-23 陕西科技大学 Expression recognition method based on BN parameter transfer learning

Also Published As

Publication number Publication date
CN112454390A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112454390B (en) Humanoid robot facial expression simulation method based on deep reinforcement learning
CN106127139B (en) A kind of dynamic identifying method of MOOC course middle school student's facial expression
CN112561064B (en) Knowledge base completion method based on OWKBC model
CN101393599B (en) Game role control method based on human face expression
CN105825268B (en) The data processing method and system of object manipulator action learning
CN109829541A (en) Deep neural network incremental training method and system based on learning automaton
CN112200894B (en) Automatic digital human facial expression animation migration method based on deep learning framework
CN111144580B (en) Hierarchical reinforcement learning training method and device based on imitation learning
CN110222574A (en) Production operation Activity recognition method, apparatus, equipment, system and storage medium based on structuring double fluid convolutional neural networks
CN106407889A (en) Video human body interaction motion identification method based on optical flow graph depth learning model
CN109131348A (en) A kind of intelligent vehicle Driving Decision-making method based on production confrontation network
CN108363973A (en) A kind of unconfined 3D expressions moving method
JP2023524252A (en) Generative nonlinear human shape model
CN112651360B (en) Skeleton action recognition method under small sample
CN113633983B (en) Virtual character expression control method and device, electronic equipment and medium
CN113779289A (en) Drawing step reduction system based on artificial intelligence
Li et al. An e-learning system model based on affective computing
CN115761905A (en) Diver action identification method based on skeleton joint points
CN106326980A (en) Robot and method for simulating human facial movements by robot
CN110163098A (en) Based on the facial expression recognition model construction of depth of seam division network and recognition methods
Khalifa et al. An automatic facial age proression estimation system
Huang et al. Human-like facial expression imitation for humanoid robot based on recurrent neural network
CN112115779B (en) Interpretable classroom student emotion analysis method, system, device and medium
Cao et al. Facial Expression Study Based on 3D Facial Emotion Recognition
CN114127748A (en) Memory in personal intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant