CN112454390B

CN112454390B - Humanoid robot facial expression simulation method based on deep reinforcement learning

Info

Publication number: CN112454390B
Application number: CN202011355989.XA
Authority: CN
Inventors: 唐冰; 吴锋
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2022-05-17
Anticipated expiration: 2040-11-27
Also published as: CN112454390A

Abstract

The invention discloses a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, which comprises the following steps: step 1, acquiring a face picture of a target to be simulated, and performing face action unit vector prediction on the face picture by a depth reinforcement learning algorithm operated by a humanoid robot entity to obtain a corresponding face action unit vector and a corresponding motor action vector; step 2, acting the motor motion vector on the humanoid robot of the entity, and capturing the corresponding facial expression of the humanoid robot; and 3, taking the corresponding facial expression as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face. The method reduces the times of training on the humanoid robot entity, avoids influencing the service life of the humanoid robot hardware, and can ensure accurate imitation of the target human face expression.

Description

Humanoid robot facial expression simulation method based on deep reinforcement learning

Technical Field

The invention relates to the field of robot facial expression simulation, in particular to a method for simulating facial expression of a humanoid robot based on deep reinforcement learning.

Background

In human face-to-face communication, over 55% of the information is transmitted through the face. Because the humanoid robot has a humanoid face, the facial expression of the humanoid robot also plays a very important role in human-computer interaction, such as emotion expression and the like. In order to enable the humanoid robot to generate human recognizable facial expressions, one research content is that the humanoid robot can simulate human facial expressions and present the human facial expressions through the linkage effect of a head rigid motion structure and a face flexible material.

At present, the simulation method of the facial expression of the humanoid robot comprises a manual preset method, a feature point mapping method, an end-to-end network training method and the like. In the manual presetting method, because a series of motor motion vectors corresponding to basic expressions such as 'happy' and 'sad' closely related to a rigid motion structure of the humanoid robot need to be arranged in advance, the number of expressions which can be imitated by the humanoid robot is limited and fixed, and the facial expression type of the human face to be presented also needs to be fixedThe method has the defect that the capability of imitating the expressions of the humanoid robot is limited to a great extent in the range of realizing the arranged expression categories. Although the method for mapping the feature points realizes real-time simulation of facial expressions of the humanoid robot by using the motion capture system, the method needs to paste mark points on the faces of the human and the humanoid robot so as to realize the linear mapping relation of the positions of the corresponding feature points of the faces of the human and the humanoid robot, or further learns the mapping relation between the feature points of the faces and the motor values by using a machine learning method, but the machine learning method needs to train on the entity of the humanoid robot so as to obtain the mapping relation, but the method is not practical to paste special mark points on the faces of the humanoid robot in the real human-computer interaction process, so that the method is poor in convenience in use in an actual application scene. The end-to-end network training method realizes end-to-end training on the basis of manually arranging a large number of real sample data sets related to the humanoid robot entity and constructing a network model. Thereby obtaining the mapping relation between the facial action unit or the facial feature point and the motor value. Compared with a method of manual presetting and feature point mapping, the method of end-to-end network training can improve the richness of the imitation generation of the facial expression of the humanoid robot, but the method needs to manually arrange a large number of real sample data sets related to the humanoid robot, the whole training process is executed on a robot entity, the real sample data sets need to be rearranged when a new humanoid robot faces, and the model is constructed by a machine learning method again to carry out end-to-end training. And the output value of the motor of the humanoid robot for driving the rigid motion structure is a continuous value, if the output value of each motor is discretized, the quantity of the motor action vectors which can be arranged manually is exponentially increased (for example, N motors are provided, the rotation range of each motor is discretized into M, and the quantity of the motor action vectors which can be arranged at the moment is M^NOne) and also needs to take into account the programmed motor motion directionWhether the volume is capable of producing a corresponding real facial expression also causes a lot of time consumption in arranging the data set.

Therefore, how to provide a method for simulating the facial expression of the humanoid robot, which avoids manual arrangement, reduces the training times of the humanoid robot entity and reduces the hardware life consumption, is a problem to be solved.

Disclosure of Invention

Based on the problems in the prior art, the invention aims to provide a method for simulating the facial expression of a humanoid robot based on deep reinforcement learning, which can solve the problems that the existing end-to-end network training method for simulating the facial expression of the humanoid robot needs to manually arrange a data set, the time consumption is long, the training times on a humanoid robot entity are more, and the service life of hardware is consumed.

The purpose of the invention is realized by the following technical scheme:

the embodiment of the invention provides a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, which comprises the following steps:

step 1, acquiring a target human face image to be simulated, performing facial action unit vector prediction on the target human face image through a depth reinforcement learning algorithm running in a humanoid robot entity to obtain a corresponding facial action unit vector, and obtaining a corresponding motor action vector according to the facial action unit vector;

the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained;

step 2, the obtained motor motion vector acts on a humanoid robot of an entity, and the corresponding facial expression of the humanoid robot is captured;

and 3, taking the captured corresponding facial expression of the humanoid robot as an initial state, and controlling the facial action of the humanoid robot to imitate the facial expression of the target human face by a depth reinforcement learning algorithm running in the humanoid robot according to the initial state until the humanoid robot finishes the expression imitation of the target human face.

According to the technical scheme provided by the invention, the method for simulating the facial expression of the humanoid robot based on the deep reinforcement learning has the following beneficial effects:

because the pre-training neural network model which does not run on the humanoid robot entity is adopted, the real face data set is pre-trained, and then the pre-trained neural network model and the parameters are transferred to the deep reinforcement learning algorithm running on the humanoid robot entity, so that the actual training times on the humanoid robot entity are reduced; in addition, the pre-training neural network model can take a real face data set as training data, so that the work of manually arranging the data set is not needed; the method can greatly reduce manual arrangement work and actual training times on the humanoid robot entity, and does not need special auxiliary equipment to realize the simulation of the facial expression of the humanoid robot. The method can be conveniently applied to different humanoid robot entities according to the relation between the rigid motion structure and the face action unit.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention;

fig. 2 is a training flowchart of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a pre-trained neural network model in the method according to the embodiment of the present invention;

FIG. 4 is a schematic diagram of a building block of a deep reinforcement learning algorithm in the method according to the embodiment of the present invention;

fig. 5 is an overall architecture diagram of a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the specific contents of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention. Details which are not described in detail in the embodiments of the invention belong to the prior art which is known to the person skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides a method for simulating facial expressions of a humanoid robot based on deep reinforcement learning, including:

In step 1 of the above method, in step 1, performing, by using a depth-enhanced learning algorithm running on a humanoid robot entity, facial action unit vector prediction on a target face image:

and performing face action unit vector prediction on the picture of the target face part obtained by cutting the target face picture by using a depth reinforcement learning algorithm.

Referring to fig. 2, the method includes the step of pre-training the pre-training neural network model, specifically:

in the method, before step 1, the method further comprises the steps of pre-training the pre-trained neural network model and training a deep reinforcement learning algorithm, and specifically comprises the following steps:

step 11, screening the labels of the real face data set:

adopting a real human face data set consisting of human face pictures and corresponding facial action unit vectors as a training data set, and screening out label dimensions required for simulating the facial expression of the humanoid robot according to the corresponding relation between a rigid motion structure in the head of the humanoid robot and the facial action units;

step 12, pre-training the neural network model:

pre-training the pre-training neural network model by taking a face image in the real face data set as input and a face action unit vector corresponding to the face image in the real face data set as output, and determining the output of the pre-training neural network model by the label dimension screened in the step 11;

step 13, migrating the pre-trained neural network model and parameters:

migrating the pre-trained neural network model and the pre-trained parameters to a deep reinforcement learning algorithm running on the humanoid robot entity;

step 14, training a deep reinforcement learning algorithm:

and training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot.

In step 11 of the method, according to the corresponding relationship between the rigid motion structure inside the head of the humanoid robot and the facial action unit, the label dimensions required for simulating the facial expression of the humanoid robot are screened out as follows:

and if the humanoid robot can realize a certain action unit described in the facial action coding system, selecting a label corresponding to the action dimension in the real human face data set. Specifically, the generation of the facial expression of the humanoid robot is realized by driving an external flexible material by a rigid motion structure positioned in the head of the robot, so that deformation is generated and presented, the facial expression of the human face is generated by pulling skin by muscle tissues positioned under the skin, and the facial expression and the skin have certain similarity, so that the label dimension is screened according to the above.

In step 12 of the above method, the output of the pre-trained neural network model is determined by the label dimensions screened in step 11 as follows:

and determining the output dimension size and the corresponding meaning of each dimension of the pre-training neural network model according to the dimension size and the meaning of the label screened in the step 11.

In step 13 of the method, migrating the neural network model and parameters pre-trained in step 12 to a deep reinforcement learning algorithm running on the humanoid robot entity is:

the Actor module of the deep reinforcement learning algorithm adopts the structure and parameters completely same as those of the pre-trained neural network model;

and migrating the pre-trained neural network model and the parameters to an Actor module of the deep reinforcement learning algorithm.

In step 14 of the method, the training of the deep reinforcement learning algorithm on the entity of the humanoid robot is as follows:

in the training process, the corresponding motor action vector predicted by the pre-training neural network model at each time acts on a humanoid robot entity, and after the humanoid robot executes the facial action corresponding to the motor action vector, the next training of the deep reinforcement learning algorithm is carried out.

In step 2 of the method, a camera is used for capturing a facial picture of the humanoid robot to obtain a corresponding facial expression from the facial picture.

In step 3 of the method, the deep reinforcement learning algorithm running on the humanoid robot controls the facial motion of the humanoid robot according to the initial state until the humanoid robot finishes imitating the expression of the target human face as follows:

and determining the next facial action taken by the humanoid robot according to the initial state by a deep reinforcement learning algorithm operated on the humanoid robot, displaying a new facial expression on the humanoid robot after executing the facial action, giving a corresponding reward according to the similarity between the new facial expression and the target facial expression, and realizing the simulation of the facial expression of the humanoid robot in a limited step under the guidance of the reward. Specifically, the facial expression of the human face is set as a target, and the facial expression of the humanoid robot at a certain time t is set as an initial state S_tIn the initial state S_tThe physical action acted on the humanoid robot by the face action unit of the humanoid robot is a_tThe facial expression of the humanoid robot after the action is executed is changed under the driving of the facial action unit, and the changed facial expression is a state S_t+1At the moment, corresponding reward r is given according to the similarity between the facial expression of the human face and the facial expression of the humanoid robot_tWill then be further based on the award r_tAnd the state determines the next facial action unit action to be performed.

Referring to fig. 3, in the above method, the pre-training neural network model adopts a neural network model formed by sequentially connecting a VGG16 neural network model, a leveling layer, a first fully-connected layer, a second fully-connected layer and an output layer.

In the method, the depth reinforcement learning algorithm adopts a depth determination strategy gradient algorithm.

In the method for simulating the facial expression of the humanoid robot, the collected real human face data set is used as training data, and a pre-training mode is carried out by matching with a pre-training neural network model which does not run on the humanoid robot entity, so that the training times on the humanoid robot entity are reduced while manual arrangement is avoided, special auxiliary equipment is not needed, and the accurate simulation of the facial expression of the humanoid robot can be realized. The method can be conveniently applied to different humanoid robot entities according to the relation between the rigid motion structure and the face action unit.

The embodiments of the present invention are described in further detail below.

Referring to fig. 1, the invention provides a method for simulating facial expressions of a humanoid robot based on a deep reinforcement learning algorithm, which comprises the following steps:

step 1, firstly, obtaining a picture of a target face, and performing corresponding face action unit vector prediction on the cut target face by using a pre-trained neural network model to further obtain a motor action unit vector corresponding to the humanoid robot;

step 2, the obtained motor motion vector acts on a humanoid robot entity, and the facial expression of the humanoid robot is captured by a camera;

and 3, taking the captured facial expression of the humanoid robot as an initial state, determining the facial action to be taken by the humanoid robot next step by a depth reinforcement learning algorithm (a depth determination strategy gradient algorithm, namely a DDPG algorithm, is adopted in the embodiment) running in the humanoid robot according to the initial state, presenting a new facial expression on the humanoid robot after executing the relevant action, and realizing the simulation of the humanoid robot on the facial expression of the target human face in a limited step under the guidance of obtaining the reward.

Referring to fig. 2, since the pre-trained neural network model needs to be pre-trained and the deep reinforcement learning algorithm needs to be trained on the decision humanoid robot, a specific process related to training of the two models is shown in fig. 1, and includes the following steps:

step 11, screening real face data set labels: adopting a real human face data set consisting of human face pictures and corresponding facial action unit value vectors, and screening out label dimensionality required by the invention for simulating the facial expression of the humanoid robot according to the corresponding relation between the rigid motion structure in the head of the humanoid robot and the facial action units;

step 12, pre-training the pre-training neural network model: the input of the pre-training neural network model is a cut human face image, the output is a value vector of a facial action unit corresponding to the facial image, and the output of the pre-training neural network model is determined by the result of label screening of a real human face data set; the pre-training neural network model is mainly used for constructing mapping from facial expression pictures to facial action unit value vectors, so that under the condition of giving the facial pictures, the pre-training neural network model can give the facial action unit value vectors screened from the current face. Because the pre-training neural network model is used, the training times of directly acting on a robot entity in the process of realizing the simulation generation of the facial expression of the humanoid robot are reduced, thereby achieving the purpose of protecting the hardware of the robot;

step 13, pre-training the network model and parameter migration: the pre-trained neural network model and parameters are wholly or partially transferred to a deep reinforcement learning algorithm, and the deep reinforcement learning algorithm runs on a humanoid robot entity for training, so that the pre-trained neural network model can be fully utilized; the specific migration method can be determined according to the model structure of the adopted deep reinforcement learning algorithm, the structure for action prediction in the structure can adopt all migration, the structure for action evaluation adopts partial migration, and the number of times and time of training the deep reinforcement learning algorithm on the entity humanoid robot are reduced through the migration, so that the possibility of damage to hardware of the entity humanoid robot is reduced. By adopting a certain degree of similarity between the pre-trained neural network model and the model adopted by the deep reinforcement learning algorithm, the pre-trained network model and the parameters can be migrated to the deep reinforcement learning algorithm, and specifically, only the relevant network layer parameters in the pre-trained network model need to be loaded to the corresponding network layer in the deep reinforcement learning algorithm model.

Step 4, deep reinforcement learning algorithm training: the deep reinforcement learning algorithm is trained on a humanoid robot entity, and the facial expression of the humanoid robot achieves the same or nearly the same effect as the facial expression of the human face in a limited step under the control of the trained deep reinforcement learning algorithm, so that the simulation generation of the facial expression of the humanoid robot is achieved. Specifically, in this step, the facial expression of the human face is regarded as the target, and the facial expression of the humanoid robot at a certain time t is regarded as the state S_tIn this state, the action applied to the humanoid robot entity by the bottom layer driving module is a_tThe facial expression of the humanoid robot after the action is executed is changed under the driving of the motor, and the changed facial expression is a state S_t+1At the moment, corresponding reward r is given according to the similarity between the facial expression of the human face and the facial expression of the humanoid robot_tThe motor action to be executed next is further determined according to the reward and the state.

For ease of understanding, the methods described below in connection with the specific examples are described.

The embodiment provides a humanoid robot facial expression imitation method based on a deep reinforcement learning algorithm, which comprises the following steps:

step 11, screening real face data set labels:

the real face data set used in this step is a face data set FEAFA (see http: www.iiplab.net/FEAFA /) collected in the real environment calculated by the Chinese academy, and is composed of face pictures and floating point intensity value labels corresponding to 24 redefined face action units in each face picture. According to the characteristics of a rigid operation structure of the humanoid robot, 11 label dimensions in the original FEAFA face data set are selected, the selected label dimensions are redefined and described, and the redefined face action units can be displayed through the humanoid robot face under the action of the motor. Table 1 shows the redefined facial action units and their associated description taken in this embodiment.

AU numbering	Description of the invention	Original description
			1	Left Eye Close	AU43，Eye Close
2	Right Eye Close	AU43，Eye Close
			3	Left Lid Raise	AU5，Lid Raise
4	Right Lid Raise	AU5，Lid Raise
			5	Left Brow Lower	AU4，Brow Lower
6	Right Brow Lower	AU4，Brow Lower
			7	Jaw Drop	AU26，Jaw Drop
8	Left Lip Corner Pull	AU12，Lip Corner Pull
			9	Right Lip Corner Pull	AU12，Lip Corner Pull
10	Left Lip Corner Stretch	AU20，Lip Corner Stretch
			11	Right Lip Corner Stretch	AU20，Lip Corner Stretch

It can be known that, the required data set label dimension is screened according to the linkage relationship between the facial action unit and the rigid motion structure of the head of the humanoid robot, the FEAFA face data set of the department of china is used in the embodiment of the present invention, other similar face data sets can be used in the method of the present invention, and the use of other types of face data sets should not be considered as the main difference from the present invention.

In step 11, the filtering dimensions and filtering results of the face dataset are redefined and described, only the exemplary filtering dimensions and redefining results (shown in table 1) of 11 labels are redefined and described, and other numbers of filtering dimensions and redefining results of labels may be filtered.

Step 12, pre-training the pre-training neural network model:

the pre-trained neural network model adopted in this step is shown in fig. 3, and is composed of a feature extraction layer composed of convolutional layers and two full-connection layers, where the feature extraction layer is a feature extraction layer composed of convolutional layers in a VGG-16 neural network model pre-trained on an ImageNet large-scale data set, and the full-connection layer is composed of two hidden layers (the activation function uses a relu function) and an output layer (the activation function uses a sigmoid function), and in this embodiment, the number of neurons in the full-connection layers is 512 and 128, and the number can be defined by itself according to actual requirements. The training data set used for the pre-training neural network model is a filtered FEAFA real face data set: the human face picture and 11 face action units corresponding to the rigid motion structure of the humanoid robot; the cut face picture is input into a pre-training neural network, the output after feature extraction is input into a full connection layer, the output of the whole pre-training neural network is a floating point intensity value vector of a designated face action unit (each floating point intensity value range is 0-1), and each dimensional element in the floating point intensity value vector is linearly converted into a motor rotation angle corresponding to a rigid motion structure (if the mouth opening and closing floating point intensity value is 0.5, the motor rotation range is 0-40 degrees, the corresponding motor rotation angle is 20 degrees);

it can be known that, in this step 12, the screened face data set is used for the pre-trained neural network model, so as to improve the convergence speed of the neural network model training, only the feature extraction layer of the VGG16 pre-trained network model and the self-defined full connection are used for constructing the full connection layer as an example, other types of pre-trained neural network models may also be used, and the use of other types of pre-trained neural network models should not be considered as the main difference from the present invention.

Step 13, pre-training neural network model and parameter migration:

migrating the pre-trained neural network model and parameters in the step 12 to a Deep reinforcement learning algorithm, wherein a Deep determination strategy Gradient algorithm (DDPG) is used in the Deep reinforcement learning algorithm, and the pre-trained neural network model and parameters shown in fig. 4 are migrated in the following manner: an Actor module in the depth determination strategy gradient algorithm adopts a structure and parameters completely identical to those of a pre-trained neural network model, a criticic module is used for evaluating actions output by the Actor module in a characteristic state, so that a characteristic extraction layer and two layers of full connection layers of the pre-trained neural network model are migrated into the criticic module, floating point strength value vectors of a face action unit are fused after the output result of the characteristic extraction layer is subjected to a flattening operation and further serve as the input of the full connection layers, and the final output result of the criticic module (the structure of the criticic module is schematically shown in figure 4) is the evaluation of the current state and the actions. In the depth determination strategy gradient algorithm, the Actor module inputs a face picture of the humanoid robot and outputs a face action unit floating point strength value vector (namely a face action unit vector), and the vector is converted into a corresponding motor action vector; the Critic module inputs a face picture of the humanoid robot and a motor action vector to be executed by the humanoid robot in the face state;

according to the reward obtained after the motor action vector is executed, the humanoid robot enables the similarity between the facial expression of the humanoid robot and the target facial expression to be continuously close to an allowable range, and therefore the purpose of simulating the target facial expression is achieved.

It can be understood that, in this step 13, the pre-trained neural network model and the parameters are migrated to the deep reinforcement learning algorithm structure according to a certain requirement, or other methods may be used for migration, and the migration operation performed by using other methods should not be considered as a main difference from the present invention.

In addition, in this step 13, the google FECNet network is used in the example to compare the similarity between the facial expression of the humanoid robot and the facial expression of the target human face, and other methods may also be used to implement the similarity comparison, and the use of other methods to implement the similarity comparison function should not be regarded as the main difference from the present invention.

Step 14, deep reinforcement learning algorithm training:

In this embodiment, the architecture for implementing the method is shown in fig. 5, the pre-training neural network and the deep reinforcement learning algorithm training are both performed on an upper computer (the machine memory is 16G, the CPU is Intel I7, and the graphics card is RTX 2080Ti) with an Ubuntu operating system, the deep learning frame used is Keras, the rear end of the deep learning frame is tens flow, the upper computer is used for capturing a human face and is a high-definition auto-focusing common camera webcam, and the upper computer performs actual control on an action value output by the algorithm through a 16-way steering engine drive board produced by Torobot corporation.

It can be known that the example in step 14 uses a DDPG algorithm (depth determination strategy gradient algorithm) as a depth reinforcement learning algorithm, the DDPG algorithm is used for solving the depth reinforcement learning problem on the continuous motion space, it first approximates the Q function using a deep neural network, and second it is a deterministic strategy, for any state, outputting the optimal action corresponding to the state, and determining the action of the action, instead of a set of actions comprising probability distributions, and finally a method introducing a policy gradient, other types of deep reinforcement learning algorithms that can be used for continuous motion spatial control can also be applied to the method of the present invention, such as NAF algorithm, A3C algorithm, etc., as long as the deep reinforcement learning algorithm satisfying the control of the facial expression of the humanoid robot can be used, the use of other types of deep reinforcement learning algorithms should not be considered as the main difference from the present invention.

The simulation method is used for the pre-training neural network model for identifying the strength of the human face action unit, so that the training times of the humanoid robot entity are greatly reduced in the training process of the humanoid robot facial expression simulation method based on the deep reinforcement learning algorithm. In addition, the method can realize that the facial expression of the humanoid robot and the target facial expression can be observed approximately in a limited step, thereby solving the problem of unmoldability and simulation caused by the strong nonlinear characteristic of the flexible material of the humanoid robot face. In addition, when the pre-trained neural network model is trained, the network output of the pre-trained neural network model contains unused face Action units, so that if the degree of freedom is increased or a new humanoid robot is faced on the basis of the existing structure, the method can be generalized according to the relationship between the new structure and the face Action units, and the method can be generalized to the humanoid robot face expression simulation based on a Face Action Coding System (FACS) as a rigid operation structure design reference.

Those of ordinary skill in the art will understand that: all or part of the processes of the methods according to the embodiments may be implemented by a program, which may be stored in a computer-readable storage medium, and when executed, may include the processes according to the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A facial expression imitation method of a humanoid robot based on deep reinforcement learning is characterized by comprising the following steps:

the initialization parameters of a prediction module of the deep reinforcement learning algorithm adopt parameters transferred from a pre-trained neural network model, and the pre-trained neural network model runs outside a humanoid robot entity and is pre-trained; the method comprises the following steps of pre-training the pre-training neural network model and training a deep reinforcement learning algorithm, and specifically comprises the following steps:

step 11, screening the labels of the real face data set:

step 12, pre-training the neural network model:

step 13, migrating the pre-trained neural network model and parameters:

step 14, training a deep reinforcement learning algorithm:

training the deep reinforcement learning algorithm on the humanoid robot entity, and after the deep reinforcement learning algorithm is trained, simulating the facial expression of the humanoid robot;

2. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 1, the vector prediction of facial action units is performed on the target human face picture by a deep reinforcement learning algorithm running on a humanoid robot entity as follows:

3. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 11, label dimensions screened for simulating facial expressions of the humanoid robot are:

and if the humanoid robot can realize a certain action unit described in the facial action coding system, selecting a label corresponding to the action dimension in the real human face data set.

4. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 12, the label dimensions screened out in the step 11 determine the output of the pre-trained neural network model as:

5. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 13, the migration of the neural network model and parameters pre-trained in the step 12 into the deep reinforcement learning algorithm running on the humanoid robot entity is:

6. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning of claim 1, wherein in the step 14, the deep reinforcement learning algorithm is trained on the body of the humanoid robot as follows:

7. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning as claimed in any one of claims 1 to 2, wherein in the step 2, a camera is used for capturing a picture of the face of the humanoid robot to obtain the corresponding facial expressions.

8. The method for simulating facial expressions of a humanoid robot based on deep reinforcement learning according to any one of claims 1 to 2, wherein in the step 3, the deep reinforcement learning algorithm running on the humanoid robot controls the facial actions of the humanoid robot to simulate the target facial expressions according to the initial state until the humanoid robot finishes simulating the target facial expressions:

and determining the next facial action taken by the humanoid robot according to the initial state by a deep reinforcement learning algorithm operated on the humanoid robot entity, displaying a new facial expression on the humanoid robot face after executing the facial action, giving a corresponding reward according to the similarity between the new facial expression and the target facial expression, and realizing the simulation of the humanoid robot on the target facial expression in a limited step under the guidance of the reward.

9. The method for simulating facial expressions of the humanoid robot based on deep reinforcement learning of any one of claims 1 to 2, wherein the pre-trained neural network model adopts a neural network model formed by sequentially connecting a VGG16 neural network model, a flattening layer, a first full-link layer, a second full-link layer and an output layer;

the depth reinforcement learning algorithm adopts any one of a depth determination strategy gradient algorithm, a NAF algorithm and an A3C algorithm.