WO2021196389A1 - Facial action unit recognition method and apparatus, electronic device, and storage medium - Google Patents

Facial action unit recognition method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2021196389A1
WO2021196389A1 PCT/CN2020/092805 CN2020092805W WO2021196389A1 WO 2021196389 A1 WO2021196389 A1 WO 2021196389A1 CN 2020092805 W CN2020092805 W CN 2020092805W WO 2021196389 A1 WO2021196389 A1 WO 2021196389A1
Authority
WO
WIPO (PCT)
Prior art keywords
recognized
face image
face
feature map
key points
Prior art date
Application number
PCT/CN2020/092805
Other languages
French (fr)
Chinese (zh)
Inventor
胡艺飞
徐国强
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021196389A1 publication Critical patent/WO2021196389A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the field of computer vision technology, and in particular to a method, device, electronic device, and storage medium for facial motion unit recognition.
  • Facial expression recognition and facial emotion analysis are currently popular areas of computer vision research, and the results of these studies depend on the recognition accuracy of facial action units (AU) to varying degrees.
  • the so-called facial action unit refers to the recognition of whether the muscle actions of specific parts of the face such as blinking, frowning, and beating mouth appear.
  • deep learning has a wide range of applications in the recognition of facial action units, that is, by constructing a network model
  • the inventor realized that most of the existing facial motion unit recognition models support a small number of facial motion units, and the description of subtle facial expression changes is relatively rough.
  • the output of the facial action unit recognition model will be affected, resulting in a lower recognition accuracy. Low.
  • the embodiments of the present application provide a method, a device, an electronic device, and a storage medium for identifying a facial action unit, which are beneficial to improve the accuracy of facial action unit recognition in a face image.
  • an embodiment of the present application provides a facial motion unit recognition method, which includes:
  • the second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized
  • the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
  • an embodiment of the present application provides a facial motion unit recognition device, which includes:
  • the image acquisition module is used to acquire the first face image to be recognized uploaded by the terminal;
  • a face detection module configured to use a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
  • a face correction module configured to perform face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
  • the facial motion unit recognition module is used to input the second facial image to be recognized into a pre-trained facial motion unit recognition model, and then process the main network part, attention mechanism and fully connected layer of the facial motion unit recognition model , Obtaining the facial action unit recognition result of the first face image to be recognized, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is composed of a deep residual network and a deep dense network Stacked
  • the recognition result output module is configured to output the facial action unit recognition result of the first face image to be recognized to the terminal.
  • an embodiment of the present application provides an electronic device that includes an input device and an output device, and also includes a processor, which is adapted to implement one or more instructions; and, a computer-readable storage medium.
  • the readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by the processor and executing the following steps:
  • the second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized
  • the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the following steps :
  • the second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized
  • the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
  • the location information of the key points of the face of the first face image to be recognized is first obtained, and the location information is used to compare the position information of the face image in the first face image to be recognized.
  • the face is corrected to straighten it, and then the second face image to be recognized is input into the facial action unit recognition model composed of the main network part, the attention mechanism module and the fully connected layer for recognition.
  • the facial action unit recognition result is more accurate than the prior art.
  • Figure 1 is a network architecture diagram provided by an embodiment of the application
  • Fig. 2a is an example diagram of obtaining a face image provided by an application embodiment
  • Fig. 2b is another example diagram of obtaining a face image provided by the application embodiment
  • FIG. 3 is a schematic flowchart of a facial motion unit recognition method provided by an embodiment of this application.
  • FIG. 4 is a schematic structural diagram of a convolutional neural network model provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a facial action unit recognition model provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a deep residual dense network provided by an embodiment of this application.
  • FIG. 7 is a schematic flowchart of another facial motion unit recognition method provided by an embodiment of this application.
  • FIG. 8 is a schematic structural diagram of a facial action unit recognition device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the embodiment of the application proposes a facial action unit recognition scheme, which can be applied to face-to-face audits, customer expression analysis, and psychological activity analysis when handling business (for example, loan business, insurance business), etc.
  • the face used in the scheme The action unit recognition model combines a deep residual network and a deep dense network to ensure that high-level features can be learned, thereby improving the accuracy of facial action unit recognition on the face image input by the terminal.
  • the features of the facial action units of the face are similar. If the models are trained separately for different facial action units, a lot of repetitive work will be generated. This solution branches the facial action unit recognition model in the high-level feature stage, and only needs to train one.
  • the model can identify 39 facial motion units, which can reduce the difficulty of deploying the facial motion unit recognition model on the device and increase the running speed of the model.
  • the solution can be implemented based on the network architecture shown in Figure 1.
  • the network architecture includes at least a terminal and a server.
  • the terminal and the server communicate through a network, which includes but is not limited to virtual private Network, local area network, metropolitan area network, the terminal can directly collect face images, or it can rely on external image collection tools for face images and then obtain face images from external image collection tools.
  • the terminals can be mobile phones, tablets, and laptops. Computers, handheld computers and other equipment. In some embodiments of the present application, as shown in FIG.
  • the terminal can automatically complete the collection of face images when a face is detected, and then send the collected face images to the server.
  • the terminal can also start collecting face images after the controls on the screen are triggered, and then send the collected face images to the server.
  • the controls can appear in a fixed form or It appears in the form of floating, and the triggering method can be light touch, long press, slide, etc., which is not limited here.
  • the processor After the server obtains the face image sent by the terminal, the processor performs a series of operations such as face key point detection, face correction, and calling the facial motion unit recognition model for facial motion unit recognition, and finally outputs the recognition result to the terminal. User presentation.
  • the server can be a single server, a server cluster, or a cloud server. It is the main body of execution of the entire facial motion unit recognition solution. It can be seen that the network architecture shown in Figure 1 can enable this solution to be implemented. Of course, The network architecture can also include more components, such as a database and so on.
  • FIG. 3 is a schematic flowchart of a facial motion unit recognition method according to an embodiment of the application. As shown in FIG. 3, it includes steps S31-S35:
  • S31 Acquire the first face image to be recognized uploaded by the terminal.
  • the first face image to be recognized is the original face image uploaded by the terminal without face detection and face correction. It can be a face image in any open source database at home and abroad, or It is a customer's face image collected when a bank, insurance company, communication company, etc. handle business, or it can also be an image collected by monitoring equipment in any monitoring area such as a residential area or a shopping mall.
  • the key points of the face are the five key points of the two eyes, the nose, and the corners of the mouth on the left and right sides of the detected face
  • the position information is the coordinates of the key points, for example: the center points of the two eye ellipses
  • the pre-trained convolutional neural network model refers to Multi-task Cascaded Convolutional Networks (MTCNN), as shown in Figure 4, using a three-layer cascade architecture combined with a convolutional neural network algorithm for face detection and key points Positioning includes neural networks P-Net, R-Net and O-Net.
  • the first face image to be recognized is first input to P-Net for recognition, and the output of P-Net is used as the input of R-Net.
  • R-Net The output is used as the input of O-Net, the input size of each network is different, the input size of P-Net is 12*12*3, the input size of R-Net is 24*24*3, and the input size of O-Net 48*48*3, the processing in P-Net is mainly 3*3 convolution and 2*2 pooling, and the processing in R-Net is mainly 3*3 convolution and 3*3 pooling And 2*2 pooling, the processing in O-Net is better than R-Net 3*3 convolution and 2*2 pooling.
  • a face classifier is used to determine whether the area is a face. At the same time Use border regression and a key point locator to detect the face area.
  • the processing process of the multi-task convolutional neural network is: input the first face image to be recognized into P-Net for recognition to obtain the first candidate window and the bounding regression box, and compare the first candidate form and the bounding regression box according to the bounding regression box.
  • the candidate window is calibrated, and non-maximum value suppression is used to remove the overlapped first candidate window after calibration to obtain a second candidate window; the second candidate window is input into R-Net for identification, and falsehoods are filtered out Obtain the third candidate window of the second candidate window; input the third window into O-Net for recognition, output the face area through the bounding box regression, and output the first face image to be recognized through key point positioning Location information of key points in the face.
  • P-Net does not use full connection
  • R-Net and O-Net use 128 channels and 256 channels respectively
  • O-Net has one more layer of convolution processing than R-Net. .
  • S33 Perform face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized.
  • the second face image to be recognized is a straightened face image obtained after face correction is performed on the first face image to be recognized, where face correction involves zooming, rotation, and translation, etc.
  • face correction involves zooming, rotation, and translation, etc.
  • MTCNN MTCNN
  • obtain the position information of the key points of the face in the first face image to be recognized obtain the position information of the key points of the face in the pre-stored standard face image.
  • standard face image refers to the image The face is front and the head does not have a face that does not need to be corrected.
  • the position information (coordinate information) of the key points of the face in the standard face image has been obtained in advance and stored in the preset database.
  • the position information of the key points of the face in the recognition face image is compared with the position information of the key points of the face in the standard face image, and the similarity transformation matrix H is obtained, and the similarity transformation matrix H is solved according to the following similarity transformation matrix equation:
  • the position information of each pixel in the first face image to be recognized is multiplied by the similar transformation matrix H obtained after the solution, to obtain the second face image to be recognized with the face straightened.
  • (x, y) represents the location information of the key points of the face in the first face image to be recognized
  • (x', y') represents the location information of the key points of the face in the standard face image
  • s represents the scaling factor
  • represents the rotation angle, usually counterclockwise rotation
  • (t x , t y ) represents the translation parameter.
  • the transformation.SimilarityTransform function can be used to iteratively solve the similarity transformation matrix H. This function comes from the python sklearn library (a machine learning library).
  • the main body network part includes a plurality of deep residual dense networks, and each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network.
  • the structure of the facial action unit recognition model is shown in Figure 5, which mainly includes the main body network part, the attention mechanism module and the last fully connected layer.
  • the input of the model is a color image in RGB format, that is, the input
  • the image depth of is 3, and the recognition result of the model is the probability value of the appearance of 39 facial action units. If the facial action unit is greater than or equal to 0.5, it means that the facial action unit appears, and if it is less than 0.5, it means that the facial action unit does not appear.
  • output AU45 (blink) The value of is 0.8, and the value of AU04 (frowning) is 0.3, which means that the face in the input image has AU45 but not AU04.
  • the aforementioned second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism, and fully connected layer of the facial action unit recognition model are processed to obtain the
  • the recognition result of the facial action unit of the first face image to be recognized includes:
  • the first feature map and the second feature map are combined in the depth direction
  • the second feature map is spliced, and the spliced feature map is convolved 1*1 to obtain the third feature map; the width and height of the third feature map are multiplied by the width and height of the higher-order feature map to obtain
  • a target feature map the target feature map is used as the input of the fully connected layer, and the fully connected layer performs two classifications, and finally outputs the facial action unit recognition result of the first face image to be recognized.
  • the main network part of the facial action unit recognition model is composed of four deep residual dense networks, with a total of 92 hidden layers. As shown in Figure 6, each deep residual dense network is stacked by a deep residual module and a deep dense module.
  • a deep residual dense network starts with a 1*1 convolutional layer, followed by a 3*3 convolutional layer, and is divided into two parts after the last 1*1 convolutional layer, one part is divided into two parts according to the corresponding width and height Access to the deep residual module in an additive manner, using the characteristics of the residual network, so that the learned good features will not be forgotten as the network deepens, for example: the width and sum of the features obtained by the second hidden layer
  • the two dimensions of height are added to the width and height of the feature obtained by the fifth hidden layer, the depth dimension remains unchanged, and the other part is connected to the path of the depth-intensive module, for example: the depth of the feature obtained by the second hidden layer is this One dimension is stitched with the depth of the features obtained by the fifth hidden layer to maintain the
  • the stitched feature depth will be 50, while the width and height remain unchanged .
  • the main network part adopts a combination of deep residual network and deep dense network. Compared with the prior art only using deep residual network, it is more conducive to maintaining the diversity of high-order features, and more Conducive to accurately identify 39 facial action units.
  • the function of the attention mechanism module is to give weights to the high-order features extracted by the main network part, so that these high-order features can be recombined. It uses a combination of maximum pooling, average pooling and 1*1 convolution. Its input is the output of the main network part. After maximum pooling and average pooling, two feature maps with the same width and height as the input feature and depth of 1 are obtained, namely the first feature map and the second feature map. The two feature maps are stitched in depth, and the output feature map of the attention mechanism module is obtained through the convolution of 1*1 convolution, that is, the third feature map.
  • the width and height of the output feature map are compared with the attention mechanism Multiply the width and height corresponding to the input feature map of the module (ie, high-order feature map) to obtain the input feature map of the fully connected layer, that is, the target special diagnosis map.
  • the two-class probability values of three facial action units, and finally the two-class probability values of 39 facial action units are output to the terminal, and the recognition results of the facial action units of the first image to be recognized are displayed.
  • the use of different scales of maximum pooling and average pooling for processing is conducive to capturing feature information of different scales, focusing on obtaining the weights of the width and height dimensions, and clarifying which position of the input face has more feature information. Conducive to the recognition of facial action units.
  • the location information of the key points of the face of the first face image to be recognized is first obtained, and the location information is used to compare the position information of the face image in the first face image to be recognized.
  • the face is corrected to straighten it, and then the second face image to be recognized is input into the facial action unit recognition model composed of the main network part, the attention mechanism module and the fully connected layer for recognition.
  • the facial action unit recognition result is more accurate than the prior art.
  • FIG. 7 is a schematic flowchart of another facial motion unit recognition method provided by an embodiment of the application. As shown in FIG. 7, it includes steps S71-S76:
  • S74 Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image. Obtain the second face image to be recognized;
  • the above-mentioned comparison of the position information of the key points of the face in the first face image to be recognized and the position information of the key points of the face in the standard face image is performed on the first face to be recognized.
  • Perform face correction on the face image to obtain the second face image to be recognized including:
  • the position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  • MTCNN is used to perform face correction, and the model can accurately judge when the face rotates at different angles in the first face image to be recognized, which ensures the stability of the model.
  • the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
  • the aforementioned second face image to be recognized is input into a pre-trained facial action unit recognition model, and passes through the main body network part, attention mechanism, and fully connected layer of the facial action unit recognition model
  • the processing to obtain the facial action unit recognition result of the first face image to be recognized includes:
  • obtaining the target feature map according to the first feature map and the second feature map described above includes:
  • the target feature map is input into the fully connected layer of the facial action unit recognition model to perform two classifications, and the facial action unit recognition result of the first face image to be recognized is output.
  • the above-mentioned inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map includes:
  • the second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks
  • the convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network
  • the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network
  • the features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  • the main network part of the facial action unit recognition model is formed by stacking deep residual networks and deep dense networks to ensure that higher-order features are learned, plus maximum pooling, average pooling, and 1*1
  • the convolutional attention mechanism module helps to delete redundant features and improves the recognition accuracy of 39 facial action units.
  • S76 Output the facial action unit recognition result of the first facial image to be recognized to the terminal.
  • the present application also provides a facial motion unit recognition device.
  • the facial motion unit recognition device may be a computer program (including program code) running in a terminal.
  • the facial motion unit recognition device can execute the method shown in FIG. 3 or FIG. 7. See Figure 8.
  • the device includes:
  • the image acquisition module 81 is configured to acquire the first face image to be recognized uploaded by the terminal;
  • the face detection module 82 is configured to use a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized to obtain position information of key points of the face in the first face image to be recognized ;
  • the face correction module 83 is configured to use the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second face image to be recognized;
  • the facial action unit recognition module 84 is used to input the second face image to be recognized into a pre-trained facial action unit recognition model, and pass through the main network part, attention mechanism and fully connected layer of the facial action unit recognition model Processing to obtain the facial action unit recognition result of the first face image to be recognized, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is composed of a deep residual network and a deep dense Network stacking;
  • the recognition result output module 85 is configured to output the facial action unit recognition result of the first facial image to be recognized to the terminal.
  • the face correction module 83 is specifically configured to :
  • the face correction module 83 is specifically configured to:
  • the position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  • the facial motion unit recognition module 84 is specifically configured to:
  • the facial action unit recognition module 84 is specifically configured to:
  • the target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
  • the facial action unit recognition module 84 is specifically configured to:
  • the second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks
  • the convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network
  • the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network
  • the features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  • the various modules of the facial motion unit recognition device shown in FIG. 8 can be separately or completely combined into one or several other units to form, or some of the modules can also be disassembled. It is composed of multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit.
  • the facial motion unit recognition device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
  • a general-purpose computing device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM) and other processing elements and storage elements
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • Run a computer program capable of executing the steps involved in the corresponding method shown in FIG. 3 or FIG. 7 to construct the facial motion unit recognition device as shown in FIG. 8 and to implement the present application
  • the facial motion unit recognition method of the embodiment may be recorded on, for example, a computer-readable recording medium, and loaded into the above-mentioned computing device through the computer-readable recording medium, and run in it.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. A device 902, an output device 903, and a computer-readable storage medium 904. Wherein, the processor 901, the input device 902, the output device 903, and the computer-readable storage medium 904 in the electronic device may be connected by a bus or other methods.
  • the computer-readable storage medium 904 may be stored in the memory of the electronic device.
  • the computer-readable storage medium 904 is used to store a computer program.
  • the computer program includes program instructions.
  • the processor 901 is used to execute the computer-readable Program instructions stored in the storage medium 904.
  • the processor 901 (or CPU (Central Processing Unit, central processing unit)) is the computing core and control core of an electronic device. It is suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve Corresponding method flow or corresponding function.
  • the processor 901 of the electronic device provided in the embodiment of the present application may be used to perform a series of facial action unit recognition processing on the acquired facial image:
  • the second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized
  • the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
  • the processor 901 executes the use of the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second face image to be recognized, including :
  • the processor 901 performs the comparison of the position information of the key points of the face in the first face image to be recognized and the position information of the key points of the face in the standard face image.
  • the performing face correction on the first face image to be recognized to obtain the second face image to be recognized includes:
  • the position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  • the processor 901 executes the input of the second face image to be recognized into the pre-trained facial action unit recognition model, and passes through the main network part and the attention of the facial action unit recognition model.
  • the mechanism and the processing of the fully connected layer to obtain the facial action unit recognition result of the first face image to be recognized includes:
  • the execution of the processor 901 to obtain the target feature map according to the first feature map and the second feature map includes:
  • the target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
  • the processor 901 executing the input of the second face image to be recognized into the main network part for feature extraction to obtain a high-level feature map includes:
  • the second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks
  • the convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network
  • the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network
  • the features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  • the above-mentioned electronic device may be a server, a computer host, a cloud server and other devices.
  • the electronic device may include, but is not limited to, a processor 901, an input device 902, an output device 903, and a computer-readable storage medium 904.
  • a processor 901 an input device 902, an output device 903, and a computer-readable storage medium 904.
  • the schematic diagram is only an example of the electronic device, and does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different components.
  • the processor 901 of the electronic device executes the computer program to implement the steps in the facial motion unit recognition method described above, the above-mentioned embodiments of the facial motion unit recognition method are all applicable to the electronic device, and can achieve the same Or similar beneficial effects.
  • the embodiment of the present application also provides a computer-readable storage medium (Memory).
  • the computer-readable storage medium is a memory device in an electronic device for storing programs and data. It can be understood that the computer-readable storage medium herein may include a built-in storage medium in the terminal, and of course, may also include an extended storage medium supported by the terminal.
  • the computer-readable storage medium provides storage space, and the storage space stores the operating system of the terminal.
  • one or more instructions suitable for being loaded and executed by the processor 901 are stored in the storage space, and these instructions may be one or more computer programs (including program codes).
  • the computer-readable storage medium here can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away
  • the aforementioned processor 901 is a computer-readable storage medium.
  • the processor 901 can load and execute one or more instructions stored in a computer-readable storage medium to implement the corresponding steps of the above-mentioned facial motion unit recognition method; in a specific implementation, the computer-readable storage medium One or more instructions of is loaded by the processor 901 and executes the following steps:
  • the second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized
  • the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
  • the position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  • the target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
  • the second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks
  • the convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network
  • the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network
  • the features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  • the computer program in the computer-readable storage medium includes computer program code
  • the computer program code may be in the form of source code, object code, executable file, or some intermediate form, etc.
  • the computer-readable storage medium may It is non-volatile or volatile.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • the program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Abstract

A facial action unit recognition method and apparatus, an electronic device and a storage medium. Said method comprises: acquiring a first facial image to be recognized, uploaded by a terminal; using a pre-trained convolutional neural network model to perform face detection on said first facial image, to obtain position information of key facial points in said first facial image; using the position information of key facial points to perform face correction on said first facial image, to obtain a second facial image to be recognized; inputting said second facial image into a pre-trained facial action unit recognition model, and processing same by means of a main network part of the facial action unit recognition model, an attention mechanism and a fully-connected layer, so as to obtain a facial action unit recognition result of said first facial image; and outputting to the terminal the facial action unit recognition result of said first facial image. Said method is beneficial for improving the accuracy of recognition of a facial action unit in a facial image.

Description

面部动作单元识别方法、装置、电子设备及存储介质Facial action unit recognition method, device, electronic equipment and storage medium
本申请要求于2020年4月3日提交中国专利局、申请号为202010262740.8,发明名称为“面部动作单元识别方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 3, 2020, the application number is 202010262740.8, and the invention title is "Facial Action Unit Recognition Method, Apparatus, Electronic Equipment, and Storage Medium". The entire content is approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及计算机视觉技术领域,尤其涉及一种面部动作单元识别方法、装置、电子设备及存储介质。This application relates to the field of computer vision technology, and in particular to a method, device, electronic device, and storage medium for facial motion unit recognition.
背景技术Background technique
人脸表情识别、人脸情绪分析等是当前计算机视觉研究的热门领域,而这些研究的结果在不同程度上都依赖于面部动作单元(Action Units,AU)的识别准确率。所谓面部动作单元是指识别眨眼、皱眉、嘟嘴等面部特定部位的肌肉动作是否出现,随着计算机信息技术的发展,深度学习在面部动作单元的识别中有着广泛的应用,即通过构建网络模型进行识别,但是,发明人意识到现有的面部动作单元识别模型大多支持的面部动作单元数量较少,且在人脸细微表情变化的描述上较为粗略,另外,当图片中的人脸处在不同的旋转角度时,或者图片中存在不影响脸部的干扰信息时,再或者图片的某些属性被改变时,都将使面部动作单元识别模型的输出受到影响,从而导致识别的准确率较低。Facial expression recognition and facial emotion analysis are currently popular areas of computer vision research, and the results of these studies depend on the recognition accuracy of facial action units (AU) to varying degrees. The so-called facial action unit refers to the recognition of whether the muscle actions of specific parts of the face such as blinking, frowning, and beating mouth appear. With the development of computer information technology, deep learning has a wide range of applications in the recognition of facial action units, that is, by constructing a network model However, the inventor realized that most of the existing facial motion unit recognition models support a small number of facial motion units, and the description of subtle facial expression changes is relatively rough. In addition, when the face in the picture is When different rotation angles, or when there is interference information in the picture that does not affect the face, or when some attributes of the picture are changed, the output of the facial action unit recognition model will be affected, resulting in a lower recognition accuracy. Low.
发明内容Summary of the invention
本申请实施例提供了一种面部动作单元识别方法、装置、电子设备及存储介质,有利于提高人脸图像中面部动作单元识别的准确率。The embodiments of the present application provide a method, a device, an electronic device, and a storage medium for identifying a facial action unit, which are beneficial to improve the accuracy of facial action unit recognition in a face image.
第一方面,本申请实施例提供了一种面部动作单元识别方法,该方法包括:In the first aspect, an embodiment of the present application provides a facial motion unit recognition method, which includes:
获取终端上传的第一待识别人脸图像;Acquiring the first face image to be recognized uploaded by the terminal;
采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;Performing face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized According to the recognition result of the facial action unit of the face image, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。Outputting the facial action unit recognition result of the first facial image to be recognized to the terminal.
第二方面,本申请实施例提供了一种面部动作单元识别装置,该装置包括:In a second aspect, an embodiment of the present application provides a facial motion unit recognition device, which includes:
图像获取模块,用于获取终端上传的第一待识别人脸图像;The image acquisition module is used to acquire the first face image to be recognized uploaded by the terminal;
人脸检测模块,用于采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;A face detection module, configured to use a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
人脸矫正模块,用于利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;A face correction module, configured to perform face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
面部动作单元识别模块,用于将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The facial motion unit recognition module is used to input the second facial image to be recognized into a pre-trained facial motion unit recognition model, and then process the main network part, attention mechanism and fully connected layer of the facial motion unit recognition model , Obtaining the facial action unit recognition result of the first face image to be recognized, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is composed of a deep residual network and a deep dense network Stacked
识别结果输出模块,用于向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。The recognition result output module is configured to output the facial action unit recognition result of the first face image to be recognized to the terminal.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括输入设备和输出设备,还包括处理器,适于实现一条或多条指令;以及,计算机可读存储介质,所述计算机可读存储介质存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行如下步骤:In a third aspect, an embodiment of the present application provides an electronic device that includes an input device and an output device, and also includes a processor, which is adapted to implement one or more instructions; and, a computer-readable storage medium. The readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by the processor and executing the following steps:
获取终端上传的第一待识别人脸图像;Acquiring the first face image to be recognized uploaded by the terminal;
采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;Performing face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized According to the recognition result of the facial action unit of the face image, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。Outputting the facial action unit recognition result of the first facial image to be recognized to the terminal.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行如下步骤:In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded by a processor and executing the following steps :
获取终端上传的第一待识别人脸图像;Acquiring the first face image to be recognized uploaded by the terminal;
采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;Performing face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized According to the recognition result of the facial action unit of the face image, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。Outputting the facial action unit recognition result of the first facial image to be recognized to the terminal.
本申请实施例中,在终端输入第一待识别人脸图像时,首先获取第一待识别人脸图像的人脸关键点的位置信息,利用该位置信息对第一待识别人脸图像中的人脸进行矫正,以将其摆正,然后将人脸摆正的第二待识别人脸图像输入由主体网络部分、注意力机制模块和全连接层构成的面部动作单元识别模型进行识别,得到的面部动作单元识别结果相比现有技术更为准确。In this embodiment of the application, when the terminal inputs the first face image to be recognized, the location information of the key points of the face of the first face image to be recognized is first obtained, and the location information is used to compare the position information of the face image in the first face image to be recognized. The face is corrected to straighten it, and then the second face image to be recognized is input into the facial action unit recognition model composed of the main network part, the attention mechanism module and the fully connected layer for recognition. The facial action unit recognition result is more accurate than the prior art.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的一种网络架构图;Figure 1 is a network architecture diagram provided by an embodiment of the application;
图2a为申请实施例提供的一种获取人脸图像的示例图;Fig. 2a is an example diagram of obtaining a face image provided by an application embodiment;
图2b为申请实施例提供的另一种获取人脸图像的示例图;Fig. 2b is another example diagram of obtaining a face image provided by the application embodiment;
图3为本申请实施例提供的一种面部动作单元识别方法的流程示意图;FIG. 3 is a schematic flowchart of a facial motion unit recognition method provided by an embodiment of this application;
图4为本申请实施例提供的一种卷积神经网络模型的结构示意图;4 is a schematic structural diagram of a convolutional neural network model provided by an embodiment of the application;
图5为本申请实施例提供的一种面部动作单元识别模型的结构示意图;5 is a schematic structural diagram of a facial action unit recognition model provided by an embodiment of the application;
图6为本申请实施例提供的一种深度残差密集网络的结构示意图;6 is a schematic structural diagram of a deep residual dense network provided by an embodiment of this application;
图7为本申请实施例提供的另一种面部动作单元识别方法的流程示意图;FIG. 7 is a schematic flowchart of another facial motion unit recognition method provided by an embodiment of this application;
图8为本申请实施例提供的一种面部动作单元识别装置的结构示意图;FIG. 8 is a schematic structural diagram of a facial action unit recognition device provided by an embodiment of the application;
图9为本申请实施例提供的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请实施例提出一种面部动作单元识别方案,可应用于办理业务(例如:贷款业务、保险业务)时的面审或客户表情分析、心理活动分析等众多场景中,该方案中采用的面部动作单元识别模型结合了深度残差网络和深度密集网络,保证了能够学习到高阶特征,从而能够提高对终端输入的人脸图像进行面部动作单元识别的准确率,同时,由于在低阶特征阶段,人脸的面部动作单元的特征相似,如果对不同的面部动作单元分别训练模型,则会产生大量重复工作,本方案在高阶特征阶段对面部动作单元识别模型进行分支,仅需训练一个模型便能识别出39个面部动作单元,能够降低面部动作单元识别模型部署在设备上的难度,并提高模型的运行速度。具体的,该方案可基于图1所示的网络架构进行实施,如图1所示,该网络架构至少包括终端和服务器,终端和服务器之间通过网络进行通信,该网络包括但不限于虚拟专用网络、局域网络、城域网络,该终端可直接进行人脸图像采集,也可以依赖于外部图像采集工具进行人脸图像之后从外部图像采集工具获取人脸图像,终端可以是手机、平板、笔记本电脑、掌上电脑等设备。在本申请的一些实施例中,如图2a所示,终端可以在检测到人脸时自动完成人脸图像的采集,随即将采集的人脸图像发送至服务器,在本申请的一些实施例中,如图2b所示,终端也可以在屏幕上的控件被触发后才开始采集人脸图像,然后将采集的人脸图像发送至服务器,该控件可以是以固定的形式出现,也可以是以悬浮的形式出现,触发方式可以是轻触、长按、滑动等等,在此不作限定。服务器在获取到终端发送的人脸图像后,由处理器执行人脸关键点检测、人脸矫正、调用面部动作单元识别模型进行面部动作单元识别等一系列操作,最后将识别结果输出至终端向用户展示。服务器可以是单台服务器,也可以是服务器集群,还可以是云端服务器,是整个面部动作单元识别方案的执行主体,由此可见,图1所示的网络架构能够使本方案得以实施,当然,该网络架构中还可以包括更多的组成部分,例如:数据库等。The embodiment of the application proposes a facial action unit recognition scheme, which can be applied to face-to-face audits, customer expression analysis, and psychological activity analysis when handling business (for example, loan business, insurance business), etc. The face used in the scheme The action unit recognition model combines a deep residual network and a deep dense network to ensure that high-level features can be learned, thereby improving the accuracy of facial action unit recognition on the face image input by the terminal. At the same time, due to the low-level features In the stage, the features of the facial action units of the face are similar. If the models are trained separately for different facial action units, a lot of repetitive work will be generated. This solution branches the facial action unit recognition model in the high-level feature stage, and only needs to train one. The model can identify 39 facial motion units, which can reduce the difficulty of deploying the facial motion unit recognition model on the device and increase the running speed of the model. Specifically, the solution can be implemented based on the network architecture shown in Figure 1. As shown in Figure 1, the network architecture includes at least a terminal and a server. The terminal and the server communicate through a network, which includes but is not limited to virtual private Network, local area network, metropolitan area network, the terminal can directly collect face images, or it can rely on external image collection tools for face images and then obtain face images from external image collection tools. The terminals can be mobile phones, tablets, and laptops. Computers, handheld computers and other equipment. In some embodiments of the present application, as shown in FIG. 2a, the terminal can automatically complete the collection of face images when a face is detected, and then send the collected face images to the server. In some embodiments of the present application, As shown in Figure 2b, the terminal can also start collecting face images after the controls on the screen are triggered, and then send the collected face images to the server. The controls can appear in a fixed form or It appears in the form of floating, and the triggering method can be light touch, long press, slide, etc., which is not limited here. After the server obtains the face image sent by the terminal, the processor performs a series of operations such as face key point detection, face correction, and calling the facial motion unit recognition model for facial motion unit recognition, and finally outputs the recognition result to the terminal. User presentation. The server can be a single server, a server cluster, or a cloud server. It is the main body of execution of the entire facial motion unit recognition solution. It can be seen that the network architecture shown in Figure 1 can enable this solution to be implemented. Of course, The network architecture can also include more components, such as a database and so on.
请参见图3,图3为本申请实施例提供的一种面部动作单元识别方法的流程示意图,如图3所示,包括步骤S31-S35:Please refer to FIG. 3. FIG. 3 is a schematic flowchart of a facial motion unit recognition method according to an embodiment of the application. As shown in FIG. 3, it includes steps S31-S35:
S31,获取终端上传的第一待识别人脸图像。S31: Acquire the first face image to be recognized uploaded by the terminal.
本申请具体实施例中,第一待识别人脸图像即终端上传的未经人脸检测、人脸矫正的原始人脸图像,其可以是国内外任一开源数据库中的人脸图像,也可以是银行、保险公司、通信公司等办理业务时采集的客户的人脸图像,或者还可以是小区、商场等任意监控区域的监控设备采集的图像。In the specific embodiment of this application, the first face image to be recognized is the original face image uploaded by the terminal without face detection and face correction. It can be a face image in any open source database at home and abroad, or It is a customer's face image collected when a bank, insurance company, communication company, etc. handle business, or it can also be an image collected by monitoring equipment in any monitoring area such as a residential area or a shopping mall.
S32,采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息。S32, using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized.
本申请具体实施例中,人脸关键点即检测出的人脸中两个眼睛、鼻子、左右两侧嘴角五个关键点,位置信息即关键点的坐标,例如:两个眼部椭圆中心点的坐标、鼻尖的坐标、左右两侧嘴角的坐标。In the specific embodiment of the present application, the key points of the face are the five key points of the two eyes, the nose, and the corners of the mouth on the left and right sides of the detected face, and the position information is the coordinates of the key points, for example: the center points of the two eye ellipses The coordinates of the nose, the coordinates of the nose, the coordinates of the corners of the mouth on the left and right sides.
预训练的卷积神经网络模型指多任务卷积神经网络(Multi-task Cascaded Convolutional Networks,MTCNN),如图4所示,采用三层级联架构结合卷积神经网络算法进行人脸检测及关键点定位,包括神经网络P-Net、R-Net和O-Net,第一待识别人脸图像首先输入P-Net进行识别,P-Net的输出作为R-Net的输入,同时,R-Net的输出作为O-Net的输入,每个 网络的输入尺寸均不相同,P-Net的输入尺寸为12*12*3,R-Net的输入尺寸为24*24*3,O-Net的输入尺寸为48*48*3,P-Net中的处理主要为3*3的卷积和2*2的池化,R-Net中的处理主要为3*3的卷积、3*3的池化和2*2的池化,O-Net中的处理比R-Net3*3的卷积和2*2的池化,每个网络后通过一个人脸分类器判断该区域是否是人脸,同时使用边框回归和一个关键点定位器来进行人脸区域的检测。具体的,多任务卷积神经网络的处理过程为:将第一待识别人脸图像输入P-Net进行识别得到第一候选窗体和边界回归框,根据所述边界回归框对所述第一候选窗体进行校准,采用非极大值抑制去除校准后重叠的所述第一候选窗体,得到第二候选窗体;将所述第二候选窗体输入R-Net进行识别,过滤掉虚假的第二候选窗体,得到第三候选窗体;将所述第三窗体输入O-Net进行识别,通过边界框回归输出人脸区域,以及通过关键点定位输出第一待识别人脸图像中人脸关键点的位置信息。需要说明的是,P-Net中并未采用全连接,而R-Net和O-Net中分别采用了128通道和256通道的全连接,且O-Net比R-Net多一层卷积处理。The pre-trained convolutional neural network model refers to Multi-task Cascaded Convolutional Networks (MTCNN), as shown in Figure 4, using a three-layer cascade architecture combined with a convolutional neural network algorithm for face detection and key points Positioning includes neural networks P-Net, R-Net and O-Net. The first face image to be recognized is first input to P-Net for recognition, and the output of P-Net is used as the input of R-Net. At the same time, R-Net’s The output is used as the input of O-Net, the input size of each network is different, the input size of P-Net is 12*12*3, the input size of R-Net is 24*24*3, and the input size of O-Net 48*48*3, the processing in P-Net is mainly 3*3 convolution and 2*2 pooling, and the processing in R-Net is mainly 3*3 convolution and 3*3 pooling And 2*2 pooling, the processing in O-Net is better than R-Net 3*3 convolution and 2*2 pooling. After each network, a face classifier is used to determine whether the area is a face. At the same time Use border regression and a key point locator to detect the face area. Specifically, the processing process of the multi-task convolutional neural network is: input the first face image to be recognized into P-Net for recognition to obtain the first candidate window and the bounding regression box, and compare the first candidate form and the bounding regression box according to the bounding regression box. The candidate window is calibrated, and non-maximum value suppression is used to remove the overlapped first candidate window after calibration to obtain a second candidate window; the second candidate window is input into R-Net for identification, and falsehoods are filtered out Obtain the third candidate window of the second candidate window; input the third window into O-Net for recognition, output the face area through the bounding box regression, and output the first face image to be recognized through key point positioning Location information of key points in the face. It should be noted that P-Net does not use full connection, while R-Net and O-Net use 128 channels and 256 channels respectively, and O-Net has one more layer of convolution processing than R-Net. .
S33,利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像。S33: Perform face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized.
本申请具体实施例中,第二待识别人脸图像即对第一待识别人脸图像进行人脸矫正后得到的摆正的人脸图像,其中,人脸矫正涉及到缩放、旋转与平移等操作,在使用MTCNN得到第一待识别人脸图像中人脸关键点的位置信息后,获取预先存储的标准人脸图像中人脸关键点的位置信息,所谓标准人脸图像即指图像中的人脸为正面且头部不存在转动、不需要矫正的人脸,标准人脸图像中人脸关键点的位置信息(坐标信息)预先已经获取到,存储在预设数据库中,将第一待识别人脸图像中人脸关键点的位置信息与标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H,根据以下相似变换矩阵方程求解相似变换矩阵H:In the specific embodiment of the present application, the second face image to be recognized is a straightened face image obtained after face correction is performed on the first face image to be recognized, where face correction involves zooming, rotation, and translation, etc. Operation, after using MTCNN to obtain the position information of the key points of the face in the first face image to be recognized, obtain the position information of the key points of the face in the pre-stored standard face image. The so-called standard face image refers to the image The face is front and the head does not have a face that does not need to be corrected. The position information (coordinate information) of the key points of the face in the standard face image has been obtained in advance and stored in the preset database. The position information of the key points of the face in the recognition face image is compared with the position information of the key points of the face in the standard face image, and the similarity transformation matrix H is obtained, and the similarity transformation matrix H is solved according to the following similarity transformation matrix equation:
Figure PCTCN2020092805-appb-000001
Figure PCTCN2020092805-appb-000001
之后,将第一待识别人脸图像中每个像素点的位置信息与求解后得到的相似变换矩阵H相乘,得到人脸摆正的第二待识别人脸图像。上述相似变换矩阵方程中,(x,y)表示第一待识别人脸图像中人脸关键点的位置信息,(x’,y’)表示标准人脸图像中人脸关键点的位置信息,
Figure PCTCN2020092805-appb-000002
即为相似变换矩阵H,s表示缩放因子,θ表示旋转角度,通常是逆时针旋转,(t x,t y)表示平移参数,具体可采用transform.SimilarityTransform函数对相似变换矩阵H进行迭代求解,该函数来自于python sklearn库(一个机器学习库)。
After that, the position information of each pixel in the first face image to be recognized is multiplied by the similar transformation matrix H obtained after the solution, to obtain the second face image to be recognized with the face straightened. In the above-mentioned similarity transformation matrix equation, (x, y) represents the location information of the key points of the face in the first face image to be recognized, (x', y') represents the location information of the key points of the face in the standard face image,
Figure PCTCN2020092805-appb-000002
It is the similarity transformation matrix H, s represents the scaling factor, θ represents the rotation angle, usually counterclockwise rotation, and (t x , t y ) represents the translation parameter. Specifically, the transformation.SimilarityTransform function can be used to iteratively solve the similarity transformation matrix H. This function comes from the python sklearn library (a machine learning library).
S34,将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成。S34. Input the second face image to be recognized into a pre-trained facial action unit recognition model, and process the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model to obtain the first According to the recognition result of the facial action unit of the face image to be recognized, the main body network part includes a plurality of deep residual dense networks, and each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network.
S35,向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。S35. Output a facial action unit recognition result of the first facial image to be recognized to the terminal.
本申请具体实施例中,面部动作单元识别模型的结构如图5所示,主要包括主体网络 部分、注意力机制模块和最后接的全连接层,模型的输入为RGB格式的彩色图像,即输入的图像深度为3,模型的识别结果为39个面部动作单元出现的概率值,大于等于0.5则表示该面部动作单元出现,小于0.5则表示该面部动作单元未出现,例如:输出AU45(眨眼)的值为0.8,AU04(皱眉)的值为0.3,则表示输入图像中的人脸出现了AU45,而没有出现AU04。In the specific embodiment of this application, the structure of the facial action unit recognition model is shown in Figure 5, which mainly includes the main body network part, the attention mechanism module and the last fully connected layer. The input of the model is a color image in RGB format, that is, the input The image depth of is 3, and the recognition result of the model is the probability value of the appearance of 39 facial action units. If the facial action unit is greater than or equal to 0.5, it means that the facial action unit appears, and if it is less than 0.5, it means that the facial action unit does not appear. For example, output AU45 (blink) The value of is 0.8, and the value of AU04 (frowning) is 0.3, which means that the face in the input image has AU45 but not AU04.
具体的,上述将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果包括:Specifically, the aforementioned second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism, and fully connected layer of the facial action unit recognition model are processed to obtain the The recognition result of the facial action unit of the first face image to be recognized includes:
将第二待识别人脸图像输入预训练的面部动作单元识别模型的主体网络部分,经过多个深度残差密集网络进行特征提取,得到高阶特征图;利用面部动作单元识别模型的注意力机制对该高阶特征图进行最大池化和平均池化操作,得到宽、高与高阶特征图相同,深度为1的第一特征图和第二特征图;在深度方向将第一特征图和第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;将该第三特征图的宽、高与高阶特征图的宽、高相乘,得到一目标特征图,将该目标特征图作为全连接层的输入,由全连接层进行二分类,最终输出第一待识别人脸图像的面部动作单元识别结果。Input the second face image to be recognized into the main network part of the pre-trained facial action unit recognition model, and then perform feature extraction through multiple deep residual dense networks to obtain high-level feature maps; use the attention mechanism of the facial action unit recognition model Perform maximum pooling and average pooling operations on this high-level feature map to obtain a first feature map and a second feature map with the same width, height and high-order feature map, and a depth of 1. The first feature map and the second feature map are combined in the depth direction The second feature map is spliced, and the spliced feature map is convolved 1*1 to obtain the third feature map; the width and height of the third feature map are multiplied by the width and height of the higher-order feature map to obtain A target feature map, the target feature map is used as the input of the fully connected layer, and the fully connected layer performs two classifications, and finally outputs the facial action unit recognition result of the first face image to be recognized.
面部动作单元识别模型的主体网络部分由四个深度残差密集网络构成,总共有92个隐藏层,如图6所示,每个深度残差密集网络由深度残差模块和深度密集模块堆叠而成,一个深度残差密集网络从1*1的卷积层开始,后接3*3的卷积层,在最后一个1*1的卷积层后分为两部分,一部分按照对应宽、高相加的方式接入深度残差模块,利用残差网络的特性,使得学习到的好的特征不会随着网络的加深而被遗忘掉,例如:将第二隐藏层得到的特征的宽和高两个维度与第五隐藏层得到的特征的宽和高相加,深度这一维度保持不变,另一部分与深度密集模块的路径连接,例如:将第二隐藏层得到的特征的深度这一维度与第五隐藏层得到的特征的深度进行拼接,保持高阶特征的多样性,例如:深度为20和30的两个特征,拼接后的特征深度就为50,而宽和高不变。需要说明的是,主体网络部分采用深度残差网络与深度密集网络相结合的结构,与现有技术中只使用深度残差网络相比,更有利于保持高阶特征的多样性,进而更有利于准确识别出39个面部动作单元。The main network part of the facial action unit recognition model is composed of four deep residual dense networks, with a total of 92 hidden layers. As shown in Figure 6, each deep residual dense network is stacked by a deep residual module and a deep dense module. A deep residual dense network starts with a 1*1 convolutional layer, followed by a 3*3 convolutional layer, and is divided into two parts after the last 1*1 convolutional layer, one part is divided into two parts according to the corresponding width and height Access to the deep residual module in an additive manner, using the characteristics of the residual network, so that the learned good features will not be forgotten as the network deepens, for example: the width and sum of the features obtained by the second hidden layer The two dimensions of height are added to the width and height of the feature obtained by the fifth hidden layer, the depth dimension remains unchanged, and the other part is connected to the path of the depth-intensive module, for example: the depth of the feature obtained by the second hidden layer is this One dimension is stitched with the depth of the features obtained by the fifth hidden layer to maintain the diversity of higher-order features. For example, for two features with depths of 20 and 30, the stitched feature depth will be 50, while the width and height remain unchanged . It should be noted that the main network part adopts a combination of deep residual network and deep dense network. Compared with the prior art only using deep residual network, it is more conducive to maintaining the diversity of high-order features, and more Conducive to accurately identify 39 facial action units.
另外,注意力机制模块的作用是为主体网络部分提取到的高阶特征赋予权重,使这些高阶特征重新组合,其采用最大池化、平均池化和1*1卷积相结合的方式,其输入为主体网络部分的输出,经过最大池化和平均池化处理,得到两个宽、高与输入的特征相同,深度为1的特征图,即第一特征图和第二特征图。在深度上对这两个特征图进行拼接,经过1*1卷积的卷积得到注意力机制模块的输出特征图,即第三特征图,将该输出特征图的宽、高与注意力机制模块的输入特征图(即高阶特征图)对应的宽、高相乘得到全连接层的输入特征图,即目标特诊图,将该目标特征图输入全连接层进行矩阵相乘,得到39个面部动作单元的二分类概率值,最后向终端输出39个面部动作单元的二分类概率值,对第一待识别图像的面部动作单元识别结果进行展示。此处,使用不同尺度的最大池化和平均池化进行处理有利于捕捉到不同尺度的特征信息,着重于获取宽、高两个维度的权重,能明确输入的人脸哪个位置的特征信息更有利于面部动作单元的识别。In addition, the function of the attention mechanism module is to give weights to the high-order features extracted by the main network part, so that these high-order features can be recombined. It uses a combination of maximum pooling, average pooling and 1*1 convolution. Its input is the output of the main network part. After maximum pooling and average pooling, two feature maps with the same width and height as the input feature and depth of 1 are obtained, namely the first feature map and the second feature map. The two feature maps are stitched in depth, and the output feature map of the attention mechanism module is obtained through the convolution of 1*1 convolution, that is, the third feature map. The width and height of the output feature map are compared with the attention mechanism Multiply the width and height corresponding to the input feature map of the module (ie, high-order feature map) to obtain the input feature map of the fully connected layer, that is, the target special diagnosis map. Input the target feature map into the fully connected layer for matrix multiplication, and get 39 The two-class probability values of three facial action units, and finally the two-class probability values of 39 facial action units are output to the terminal, and the recognition results of the facial action units of the first image to be recognized are displayed. Here, the use of different scales of maximum pooling and average pooling for processing is conducive to capturing feature information of different scales, focusing on obtaining the weights of the width and height dimensions, and clarifying which position of the input face has more feature information. Conducive to the recognition of facial action units.
本申请实施例中,在终端输入第一待识别人脸图像时,首先获取第一待识别人脸图像的人脸关键点的位置信息,利用该位置信息对第一待识别人脸图像中的人脸进行矫正,以将其摆正,然后将人脸摆正的第二待识别人脸图像输入由主体网络部分、注意力机制模块和全连接层构成的面部动作单元识别模型进行识别,得到的面部动作单元识别结果相比现有技术更为准确。In this embodiment of the application, when the terminal inputs the first face image to be recognized, the location information of the key points of the face of the first face image to be recognized is first obtained, and the location information is used to compare the position information of the face image in the first face image to be recognized. The face is corrected to straighten it, and then the second face image to be recognized is input into the facial action unit recognition model composed of the main network part, the attention mechanism module and the fully connected layer for recognition. The facial action unit recognition result is more accurate than the prior art.
请参见图7,图7为本申请实施例提供的另一种面部动作单元识别方法的流程示意图,如图7所示,包括步骤S71-S76:Please refer to FIG. 7. FIG. 7 is a schematic flowchart of another facial motion unit recognition method provided by an embodiment of the application. As shown in FIG. 7, it includes steps S71-S76:
S71,获取终端上传的第一待识别人脸图像;S71: Obtain the first face image to be recognized uploaded by the terminal;
S72,采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;S72, using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
S73,从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;S73: Obtain the pre-stored position information of the key points of the face in the standard face image from the database;
S74,根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;S74: Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image. Obtain the second face image to be recognized;
在一种可能的实施方式中,上述根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像,包括:In a possible implementation manner, the above-mentioned comparison of the position information of the key points of the face in the first face image to be recognized and the position information of the key points of the face in the standard face image is performed on the first face to be recognized. Perform face correction on the face image to obtain the second face image to be recognized, including:
将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
该实施方式中,利用MTCNN进行人脸矫正,在第一待识别人脸图像中人脸转动不同角度时模型都能准确判断,保障了模型的稳定性。In this embodiment, MTCNN is used to perform face correction, and the model can accurately judge when the face rotates at different angles in the first face image to be recognized, which ensures the stability of the model.
S75,将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;S75. Input the second face image to be recognized into a pre-trained facial action unit recognition model, and process the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model to obtain the first According to the recognition result of the facial action unit of the face image to be recognized, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
在一种可能的实施方式中,上述将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,包括:In a possible implementation manner, the aforementioned second face image to be recognized is input into a pre-trained facial action unit recognition model, and passes through the main body network part, attention mechanism, and fully connected layer of the facial action unit recognition model The processing to obtain the facial action unit recognition result of the first face image to be recognized includes:
将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
利用所述面部动作单元识别模型的注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到第一特征图和第二特征图;Using the attention mechanism of the facial action unit recognition model to perform maximum pooling and average pooling operations on the high-level feature map to obtain a first feature map and a second feature map;
根据所述第一特征图和所述第二特征图得到目标特征图。Obtain a target feature map according to the first feature map and the second feature map.
其中,上述根据所述第一特征图和所述第二特征图得到目标特征图,包括:Wherein, obtaining the target feature map according to the first feature map and the second feature map described above includes:
在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
根据所述高阶特征图和所述第三特征图得到目标特征图;Obtaining a target feature map according to the high-level feature map and the third feature map;
将所述目标特征图输入所述面部动作单元识别模型的全连接层进行二分类,输出所述第一待识别人脸图像的面部动作单元识别结果。The target feature map is input into the fully connected layer of the facial action unit recognition model to perform two classifications, and the facial action unit recognition result of the first face image to be recognized is output.
其中,上述将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图,包括:Wherein, the above-mentioned inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map includes:
将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
该实施方式中,面部动作单元识别模型的主体网络部分采用深度残差网络和深度密集网络堆叠而成,保证了学习到更高阶的特征,加上最大池化、平均池化和1*1卷积的注意力机制模块,有利于删除冗余特征,提高了39个面部动作单元的识别准确度。In this embodiment, the main network part of the facial action unit recognition model is formed by stacking deep residual networks and deep dense networks to ensure that higher-order features are learned, plus maximum pooling, average pooling, and 1*1 The convolutional attention mechanism module helps to delete redundant features and improves the recognition accuracy of 39 facial action units.
S76,向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。S76: Output the facial action unit recognition result of the first facial image to be recognized to the terminal.
需要说明的是,上述步骤S71-S76的具体实施方式在图3所示的实施例中已有详细描述,且能达到相同或相似的有益效果,此处不再赘述。It should be noted that the specific implementations of the above steps S71-S76 have been described in detail in the embodiment shown in FIG. 3, and can achieve the same or similar beneficial effects, and will not be repeated here.
本申请还提供一种面部动作单元识别装置,所述面部动作单元识别装置可以是运行于终端中的一个计算机程序(包括程序代码)。该面部动作单元识别装置可以执行图3或图7所示的方法。请参见图8,该装置包括:The present application also provides a facial motion unit recognition device. The facial motion unit recognition device may be a computer program (including program code) running in a terminal. The facial motion unit recognition device can execute the method shown in FIG. 3 or FIG. 7. See Figure 8. The device includes:
图像获取模块81,用于获取终端上传的第一待识别人脸图像;The image acquisition module 81 is configured to acquire the first face image to be recognized uploaded by the terminal;
人脸检测模块82,用于采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;The face detection module 82 is configured to use a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized to obtain position information of key points of the face in the first face image to be recognized ;
人脸矫正模块83,用于利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;The face correction module 83 is configured to use the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second face image to be recognized;
面部动作单元识别模块84,用于将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The facial action unit recognition module 84 is used to input the second face image to be recognized into a pre-trained facial action unit recognition model, and pass through the main network part, attention mechanism and fully connected layer of the facial action unit recognition model Processing to obtain the facial action unit recognition result of the first face image to be recognized, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is composed of a deep residual network and a deep dense Network stacking;
识别结果输出模块85,用于向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。The recognition result output module 85 is configured to output the facial action unit recognition result of the first facial image to be recognized to the terminal.
在一个实施例中,在利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像方面,人脸矫正模块83具体用于:In one embodiment, in terms of using the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain the second face image to be recognized, the face correction module 83 is specifically configured to :
从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;Obtain the position information of the key points of the face in the pre-stored standard face image from the database;
根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image to obtain the result The second face image to be recognized.
在一个实施例中,在根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像方面,人脸矫正模块83具体用于:In one embodiment, according to the position information of the key points of the face in the first face image to be recognized and the position information of the key points of the face in the standard face image, compare the first face image to be recognized In terms of performing face correction to obtain the second face image to be recognized, the face correction module 83 is specifically configured to:
将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
在一个实施例中,在将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果方面,面部动作单元识别模块84具体用于:In one embodiment, when the second face image to be recognized is input into a pre-trained facial action unit recognition model, the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed, In terms of obtaining the facial motion unit recognition result of the first face image to be recognized, the facial motion unit recognition module 84 is specifically configured to:
将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
利用所述注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到宽、高与所述高阶特征图相同,深度为1的第一特征图和第二特征图;Using the attention mechanism to perform maximum pooling and average pooling operations on the high-order feature map, to obtain a first feature map and a second feature map with the same width and height as the high-order feature map and a depth of 1;
根据所述第一特征图和所述第二特征图得到目标特征图,将所述目标特征图输入所述全连接层进行二分类,得到所述第一待识别人脸图像的面部动作单元识别结果。Obtain a target feature map according to the first feature map and the second feature map, and input the target feature map into the fully connected layer for two-class classification to obtain the facial action unit recognition of the first face image to be recognized result.
在一个实施例中,在根据所述第一特征图和所述第二特征图得到目标特征图方面,面部动作单元识别模块84具体用于:In one embodiment, in terms of obtaining the target feature map according to the first feature map and the second feature map, the facial action unit recognition module 84 is specifically configured to:
在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
将所述第三特征图的宽、高与所述高阶特征图的宽、高对应相乘得到所述目标特征图。The target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
在一个实施例中,在将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图方面,面部动作单元识别模块84具体用于:In one embodiment, in terms of inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map, the facial action unit recognition module 84 is specifically configured to:
将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
根据本申请的一个实施例,图8所示的面部动作单元识别装置的各个模块可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)模块还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,面部动作单元识别装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。According to an embodiment of the present application, the various modules of the facial motion unit recognition device shown in FIG. 8 can be separately or completely combined into one or several other units to form, or some of the modules can also be disassembled. It is composed of multiple units with smaller functions, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application. The above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit. In other embodiments of the present application, the facial motion unit recognition device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
根据本申请的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图3或图7中所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图8中所示的面部动作单元识别装置设备,以及来实现本申请实施例的面部动作单元识别方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。According to another embodiment of the present application, it can be implemented on a general-purpose computing device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM) and other processing elements and storage elements Run a computer program (including program code) capable of executing the steps involved in the corresponding method shown in FIG. 3 or FIG. 7 to construct the facial motion unit recognition device as shown in FIG. 8 and to implement the present application The facial motion unit recognition method of the embodiment. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into the above-mentioned computing device through the computer-readable recording medium, and run in it.
基于上述方法实施例和装置实施例的描述,请参见图9,图9为本申请实施例提供的一种电子设备的结构示意图,如图9所示,该电子设备至少包括处理器901、输入设备902、输出设备903以及计算机可读存储介质904。其中,电子设备内的处理器901、输入设备902、输出设备903以及计算机可读存储介质904可通过总线或其他方式连接。Based on the description of the foregoing method embodiment and apparatus embodiment, please refer to FIG. 9. FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. A device 902, an output device 903, and a computer-readable storage medium 904. Wherein, the processor 901, the input device 902, the output device 903, and the computer-readable storage medium 904 in the electronic device may be connected by a bus or other methods.
计算机可读存储介质904可以存储在电子设备的存储器中,所述计算机可读存储介质904用于存储计算机程序,所述计算机程序包括程序指令,所述处理器901用于执行所述计算机可读存储介质904存储的程序指令。处理器901(或称CPU(Central Processing Unit,中央处理器))是电子设备的计算核心以及控制核心,其适于实现一条或多条指令,具体适于加载并执行一条或多条指令从而实现相应方法流程或相应功能。The computer-readable storage medium 904 may be stored in the memory of the electronic device. The computer-readable storage medium 904 is used to store a computer program. The computer program includes program instructions. The processor 901 is used to execute the computer-readable Program instructions stored in the storage medium 904. The processor 901 (or CPU (Central Processing Unit, central processing unit)) is the computing core and control core of an electronic device. It is suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve Corresponding method flow or corresponding function.
在一个实施例中,本申请实施例提供的电子设备的处理器901可以用于对获取到的人脸图像进行一系列面部动作单元识别处理:In one embodiment, the processor 901 of the electronic device provided in the embodiment of the present application may be used to perform a series of facial action unit recognition processing on the acquired facial image:
获取终端上传的第一待识别人脸图像;Acquiring the first face image to be recognized uploaded by the terminal;
采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;Performing face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized According to the recognition result of the facial action unit of the face image, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。Outputting the facial action unit recognition result of the first facial image to be recognized to the terminal.
在一种可能的实施方式中,处理器901执行所述利用所述人脸关键点的位置信息对所 述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像,包括:In a possible implementation manner, the processor 901 executes the use of the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second face image to be recognized, including :
从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;Obtain the position information of the key points of the face in the pre-stored standard face image from the database;
根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image to obtain the result The second face image to be recognized.
在一种可能的实施方式中,处理器901执行所述根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像,包括:In a possible implementation manner, the processor 901 performs the comparison of the position information of the key points of the face in the first face image to be recognized and the position information of the key points of the face in the standard face image. The performing face correction on the first face image to be recognized to obtain the second face image to be recognized includes:
将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
在一种可能的实施方式中,处理器901执行所述将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,包括:In a possible implementation manner, the processor 901 executes the input of the second face image to be recognized into the pre-trained facial action unit recognition model, and passes through the main network part and the attention of the facial action unit recognition model. The mechanism and the processing of the fully connected layer to obtain the facial action unit recognition result of the first face image to be recognized includes:
将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
利用所述注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到宽、高与所述高阶特征图相同,深度为1的第一特征图和第二特征图;Using the attention mechanism to perform maximum pooling and average pooling operations on the high-order feature map, to obtain a first feature map and a second feature map with the same width and height as the high-order feature map and a depth of 1;
根据所述第一特征图和所述第二特征图得到目标特征图,将所述目标特征图输入所述全连接层进行二分类,得到所述第一待识别人脸图像的面部动作单元识别结果。Obtain a target feature map according to the first feature map and the second feature map, and input the target feature map into the fully connected layer for two-class classification to obtain the facial action unit recognition of the first face image to be recognized result.
在一种可能的实施方式中,处理器901执行所述根据所述第一特征图和所述第二特征图得到目标特征图,包括:In a possible implementation manner, the execution of the processor 901 to obtain the target feature map according to the first feature map and the second feature map includes:
在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
将所述第三特征图的宽、高与所述高阶特征图的宽、高对应相乘得到所述目标特征图。The target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
在一种可能的实施方式中,处理器901执行所述将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图,包括:In a possible implementation manner, the processor 901 executing the input of the second face image to be recognized into the main network part for feature extraction to obtain a high-level feature map includes:
将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
示例性的,上述电子设备可以是服务器、电脑主机、云端服务器等设备。电子设备可包括但不仅限于处理器901、输入设备902、输出设备903以及计算机可读存储介质904。本领域技术人员可以理解,所述示意图仅仅是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。Exemplarily, the above-mentioned electronic device may be a server, a computer host, a cloud server and other devices. The electronic device may include, but is not limited to, a processor 901, an input device 902, an output device 903, and a computer-readable storage medium 904. Those skilled in the art can understand that the schematic diagram is only an example of the electronic device, and does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different components.
需要说明的是,由于电子设备的处理器901执行计算机程序时实现上述的面部动作单元识别方法中的步骤,因此上述面部动作单元识别方法的实施例均适用于该电子设备,且均能达到相同或相似的有益效果。It should be noted that since the processor 901 of the electronic device executes the computer program to implement the steps in the facial motion unit recognition method described above, the above-mentioned embodiments of the facial motion unit recognition method are all applicable to the electronic device, and can achieve the same Or similar beneficial effects.
本申请实施例还提供了一种计算机可读存储介质(Memory),所述计算机可读存储介质是电子设备中的记忆设备,用于存放程序和数据。可以理解的是,此处的计算机可读存储介质既可以包括终端中的内置存储介质,当然也可以包括终端所支持的扩展存储介质。计算机可读存储介质提供存储空间,该存储空间存储了终端的操作系统。并且,在该存储 空间中还存放了适于被处理器901加载并执行的一条或多条的指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。需要说明的是,此处的计算机可读存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的,还可以是至少一个位于远离前述处理器901的计算机可读存储介质。在一个实施例中,可由处理器901加载并执行计算机可读存储介质中存放的一条或多条指令,以实现上述有关面部动作单元识别方法的相应步骤;具体实现中,计算机可读存储介质中的一条或多条指令由处理器901加载并执行如下步骤:The embodiment of the present application also provides a computer-readable storage medium (Memory). The computer-readable storage medium is a memory device in an electronic device for storing programs and data. It can be understood that the computer-readable storage medium herein may include a built-in storage medium in the terminal, and of course, may also include an extended storage medium supported by the terminal. The computer-readable storage medium provides storage space, and the storage space stores the operating system of the terminal. In addition, one or more instructions suitable for being loaded and executed by the processor 901 are stored in the storage space, and these instructions may be one or more computer programs (including program codes). It should be noted that the computer-readable storage medium here can be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory; optionally, it can also be at least one located far away The aforementioned processor 901 is a computer-readable storage medium. In one embodiment, the processor 901 can load and execute one or more instructions stored in a computer-readable storage medium to implement the corresponding steps of the above-mentioned facial motion unit recognition method; in a specific implementation, the computer-readable storage medium One or more instructions of is loaded by the processor 901 and executes the following steps:
获取终端上传的第一待识别人脸图像;Acquiring the first face image to be recognized uploaded by the terminal;
采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;Performing face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized According to the recognition result of the facial action unit of the face image, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。Outputting the facial action unit recognition result of the first facial image to be recognized to the terminal.
在一种示例中,计算机可读存储介质中的一条或多条指令由处理器901加载时还执行如下步骤:In an example, when one or more instructions in the computer-readable storage medium are loaded by the processor 901, the following steps are also executed:
从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;Obtain the position information of the key points of the face in the pre-stored standard face image from the database;
根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image to obtain the result The second face image to be recognized.
在一种示例中,计算机可读存储介质中的一条或多条指令由处理器901加载时还执行如下步骤:In an example, when one or more instructions in the computer-readable storage medium are loaded by the processor 901, the following steps are also executed:
将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
在一种示例中,计算机可读存储介质中的一条或多条指令由处理器901加载时还执行如下步骤:In an example, when one or more instructions in the computer-readable storage medium are loaded by the processor 901, the following steps are also executed:
将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
利用所述注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到宽、高与所述高阶特征图相同,深度为1的第一特征图和第二特征图;Using the attention mechanism to perform maximum pooling and average pooling operations on the high-order feature map, to obtain a first feature map and a second feature map with the same width and height as the high-order feature map and a depth of 1;
根据所述第一特征图和所述第二特征图得到目标特征图,将所述目标特征图输入所述全连接层进行二分类,得到所述第一待识别人脸图像的面部动作单元识别结果。Obtain a target feature map according to the first feature map and the second feature map, and input the target feature map into the fully connected layer for two-class classification to obtain the facial action unit recognition of the first face image to be recognized result.
在一种示例中,计算机可读存储介质中的一条或多条指令由处理器901加载时还执行如下步骤:In an example, when one or more instructions in the computer-readable storage medium are loaded by the processor 901, the following steps are also executed:
在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
将所述第三特征图的宽、高与所述高阶特征图的宽、高对应相乘得到所述目标特征图。The target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
在一种示例中,计算机可读存储介质中的一条或多条指令由处理器901加载时还执行如下步骤:In an example, when one or more instructions in the computer-readable storage medium are loaded by the processor 901, the following steps are also executed:
将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络 进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
示例性的,计算机可读存储介质的计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。Exemplarily, the computer program in the computer-readable storage medium includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate form, etc., and the computer-readable storage medium may It is non-volatile or volatile. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
需要说明的是,由于计算机可读存储介质的计算机程序被处理器执行时实现上述的面部动作单元识别方法中的步骤,因此上述面部动作单元识别方法的所有实施例均适用于该计算机可读存储介质,且均能达到相同或相似的有益效果。It should be noted that, since the computer program of the computer-readable storage medium is executed by the processor to realize the steps in the above-mentioned facial motion unit recognition method, all the embodiments of the above-mentioned facial motion unit recognition method are applicable to the computer-readable storage Medium, and can achieve the same or similar beneficial effects.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
以上所揭露的仅为本申请的部分实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于本申请所涵盖的范围。The above-disclosed are only part of the embodiments of this application. Of course, it cannot be used to limit the scope of rights of this application. Those of ordinary skill in the art can understand all or part of the procedures for implementing the above-mentioned embodiments and make them in accordance with the claims of this application. The equivalent change of is still within the scope of this application.

Claims (20)

  1. 一种面部动作单元识别方法,其中,所述方法包括:A method for recognizing facial action units, wherein the method includes:
    获取终端上传的第一待识别人脸图像;Acquiring the first face image to be recognized uploaded by the terminal;
    采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
    利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;Performing face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
    将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The second face image to be recognized is input into a pre-trained facial action unit recognition model, and the main body network part, attention mechanism and fully connected layer of the facial action unit recognition model are processed to obtain the first to be recognized According to the recognition result of the facial action unit of the face image, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network;
    向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。Outputting the facial action unit recognition result of the first facial image to be recognized to the terminal.
  2. 根据权利要求1所述的方法,其中,所述利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像,包括:The method according to claim 1, wherein said using the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second face image to be recognized comprises:
    从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;Obtain the position information of the key points of the face in the pre-stored standard face image from the database;
    根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image to obtain the result The second face image to be recognized.
  3. 根据权利要求2所述的方法,其中,所述根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像,包括:2. The method according to claim 2, wherein the first to-be-recognized face image position information of the face key points and the standard face image position information of the face key points are compared to the first Performing face correction on a face image to be recognized to obtain the second face image to be recognized includes:
    将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
    根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
    将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  4. 根据权利要求1-3任一项所述的方法,其中,所述将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,包括:The method according to any one of claims 1 to 3, wherein the input of the second face image to be recognized into a pre-trained facial action unit recognition model passes through the main network part of the facial action unit recognition model , The attention mechanism and the processing of the fully connected layer to obtain the facial action unit recognition result of the first face image to be recognized includes:
    将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
    利用所述注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到宽、高与所述高阶特征图相同,深度为1的第一特征图和第二特征图;Using the attention mechanism to perform maximum pooling and average pooling operations on the high-order feature map, to obtain a first feature map and a second feature map with the same width and height as the high-order feature map and a depth of 1;
    根据所述第一特征图和所述第二特征图得到目标特征图,将所述目标特征图输入所述全连接层进行二分类,得到所述第一待识别人脸图像的面部动作单元识别结果。Obtain a target feature map according to the first feature map and the second feature map, and input the target feature map into the fully connected layer for two-class classification to obtain the facial action unit recognition of the first face image to be recognized result.
  5. 根据权利要求4所述的方法,其中,所述根据所述第一特征图和所述第二特征图得到目标特征图,包括:The method according to claim 4, wherein the obtaining a target feature map according to the first feature map and the second feature map comprises:
    在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
    将所述第三特征图的宽、高与所述高阶特征图的宽、高对应相乘得到所述目标特征图。The target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
  6. 根据权利要求4所述的方法,其中,所述将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图,包括:The method according to claim 4, wherein the inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-order feature map comprises:
    将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进 行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  7. 一种面部动作单元识别装置,其中,所述装置包括:A facial action unit recognition device, wherein the device includes:
    图像获取模块,用于获取终端上传的第一待识别人脸图像;The image acquisition module is used to acquire the first face image to be recognized uploaded by the terminal;
    人脸检测模块,用于采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;A face detection module, configured to use a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized, to obtain position information of key points of the face in the first face image to be recognized;
    人脸矫正模块,用于利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;A face correction module, configured to perform face correction on the first face image to be recognized by using the position information of the key points of the face to obtain a second face image to be recognized;
    面部动作单元识别模块,用于将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;The facial motion unit recognition module is used to input the second facial image to be recognized into a pre-trained facial motion unit recognition model, and then process the main network part, attention mechanism and fully connected layer of the facial motion unit recognition model , Obtaining the facial action unit recognition result of the first face image to be recognized, the main body network part includes a plurality of deep residual dense networks, each of the deep residual dense networks is composed of a deep residual network and a deep dense network Stacked
    识别结果输出模块,用于向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。The recognition result output module is configured to output the facial action unit recognition result of the first face image to be recognized to the terminal.
  8. 根据权利要求7所述的装置,其中,在利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像方面,人脸矫正模块83具体用于:The device according to claim 7, wherein, in terms of using the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second face image to be recognized, the face correction Module 83 is specifically used for:
    从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;Obtain the position information of the key points of the face in the pre-stored standard face image from the database;
    根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image to obtain the result The second face image to be recognized.
  9. 根据权利要求8所述的装置,其中,在根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像方面,所述人脸矫正模块具体用于:8. The device according to claim 8, wherein the comparison between the position information of the key points of the face in the first face image to be recognized and the position information of the key points of the face in the standard face image is compared to the first In terms of performing face correction on the face image to be recognized, and obtaining the second face image to be recognized, the face correction module is specifically configured to:
    将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
    根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
    将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  10. 根据权利要求7-9任一项所述的装置,其中,在将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果方面,所述面部动作单元识别模块具体用于:The device according to any one of claims 7-9, wherein, after the second face image to be recognized is input into a pre-trained facial action unit recognition model, it passes through the main network part of the facial action unit recognition model, Regarding the processing of the attention mechanism and the fully connected layer to obtain the facial motion unit recognition result of the first face image to be recognized, the facial motion unit recognition module is specifically configured to:
    将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
    利用所述注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到宽、高与所述高阶特征图相同,深度为1的第一特征图和第二特征图;Using the attention mechanism to perform maximum pooling and average pooling operations on the high-order feature map, to obtain a first feature map and a second feature map with the same width and height as the high-order feature map and a depth of 1;
    根据所述第一特征图和所述第二特征图得到目标特征图,将所述目标特征图输入所述全连接层进行二分类,得到所述第一待识别人脸图像的面部动作单元识别结果。Obtain a target feature map according to the first feature map and the second feature map, and input the target feature map into the fully connected layer for two-class classification to obtain the facial action unit recognition of the first face image to be recognized result.
  11. 根据权利要求10所述的装置,其中,在根据所述第一特征图和所述第二特征图得到目标特征图方面,所述面部动作单元识别模块具体用于:The device according to claim 10, wherein, in terms of obtaining the target feature map according to the first feature map and the second feature map, the facial action unit recognition module is specifically configured to:
    在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
    将所述第三特征图的宽、高与所述高阶特征图的宽、高对应相乘得到所述目标特征图。The target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
  12. 根据权利要求10所述的装置,其中,在将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图方面,所述面部动作单元识别模块具体用于:The device according to claim 10, wherein, in terms of inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map, the facial action unit recognition module is specifically configured to:
    将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络 进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  13. 一种电子设备,包括输入设备和输出设备,其中,还包括:An electronic device, including an input device and an output device, which also includes:
    处理器,适于实现一条或多条指令;以及,Processor, suitable for implementing one or more instructions; and,
    计算机可读存储介质,所述计算机可读存储介质存储有一条或多条指令,所述一条或多条指令适于由所述处理器加载并执行:获取终端上传的第一待识别人脸图像;采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。A computer-readable storage medium storing one or more instructions, and the one or more instructions are suitable for being loaded and executed by the processor: acquiring the first face image to be recognized uploaded by the terminal Using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized to obtain position information of key points of the face in the first face image to be recognized; using the face key The position information of the point performs face correction on the first face image to be recognized to obtain a second face image to be recognized; the second face image to be recognized is input into a pre-trained facial action unit recognition model, and all The main body network part of the facial action unit recognition model, the attention mechanism, and the processing of the fully connected layer are processed to obtain the facial action unit recognition result of the first face image to be recognized. The main body network part includes a plurality of deep residual intensive Network, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network; and output to the terminal the recognition result of the facial action unit of the first face image to be recognized.
  14. 根据权利要求13所述的电子设备,其中,所述处理器执行所述利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像,包括:The electronic device according to claim 13, wherein the processor executes the use of the position information of the key points of the face to perform face correction on the first face image to be recognized to obtain a second person to be recognized Face images, including:
    从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;Obtain the position information of the key points of the face in the pre-stored standard face image from the database;
    根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。Perform face correction on the first face image to be recognized according to the location information of the face key points in the first face image to be recognized and the location information of the face key points in the standard face image to obtain the result The second face image to be recognized.
  15. 根据权利要求14所述的电子设备,其中,所述处理器执行所述根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像,包括:The electronic device according to claim 14, wherein the processor executes the calculation based on the position information of the face key points in the first face image to be recognized and the face key points in the standard face image. The position information performs face correction on the first face image to be recognized to obtain the second face image to be recognized, including:
    将所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息进行比对,得到相似变换矩阵H;Comparing the position information of the key points of the face in the first face image to be recognized with the position information of the key points of the face in the standard face image to obtain a similarity transformation matrix H;
    根据预设相似变换矩阵方程求解所述相似变换矩阵H;Solving the similarity transformation matrix H according to a preset similarity transformation matrix equation;
    将所述第一待识别人脸图像中每个像素点的位置信息与求解后得到的所述相似变换矩阵H相乘,得到摆正的所述第二待识别人脸图像。The position information of each pixel in the first face image to be recognized is multiplied by the similarity transformation matrix H obtained after the solution, to obtain the second face image to be recognized that is straightened.
  16. 根据权利要求13-15任一项所述的电子设备,其中,所述处理器执行所述将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,包括:The electronic device according to any one of claims 13-15, wherein the processor executes the input of the second face image to be recognized into a pre-trained facial action unit recognition model, and passes through the facial action unit The processing of the main network part, the attention mechanism and the fully connected layer of the recognition model to obtain the facial action unit recognition result of the first face image to be recognized includes:
    将所述第二待识别人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图;Inputting the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map;
    利用所述注意力机制对所述高阶特征图进行最大池化和平均池化操作,得到宽、高与所述高阶特征图相同,深度为1的第一特征图和第二特征图;Using the attention mechanism to perform maximum pooling and average pooling operations on the high-order feature map, to obtain a first feature map and a second feature map with the same width and height as the high-order feature map and a depth of 1;
    根据所述第一特征图和所述第二特征图得到目标特征图,将所述目标特征图输入所述全连接层进行二分类,得到所述第一待识别人脸图像的面部动作单元识别结果。Obtain a target feature map according to the first feature map and the second feature map, and input the target feature map into the fully connected layer for two-class classification to obtain the facial action unit recognition of the first face image to be recognized result.
  17. 根据权利要求16所述的电子设备,其中,所述处理器执行所述根据所述第一特征图和所述第二特征图得到目标特征图,包括:The electronic device according to claim 16, wherein the execution by the processor to obtain the target feature map according to the first feature map and the second feature map comprises:
    在深度方向将所述第一特征图和所述第二特征图进行拼接,对拼接得到的特征图进行1*1的卷积,得到第三特征图;Splicing the first feature map and the second feature map in the depth direction, and performing a 1*1 convolution on the spliced feature map to obtain a third feature map;
    将所述第三特征图的宽、高与所述高阶特征图的宽、高对应相乘得到所述目标特征图。The target feature map is obtained by correspondingly multiplying the width and height of the third feature map with the width and height of the higher-order feature map.
  18. 根据权利要求16所述的电子设备,其中,所述处理器执行所述将所述第二待识别 人脸图像输入所述主体网络部分进行特征提取,得到高阶特征图,包括:The electronic device according to claim 16, wherein the processor executes the input of the second face image to be recognized into the main body network part for feature extraction to obtain a high-level feature map, comprising:
    将所述第二待识别人脸图像输入所述主体网络部分,经过多个所述深度残差密集网络进行特征提取,得到所述高阶特征图;其中,每个所述深度残差密集网络从1*1的卷积层开始进行卷积处理,后接3*3的卷积层,再接一个1*1的卷积层后分为两部分处理,一部分接入所述深度残差网络,在所述深度残差网络中将两个隐藏层输出的特征在宽、高上进行相加,深度保持不变,另一部分与所述深度密集网络的路径连接,在所述深度密集网络中将两个隐藏层输出的特征在深度上进行拼接,宽、高保持不变。The second face image to be recognized is input into the main network part, and feature extraction is performed through a plurality of the deep residual dense networks to obtain the high-order feature map; wherein, each of the deep residual dense networks The convolution processing starts from the 1*1 convolution layer, followed by the 3*3 convolution layer, and then the 1*1 convolution layer, and then it is divided into two parts for processing, and one part is connected to the deep residual network In the deep residual network, the features output by the two hidden layers are added in width and height, the depth remains unchanged, and the other part is connected to the path of the deep dense network, in the deep dense network The features output by the two hidden layers are stitched in depth, and the width and height remain unchanged.
  19. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有一条或多条指令,所述一条或多条指令适于由处理器加载并执行:获取终端上传的第一待识别人脸图像;采用预训练的卷积神经网络模型对所述第一待识别人脸图像进行人脸检测,得到所述第一待识别人脸图像中人脸关键点的位置信息;利用所述人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到第二待识别人脸图像;将所述第二待识别人脸图像输入预训练的面部动作单元识别模型,经过所述面部动作单元识别模型的主体网络部分、注意力机制及全连接层的处理,得到所述第一待识别人脸图像的面部动作单元识别结果,所述主体网络部分包括多个深度残差密集网络,每个所述深度残差密集网络由深度残差网络和深度密集网络堆叠而成;向所述终端输出所述第一待识别人脸图像的面部动作单元识别结果。A computer-readable storage medium, wherein the computer-readable storage medium stores one or more instructions, and the one or more instructions are suitable for being loaded and executed by a processor: obtaining a first person to be identified uploaded by a terminal Face image; using a pre-trained convolutional neural network model to perform face detection on the first face image to be recognized to obtain position information of key points of the face in the first face image to be recognized; using the person The position information of the key points of the face performs face correction on the first face image to be recognized to obtain a second face image to be recognized; inputting the second face image to be recognized into a pre-trained facial action unit recognition model, After the main body network part of the facial action unit recognition model, the attention mechanism and the fully connected layer are processed, the facial action unit recognition result of the first face image to be recognized is obtained. The main body network part includes a plurality of deep residuals. A difference dense network, each of the deep residual dense networks is formed by stacking a deep residual network and a deep dense network; and the facial action unit recognition result of the first face image to be recognized is output to the terminal.
  20. 根据权利要求19所述的计算机可读存储介质,其中,计算机可读存储介质中的一条或多条指令由处理器加载时还执行如下步骤:从数据库中获取预先存储的标准人脸图像中人脸关键点的位置信息;根据所述第一待识别人脸图像中人脸关键点的位置信息与所述标准人脸图像中人脸关键点的位置信息对所述第一待识别人脸图像进行人脸矫正,得到所述第二待识别人脸图像。The computer-readable storage medium according to claim 19, wherein, when one or more instructions in the computer-readable storage medium are loaded by the processor, the following steps are further executed: obtaining the pre-stored standard human face image from the database. The position information of the face key points; according to the position information of the face key points in the first face image to be recognized and the position information of the face key points in the standard face image, compare the first face image to be recognized Perform face correction to obtain the second face image to be recognized.
PCT/CN2020/092805 2020-04-03 2020-05-28 Facial action unit recognition method and apparatus, electronic device, and storage medium WO2021196389A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010262740.8A CN111597884A (en) 2020-04-03 2020-04-03 Facial action unit identification method and device, electronic equipment and storage medium
CN2020102627408 2020-04-03

Publications (1)

Publication Number Publication Date
WO2021196389A1 true WO2021196389A1 (en) 2021-10-07

Family

ID=72185476

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092805 WO2021196389A1 (en) 2020-04-03 2020-05-28 Facial action unit recognition method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111597884A (en)
WO (1) WO2021196389A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049250A (en) * 2022-01-13 2022-02-15 广州卓腾科技有限公司 Method, device and medium for correcting face pose of certificate photo
CN114596624A (en) * 2022-04-20 2022-06-07 深圳市海清视讯科技有限公司 Human eye state detection method and device, electronic equipment and storage medium
CN114842542A (en) * 2022-05-31 2022-08-02 中国矿业大学 Facial action unit identification method and device based on self-adaptive attention and space-time correlation
CN116486464A (en) * 2023-06-20 2023-07-25 齐鲁工业大学(山东省科学院) Attention mechanism-based face counterfeiting detection method for convolution countermeasure network

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116355A (en) * 2020-09-18 2020-12-22 支付宝(杭州)信息技术有限公司 Method, system and device for confirming whether payment is finished or not based on willingness recognition
CN113542527B (en) * 2020-11-26 2023-08-18 腾讯科技(深圳)有限公司 Face image transmission method and device, electronic equipment and storage medium
CN112861752B (en) * 2021-02-23 2022-06-14 东北农业大学 DCGAN and RDN-based crop disease identification method and system
CN114821747A (en) * 2022-05-26 2022-07-29 深圳市科荣软件股份有限公司 Method and device for identifying abnormal state of construction site personnel
CN115067945A (en) * 2022-08-22 2022-09-20 深圳市海清视讯科技有限公司 Fatigue detection method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN105654049A (en) * 2015-12-29 2016-06-08 中国科学院深圳先进技术研究院 Facial expression recognition method and device
CN108460343A (en) * 2018-02-06 2018-08-28 北京达佳互联信息技术有限公司 Image processing method, system and server
CN110263673A (en) * 2019-05-31 2019-09-20 合肥工业大学 Human facial expression recognition method, apparatus, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921061B (en) * 2018-06-20 2022-08-26 腾讯科技(深圳)有限公司 Expression recognition method, device and equipment
CN110059593B (en) * 2019-04-01 2022-09-30 华侨大学 Facial expression recognition method based on feedback convolutional neural network
CN110633665B (en) * 2019-09-05 2023-01-10 卓尔智联(武汉)研究院有限公司 Identification method, device and storage medium
CN110889325B (en) * 2019-10-12 2023-05-23 平安科技(深圳)有限公司 Multitasking facial motion recognition model training and multitasking facial motion recognition method
CN110796643A (en) * 2019-10-18 2020-02-14 四川大学 Rail fastener defect detection method and system
CN110929583A (en) * 2019-10-26 2020-03-27 湖北讯獒信息工程有限公司 High-detection-precision face recognition method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN105654049A (en) * 2015-12-29 2016-06-08 中国科学院深圳先进技术研究院 Facial expression recognition method and device
CN108460343A (en) * 2018-02-06 2018-08-28 北京达佳互联信息技术有限公司 Image processing method, system and server
CN110263673A (en) * 2019-05-31 2019-09-20 合肥工业大学 Human facial expression recognition method, apparatus, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI ZHENG, ZHANG TONG;ZHU GUOTAO;WANG XIN;WANG WEI: "A Super-resolution Image Reconstruction Method Based on Deep Learning", vol. 28, no. 6, 30 November 2019 (2019-11-30), pages 59 - 63, XP055855544, ISSN: 1672-7304, DOI: 10.3969/j.issn.1672-7304.2019.06.0011 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049250A (en) * 2022-01-13 2022-02-15 广州卓腾科技有限公司 Method, device and medium for correcting face pose of certificate photo
CN114596624A (en) * 2022-04-20 2022-06-07 深圳市海清视讯科技有限公司 Human eye state detection method and device, electronic equipment and storage medium
CN114842542A (en) * 2022-05-31 2022-08-02 中国矿业大学 Facial action unit identification method and device based on self-adaptive attention and space-time correlation
CN116486464A (en) * 2023-06-20 2023-07-25 齐鲁工业大学(山东省科学院) Attention mechanism-based face counterfeiting detection method for convolution countermeasure network
CN116486464B (en) * 2023-06-20 2023-09-01 齐鲁工业大学(山东省科学院) Attention mechanism-based face counterfeiting detection method for convolution countermeasure network

Also Published As

Publication number Publication date
CN111597884A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
WO2021196389A1 (en) Facial action unit recognition method and apparatus, electronic device, and storage medium
CN110532984B (en) Key point detection method, gesture recognition method, device and system
Han et al. Two-stage learning to predict human eye fixations via SDAEs
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
TWI754887B (en) Method, device and electronic equipment for living detection and storage medium thereof
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN107545241A (en) Neural network model is trained and biopsy method, device and storage medium
US20220237812A1 (en) Item display method, apparatus, and device, and storage medium
Wang et al. Dynamic attention guided multi-trajectory analysis for single object tracking
CN106874826A (en) Face key point-tracking method and device
CN110163111A (en) Method, apparatus of calling out the numbers, electronic equipment and storage medium based on recognition of face
CN110232318A (en) Acupuncture point recognition methods, device, electronic equipment and storage medium
CN111222433B (en) Automatic face auditing method, system, equipment and readable storage medium
WO2021047587A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
WO2021218238A1 (en) Image processing method and image processing apparatus
WO2022188697A1 (en) Biological feature extraction method and apparatus, device, medium, and program product
CN110222572A (en) Tracking, device, electronic equipment and storage medium
CN111127309A (en) Portrait style transfer model training method, portrait style transfer method and device
CN111382791B (en) Deep learning task processing method, image recognition task processing method and device
Baggio et al. Mastering OpenCV 3
Yu Emotion monitoring for preschool children based on face recognition and emotion recognition algorithms
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
WO2021217919A1 (en) Facial action unit recognition method and apparatus, and electronic device, and storage medium
CN111353325A (en) Key point detection model training method and device
Lu et al. Cost-effective real-time recognition for human emotion-age-gender using deep learning with normalized facial cropping preprocess

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929378

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20929378

Country of ref document: EP

Kind code of ref document: A1