WO2021217919A1 - Facial action unit recognition method and apparatus, and electronic device, and storage medium - Google Patents

Facial action unit recognition method and apparatus, and electronic device, and storage medium Download PDF

Info

Publication number
WO2021217919A1
WO2021217919A1 PCT/CN2020/104042 CN2020104042W WO2021217919A1 WO 2021217919 A1 WO2021217919 A1 WO 2021217919A1 CN 2020104042 W CN2020104042 W CN 2020104042W WO 2021217919 A1 WO2021217919 A1 WO 2021217919A1
Authority
WO
WIPO (PCT)
Prior art keywords
action unit
face
feature
target
sub
Prior art date
Application number
PCT/CN2020/104042
Other languages
French (fr)
Chinese (zh)
Inventor
胡艺飞
徐国强
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021217919A1 publication Critical patent/WO2021217919A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and storage medium for recognizing face action units.
  • the face action unit has shown great excavability in the field of human-computer interaction, attracting more and more enterprises or researchers.
  • the recognition of facial action units is the basis of facial expression analysis, emotion analysis, and deeper behavioral analysis of whether the subject has lied, fraud, etc. It usually needs to be implemented by building a neural network model using annotated facial image data sets.
  • the network structure adopted is relatively complex, and the model level after training is generally too large. Therefore, it is not suitable for mobile devices. Even if it can be deployed on mobile devices, the inventor Realize that because the performance of the processor of the mobile device is much lower than that of the server, the model requires a lot of time to run once, which makes the recognition efficiency of the facial action unit low.
  • the embodiments of the present application provide a method, device, electronic device, and storage medium for recognizing a face action unit, which are beneficial to improve the efficiency of recognition of a face action unit in a face image.
  • an embodiment of the present application provides a method for recognizing a facial action unit, the method including:
  • the separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
  • the force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
  • the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  • an embodiment of the present application provides a face action unit recognition device, which includes:
  • the face correction module is used to obtain a face image to be recognized, perform face correction on the face image to be recognized, to obtain a target face image to be recognized;
  • the feature extraction module is used to extract features of the target face image to be recognized by using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to obtain the first target type face action unit sub Features, sub-features of the second target type of facial action unit, and sub-features of the third target type of facial action unit;
  • a feature processing module configured to input the sub-features of the first target-type face action unit, the sub-features of the second target-type face action unit, and the sub-features of the third target-type face action unit into the face
  • the attention mechanism of the action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, the second output feature of the sub-feature of the second target type of face action unit, and The third output feature of the sub-feature of the face action unit of the third target category;
  • the facial action unit classification module is configured to obtain the recognition result of the first target type facial action unit and the first target facial action unit according to the first output feature, the second output feature, and the third output feature. Two recognition results of the target face action unit and the recognition result of the third target face action unit.
  • an embodiment of the present application provides an electronic device.
  • the electronic device includes a processor, a memory, and a computer program stored on the memory and running on the processor, and the processor executes the Realize in computer program:
  • the separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
  • the force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
  • the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  • an embodiment of the present application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following is achieved:
  • the separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
  • the force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
  • the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  • the backbone network of the face action unit recognition model uses a stack of separable convolution blocks and de-residual blocks to extract sub-features, and the separable convolution makes the processing parameters of the face action unit recognition model
  • the anti-residual block is smaller than the positive residual structure, and the matrix multiplication is used in the attention mechanism to calculate, which can ensure the running speed of the face action unit recognition model. It can be seen that the entire face action unit recognition model is in the structure The upper lighter weight and fast calculation speed help to improve the efficiency of facial motion unit recognition in facial images.
  • FIG. 1 is an example diagram of an application scenario provided by an embodiment of the application
  • Figure 2 is a network architecture diagram provided by an application embodiment
  • FIG. 3 is a schematic flowchart of a method for recognizing a face action unit provided by an embodiment of the application
  • FIG. 4 is a schematic structural diagram of a multi-task convolutional neural network model provided by an embodiment of the application.
  • FIG. 5 is a schematic structural diagram of a facial action unit recognition model provided by an embodiment of the application.
  • FIG. 6 is an example diagram of a separable convolution provided by an embodiment of the application.
  • FIG. 7 is a schematic flowchart of another method for recognizing a face action unit provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a face action unit recognition device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the embodiment of the present application proposes a face action unit recognition solution, which can be applied to the scenario in which staff and customers/people handle business as shown in Figure 1.
  • Staff usually need to use a terminal to collect videos or photos, for example: When bank staff handle loan business for customers, insurance company handles insurance business for customers, and government affairs center handles related business for the masses, of course, the scene shown in Figure 1 is just for illustration and does not limit this application.
  • the facial action unit proposed in the application can also be used in many scenes such as facial expression analysis, psychological activity analysis, and interviews.
  • the face action unit recognition model adopted in this scheme adopts separable convolution in convolution processing, which greatly reduces the parameter amount of the model, and uses the anti-residual module to extract deeper features.
  • the residual module and the anti-residual module are lighter.
  • the backbone network of the model and the operation in the attention mechanism adopt matrix-like multiplication.
  • the entire design makes the model less than 7M in size, which guarantees that 39 facial action units are guaranteed.
  • the running speed is faster and the efficiency is higher, and it can be deployed not only on the server side, but also on the mobile terminal.
  • the face action unit recognition solution can be implemented based on the network architecture shown in FIG. 2.
  • the network architecture includes at least a terminal and a server.
  • the terminal and the server communicate through a network, which includes but is not limited to Virtual private network, local area network, metropolitan area network, the terminal is mainly used for photographing and uploading face images and displaying the final recognition result.
  • the terminal can be a mobile phone, a tablet, a notebook computer, a handheld computer and other devices.
  • the server After obtaining the face image sent by the terminal, the server performs a series of facial motion unit recognition operations, and finally outputs the recognition results to the terminal.
  • the server can be a single server, a server cluster, or a cloud server. It is the executive body of the entire facial action unit recognition scheme.
  • the execution subject may also be the terminal, and related models or algorithms such as face detection and face correction are also deployed on the terminal.
  • FIG. 3 is a schematic flowchart of a face action unit recognition method provided by an embodiment of the application, which is applied to a server, as shown in FIG. 3, including steps S31-S34:
  • S31 Obtain a face image to be recognized, and perform face correction on the face image to be recognized to obtain a target face image to be recognized.
  • the face image to be recognized is the face image collected by the terminal and uploaded to the server in real time. It may be a short video or a separate picture, which is not limited here.
  • the server After the server obtains the image to be recognized, it first inputs it into the pre-trained multi-task convolutional neural network model for face detection and face key point positioning.
  • the multi-task convolutional neural network model is composed of P- Net, R-Net and O-Net are composed of three sub-networks.
  • the input size of P-Net ie width, height and depth
  • the input size of R-Net is 24*24*3.
  • the input size of O-Net is 48*48*3, followed by a 256-channel fully connected layer
  • the face image to be recognized is first input to P-Net for processing
  • the output of P-Net As the input of R-Net, the output of R-Net is used as the input of O-Net to form a cascaded structure.
  • Each sub-network uses 3*3 convolution or 2*2 convolution, and 3*3 pooling Or 2*2 pooling for processing, and finally a face classifier is used to give the confidence of whether the area is a face.
  • border regression and key point locator are used to calibrate the face area and face key points Positioning.
  • the key points of the face are the five key points of the two eyes, the nose, and the corners of the mouth on the left and right sides of the face in the face image to be recognized, and the coordinate information of the five key points will be obtained by locating them.
  • the pre-stored coordinate information of the face key points of the standard face image is obtained from the database.
  • the standard face image means that the face in the image does not have rotation and does not need to be rotated. Corrected face image. Compare the coordinate information of the five key points of the face in the face image to be recognized with the coordinate information of the key points of the face in the standard face image to obtain the similarity transformation matrix T, and solve the similarity transformation matrix T according to the following similarity transformation matrix equation:
  • the coordinate information of the five key points of the face in the face image to be recognized is multiplied by the similarity transformation matrix T to obtain the target face image to be recognized, that is, the correction of the face in the face image to be recognized is completed.
  • the similarity transformation matrix equation (x, y) represents the coordinate information of the key points of the face in the face image to be recognized, (x', y') represents the coordinate information of the key points of the face in the standard face image, That is, the similarity transformation matrix T, s represents the scaling factor, ⁇ represents the rotation angle, usually counterclockwise rotation, and (t x , t y ) represents the translation parameter.
  • the transformation.SimilarityTransform function can be used to iteratively solve the similarity transformation matrix T.
  • the processing efficiency of the model uses a more lightweight convolutional neural network.
  • the specific structure is shown in Figure 5.
  • the backbone network part of the facial action unit recognition model is a stack of 7 separable convolutional blocks and anti-residual modules, with a total of 17 layers, which are mainly used for inputting the target face to be recognized. Image feature extraction.
  • the convolution kernels of all standard convolutional neural networks in the face action unit recognition model are replaced with separable convolutions.
  • the input feature map size is d*d*m (d is the width and height of the feature map, m is the channel Number)
  • the output feature map is d*d*n
  • the convolution size is k*k.
  • the computational complexity of the standard convolution kernel is d*d*m*n*k*k
  • the computational complexity of the separable convolution kernel is d*d*m*(n+k*k)
  • the 10*3 feature map is then convolved with a 1*1*3 convolution kernel to obtain a 10*10*1 feature map.
  • the anti-residual module is constructed on the basis of separable convolution, and the depth of the feature map is expanded and compressed using the "expansion-convolution-compression" processing mode, in order to extract deeper features, compared with positive
  • the residual module and the anti-residual module have a smaller structure, which is more conducive to improving the computational efficiency of the model.
  • the first target face action unit is the pre-divided face action unit around the eyes
  • the second target face action unit is the face and nose face action unit
  • the third target face action unit is the face action unit.
  • Mouth-like facial action unit Since the data set used to train the aforementioned facial action unit recognition model is an annotated data set that divides 39 face action units into 3 categories, namely, the area around the eyes, the face and nose, the mouth, and the around the eyes.
  • the changes in facial action units are generally slight skin tightening or stretching, the changes in facial action units around the nose are generally folds, and the changes in facial action units around the mouth are generally bulging of the skin caused by the lips or tongue. Wait.
  • AU45 belongs to the area around the eyes
  • AU18 belongs to the mouth category
  • AU04 Crowning
  • the facial action unit recognition model learns to extract the above three categories respectively.
  • the sub-features of the face action unit that is, the first target face action unit sub-feature, the second target face action unit sub-feature, and the third target output after processing by the separable convolution block and the inverse residual block
  • Sub-features of human-like facial action units Sub-features of human-like facial action units.
  • the first output feature is the feature map output after the first target-type face action unit sub-feature is processed by the convolution processing in the attention mechanism module, and the second output feature and the third output feature are the same.
  • the face action unit recognition model is divided into three branches after the main network part. Each branch processes the sub-features of the eye area, face and nose, and mouth.
  • Attention mechanism modules are added to each branch, and each attention mechanism module is composed of three layers of 1*1 convolution, the first target type of face action unit sub-feature, and the second target type of face action unit sub-feature And the sub-features of the third target class of face action units are respectively subjected to three times of 1*1 convolution to obtain the output features of each type of sub-features.
  • the attention mechanism module in each branch uses three consecutive layers of 1*1 volume.
  • the product learns two-dimensional weights, which can clarify the feature information of which position of the input face is more conducive to the recognition of facial action units.
  • the attention mechanism module uses matrix multiplication to calculate, which ensures the model's operation speed and strengthens the model. The ability to extract high-level features of facial action units.
  • the output feature is used as a weight, and its width and height are respectively the same as the sub-features of the first target type face action unit and the second output feature.
  • the corresponding width and height of the target facial action unit sub-feature and the third target facial action unit sub-feature are multiplied to pay more attention to the useful features of the first target facial action unit, that is, the width and height of the first output feature
  • the height is multiplied by the width and height of the sub-features of the first target type of face action unit.
  • the second output feature and the third output feature also do this operation to obtain the first target type face action unit's first feature to be classified,
  • the feature to be classified of each type of face action unit is the input feature of the fully connected layer.
  • the first feature to be classified, the second feature to be classified, and the third feature to be classified are input to the fully connected layer, and the fully connected layer performs classification respectively, and finally outputs the recognition result of the first target type of face action unit, and the second target type of person
  • the recognition result of the face action unit and the recognition result of the third target type of face action unit that is, the output is the recognition result of the face action unit of the area around the eyes, the recognition result of the face and nose face action unit, the mouth
  • the recognition result of the facial action unit of the category, the result is a probability value, and a threshold can be set for it.
  • the recognition result of a specific facial action unit is greater than or equal to the threshold, it indicates that the face image to be recognized is
  • the face action unit appears in the face of
  • the value of AU45 (blink) is 0.8
  • the value of AU18 (frowning) The value is 0.3.
  • the threshold value is 0.5, it indicates that the face in the image to be recognized has AU45 but not AU18.
  • the embodiment of the present application obtains the face image to be recognized, performs face correction on the face image to be recognized, and obtains the target face image to be recognized; the separable convolution block of the pre-trained face action unit recognition model is used And de-residual block to perform feature extraction on the face image of the target to be recognized, and obtain the sub-features of the first target type of facial motion unit, the second target type of facial motion unit sub-feature, and the third target type of facial motion unit sub-feature;
  • the first target type of face action unit sub-features, the second target type of face action unit sub-features, and the third target type of face action unit sub-features are input into the attention mechanism of the face action unit recognition model for convolution processing, and the first An output feature, a second output feature, and a third output feature; according to the first output feature, the second output feature, and the third output feature, the recognition results of the first target type face action unit and the second target type face are obtained respectively.
  • the separable convolution makes the processing parameters of the facial motion unit recognition model doubled and reversed.
  • the residual block is smaller than the positive residual structure, and the attention mechanism adopts matrix multiplication to calculate, which can ensure the running speed of the face action unit recognition model. It can be seen that the entire face action unit recognition model is lighter in structure. And the calculation speed is fast, which is beneficial to improve the efficiency of facial action unit recognition in the face image.
  • FIG. 7 is a schematic flowchart of another facial motion unit recognition method provided by an embodiment of the application, as shown in FIG. 7, Including steps S71-S75:
  • the above-mentioned performing face correction on the face image to be recognized to obtain the target face image to be recognized includes:
  • the foregoing performing face correction on the face image to be recognized based on the face key points includes:
  • the coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
  • the face image to be recognized is not directly input into the face action unit recognition model for processing, but a multi-task convolutional neural network model is first used to perform face correction on the face image to be recognized, and the face is rotated at different angles.
  • the time model can be accurately judged, which guarantees the stability of the model.
  • the above-mentioned inputting the sub-features of the first target-type facial motion unit, the sub-features of the second target-type facial motion unit, and the sub-features of the third target-type facial motion unit into the facial motion The attention mechanism of the unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, the second output feature of the sub-feature of the second target type of face action unit, and all
  • the third output feature of the sub-feature of the face action unit of the third target category includes:
  • the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
  • the first output feature, the second output feature, and the third output feature are obtained.
  • the backbone network is followed by 3 branches, and each branch processes the sub-features of the face-like action units in the area around the eyes, the sub-features of the face and nose-like face action units, and the mouth-like face.
  • the sub-features of the action unit ensure that the kinetic energy of 39 face action units can be recognized, and the attention mechanism module in each branch adopts a three-layer 1*1 convolution stack, which makes the model pay more attention to useful features.
  • the recognition result of the first target type face action unit and the second target type person include:
  • the width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action
  • the above-mentioned recognition result can also be stored in a node of a blockchain.
  • the features output by the attention mechanism module are used as weights to perform calculations on their input features respectively to obtain the input features of the fully connected layer, and then the features to be classified of the three target face action units are input to the fully connected layer for two steps. Classification helps the model pay more attention to the difference between the three target facial action units.
  • the present application also provides a face action unit recognition device, which can execute the method shown in FIG. 3 or FIG. 7. See Figure 8.
  • the device includes:
  • the face correction module 81 is configured to obtain a face image to be recognized, perform face correction on the face image to be recognized, to obtain a target face image to be recognized;
  • the feature extraction module 82 is configured to use the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to perform feature extraction on the target face image to be recognized to obtain the first target type face action unit Sub-features, sub-features of face action units of the second target category, and sub-features of face action units of the third target category;
  • the feature processing module 83 is configured to input the sub-features of the first target type of face action unit, the second target type of face action unit sub-features, and the third target type of face action unit sub-features into the person
  • the attention mechanism of the facial action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, and the second output feature of the sub-feature of the second target type of face action unit And the third output feature of the sub-feature of the face action unit of the third target category;
  • the facial action unit classification module 84 is configured to obtain the recognition result of the first target facial action unit, the recognition result of the first target facial action unit, and the The recognition result of the second target type face action unit and the recognition result of the third target type face action unit.
  • the feature extraction module 82 is specifically configured to:
  • the feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
  • the sub-features of the first target-type facial motion unit, the sub-features of the second target-type facial motion unit, and the sub-features of the third target-type facial motion unit are input to the person
  • the attention mechanism of the facial action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, and the second output feature of the sub-feature of the second target type of face action unit
  • the feature processing module 83 is specifically configured to:
  • the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
  • the first output feature, the second output feature, and the third output feature are obtained.
  • the facial motion unit classification module 84 is specifically configured to:
  • the width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action
  • the face correction module 81 is specifically configured to:
  • the face correction module 81 is specifically further configured to:
  • the coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
  • the facial action unit recognition device obtained by the embodiment of the application obtains the face image to be recognized, performs face correction on the face image to be recognized, and obtains the target face image to be recognized; a pre-trained face motion unit recognition model is adopted Separate convolution block and de-residual block to perform feature extraction on the face image of the target to be recognized, and obtain the sub-features of the first target type of face action unit, the second target type of face action unit sub-feature, and the third target type of face action Unit sub-features; input the first target-type facial motion unit sub-features, the second target-type facial motion unit sub-features, and the third target-type facial motion unit sub-features into the attention mechanism of the facial motion unit recognition model for volume
  • Product processing to obtain the first output feature, the second output feature, and the third output feature; according to the first output feature, the second output feature, and the third output feature, the recognition results and the first target facial action units of the first target category are obtained respectively.
  • the recognition result of the second target type face action unit and the recognition result of the third target type face action unit Since the backbone network of the facial motion unit recognition model uses a stack of separable convolutional blocks and de-residual blocks to extract sub-features, the separable convolution makes the processing parameters of the facial motion unit recognition model doubled and reversed.
  • the residual block is smaller than the positive residual structure, and the attention mechanism adopts matrix multiplication to calculate, which can ensure the running speed of the face action unit recognition model. It can be seen that the entire face action unit recognition model is lighter in structure. And the calculation speed is fast, which is beneficial to improve the efficiency of facial action unit recognition in the face image.
  • the various modules of the facial action unit recognition device shown in FIG. 8 can be separately or completely combined into one or several other units to form, or some of the modules can be further It is divided into multiple functionally smaller units to form, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions.
  • the function of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit.
  • the face action unit recognition device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
  • a general-purpose computing device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM) and other processing elements and storage elements
  • CPU central processing unit
  • RAM random access storage medium
  • ROM read-only storage medium
  • the computer program may be recorded on, for example, a computer-readable recording medium, and loaded into the above-mentioned computing device through the computer-readable recording medium, and run in it.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • the electronic device includes at least a memory 901 for Stores a computer program; the processor 902 is used to call the computer program stored in the memory 901 to implement the steps in the embodiment of the implementation method of the convolutional neural network; the input and output interface 903 is used to perform input and output, and the input and output interface 903 can It is one or more; it is understandable that each part of the electronic device is connected to the bus respectively.
  • a computer-readable storage medium may be stored in the memory 901 of the electronic device.
  • the computer-readable storage medium is used to store a computer program.
  • the computer program includes program instructions.
  • the processor 902 is used to execute the computer-readable storage. Program instructions stored in the medium.
  • the processor 902 (or CPU (Central Processing Unit, central processing unit)) is the computing core and control core of an electronic device. It is suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve Corresponding method flow or corresponding function.
  • processor 902 is specifically configured to call a computer program to execute the following steps:
  • the separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
  • the force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
  • the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  • the processor 902 executes the feature extraction of the target face image to be recognized by using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model, including :
  • the feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
  • the processor 902 executes the combination of the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face
  • the action unit sub-features are input into the attention mechanism of the facial action unit recognition model for convolution processing to obtain the first output feature of the first target-type face action unit sub-features, and the second target-type face action
  • the second output feature of the unit sub-feature and the third output feature of the third target-type face action unit sub-feature include:
  • the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
  • the first output feature, the second output feature, and the third output feature are obtained.
  • the processor 902 executes the acquisition of the facial action unit of the first target type according to the first output feature, the second output feature, and the third output feature.
  • the recognition result, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit include:
  • the width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action
  • the processor 902 executing the face correction on the face image to be recognized includes:
  • the processor 902 executing the face correction on the face image to be recognized based on the face key points includes:
  • the coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
  • the foregoing electronic device may be various servers, hosts, and other devices.
  • the electronic device may include, but is not limited to, a processor 902, a memory 901, and an input/output interface 903.
  • a processor 902 a processor 902
  • a memory 901 a memory 901
  • the schematic diagram is only an example of the electronic device, and does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different components.
  • the processor 902 of the electronic device executes the computer program to implement the steps in the above-mentioned facial motion unit recognition method
  • the above-mentioned embodiments of the facial motion unit recognition method are all applicable to the electronic device, and all are capable of Achieve the same or similar beneficial effects.
  • the embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
  • the separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
  • the force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
  • the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  • the feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
  • the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
  • the first output feature, the second output feature, and the third output feature are obtained.
  • the width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action
  • the coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
  • the computer program in the computer-readable storage medium includes computer program code
  • the computer program code may be in the form of source code, object code, executable file, or some intermediate form.
  • the computer-readable storage medium may be non-volatile or volatile, and may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, Computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Embodiments of the present application relate to the technical field of artificial intelligence, and provide a facial action unit recognition method. The method comprises: obtaining a face image to be recognized, and performing face correction on said face image to obtain a target face image to be recognized; performing feature extraction on said target face image by using a separable convolutional block and an inverted residual block of a pretrained facial action unit recognition model to obtain sub-features of three target-class facial action units; obtaining outputs of the sub-features of the three target-class facial action units by means of an attention mechanism of the facial action unit recognition model; and respectively obtaining a recognition result of each target-class facial action unit according to the outputs of the sub-features of the three target-class facial action units. Implementation of embodiments of the facial action unit recognition method of the present application facilitates improving the efficiency of facial action unit recognition in face images. In addition, the present application further relates to a blockchain technology, and the recognition results can be stored in a blockchain node.

Description

人脸动作单元识别方法、装置、电子设备及存储介质Face action unit recognition method, device, electronic equipment and storage medium
本申请要求于2020年4月29日提交中国专利局、申请号为202010359833.2,发明名称为“人脸动作单元识别方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on April 29, 2020, the application number is 202010359833.2, and the invention title is "Face Action Unit Recognition Method, Device, Electronic Equipment and Storage Medium", and its entire content Incorporated in this application by reference.
技术领域Technical field
本申请涉及人工智能技术领域,尤其涉及一种人脸动作单元识别方法、装置、电子设备及存储介质。This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and storage medium for recognizing face action units.
背景技术Background technique
随着人工智能中计算机视觉技术的发展,人脸动作单元在人机交互领域表现出了巨大的可挖掘性,吸引了越来越多的企业或研究者的关注。人脸动作单元的识别是人脸表情分析、情绪分析以及考察对象是否有撒谎、欺诈等更深层次行为分析的基础,通常需要采用经过标注的人脸图像数据集构建神经网络模型实现。现有的人脸动作单元识别模型为了提高识别精度,采用的网络结构较为复杂,训练出来的模型量级普遍偏大,因此,并不适合于移动设备,即使能够部署在移动设备上,发明人意识到由于移动设备处理器的性能远低于服务器的处理器性能,模型运行一次需要消耗大量的时间,这就使得人脸动作单元识别的效率偏低。With the development of computer vision technology in artificial intelligence, the face action unit has shown great excavability in the field of human-computer interaction, attracting more and more enterprises or researchers. The recognition of facial action units is the basis of facial expression analysis, emotion analysis, and deeper behavioral analysis of whether the subject has lied, fraud, etc. It usually needs to be implemented by building a neural network model using annotated facial image data sets. In order to improve the recognition accuracy of the existing facial action unit recognition model, the network structure adopted is relatively complex, and the model level after training is generally too large. Therefore, it is not suitable for mobile devices. Even if it can be deployed on mobile devices, the inventor Realize that because the performance of the processor of the mobile device is much lower than that of the server, the model requires a lot of time to run once, which makes the recognition efficiency of the facial action unit low.
发明内容Summary of the invention
本申请实施例提供了一种人脸动作单元识别方法、装置、电子设备及存储介质,有利于提高人脸图像中人脸动作单元识别的效率。The embodiments of the present application provide a method, device, electronic device, and storage medium for recognizing a face action unit, which are beneficial to improve the efficiency of recognition of a face action unit in a face image.
第一方面,本申请实施例提供了一种人脸动作单元识别方法,该方法包括:In the first aspect, an embodiment of the present application provides a method for recognizing a facial action unit, the method including:
获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
第二方面,本申请实施例提供了一种人脸动作单元识别装置,该装置包括:In the second aspect, an embodiment of the present application provides a face action unit recognition device, which includes:
人脸矫正模块,用于获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;The face correction module is used to obtain a face image to be recognized, perform face correction on the face image to be recognized, to obtain a target face image to be recognized;
特征提取模块,用于采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The feature extraction module is used to extract features of the target face image to be recognized by using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to obtain the first target type face action unit sub Features, sub-features of the second target type of facial action unit, and sub-features of the third target type of facial action unit;
特征处理模块,用于将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征 的第三输出特征;A feature processing module, configured to input the sub-features of the first target-type face action unit, the sub-features of the second target-type face action unit, and the sub-features of the third target-type face action unit into the face The attention mechanism of the action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, the second output feature of the sub-feature of the second target type of face action unit, and The third output feature of the sub-feature of the face action unit of the third target category;
人脸动作单元分类模块,用于根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。The facial action unit classification module is configured to obtain the recognition result of the first target type facial action unit and the first target facial action unit according to the first output feature, the second output feature, and the third output feature. Two recognition results of the target face action unit and the recognition result of the third target face action unit.
第三方面,本申请实施例提供了一种电子设备,电子设备包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:In a third aspect, an embodiment of the present application provides an electronic device. The electronic device includes a processor, a memory, and a computer program stored on the memory and running on the processor, and the processor executes the Realize in computer program:
获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
第四方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现:In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, the following is achieved:
获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
本申请实施例中,人脸动作单元识别模型的骨干网络采用的是可分离卷积块和反残差块的堆叠进行子特征的提取,可分离卷积使人脸动作单元识别模型的处理参数成倍减少,反残差块相比正残差结构更小,而注意力机制中采用矩阵乘法进行计算,能够保证人脸动作单元识别模型的运行速度,可见,整个人脸动作单元识别模型在结构上更轻量,且运算速度快,有利于提高人脸图像中人脸动作单元识别的效率。In the embodiment of this application, the backbone network of the face action unit recognition model uses a stack of separable convolution blocks and de-residual blocks to extract sub-features, and the separable convolution makes the processing parameters of the face action unit recognition model The anti-residual block is smaller than the positive residual structure, and the matrix multiplication is used in the attention mechanism to calculate, which can ensure the running speed of the face action unit recognition model. It can be seen that the entire face action unit recognition model is in the structure The upper lighter weight and fast calculation speed help to improve the efficiency of facial motion unit recognition in facial images.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的一种应用场景的示例图;FIG. 1 is an example diagram of an application scenario provided by an embodiment of the application;
图2为申请实施例提供的一种网络架构图;Figure 2 is a network architecture diagram provided by an application embodiment;
图3为本申请实施例提供的一种人脸动作单元识别方法的流程示意图;FIG. 3 is a schematic flowchart of a method for recognizing a face action unit provided by an embodiment of the application;
图4为本申请实施例提供的一种多任务卷积神经网络模型的结构示意图;4 is a schematic structural diagram of a multi-task convolutional neural network model provided by an embodiment of the application;
图5为本申请实施例提供的一种人脸动作单元识别模型的结构示意图;FIG. 5 is a schematic structural diagram of a facial action unit recognition model provided by an embodiment of the application;
图6为本申请实施例提供的一种可分离卷积的示例图;FIG. 6 is an example diagram of a separable convolution provided by an embodiment of the application;
图7为本申请实施例提供的另一种人脸动作单元识别方法的流程示意图;FIG. 7 is a schematic flowchart of another method for recognizing a face action unit provided by an embodiment of the application;
图8为本申请实施例提供的一种人脸动作单元识别装置的结构示意图;FIG. 8 is a schematic structural diagram of a face action unit recognition device provided by an embodiment of the application;
图9为本申请实施例提供的一种电子设备的结构示意图。FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请说明书、权利要求书和附图中出现的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,术语“第一”、“第二”和“第三”等是用于区别不同的对象,而并非用于描述特定的顺序。The terms "including" and "having" appearing in the specification, claims, and drawings of this application and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment. In addition, the terms "first", "second", "third", etc. are used to distinguish different objects, but not to describe a specific sequence.
本申请实施例提出一种人脸动作单元识别方案,可应用于如图1所示的工作人员与客户/群众办理业务的场景中,工作人员通常需要使用终端进行视频或照片的采集,例如:银行工作人员为客户办理贷款业务时、保险公司为客户办理保险业务时、政务中心为群众办理相关业务时,当然,图1所示的场景仅仅是为了举例说明,并不对本申请造成限定,本申请所提出的人脸动作单元还可应用于众多表情分析、心理活动分析、面试等场景中。本方案中采用的人脸动作单元识别模型在卷积处理上均采用可分离卷积,大大减小了模型的参数量,且在提取更深层次的特征上,采用了反残差模块,相比残差模块,反残差模块更轻量,同时,模型的骨干网络与注意力机制中的操作均采用类矩阵乘法,整个设计使得模型只有不到7M的大小,在保证对39个人脸动作单元的识别准确率的情况下,运行速度更快,效率更高,不仅可部署于服务器端,还可以部署于移动终端。The embodiment of the present application proposes a face action unit recognition solution, which can be applied to the scenario in which staff and customers/people handle business as shown in Figure 1. Staff usually need to use a terminal to collect videos or photos, for example: When bank staff handle loan business for customers, insurance company handles insurance business for customers, and government affairs center handles related business for the masses, of course, the scene shown in Figure 1 is just for illustration and does not limit this application. The facial action unit proposed in the application can also be used in many scenes such as facial expression analysis, psychological activity analysis, and interviews. The face action unit recognition model adopted in this scheme adopts separable convolution in convolution processing, which greatly reduces the parameter amount of the model, and uses the anti-residual module to extract deeper features. The residual module and the anti-residual module are lighter. At the same time, the backbone network of the model and the operation in the attention mechanism adopt matrix-like multiplication. The entire design makes the model less than 7M in size, which guarantees that 39 facial action units are guaranteed. In the case of high recognition accuracy, the running speed is faster and the efficiency is higher, and it can be deployed not only on the server side, but also on the mobile terminal.
该人脸动作单元识别方案可基于图2所示的网络架构进行实施,如图2所示,该网络架构至少包括终端和服务器,终端和服务器之间通过网络进行通信,该网络包括但不限于虚拟专用网络、局域网络、城域网络,该终端主要用于进行人脸图像的拍摄上传以及最终识别结果的展示,终端可以是手机、平板、笔记本电脑、掌上电脑等设备。服务器在获取到终端发送的人脸图像后,执行一系列的人脸动作单元识别操作,最后将识别结果输出至终端,服务器可以是单台服务器,也可以是服务器集群,还可以是云端服务器,是整个面部动作单元识别方案的执行主体。本申请的一些实施例中,在人脸动作单元识别模型部署在终端的情况下,执行主体还可以是终端,终端上还部署有人脸检测、人脸矫正等相关模型或算法。The face action unit recognition solution can be implemented based on the network architecture shown in FIG. 2. As shown in FIG. 2, the network architecture includes at least a terminal and a server. The terminal and the server communicate through a network, which includes but is not limited to Virtual private network, local area network, metropolitan area network, the terminal is mainly used for photographing and uploading face images and displaying the final recognition result. The terminal can be a mobile phone, a tablet, a notebook computer, a handheld computer and other devices. After obtaining the face image sent by the terminal, the server performs a series of facial motion unit recognition operations, and finally outputs the recognition results to the terminal. The server can be a single server, a server cluster, or a cloud server. It is the executive body of the entire facial action unit recognition scheme. In some embodiments of the present application, when the facial action unit recognition model is deployed on the terminal, the execution subject may also be the terminal, and related models or algorithms such as face detection and face correction are also deployed on the terminal.
基于上述描述,以下结合其他附图对本申请实施例提供的人脸动作单元识别方法进行详细阐述。请参见图3,图3为本申请实施例提供的一种人脸动作单元识别方法的流程示意图,应用于服务器,如图3所示,包括步骤S31-S34:Based on the foregoing description, the face action unit recognition method provided by the embodiments of the present application will be described in detail below in conjunction with other drawings. Please refer to FIG. 3. FIG. 3 is a schematic flowchart of a face action unit recognition method provided by an embodiment of the application, which is applied to a server, as shown in FIG. 3, including steps S31-S34:
S31,获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像。S31: Obtain a face image to be recognized, and perform face correction on the face image to be recognized to obtain a target face image to be recognized.
本申请具体实施例中,待识别人脸图像即终端采集并实时上传至服务器的人脸图像,其可以是一段短视频,也可以是单独的图片,此处不作限定。服务器在获取到待识别图像 后,首先将其输入预训练的多任务卷积神经网络模型进行人脸检测和人脸关键点定位,如图4所示,多任务卷积神经网络模型由P-Net、R-Net和O-Net三个子网络构成,其中P-Net的输入尺寸(即宽度、高度和深度)为12*12*3,R-Net的输入尺寸为24*24*3,其后接128通道的全连接层,O-Net的输入尺寸为48*48*3,其后接256通道的全连接层,待识别人脸图像先输入P-Net进行处理,P-Net的输出作为R-Net的输入,R-Net的输出作为O-Net的输入,形成级联的结构,每个子网络中采用3*3的卷积或2*2的卷积、3*3的池化或2*2的池化进行处理,最后通过一个人脸分类器给出该区域是否是人脸的置信度,同时使用边框回归和关键点定位器来进行人脸区域的标定和人脸关键点的定位。人脸关键点即待识别人脸图像中人脸的两个眼睛、鼻子、左右两侧嘴角五个关键点,对其进行定位将会得到五个关键点的坐标信息。In the specific embodiment of the present application, the face image to be recognized is the face image collected by the terminal and uploaded to the server in real time. It may be a short video or a separate picture, which is not limited here. After the server obtains the image to be recognized, it first inputs it into the pre-trained multi-task convolutional neural network model for face detection and face key point positioning. As shown in Figure 4, the multi-task convolutional neural network model is composed of P- Net, R-Net and O-Net are composed of three sub-networks. The input size of P-Net (ie width, height and depth) is 12*12*3, and the input size of R-Net is 24*24*3. Followed by a 128-channel fully connected layer, the input size of O-Net is 48*48*3, followed by a 256-channel fully connected layer, the face image to be recognized is first input to P-Net for processing, and the output of P-Net As the input of R-Net, the output of R-Net is used as the input of O-Net to form a cascaded structure. Each sub-network uses 3*3 convolution or 2*2 convolution, and 3*3 pooling Or 2*2 pooling for processing, and finally a face classifier is used to give the confidence of whether the area is a face. At the same time, border regression and key point locator are used to calibrate the face area and face key points Positioning. The key points of the face are the five key points of the two eyes, the nose, and the corners of the mouth on the left and right sides of the face in the face image to be recognized, and the coordinate information of the five key points will be obtained by locating them.
另外,得到五个人脸关键点的坐标信息后,从数据库中获取预先存储的标准人脸图像的人脸关键点的坐标信息,标准人脸图像即指图像中的人脸不存在转动、不需要矫正的人脸图像。将待识别人脸图像中五个人脸关键点的坐标信息与标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T,根据以下相似变换矩阵方程求解相似变换矩阵T:In addition, after obtaining the coordinate information of the five face key points, the pre-stored coordinate information of the face key points of the standard face image is obtained from the database. The standard face image means that the face in the image does not have rotation and does not need to be rotated. Corrected face image. Compare the coordinate information of the five key points of the face in the face image to be recognized with the coordinate information of the key points of the face in the standard face image to obtain the similarity transformation matrix T, and solve the similarity transformation matrix T according to the following similarity transformation matrix equation:
Figure PCTCN2020104042-appb-000001
Figure PCTCN2020104042-appb-000001
之后,将待识别人脸图像中五个人脸关键点的坐标信息与相似变换矩阵T相乘,得到待识别目标人脸图像,即完成了对待识别人脸图像中人脸的矫正。其中,上述相似变换矩阵方程中,(x,y)表示待识别人脸图像中人脸关键点的坐标信息,(x',y')表示标准人脸图像中人脸关键点的坐标信息,
Figure PCTCN2020104042-appb-000002
即相似变换矩阵T,s表示缩放因子,θ表示旋转角度,通常是逆时针旋转,(t x,t y)表示平移参数,具体可采用transform.SimilarityTransform函数对相似变换矩阵T进行迭代求解。
After that, the coordinate information of the five key points of the face in the face image to be recognized is multiplied by the similarity transformation matrix T to obtain the target face image to be recognized, that is, the correction of the face in the face image to be recognized is completed. Among them, in the above-mentioned similarity transformation matrix equation, (x, y) represents the coordinate information of the key points of the face in the face image to be recognized, (x', y') represents the coordinate information of the key points of the face in the standard face image,
Figure PCTCN2020104042-appb-000002
That is, the similarity transformation matrix T, s represents the scaling factor, θ represents the rotation angle, usually counterclockwise rotation, and (t x , t y ) represents the translation parameter. Specifically, the transformation.SimilarityTransform function can be used to iteratively solve the similarity transformation matrix T.
S32,采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征。S32, using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to perform feature extraction on the target face image to be recognized, to obtain the first target type face action unit sub-features and the second target face action unit sub-features. The sub-feature of the target facial action unit and the sub-feature of the third target facial action unit.
本申请具体实施例中,通过步骤S31所述的方法得到待识别目标人脸图像后,将其输入预训练的人脸动作单元识别模型进行人脸动作单元的识别,为了提高人脸动作单元识别模型的处理效率,采用了更加轻量化的卷积神经网络。具体结构如图5所示,该人脸动作单元识别模型的骨干网络部分为7个可分离卷积块和反残差模块的堆叠,总共17层,主要用于对输入的待识别目标人脸图像进行特征提取。人脸动作单元识别模型中所有标准卷积神经网络的卷积核均替换为可分离卷积,若输入的特征图大小为d*d*m(d是特征图的宽和高,m是通道数),输出的特征图是d*d*n,卷积大小为k*k。标准的卷积核计算复杂度为d*d*m*n*k*k,而可分离卷积核的计算复杂度为d*d*m*(n+k*k),例如:对于待识别目标人脸图像12*12*3的特征图,如图6所示,首先采用3*3*1的卷积核进行卷积,得到的特征图尺寸为10*10*3,对于10*10*3的特征图再通过1*1*3的卷积核对其进行卷积,得到的是10*10*1的特征图,模型的处理参数由原来的3*3*3*3=81个减少为3*3*3+1*1*3*3=36个,运算速度明显比普通的卷积操作快。其次,在可分离卷积的基础上构建反残差模块, 采用“扩张-卷积-压缩”的处理模式对特征图的深度进行扩张与压缩,以期望提取到更深层次的特征,相比正残差模块,反残差模块的结构更小,更有利于提升模型的运算效率。In the specific embodiment of the present application, after obtaining the target face image to be recognized by the method described in step S31, it is input into the pre-trained face action unit recognition model to recognize the face action unit, in order to improve the recognition of the face action unit The processing efficiency of the model uses a more lightweight convolutional neural network. The specific structure is shown in Figure 5. The backbone network part of the facial action unit recognition model is a stack of 7 separable convolutional blocks and anti-residual modules, with a total of 17 layers, which are mainly used for inputting the target face to be recognized. Image feature extraction. The convolution kernels of all standard convolutional neural networks in the face action unit recognition model are replaced with separable convolutions. If the input feature map size is d*d*m (d is the width and height of the feature map, m is the channel Number), the output feature map is d*d*n, and the convolution size is k*k. The computational complexity of the standard convolution kernel is d*d*m*n*k*k, and the computational complexity of the separable convolution kernel is d*d*m*(n+k*k), for example: Recognize the 12*12*3 feature map of the target face image, as shown in Figure 6, first use the 3*3*1 convolution kernel for convolution, and the resulting feature map size is 10*10*3, for 10* The 10*3 feature map is then convolved with a 1*1*3 convolution kernel to obtain a 10*10*1 feature map. The processing parameters of the model are changed from the original 3*3*3*3=81 The number is reduced to 3*3*3+1*1*3*3=36, and the calculation speed is obviously faster than the ordinary convolution operation. Secondly, the anti-residual module is constructed on the basis of separable convolution, and the depth of the feature map is expanded and compressed using the "expansion-convolution-compression" processing mode, in order to extract deeper features, compared with positive The residual module and the anti-residual module have a smaller structure, which is more conducive to improving the computational efficiency of the model.
第一目标类人脸动作单元即预先划分的眼睛周围区域类人脸动作单元,第二目标类人脸动作单元即脸部及鼻部类人脸动作单元,第三目标类人脸动作单元即嘴部类人脸动作单元。由于训练上述人脸动作单元识别模型采用的数据集是将39个人脸动作单元分为3大类的标注数据集,即眼睛周围区域类、脸部及鼻部类、嘴部类,眼部周围的人脸动作单元变化一般为细微的皮肤收紧或拉伸,鼻部周围的人脸动作单元变化一般为褶皱,嘴部周围的人脸动作单元变化一般为嘴唇或舌头导致的皮肤部分凸起等。例如:AU45(眨眼)属于眼睛周围区域类、AU18(嘟嘴)属于嘴部类、AU04(皱眉)还是属于眼睛周围区域类,因此,人脸动作单元识别模型学习到的是分别提取以上三大类人脸动作单元的子特征,即经过可分离卷积块和反残差块处理后输出的第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征和第三目标类人脸动作单元子特征。The first target face action unit is the pre-divided face action unit around the eyes, the second target face action unit is the face and nose face action unit, and the third target face action unit is the face action unit. Mouth-like facial action unit. Since the data set used to train the aforementioned facial action unit recognition model is an annotated data set that divides 39 face action units into 3 categories, namely, the area around the eyes, the face and nose, the mouth, and the around the eyes. The changes in facial action units are generally slight skin tightening or stretching, the changes in facial action units around the nose are generally folds, and the changes in facial action units around the mouth are generally bulging of the skin caused by the lips or tongue. Wait. For example: AU45 (blink) belongs to the area around the eyes, AU18 (toot mouth) belongs to the mouth category, and AU04 (Frowning) belongs to the area around the eyes. Therefore, the facial action unit recognition model learns to extract the above three categories respectively. The sub-features of the face action unit, that is, the first target face action unit sub-feature, the second target face action unit sub-feature, and the third target output after processing by the separable convolution block and the inverse residual block Sub-features of human-like facial action units.
S33,将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征。S33. Input the sub-features of the first target-type facial motion unit, the sub-features of the second target-type facial motion unit, and the sub-features of the third target-type facial motion unit into the facial motion unit recognition model Convolution processing is performed on the attention mechanism of the first target class of face action unit sub-features, the second output feature of the second target class face action unit sub-features, and the third The third output feature of the sub-feature of the target facial action unit.
本申请具体实施例中,第一输出特征即第一目标类人脸动作单元子特征经过注意力机制模块中的卷积处理后输出的特征图,第二输出特征、第三输出特征同样如此。请再次参照图5,人脸动作单元识别模型在主体网络部分后分为三个分支,每个分支分别对眼睛周围区域类、脸部及鼻部类、嘴部类的子特征进行处理,每个分支中均加入有注意力机制模块,每个注意力机制模块均由三层1*1的卷积组成,第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征和第三目标类人脸动作单元子特征分别经过三次1*1的卷积得到每类子特征的输出特征。In the specific embodiment of the present application, the first output feature is the feature map output after the first target-type face action unit sub-feature is processed by the convolution processing in the attention mechanism module, and the second output feature and the third output feature are the same. Please refer to Figure 5 again. The face action unit recognition model is divided into three branches after the main network part. Each branch processes the sub-features of the eye area, face and nose, and mouth. Attention mechanism modules are added to each branch, and each attention mechanism module is composed of three layers of 1*1 convolution, the first target type of face action unit sub-feature, and the second target type of face action unit sub-feature And the sub-features of the third target class of face action units are respectively subjected to three times of 1*1 convolution to obtain the output features of each type of sub-features.
将不同区域的子特征输入到对应的分支中进行处理能够减小网络的学习难度,有利于网络变浅以提高处理效率,每个分支中的注意力机制模块均使用连续三层1*1卷积学习到二维权重,能明确输入的人脸哪个位置的特征信息更有利于面部动作单元的识别,同时,注意力机制模块采用矩阵乘法进行计算,保证了模型的运算速度,且强化了模型对人脸动作单元的高阶特征的提取能力。Inputting the sub-features of different regions into the corresponding branches for processing can reduce the learning difficulty of the network, and help the network become shallower to improve processing efficiency. The attention mechanism module in each branch uses three consecutive layers of 1*1 volume. The product learns two-dimensional weights, which can clarify the feature information of which position of the input face is more conducive to the recognition of facial action units. At the same time, the attention mechanism module uses matrix multiplication to calculate, which ensures the model's operation speed and strengthens the model. The ability to extract high-level features of facial action units.
S34,根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。S34. Acquire the recognition result of the first target type face action unit and the second target type face action unit, respectively, according to the first output feature, the second output feature, and the third output feature And the recognition result of the face action unit of the third target category.
本申请具体实施例中,得到第一输出特征、第二输出特征及第三输出特征后,将该输出特征作为权重,其宽、高分别与第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征和第三目标类人脸动作单元子特征对应的宽、高相乘,以更关注第一目标类人脸动作单元有用的特征,即第一输出特征的宽、高与第一目标类人脸动作单元子特征的宽、高相乘,第二输出特征和第三输出特征亦是做此操作,得到第一目标类人脸动作单元的第一待分类特征、第二目标类人脸动作单元的第二待分类特征和第三目标类人脸动作单元的第三待分类特征,每类人脸动作单元的待分类特征即全连接层的输入特征。将第一待分类特征、第二待分类特征和第三待分类特征输入全连接层,由全连接层分别进行分类,最后输出第一目标类人脸动作单元的识别结果、第二目标类人脸动作单元的识别结果和第三目标类人脸动作单元的识别结果,即输出的是眼睛周围区域类人脸动作单元的识别结果、脸部及鼻部类人脸动作单元的识别结果、嘴部类人脸动作单元的识别结果,该结果是一个概率值,可为其设定一个阈值,当某个具体人脸动作单元的识别结果大于或等于该阈值时, 表明待识别人脸图像中的人脸出现了该人脸动作单元,小于该阈值时,表明待识别人脸图像中的人脸未出现该人脸动作单元,例如:AU45(眨眼)的值为0.8,AU18(皱眉)的值为0.3,当阈值为0.5时,表明待识别图像中的人脸出现了AU45,而没有出现AU18。In the specific embodiment of the present application, after the first output feature, the second output feature, and the third output feature are obtained, the output feature is used as a weight, and its width and height are respectively the same as the sub-features of the first target type face action unit and the second output feature. The corresponding width and height of the target facial action unit sub-feature and the third target facial action unit sub-feature are multiplied to pay more attention to the useful features of the first target facial action unit, that is, the width and height of the first output feature The height is multiplied by the width and height of the sub-features of the first target type of face action unit. The second output feature and the third output feature also do this operation to obtain the first target type face action unit's first feature to be classified, The second feature to be classified of the face action unit of the second target category and the third feature to be classified of the face action unit of the third target category. The feature to be classified of each type of face action unit is the input feature of the fully connected layer. The first feature to be classified, the second feature to be classified, and the third feature to be classified are input to the fully connected layer, and the fully connected layer performs classification respectively, and finally outputs the recognition result of the first target type of face action unit, and the second target type of person The recognition result of the face action unit and the recognition result of the third target type of face action unit, that is, the output is the recognition result of the face action unit of the area around the eyes, the recognition result of the face and nose face action unit, the mouth The recognition result of the facial action unit of the category, the result is a probability value, and a threshold can be set for it. When the recognition result of a specific facial action unit is greater than or equal to the threshold, it indicates that the face image to be recognized is When the face action unit appears in the face of, when it is less than the threshold, it indicates that the face in the face image to be recognized does not appear the face action unit. For example, the value of AU45 (blink) is 0.8, and the value of AU18 (frowning) The value is 0.3. When the threshold value is 0.5, it indicates that the face in the image to be recognized has AU45 but not AU18.
可以看出,本申请实施例通过获取待识别人脸图像,对待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;将第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征输入人脸动作单元识别模型的注意力机制进行卷积处理,得到第一输出特征、第二输出特征和第三输出特征;根据第一输出特征、第二输出特征及第三输出特征,分别获取第一目标类人脸动作单元的识别结果、第二目标类人脸动作单元的识别结果及第三目标类人脸动作单元的识别结果。由于人脸动作单元识别模型的骨干网络采用的是可分离卷积块和反残差块的堆叠进行子特征的提取,可分离卷积使人脸动作单元识别模型的处理参数成倍减少,反残差块相比正残差结构更小,而注意力机制中采用矩阵乘法进行计算,能够保证人脸动作单元识别模型的运行速度,可见,整个人脸动作单元识别模型在结构上更轻量,且运算速度快,有利于提高人脸图像中人脸动作单元识别的效率。It can be seen that the embodiment of the present application obtains the face image to be recognized, performs face correction on the face image to be recognized, and obtains the target face image to be recognized; the separable convolution block of the pre-trained face action unit recognition model is used And de-residual block to perform feature extraction on the face image of the target to be recognized, and obtain the sub-features of the first target type of facial motion unit, the second target type of facial motion unit sub-feature, and the third target type of facial motion unit sub-feature; The first target type of face action unit sub-features, the second target type of face action unit sub-features, and the third target type of face action unit sub-features are input into the attention mechanism of the face action unit recognition model for convolution processing, and the first An output feature, a second output feature, and a third output feature; according to the first output feature, the second output feature, and the third output feature, the recognition results of the first target type face action unit and the second target type face are obtained respectively The recognition result of the action unit and the recognition result of the face action unit of the third target category. Since the backbone network of the facial motion unit recognition model uses a stack of separable convolutional blocks and de-residual blocks to extract sub-features, the separable convolution makes the processing parameters of the facial motion unit recognition model doubled and reversed. The residual block is smaller than the positive residual structure, and the attention mechanism adopts matrix multiplication to calculate, which can ensure the running speed of the face action unit recognition model. It can be seen that the entire face action unit recognition model is lighter in structure. And the calculation speed is fast, which is beneficial to improve the efficiency of facial action unit recognition in the face image.
基于图3所示的人脸动作单元识别方法实施例的描述,请参见图7,图7为本申请实施例提供的另一种人脸动作单元识别方法的流程示意图,如图7所示,包括步骤S71-S75:Based on the description of the embodiment of the facial motion unit recognition method shown in FIG. 3, please refer to FIG. 7. FIG. 7 is a schematic flowchart of another facial motion unit recognition method provided by an embodiment of the application, as shown in FIG. 7, Including steps S71-S75:
S71,获取待识别人脸图像;S71: Obtain a face image to be recognized;
S72,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;S72: Perform face correction on the face image to be recognized to obtain a target face image to be recognized;
可选的,上述对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像,包括:Optionally, the above-mentioned performing face correction on the face image to be recognized to obtain the target face image to be recognized includes:
采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
可选的,上述基于所述人脸关键点对所述待识别人脸图像进行人脸矫正,包括:Optionally, the foregoing performing face correction on the face image to be recognized based on the face key points includes:
将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
该实施方式中,并不直接将待识别人脸图像输入人脸动作单元识别模型进行处理,而先采用多任务卷积神经网络模型对待识别人脸图像进行人脸矫正,在人脸转动不同角度时模型都能准确判断,保障了模型的稳定性。In this embodiment, the face image to be recognized is not directly input into the face action unit recognition model for processing, but a multi-task convolutional neural network model is first used to perform face correction on the face image to be recognized, and the face is rotated at different angles. The time model can be accurately judged, which guarantees the stability of the model.
S73,将所述待识别目标人脸图像输入预训练的人脸动作单元识别模型的骨干网络,通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;S73. Input the target face image to be recognized into the backbone network of the pre-trained face action unit recognition model, and perform the recognition on the to-be-recognized image through the separable convolution block and the anti-residual block of the backbone network. Feature extraction is performed on the target face image to obtain the sub-features of the first target type of facial motion unit, the second target type of facial motion unit sub-features, and the third target type of facial motion unit sub-features;
S74,将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;S74. Input the sub-features of the first target-type facial motion unit, the sub-features of the second target-type facial motion unit, and the sub-features of the third target-type facial motion unit into the facial motion unit recognition model Convolution processing is performed on the attention mechanism of the first target class of face action unit sub-features, the second output feature of the second target class face action unit sub-features, and the third The third output feature of the sub-feature of the target facial action unit;
可选的,上述将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机 制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征,包括:Optionally, the above-mentioned inputting the sub-features of the first target-type facial motion unit, the sub-features of the second target-type facial motion unit, and the sub-features of the third target-type facial motion unit into the facial motion The attention mechanism of the unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, the second output feature of the sub-feature of the second target type of face action unit, and all The third output feature of the sub-feature of the face action unit of the third target category includes:
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
该实施方式中,在骨干网络后接3个分支,每个分支分别处理眼睛周围区域类人脸动作单元的子特征、脸部及鼻部类人脸动作单元的子特征、嘴部类人脸动作单元的子特征,保证了39种人脸动作单元动能识别到,且每个分支中的注意力机制模块均采用三层1*1卷积的堆叠,使得模型更加关注有用的特征。In this embodiment, the backbone network is followed by 3 branches, and each branch processes the sub-features of the face-like action units in the area around the eyes, the sub-features of the face and nose-like face action units, and the mouth-like face. The sub-features of the action unit ensure that the kinetic energy of 39 face action units can be recognized, and the attention mechanism module in each branch adopts a three-layer 1*1 convolution stack, which makes the model pay more attention to useful features.
S75,根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。S75. Acquire the recognition result of the first target type face action unit and the second target type face action unit, respectively, according to the first output feature, the second output feature, and the third output feature And the recognition result of the face action unit of the third target category.
可选的,上述根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,包括:Optionally, according to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the second target type person The recognition result of the face action unit and the recognition result of the third target type of face action unit include:
将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
需要强调的是,为进一步保证上述识别结果的私密和安全性,上述识别结果还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned recognition result, the above-mentioned recognition result can also be stored in a node of a blockchain.
该实施方式中,将注意力机制模块输出的特征作为权重分别与其输入特征进行运算,得到全连接层的输入特征,然后将三个目标类人脸动作单元的待分类特征输入全连接层进行二分类,有助于模型更加关注三个目标类人脸动作单元之间的区别。In this embodiment, the features output by the attention mechanism module are used as weights to perform calculations on their input features respectively to obtain the input features of the fully connected layer, and then the features to be classified of the three target face action units are input to the fully connected layer for two steps. Classification helps the model pay more attention to the difference between the three target facial action units.
其中,上述步骤S71-S75的具体实施方式在图3所示的实施例中已有详细描述,且能达到相同或相似的有益效果,为避免重复,此处不再赘述。The specific implementations of the above steps S71-S75 have been described in detail in the embodiment shown in FIG. 3, and can achieve the same or similar beneficial effects. In order to avoid repetition, they will not be repeated here.
基于上述人脸动作单元识别方法实施例的描述,本申请还提供一种人脸动作单元识别装置,该人脸动作单元识别装置可以执行图3或图7所示的方法。请参见图8,该装置包括:Based on the description of the foregoing face action unit recognition method embodiment, the present application also provides a face action unit recognition device, which can execute the method shown in FIG. 3 or FIG. 7. See Figure 8. The device includes:
人脸矫正模块81,用于获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;The face correction module 81 is configured to obtain a face image to be recognized, perform face correction on the face image to be recognized, to obtain a target face image to be recognized;
特征提取模块82,用于采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The feature extraction module 82 is configured to use the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to perform feature extraction on the target face image to be recognized to obtain the first target type face action unit Sub-features, sub-features of face action units of the second target category, and sub-features of face action units of the third target category;
特征处理模块83,用于将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所 述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;The feature processing module 83 is configured to input the sub-features of the first target type of face action unit, the second target type of face action unit sub-features, and the third target type of face action unit sub-features into the person The attention mechanism of the facial action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, and the second output feature of the sub-feature of the second target type of face action unit And the third output feature of the sub-feature of the face action unit of the third target category;
人脸动作单元分类模块84,用于根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。The facial action unit classification module 84 is configured to obtain the recognition result of the first target facial action unit, the recognition result of the first target facial action unit, and the The recognition result of the second target type face action unit and the recognition result of the third target type face action unit.
在一个实施例中,在采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取方面,特征提取模块82具体用于:In one embodiment, the feature extraction module 82 is specifically configured to:
将所述待识别目标人脸图像输入所述骨干网络;Inputting the face image of the target to be recognized into the backbone network;
通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进行特征提取。The feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
在一个实施例中,在将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征方面,特征处理模块83具体用于:In one embodiment, the sub-features of the first target-type facial motion unit, the sub-features of the second target-type facial motion unit, and the sub-features of the third target-type facial motion unit are input to the person The attention mechanism of the facial action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, and the second output feature of the sub-feature of the second target type of face action unit And in terms of the third output feature of the sub-feature of the third target type of face action unit, the feature processing module 83 is specifically configured to:
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
在一个实施例中,在根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果方面,人脸动作单元分类模块84具体用于:In one embodiment, according to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the second target Regarding the recognition result of the facial motion unit and the recognition result of the third target facial motion unit, the facial motion unit classification module 84 is specifically configured to:
将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
在一个实施例中,在对所述待识别人脸图像进行人脸矫正方面,人脸矫正模块81具体用于:In an embodiment, in terms of performing face correction on the face image to be recognized, the face correction module 81 is specifically configured to:
采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
在一个实施例中,在基于所述人脸关键点对所述待识别人脸图像进行人脸矫正方面,人脸矫正模块81具体还用于:In an embodiment, in terms of performing face correction on the face image to be recognized based on the face key points, the face correction module 81 is specifically further configured to:
将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
本申请实施例提供的人脸动作单元识别装置,通过获取待识别人脸图像,对待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;将第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征输入人脸动作单元识别模型的注意力机制进行卷积处理,得到第一输出特征、第二输出特征和第三输出特征;根据第一输出特征、第二输出特征及第三输出特征,分别获取第一目标类人脸动作单元的识别结果、第二目标类人脸动作单元的识别结果及第三目标类人脸动作单元的识别结果。由于人脸动作单元识别模型的骨干网络采用的是可分离卷积块和反残差块的堆叠进行子特征的提取,可分离卷积使人脸动作单元识别模型的处理参数成倍减少,反残差块相比正残差结构更小,而注意力机制中采用矩阵乘法进行计算,能够保证人脸动作单元识别模型的运行速度,可见,整个人脸动作单元识别模型在结构上更轻量,且运算速度快,有利于提高人脸图像中人脸动作单元识别的效率。The facial action unit recognition device provided by the embodiment of the application obtains the face image to be recognized, performs face correction on the face image to be recognized, and obtains the target face image to be recognized; a pre-trained face motion unit recognition model is adopted Separate convolution block and de-residual block to perform feature extraction on the face image of the target to be recognized, and obtain the sub-features of the first target type of face action unit, the second target type of face action unit sub-feature, and the third target type of face action Unit sub-features; input the first target-type facial motion unit sub-features, the second target-type facial motion unit sub-features, and the third target-type facial motion unit sub-features into the attention mechanism of the facial motion unit recognition model for volume Product processing to obtain the first output feature, the second output feature, and the third output feature; according to the first output feature, the second output feature, and the third output feature, the recognition results and the first target facial action units of the first target category are obtained respectively. The recognition result of the second target type face action unit and the recognition result of the third target type face action unit. Since the backbone network of the facial motion unit recognition model uses a stack of separable convolutional blocks and de-residual blocks to extract sub-features, the separable convolution makes the processing parameters of the facial motion unit recognition model doubled and reversed. The residual block is smaller than the positive residual structure, and the attention mechanism adopts matrix multiplication to calculate, which can ensure the running speed of the face action unit recognition model. It can be seen that the entire face action unit recognition model is lighter in structure. And the calculation speed is fast, which is beneficial to improve the efficiency of facial action unit recognition in the face image.
根据本申请的一个实施例,图8所示的人脸动作单元识别装置的各个模块可以分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个(些)模块还可以再拆分为功能上更小的多个单元来构成,这可以实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也可以由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,人脸动作单元识别装置也可以包括其它单元,在实际应用中,这些功能也可以由其它单元协助实现,并且可以由多个单元协作实现。According to an embodiment of the present application, the various modules of the facial action unit recognition device shown in FIG. 8 can be separately or completely combined into one or several other units to form, or some of the modules can be further It is divided into multiple functionally smaller units to form, which can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application. The above-mentioned units are divided based on logical functions. In practical applications, the function of one unit may also be realized by multiple units, or the functions of multiple units may be realized by one unit. In other embodiments of the present application, the face action unit recognition device may also include other units. In practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by multiple units in cooperation.
根据本申请的另一个实施例,可以通过在包括中央处理单元(CPU)、随机存取存储介质(RAM)、只读存储介质(ROM)等处理元件和存储元件的例如计算机的通用计算设备上运行能够执行如图3或图7中所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图8中所示的人脸动作单元识别装置设备,以及来实现本申请实施例的人脸动作单元识别方法。所述计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算设备中,并在其中运行。According to another embodiment of the present application, a general-purpose computing device such as a computer including a central processing unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM) and other processing elements and storage elements can be used Run a computer program (including program code) capable of executing the steps involved in the corresponding method shown in FIG. 3 or FIG. 7 to construct the facial motion unit recognition device as shown in FIG. 8 and to implement this The face action unit recognition method of the application embodiment is applied. The computer program may be recorded on, for example, a computer-readable recording medium, and loaded into the above-mentioned computing device through the computer-readable recording medium, and run in it.
基于上述方法实施例和装置实施例的描述,请参见图9,图9为本申请实施例提供的一种电子设备的结构示意图,如图9所示,该电子设备至少包括存储器901,用于存储计算机程序;处理器902,用于调用存储器901存储的计算机程序实现上述卷积神经网络的实现方法的实施例中的步骤;输入输出接口903,用于进行输入输出,该输入输出接口903可以为一个或多个;可以理解的,电子设备中各部分分别与总线相连。Based on the description of the foregoing method embodiment and apparatus embodiment, please refer to FIG. 9. FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. 9, the electronic device includes at least a memory 901 for Stores a computer program; the processor 902 is used to call the computer program stored in the memory 901 to implement the steps in the embodiment of the implementation method of the convolutional neural network; the input and output interface 903 is used to perform input and output, and the input and output interface 903 can It is one or more; it is understandable that each part of the electronic device is connected to the bus respectively.
计算机可读存储介质可以存储在电子设备的存储器901中,所述计算机可读存储介质用于存储计算机程序,所述计算机程序包括程序指令,所述处理器902用于执行所述计算机可读存储介质存储的程序指令。处理器902(或称CPU(Central Processing Unit,中央处理器))是电子设备的计算核心以及控制核心,其适于实现一条或多条指令,具体适于加载并执行一条或多条指令从而实现相应方法流程或相应功能。A computer-readable storage medium may be stored in the memory 901 of the electronic device. The computer-readable storage medium is used to store a computer program. The computer program includes program instructions. The processor 902 is used to execute the computer-readable storage. Program instructions stored in the medium. The processor 902 (or CPU (Central Processing Unit, central processing unit)) is the computing core and control core of an electronic device. It is suitable for implementing one or more instructions, specifically suitable for loading and executing one or more instructions to achieve Corresponding method flow or corresponding function.
其中,处理器902具体用于调用计算机程序执行如下步骤:Wherein, the processor 902 is specifically configured to call a computer program to execute the following steps:
获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处 理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
在一种可能的实施方式中,处理器902执行所述采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,包括:In a possible implementation manner, the processor 902 executes the feature extraction of the target face image to be recognized by using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model, including :
将所述待识别目标人脸图像输入所述骨干网络;Inputting the face image of the target to be recognized into the backbone network;
通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进行特征提取。The feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
在一种可能的实施方式中,处理器902执行所述将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征,包括:In a possible implementation manner, the processor 902 executes the combination of the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face The action unit sub-features are input into the attention mechanism of the facial action unit recognition model for convolution processing to obtain the first output feature of the first target-type face action unit sub-features, and the second target-type face action The second output feature of the unit sub-feature and the third output feature of the third target-type face action unit sub-feature include:
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
在一种可能的实施方式中,处理器902执行所述根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,包括:In a possible implementation manner, the processor 902 executes the acquisition of the facial action unit of the first target type according to the first output feature, the second output feature, and the third output feature. The recognition result, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit include:
将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
在一种可能的实施方式中,处理器902执行所述对所述待识别人脸图像进行人脸矫正,包括:In a possible implementation manner, the processor 902 executing the face correction on the face image to be recognized includes:
采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
在一种可能的实施方式中,处理器902执行所述基于所述人脸关键点对所述待识别人脸图像进行人脸矫正,包括:In a possible implementation manner, the processor 902 executing the face correction on the face image to be recognized based on the face key points includes:
将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
示例性的,示例性的,上述电子设备可以是各种服务器、主机等设备。电子设备可包 括但不仅限于处理器902、存储器901、输入输出接口903。本领域技术人员可以理解,所述示意图仅仅是电子设备的示例,并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。Exemplarily, for example, the foregoing electronic device may be various servers, hosts, and other devices. The electronic device may include, but is not limited to, a processor 902, a memory 901, and an input/output interface 903. Those skilled in the art can understand that the schematic diagram is only an example of the electronic device, and does not constitute a limitation on the electronic device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different components.
需要说明的是,由于电子设备的处理器902执行计算机程序时实现上述的人脸动作单元识别方法中的步骤,因此上述人脸动作单元识别方法的实施例均适用于该电子设备,且均能达到相同或相似的有益效果。It should be noted that, since the processor 902 of the electronic device executes the computer program to implement the steps in the above-mentioned facial motion unit recognition method, the above-mentioned embodiments of the facial motion unit recognition method are all applicable to the electronic device, and all are capable of Achieve the same or similar beneficial effects.
本申请实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现以下步骤:The embodiment of the present application also provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the following steps are implemented:
获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
再一种示例中,所述计算机程序被处理器执行时还实现以下步骤:In another example, when the computer program is executed by the processor, the following steps are further implemented:
将所述待识别目标人脸图像输入所述骨干网络;Inputting the face image of the target to be recognized into the backbone network;
通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进行特征提取。The feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
再一种示例中,所述计算机程序被处理器执行时还实现以下步骤:In another example, when the computer program is executed by the processor, the following steps are further implemented:
将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
再一种示例中,所述计算机程序被处理器执行时还实现以下步骤:In another example, when the computer program is executed by the processor, the following steps are further implemented:
将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
再一种示例中,所述计算机程序被处理器执行时还实现以下步骤:In another example, when the computer program is executed by the processor, the following steps are further implemented:
采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
再一种示例中,所述计算机程序被处理器执行时还实现以下步骤:In another example, when the computer program is executed by the processor, the following steps are further implemented:
将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
示例性的,计算机可读存储介质的计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以是非易失性,也可以是易失性,可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。Exemplarily, the computer program in the computer-readable storage medium includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate form. The computer-readable storage medium may be non-volatile or volatile, and may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, Computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc.
需要说明的是,由于计算机可读存储介质的计算机程序被处理器902执行时实现上述的人脸动作单元识别方法中的步骤,因此上述人脸动作单元识别方法的所有实施例均适用于该计算机可读存储介质,且均能达到相同或相似的有益效果。It should be noted that, since the computer program of the computer-readable storage medium is executed by the processor 902 to implement the steps in the above-mentioned face action unit recognition method, all embodiments of the above-mentioned face action unit recognition method are applicable to the computer. The storage medium is readable and can achieve the same or similar beneficial effects.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the idea of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as a limitation to the application.

Claims (20)

  1. 一种人脸动作单元识别方法,其中,所述方法包括:A method for recognizing facial action units, wherein the method includes:
    获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
    采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
    将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
    根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  2. 根据权利要求1所述的方法,其中,所述采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,包括:The method according to claim 1, wherein the using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to extract the features of the target face image to be recognized comprises:
    将所述待识别目标人脸图像输入所述人脸动作单元识别模型的骨干网络;Inputting the face image of the target to be recognized into the backbone network of the face action unit recognition model;
    通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进行特征提取。The feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
  3. 根据权利要求1所述的方法,其中,所述将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征,包括:The method according to claim 1, wherein the sub-feature of the first target-type face action unit, the sub-feature of the second target-type face action unit, and the third target-type face action unit are combined Sub-features are input into the attention mechanism of the facial action unit recognition model for convolution processing to obtain the first output features of the first target-type face action unit sub-features, and the second target-type face action unit sub-features The second output feature of the feature and the third output feature of the sub-feature of the face action unit of the third target category include:
    将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
    经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
  4. 根据要求1-3任一项所述的方法,其中,所述根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,包括:The method according to any one of claims 1 to 3, wherein the first target facial action is obtained according to the first output feature, the second output feature, and the third output feature, respectively The recognition result of the unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit include:
    将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
    将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
  5. 根据权利要求1-3任一项所述的方法,其中,所述对所述待识别人脸图像进行人脸矫正,包括:The method according to any one of claims 1 to 3, wherein the performing face correction on the face image to be recognized comprises:
    采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
    基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
  6. 根据权利要求5所述的方法,其中,所述基于所述人脸关键点对所述待识别人脸图像进行人脸矫正,包括:The method according to claim 5, wherein the performing face correction on the face image to be recognized based on the face key points comprises:
    将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
    根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
    将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
  7. 根据权利要求1所述的方法,其中,所述第一目标类人脸动作单元是指预先划分的眼睛周围区域类人脸动作单元,所述第二目标类人脸动作单元是指预先划分的脸部及鼻部类人脸动作单元,所述第三目标类人脸动作单元是指预先划分的嘴部类人脸动作单元。The method according to claim 1, wherein the first target-type face action unit refers to a pre-divided face action unit around the eyes, and the second target-type face action unit refers to a pre-divided face action unit. Face and nose type face action units, the third target type face action unit refers to a pre-divided mouth type face action unit.
  8. 一种人脸动作单元识别装置,其中,所述装置包括:A face action unit recognition device, wherein the device includes:
    人脸矫正模块,用于获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;The face correction module is used to obtain a face image to be recognized, perform face correction on the face image to be recognized, to obtain a target face image to be recognized;
    特征提取模块,用于采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The feature extraction module is used to extract features of the target face image to be recognized by using the separable convolution block and the inverse residual block of the pre-trained face action unit recognition model to obtain the first target type face action unit sub Features, sub-features of the second target type of facial action unit, and sub-features of the third target type of facial action unit;
    特征处理模块,用于将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;The feature processing module is configured to input the sub-features of the first target-type face action unit, the sub-features of the second target-type face action unit, and the sub-features of the third target-type face action unit into the face The attention mechanism of the action unit recognition model performs convolution processing to obtain the first output feature of the sub-feature of the first target type of face action unit, the second output feature of the sub-feature of the second target type of face action unit, and The third output feature of the sub-feature of the face action unit of the third target category;
    人脸动作单元分类模块,用于根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。The facial action unit classification module is configured to obtain the recognition result of the first target type facial action unit and the first target facial action unit according to the first output feature, the second output feature, and the third output feature. Two recognition results of the target face action unit and the recognition result of the third target face action unit.
  9. 一种电子设备,其中,所述电子设备包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:An electronic device, wherein the electronic device includes a processor, a memory, and a computer program that is stored on the memory and can run on the processor, and when the processor executes the computer program:
    获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
    采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
    将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
    根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  10. 根据权利要求9所述的电子设备,其中,所述处理器执行所述采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,包括:The electronic device according to claim 9, wherein the processor executes the separable convolution block and the de-residual block of the pre-trained face action unit recognition model to perform processing on the target face image to be recognized Feature extraction, including:
    将所述待识别目标人脸图像输入所述骨干网络;Inputting the face image of the target to be recognized into the backbone network;
    通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进 行特征提取。The feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
  11. 根据利要求9所述的电子设备,其中,所述处理器执行所述将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征,包括:The electronic device according to claim 9, wherein the processor executes the sub-feature of the first target-type face action unit, the sub-feature of the second target-type face action unit, and the third The sub-features of the target face action unit are input into the attention mechanism of the face action unit recognition model for convolution processing to obtain the first output feature and the second target of the sub-features of the first target face action unit The second output feature of the sub-feature of the face action unit and the third output feature of the sub-feature of the third target type of face action unit include:
    将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
    经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
  12. 根据权利要求9-11任一项所述的电子设备,其中,所述处理器执行所述根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,包括:The electronic device according to any one of claims 9-11, wherein the processor executes the acquisition of the first output feature, the second output feature, and the third output feature to obtain the The recognition result of the first target type face action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit include:
    将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
    将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
  13. 根据权利要求9-11任一项所述的电子设备,其中,所述处理器执行所述对所述待识别人脸图像进行人脸矫正,包括:The electronic device according to any one of claims 9-11, wherein the execution of the face correction on the face image to be recognized by the processor comprises:
    采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
    基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
  14. 根据权利要求13所述的电子设备,其中,所述处理器执行所述基于所述人脸关键点对所述待识别人脸图像进行人脸矫正,包括:The electronic device according to claim 13, wherein the execution by the processor to perform face correction on the face image to be recognized based on the key points of the face comprises:
    将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
    根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
    将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现:A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to realize:
    获取待识别人脸图像,对所述待识别人脸图像进行人脸矫正,得到待识别目标人脸图像;Acquiring a face image to be recognized, performing face correction on the face image to be recognized, to obtain a target face image to be recognized;
    采用预训练的人脸动作单元识别模型的可分离卷积块和反残差块对所述待识别目标人脸图像进行特征提取,得到第一目标类人脸动作单元子特征、第二目标类人脸动作单元子特征及第三目标类人脸动作单元子特征;The separable convolution block and the de-residual block of the pre-trained face action unit recognition model are used to perform feature extraction on the target face image to be recognized to obtain the first target category face action unit sub-features and the second target category Face action unit sub-features and the third target category face action unit sub-features;
    将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征输入所述人脸动作单元识别模型的注意力机制进行卷积处 理,得到所述第一目标类人脸动作单元子特征的第一输出特征、所述第二目标类人脸动作单元子特征的第二输出特征以及所述第三目标类人脸动作单元子特征的第三输出特征;Input the first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature into the attention of the face action unit recognition model The force mechanism performs convolution processing to obtain the first output feature of the face action unit sub-feature of the first target category, the second output feature of the face action unit sub-feature of the second target category, and the third target category The third output feature of the sub-feature of the face action unit;
    根据所述第一输出特征、所述第二输出特征及所述第三输出特征,分别获取所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果。According to the first output feature, the second output feature, and the third output feature, the recognition result of the first target type face action unit and the recognition of the second target type face action unit are respectively obtained Result and the recognition result of the third target type face action unit.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to claim 15, wherein, when the computer program is executed by the processor, it further implements:
    将所述待识别目标人脸图像输入所述骨干网络;Inputting the face image of the target to be recognized into the backbone network;
    通过所述骨干网络的所述可分离卷积块和所述反残差块对所述待识别目标人脸图像进行特征提取。The feature extraction of the target face image to be recognized is performed through the separable convolution block and the inverse residual block of the backbone network.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to claim 15, wherein, when the computer program is executed by the processor, it further implements:
    将所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征,分别输入所述人脸动作单元识别模型中对应的分支中;The first target type face action unit sub-feature, the second target type face action unit sub-feature, and the third target type face action unit sub-feature are respectively input into the face action unit recognition model In the corresponding branch;
    经过每个分支中的所述注意力机制多次1*1的卷积处理,得到所述第一输出特征、所述第二输出特征及所述第三输出特征。After multiple times of 1*1 convolution processing by the attention mechanism in each branch, the first output feature, the second output feature, and the third output feature are obtained.
  18. 根据权利要求15-17任一项所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:18. The computer-readable storage medium according to any one of claims 15-17, wherein the computer program, when executed by a processor, further implements:
    将所述第一输出特征、所述第二输出特征及所述第三输出特征的宽、高,分别与所述第一目标类人脸动作单元子特征、所述第二目标类人脸动作单元子特征及所述第三目标类人脸动作单元子特征的宽、高相乘,得到所述第一目标类人脸动作单元的第一待分类特征、所述第二目标类人脸动作单元的第二待分类特征及所述第二目标类人脸动作单元的第三待分类特征;The width and height of the first output feature, the second output feature, and the third output feature are respectively compared with the sub-features of the first target type face action unit and the second target type face action Multiply the width and height of the sub-feature of the unit sub-feature and the sub-feature of the third target type face action unit to obtain the first to-be-classified feature of the first target type face action unit and the second target type face action The second feature to be classified of the unit and the third feature to be classified of the face action unit of the second target category;
    将所述第一待分类特征、所述第二待分类特征及所述第三待分类特征输入所述人脸动作单元识别模型的全连接层分别进行分类,得到所述第一目标类人脸动作单元的识别结果、所述第二目标类人脸动作单元的识别结果及所述第三目标类人脸动作单元的识别结果,其中,所述识别结果存储在区块链中。Input the first feature to be classified, the second feature to be classified, and the third feature to be classified into the fully connected layer of the face action unit recognition model to classify, respectively, to obtain the first target class face The recognition result of the action unit, the recognition result of the second target type face action unit, and the recognition result of the third target type face action unit, wherein the recognition result is stored in a blockchain.
  19. 根据权利要求15-17任一项所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to any one of claims 15-17, wherein, when the computer program is executed by a processor, it further implements:
    采用预训练的多任务卷积神经网络模型对所述待识别人脸图像进行人脸检测,定位出所述待识别人脸图像中的人脸关键点;Using a pre-trained multi-task convolutional neural network model to perform face detection on the face image to be recognized, and locate the key points of the face in the face image to be recognized;
    基于所述人脸关键点对所述待识别人脸图像进行人脸矫正。Perform face correction on the face image to be recognized based on the face key points.
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:The computer-readable storage medium according to claim 19, wherein, when the computer program is executed by the processor, it further implements:
    将所述人脸关键点的坐标信息与预先存储的标准人脸图像中人脸关键点的坐标信息进行比对,得到相似变换矩阵T;Comparing the coordinate information of the key points of the face with the coordinate information of the key points of the face in the pre-stored standard face image to obtain a similarity transformation matrix T;
    根据预设相似变换矩阵方程求解所述相似变换矩阵T;Solving the similarity transformation matrix T according to a preset similarity transformation matrix equation;
    将所述人脸关键点的坐标信息与求解后得到的所述相似变换矩阵T相乘,得到所述待识别目标人脸图像。The coordinate information of the key points of the human face is multiplied by the similar transformation matrix T obtained after the solution is obtained to obtain the face image of the target to be recognized.
PCT/CN2020/104042 2020-04-29 2020-07-24 Facial action unit recognition method and apparatus, and electronic device, and storage medium WO2021217919A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010359833.2 2020-04-29
CN202010359833.2A CN111639537A (en) 2020-04-29 2020-04-29 Face action unit identification method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021217919A1 true WO2021217919A1 (en) 2021-11-04

Family

ID=72332439

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104042 WO2021217919A1 (en) 2020-04-29 2020-07-24 Facial action unit recognition method and apparatus, and electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111639537A (en)
WO (1) WO2021217919A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114025198A (en) * 2021-11-08 2022-02-08 深圳万兴软件有限公司 Video cartoon method, device, equipment and medium based on attention mechanism

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631525B (en) * 2022-10-26 2023-06-23 万才科技(杭州)有限公司 Face edge point identification-based insurance instant matching method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399788A (en) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 AU detection method, device, electronic equipment and the storage medium of image
WO2019213459A1 (en) * 2018-05-04 2019-11-07 Northeastern University System and method for generating image landmarks
CN110427867A (en) * 2019-07-30 2019-11-08 华中科技大学 Human facial expression recognition method and system based on residual error attention mechanism
CN110889325A (en) * 2019-10-12 2020-03-17 平安科技(深圳)有限公司 Multitask facial motion recognition model training and multitask facial motion recognition method
CN110929603A (en) * 2019-11-09 2020-03-27 北京工业大学 Weather image identification method based on lightweight convolutional neural network
CN111310705A (en) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 Image recognition method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019213459A1 (en) * 2018-05-04 2019-11-07 Northeastern University System and method for generating image landmarks
CN110399788A (en) * 2019-06-13 2019-11-01 平安科技(深圳)有限公司 AU detection method, device, electronic equipment and the storage medium of image
CN110427867A (en) * 2019-07-30 2019-11-08 华中科技大学 Human facial expression recognition method and system based on residual error attention mechanism
CN110889325A (en) * 2019-10-12 2020-03-17 平安科技(深圳)有限公司 Multitask facial motion recognition model training and multitask facial motion recognition method
CN110929603A (en) * 2019-11-09 2020-03-27 北京工业大学 Weather image identification method based on lightweight convolutional neural network
CN111310705A (en) * 2020-02-28 2020-06-19 深圳壹账通智能科技有限公司 Image recognition method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114025198A (en) * 2021-11-08 2022-02-08 深圳万兴软件有限公司 Video cartoon method, device, equipment and medium based on attention mechanism
CN114025198B (en) * 2021-11-08 2023-06-27 深圳万兴软件有限公司 Video cartoon method, device, equipment and medium based on attention mechanism

Also Published As

Publication number Publication date
CN111639537A (en) 2020-09-08

Similar Documents

Publication Publication Date Title
Rao et al. Deep convolutional neural networks for sign language recognition
Sun et al. Lattice long short-term memory for human action recognition
WO2020125623A1 (en) Method and device for live body detection, storage medium, and electronic device
WO2020103700A1 (en) Image recognition method based on micro facial expressions, apparatus and related device
Haider et al. Deepgender: real-time gender classification using deep learning for smartphones
JP7386545B2 (en) Method for identifying objects in images and mobile device for implementing the method
WO2022000420A1 (en) Human body action recognition method, human body action recognition system, and device
WO2021196389A1 (en) Facial action unit recognition method and apparatus, electronic device, and storage medium
WO2019227479A1 (en) Method and apparatus for generating face rotation image
Deng et al. MVF-Net: A multi-view fusion network for event-based object classification
TW202038191A (en) Method, device and electronic equipment for living detection and storage medium thereof
CN110222718B (en) Image processing method and device
CN111292262B (en) Image processing method, device, electronic equipment and storage medium
WO2021169754A1 (en) Photographic composition prompting method and apparatus, storage medium, and electronic device
WO2021047587A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
CN111108508B (en) Face emotion recognition method, intelligent device and computer readable storage medium
WO2021217919A1 (en) Facial action unit recognition method and apparatus, and electronic device, and storage medium
WO2022052782A1 (en) Image processing method and related device
CN113361387A (en) Face image fusion method and device, storage medium and electronic equipment
Neverova Deep learning for human motion analysis
CN112528978B (en) Face key point detection method and device, electronic equipment and storage medium
Das et al. A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition
Vernikos et al. Fusing handcrafted and contextual features for human activity recognition
CN117237547A (en) Image reconstruction method, reconstruction model processing method and device
WO2023142886A1 (en) Expression transfer method, model training method, and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932996

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20932996

Country of ref document: EP

Kind code of ref document: A1