WO2024001095A1 - Procédé de reconnaissance d'expression faciale, dispositif terminal et support de stockage - Google Patents

Procédé de reconnaissance d'expression faciale, dispositif terminal et support de stockage Download PDF

Info

Publication number
WO2024001095A1
WO2024001095A1 PCT/CN2022/140931 CN2022140931W WO2024001095A1 WO 2024001095 A1 WO2024001095 A1 WO 2024001095A1 CN 2022140931 W CN2022140931 W CN 2022140931W WO 2024001095 A1 WO2024001095 A1 WO 2024001095A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
area map
local
face image
classification probability
Prior art date
Application number
PCT/CN2022/140931
Other languages
English (en)
Chinese (zh)
Inventor
韦燕华
Original Assignee
闻泰通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 闻泰通讯股份有限公司 filed Critical 闻泰通讯股份有限公司
Publication of WO2024001095A1 publication Critical patent/WO2024001095A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present disclosure relates to facial expression recognition methods, terminal devices and storage media.
  • Expression recognition is to recognize the facial expressions of the current person's face. These different facial expressions express the user's different emotional states and current physiological and psychological reactions.
  • Current facial expression recognition methods mainly include detection based on geometric features or appearance features. Among them, methods based on geometric features are difficult to perform under complex light and changeable facial movements; methods based on appearance features are adaptable to environmental changes. It is low and is very sensitive to changes in unbalanced lighting, complex imaging and noise, resulting in the loss of a large amount of texture and edge information in the image, thus reducing the accuracy of face recognition. Therefore, how to accurately detect users' facial expressions has become an urgent problem that needs to be solved.
  • the present disclosure provides a facial expression recognition method, terminal device and storage medium.
  • a facial expression recognition method includes: obtaining a face image
  • the target expression classification probability corresponding to the human face image is determined, and the facial expression corresponding to the human face image is determined according to the target expression classification probability.
  • the trained neural network model includes: a first neural network model and a second neural network model, and the trained neural network model extracts the local features of the face image to obtain
  • the local feature vector includes: performing super-resolution processing and noise reduction processing on the face image through the first neural network model to obtain the first image; performing super-resolution processing and noise reduction processing on the first image through the second neural network model. Local feature extraction is performed to obtain the local feature vector.
  • performing local feature extraction on the first image through the second neural network model to obtain the local feature vector includes: performing local key point extraction on the first image. Detect to obtain eye key points and mouth key points; extract the first image according to the eye key points and the mouth key points to obtain an eye area map and a mouth area map; through The second neural network model performs local feature extraction on the eye area map and the mouth area map respectively to obtain local feature vectors corresponding to the eye area map and the mouth area map respectively.
  • the second neural network model is used to perform local feature extraction on the eye area map and the mouth area map respectively to obtain the eye area map and the mouth area map.
  • the local feature vectors respectively corresponding to the mouth area map include: sliding on the eye area map and the mouth area map respectively according to the preset sliding distance through the preset sliding window in the second neural network model. to multiple preset positions, and perform local feature extraction at each of the preset positions to obtain multiple local feature vectors corresponding to the eye area map and the mouth area map respectively; wherein, the preset It is assumed that the size of the sliding window is determined according to the width and height of the eye area map and the mouth area map respectively.
  • local feature extraction is performed at each preset position to obtain multiple local feature vectors respectively corresponding to the eye area map and the mouth area map, including : At each of the preset positions, the eye area map and the mouth area map are cropped respectively to obtain multiple eye feature maps and multiple mouth feature maps; for each eye area map Perform local feature extraction on the eye feature map and each mouth feature map to obtain multiple eye feature vectors corresponding to the multiple eye area maps and multiple mouths corresponding to the multiple mouth area maps.
  • Feature vector is performed at each preset position to obtain multiple local feature vectors respectively corresponding to the eye area map and the mouth area map, including : At each of the preset positions, the eye area map and the mouth area map are cropped respectively to obtain multiple eye feature maps and multiple mouth feature maps; for each eye area map Perform local feature extraction on the eye feature map and each mouth feature map to obtain multiple eye feature vectors corresponding to the multiple eye area maps and multiple mouths corresponding to the multiple mouth area maps.
  • determining the local expression classification probability corresponding to the face image according to the local feature vector includes: combining the multiple eye feature vectors and the multiple mouth feature vectors Correspondingly input the fully connected layer network model to obtain multiple expression classification probabilities corresponding to the multiple eye feature vectors and the multiple mouth feature vectors.
  • Each expression classification probability corresponds to an eye feature vector and a corresponding Mouth feature vector; average the multiple expression classification probabilities to determine the local expression classification probability corresponding to the face image.
  • performing super-resolution processing and noise reduction processing on the face image through the first neural network model to obtain the first image includes: enlarging the face image, And according to the preset direction and preset size, the enlarged face image is cropped to obtain multiple first sub-images; through the first neural network model, the multiple first sub-images are super-resolved respectively.
  • rate processing and noise reduction processing to obtain a plurality of second sub-images, the plurality of second sub-images corresponding to the plurality of first sub-images; the plurality of second sub-images are spliced to obtain the Describe the first image.
  • splicing the plurality of second sub-images to obtain the first image includes: obtaining the position identifier of each first sub-image in the face image; According to the position identification, the plurality of second sub-images are spliced respectively to obtain the first image; wherein, the position identification of the target first sub-image in the face image and the position identification of the target second sub-image are in The position identifiers in the first image are the same, the target first sub-image is any one of the plurality of first sub-images, and the target second sub-image is the same among the plurality of second sub-images. The image corresponding to the first sub-image of the target.
  • determining the target expression classification probability corresponding to the face image according to the global expression classification probability and the local expression classification probability includes: obtaining the target expression classification probability corresponding to the global expression classification probability.
  • the method further includes: determining a target rendering corresponding to the facial expression; rendering the face image through the target rendering to obtain a target rendered facial image; and outputting the target rendering.
  • the above target renders facial images.
  • a facial expression recognition device includes: an acquisition module, used to acquire facial images;
  • a feature extraction module used to extract global features from the face image to obtain a global feature vector
  • a processing module configured to determine the global expression classification probability corresponding to the face image according to the global feature vector
  • the feature extraction module is also used to extract local features of the face image through the trained neural network model to obtain a local feature vector
  • the processing module is also used to determine the local expression classification probability corresponding to the face image according to the local feature vector;
  • the processing module is further configured to determine the target expression classification probability corresponding to the face image according to the global expression classification probability and the local expression classification probability, and determine the face image according to the target expression classification probability. Corresponding facial expression.
  • the processing module is specifically used to perform super-resolution processing and noise reduction processing on the face image through the first neural network model to obtain the first image;
  • the feature extraction module is specifically used to use the first neural network model to perform super-resolution processing and noise reduction processing on the face image.
  • the second neural network model extracts local features of the first image to obtain local feature vectors.
  • the processing module is specifically configured to perform local key point detection on the first image to obtain eye key points and mouth key points; the processing module is specifically configured to detect the eye key points and the mouth key points according to the The key points of the mouth are extracted from the first image to obtain the eye area map and the mouth area map; the feature extraction module is specifically used to perform partial analysis of the eye area map and the mouth area map through the second neural network model. Feature extraction is performed to obtain local feature vectors corresponding to the eye area map and the mouth area map respectively.
  • the feature extraction module is specifically configured to slide the eye area map and the mouth area map to a plurality of preset sliding windows according to a preset sliding distance through a preset sliding window in the second neural network model. Set the position, and perform local feature extraction at each preset position to obtain multiple local feature vectors corresponding to the eye area map and mouth area map; among them, the size of the preset sliding window is based on the eye area map respectively. and the width and height of the mouth area map are determined.
  • the processing module is specifically configured to crop the eye area map and the mouth area map respectively at each preset position to obtain multiple eye feature maps and multiple mouth features.
  • Figure; feature extraction module specifically used to extract local features from each eye feature map and each mouth feature map to obtain multiple eye feature vectors and multiple mouth regions corresponding to multiple eye area maps. Multiple mouth feature vectors corresponding to the image.
  • the processing module is specifically configured to input multiple eye feature vectors and multiple mouth feature vectors into the fully connected layer network model, respectively, to obtain multiple eye feature vectors and multiple mouth feature vectors.
  • Multiple expression classification probabilities corresponding to the facial feature vectors each expression classification probability corresponds to an eye feature vector and a mouth feature vector; the processing module is specifically used to average the multiple expression classification probabilities to determine the face The local expression classification probability corresponding to the image.
  • the processing module is specifically configured to enlarge the face image, and crop the enlarged face image according to the preset direction and preset size to obtain multiple first sub-images;
  • the processing module is specifically configured to perform super-resolution processing and noise reduction processing on multiple first sub-images through a first neural network model to obtain multiple second sub-images, multiple second sub-images and multiple first sub-images.
  • the sub-images correspond one to one; the processing module is specifically used to splice multiple second sub-images to obtain the first image.
  • the acquisition module is specifically used to obtain the position identification of each first sub-image in the face image; the processing module is specifically used to process multiple second sub-images according to the position identification. Splicing to obtain the first image; wherein, the position identifier of the target first sub-image in the face image is the same as the position identifier of the target second sub-image in the first image, and the target first sub-image is multiple first sub-images In any one of them, the target second sub-image is an image corresponding to the target first sub-image among the plurality of second sub-images.
  • the acquisition module is specifically configured to acquire the first weight corresponding to the global expression classification probability and the second weight corresponding to the local expression classification probability, where the sum of the first weight and the second weight is 1;
  • the processing module is specifically used to determine the target expression classification probability corresponding to the face image based on the global expression classification probability, the first weight, the local expression classification probability and the second weight.
  • the processing module is also used to determine the target rendering image corresponding to the facial expression; the processing module is also used to render the face image through the target rendering image to obtain the target rendered facial image; the processing module , also used to output target rendered facial images.
  • a terminal device, the terminal device includes:
  • Memory that stores executable program code
  • a processor coupled to said memory
  • the processor calls the executable program code stored in the memory to execute the steps of the facial expression recognition method in the first aspect of the present disclosure.
  • a computer-readable storage medium stores a computer program that causes a computer to execute the steps of the facial expression recognition method in the present disclosure.
  • the computer-readable storage medium includes ROM/RAM, magnetic disk or optical disk, etc.
  • a computer program product that, when run on a computer, causes the computer to perform part or all of the steps of any method.
  • An application publishing platform is used to publish computer program products, wherein when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of any method.
  • Figure 1A is a schematic scene diagram of a facial expression recognition method provided by the present disclosure
  • Figure 1B is a schematic flow chart 1 of a facial expression recognition method provided by the present disclosure
  • Figure 1C is a facial schematic diagram of a facial expression recognition method provided by the present disclosure.
  • Figure 2 is a schematic flow chart 2 of a facial expression recognition method provided by the present disclosure
  • Figure 3 is a schematic flow chart 3 of a facial expression recognition method provided by the present disclosure
  • Figure 4 is a cropping schematic diagram 1 of a facial expression recognition method provided by the present disclosure
  • Figure 5 is a cropping schematic diagram 2 of a facial expression recognition method provided by the present disclosure
  • Figure 6 is a schematic rendering of a facial expression recognition method provided by the present disclosure.
  • Figure 7 is a schematic structural diagram of a facial expression recognition device provided by the present disclosure.
  • Figure 8 is a schematic structural diagram of a terminal device provided by the present disclosure.
  • first, second, etc. in the description and claims of the present disclosure are used to distinguish different objects, rather than to describe a specific order of objects.
  • first neural network model and the second neural network model are used to distinguish different neural network models, rather than describing a specific order of the neural network models.
  • the facial expression recognition device involved in the present disclosure may be a terminal device, or may be a functional module and/or functional entity provided in the terminal device that can implement the facial expression recognition method.
  • the specific details may be determined according to actual usage requirements, and this disclosure does not make any determination. limited.
  • the terminal device can be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle-mounted terminal device, a wearable device, an ultra-mobile personal computer (Ultra-Mobile Personal Computer, UMPC), a netbook or a personal digital assistant (Personal digital assistant). Digital Assistant, PDA) and other electronic equipment.
  • the wearable device can be a smart watch, a smart bracelet, a watch phone, etc., which is not limited in this disclosure.
  • FIG. 1A is a schematic diagram of a scene of the facial expression recognition method disclosed in the present disclosure.
  • the facial expression recognition method provided by the present disclosure can be applied in the application environment as shown in FIG. 1A .
  • This facial expression recognition method is used in facial expression recognition systems.
  • the facial expression recognition system includes a user 11, a terminal device 12 and a server 13. Among them, the terminal device 12 and the server 13 communicate through the network.
  • the terminal device 12 can first obtain the face image of the user 11, and then perform global feature extraction on the face image to obtain a global feature vector, and determine the global expression classification probability corresponding to the face image according to the global feature vector; and then obtain it after training by the server 13
  • the neural network model extracts the local features of the face image to obtain the local feature vector, and determines the local expression classification probability corresponding to the face image based on the local feature vector; finally, determines the local expression classification probability corresponding to the face image based on the global expression classification probability and the local expression classification probability.
  • Target expression classification probability and determine the facial expression corresponding to the face image based on the target expression classification probability.
  • the terminal device 12 can be, but is not limited to, various personal computers, laptops, smart phones, tablets, and portable wearable devices.
  • the server 13 can be implemented as an independent server or a server cluster composed of multiple servers.
  • the server 13 can train the neural network model and store the trained neural network model in the server 13.
  • the trained neural network can be downloaded from the server 13.
  • the model performs facial expression recognition on facial images; the trained neural network model can also be trained on the terminal device 12 and stored in the terminal device 12 .
  • the terminal device 12 can respectively calculate the global expression classification probability and the local expression classification probability through the two branch architectures of global features and local features, and fuse the global expression classification probability and the local expression classification probability to determine the facial expression. , which can effectively reduce the impact of environmental factors on global features and local features respectively, and improve the accuracy of facial expression detection.
  • the present disclosure provides a facial expression recognition method, which can be applied to the terminal device 12 or the server 13 in Figure 1A.
  • the terminal device 12 is used as an example for illustration.
  • the method may include the following steps:
  • the terminal device can obtain the user's face image.
  • the face image is an image including the user's facial features.
  • the face image may be captured by the terminal device through a camera, or may be obtained by the terminal device from a pre-stored picture library.
  • the terminal device can perform global feature extraction on the entire facial area in the human face image to obtain a global feature vector.
  • global features refer to the overall attributes of the face image.
  • Common global features include color features, texture features, shape features, etc., such as intensity, histogram, etc. Since it is a pixel-level low-level visual feature, the global feature has the characteristics of good invariance, simple calculation, and intuitive representation.
  • the global feature vector can be a grayscale value, a red, green, blue (Red, Green, Blue, RGB) value, a hue, a color saturation, an intensity (Hue, Saturation, Intensity, HSI) value, etc. Express.
  • global feature extraction methods may include: Principal Components Analysis (PCA), Linear Discriminant Analysis (LDA), etc.
  • PCA Principal Components Analysis
  • LDA Linear Discriminant Analysis
  • the terminal device can mainly determine the global feature vector through PCA, and PCA can use the dimensionality reduction method to determine the global feature vector.
  • the facial image can be recognized and each facial feature point contained in the facial image can be determined.
  • Each facial feature point can determine a corresponding n-dimensional vector used to describe the facial feature point, and then these facial features can be identified.
  • the covariance matrix is calculated for the n-dimensional vector of the point.
  • the covariance matrix calculates the covariance between the multiple dimensional vectors corresponding to each feature point, rather than between different feature points.
  • the eigenvalues and eigenvectors of each feature point can be calculated based on the covariance matrix, the eigenvalues of each feature point are sorted from large to small, the top k feature points are selected, and then the k The k feature vectors corresponding to the feature points are used as column vectors to form a feature vector matrix, and the k feature points are projected onto the selected feature vectors, thereby changing each feature point from the original n-dimensional vector to a k-dimensional vector.
  • Both n and k are integers, and n is greater than k.
  • n can be 128 and k can be 31, that is, the feature points in the face image can be reduced from 128 dimensions to 31 dimensions through PCA.
  • PCA when extracting global features from face images, PCA can be used to reduce the dimensionality of feature points in face images to obtain global feature vectors, which can reduce the amount of calculation of feature vectors. To achieve the purpose of global feature extraction.
  • the global expression classification probability can be used to represent the probability of the facial expression category corresponding to the global feature vector.
  • facial expressions can include a variety of categories, for example, they can include categories such as calmness, happiness, sadness, surprise, fear, anger, and disgust.
  • Feature classification can be performed based on the global feature vector, and the global feature vector can be determined in each category.
  • the probability corresponding to each facial expression category can be any value between [0, 1].
  • the highest probability can be regarded as the global expression classification probability, and the facial expression category corresponding to the highest probability is the facial expression determined by global feature vector classification.
  • the global expression classification probability may also include the probability corresponding to each facial expression category.
  • the terminal device analyzes the global feature vector and obtains a probability of sadness of 68%, probability of anger 32%, probability of disgust 26%, probability of fear 33%, and probability of calm 8% , the probability of happiness is 2%, and the probability of surprise is 16%; then it can be seen that the probability of sadness is the highest compared to other expressions, so the global expression classification probability is 68%, and sadness can be regarded as Facial expressions determined by global feature vector classification.
  • the terminal device can pre-train a neural network model, and perform local feature extraction on local areas in the face image through the trained neural network model to obtain a local feature vector.
  • local features are features extracted from local areas of the image, including edges, corners, lines, curves, and areas with special attributes.
  • Common local features include two major description methods: corner points and area types. Local image features are abundant in the image, have low correlation between features, and will not affect the detection and matching of other features due to the disappearance of some features under occlusion.
  • the terminal device can extract the user's local feature vector of the eyes from the eye area map of the face image, and Extract the mouth local feature vector from the mouth area map of the face image.
  • the eye area map refers to an image including the user's eyes, eyebrows and nose bridge area, and can be divided into a left eye area map and a right eye area map.
  • the mouth area map refers to an image including the user's mouth and nostril area.
  • a in Figure 1C is the user's face image
  • b in Figure 1C and c in Figure 1C are the user's eye area maps
  • b in Figure 1C is Right eye area map
  • c in Figure 1C is the left eye area map
  • d in Figure 1C is the user's mouth area map.
  • the local feature vector can be a gray value, a red, green, blue (Red, Green, Blue, RGB) value, a hue, a color saturation, an intensity (Hue, Saturation, Intensity, HSI) value, etc. Express.
  • the local expression classification probability can be used to represent the probability of the facial expression category corresponding to the local feature vector.
  • the terminal device analyzes the local feature vector of the mouth
  • the probability of being happy is 68%
  • the probability of being surprised is 32%
  • the probability of being calm is 26%
  • the probability of being fearful is 13%
  • the probability of being angry is 8 %
  • the probability of sadness is 2%
  • the probability of disgust is 16%; then it can be seen that the probability of happiness is the highest compared to other expressions, so the local expression classification probability is 68% corresponding to happiness, and sadness can be Facial expressions as determined by local feature vector classification.
  • the terminal device can fuse the global expression classification probability and the local expression classification probability to determine the target expression classification probability, and the target expression classification probability can be used to represent the user's current facial expression.
  • both the global expression classification probability and the local expression classification probability can include the probabilities corresponding to multiple facial expression categories, in the process of fusing the global expression classification probability and the local expression classification probability, it is necessary to combine each The probabilities corresponding to each facial expression category are fused to obtain the target expression classification probability.
  • the target expression classification probability may also include the probability corresponding to each facial expression category.
  • the target expression classification probabilities obtained by the terminal device are: the probability of surprise is 56%, the probability of fear is 40%, the probability of anger is 38%, the probability of disgust is 26%, and the probability of sadness is 22%.
  • the probability of calmness is 2%, and the probability of happiness is 13%. It can be seen that the probability of surprise is the highest compared to other expressions. Therefore, the target expression classification probability is 56%, and the probability of surprise can be used as a global expression.
  • the facial expression determined after the fusion of the classification probability and the local expression classification probability is the facial expression determined by the joint classification of the global feature vector and the local feature vector.
  • the terminal device can calculate the global expression classification probability and the local expression classification probability respectively through the two branch architectures of global features and local features, and fuse the global expression classification probability and the local expression classification probability to determine the facial expression, This can effectively reduce the impact of environmental factors on global features and local features respectively, and improve the accuracy of facial expression detection.
  • the present disclosure provides a facial expression recognition method, which can be applied to the terminal device 12 or the server 13 in Figure 1A.
  • the terminal device 12 is used as an example for illustration.
  • the method may also include the following steps:
  • steps 201 to 202 please refer to the detailed description of steps 101 to 102 in Embodiment 1, which will not be described again in this disclosure.
  • the terminal device can input the global feature vector into the fully connected layer network model, so that the fully connected layer network model performs feature classification on the global feature vector, thereby determining the global expression classification probability corresponding to the face image.
  • the fully connected layer network model includes a fully connected layer and an activation function.
  • the activation function is to improve the nonlinear expression ability of the fully connected layer network model.
  • Each layer in the fully connected layer is composed of many A tiled structure composed of neurons, the main function of the fully connected layer is to achieve classification, that is, to classify the global feature vector to determine the corresponding global expression classification probability.
  • the fully connected layer network can be trained on samples in advance, that is, a large number of face images and the labeled facial expression categories corresponding to each face image can be trained in advance, so that the fully connected layer network can be trained through
  • the correspondence between face images and facial expression categories is learned, but because there are many features in the face image, the face image may correspond to more than one facial expression category, so that each facial expression category can be assigned a classification probability , the higher the classification probability, the more likely the facial expression category corresponding to the classification probability will become the facial expression corresponding to the face image. Therefore, when using a fully connected layer network, you only need to input a face image to get the classification probability of the facial expression category corresponding to the face image.
  • the terminal device can input the face image into the first neural network model, so that the first neural network model performs super-resolution processing and noise reduction processing on the face image, thereby obtaining the first image.
  • the trained neural network model includes a first neural network model and a second neural network model.
  • super-resolution processing is a method of amplifying the resolution of an image, which can transform a low-definition, small-size image into a high-definition, large-size image.
  • super-resolution processing There are many specific methods of super-resolution processing, such as: interpolation method, reconstruction method, model training, etc.; among them, the interpolation method is more commonly used and intuitive, and can use the pixel value of the target pixel in the original image and multiple surrounding pixels
  • the pixel value of the new image after super-resolution processing is determined to be the pixel value of the pixel corresponding to the target pixel in the original image.
  • Noise reduction processing is a relatively common processing method in current image processing. Filters can usually be used directly to reduce noise on images to remove the effects caused by equipment factors and environmental interference.
  • the first neural network model is pre-trained through a large number of face test images. During the training process, it can be based on the preset low-resolution small-size images and high-resolution large-size images. Contact and generate corresponding model parameters to obtain a convolutional neural network that can achieve super-resolution and noise reduction at the same time.
  • the structure of the first neural network neural network model may be a preset number of sequentially connected convolution blocks. Each convolution block includes a sequentially connected convolution layer and an activation layer, wherein the activation function of the activation layer may be a Sigmoid function, One or more combinations of Relu function, Leaky ReLU function, Tanh function, softmax function.
  • the first neural network model can process the pixels in the face image through the convolution function and the activation function to obtain a higher resolution and clearer image.
  • the first image, and the connection between the low-definition small-size picture and the high-definition large-size picture in the first neural network model is updated in real time. This can improve the accuracy of the first neural network model in processing face images, and can improve the first
  • the convergence speed of the neural network model further improves the efficiency of image processing.
  • the terminal device can input the first image to the second neural network model, so that the second neural network model performs local feature extraction on the first image, thereby obtaining a local feature vector.
  • the second neural network model can be a convolutional neural network model, and the second neural network model can be trained through a large number of sample images and feature vectors corresponding to the sample images, so that the second neural network model can When used, the local feature vector corresponding to the first image can be quickly extracted according to the input first image.
  • the training method of the second neural network model may include: obtaining multiple sample images and corresponding sample categories; performing feature extraction on the multiple sample images through the second neural network model to be trained to obtain the corresponding second neural network model. reference features of the sample image; determine the loss value between the reference feature and the corresponding sample category; adjust the model parameters in the corresponding second neural network model according to the loss value until the determined loss value reaches the training stop condition.
  • the reference features are sample image features obtained after feature extraction of the sample image by the second neural network model to be trained. As the number of training times of the second neural network model increases, the reference features will also change.
  • the second neural network model can use deep learning, neural network and other methods to train and learn the network.
  • the training stop condition is that the loss value of the reference features in each sample image and the corresponding known sample category reaches a preset range, that is, the prediction accuracy of each sample image reaches a preset range.
  • the terminal device obtains multiple sample images and corresponding sample categories, and extracts the image features of each sample image through the second neural network model running on the terminal device to obtain the reference features of the corresponding sample image; where, The reference features are related to the expression classification probability corresponding to the second neural network model, and can better characterize the features belonging to the corresponding expression classification probability.
  • the terminal uses the loss function to determine the loss value of the reference feature and the known sample category, and adjusts the model parameters in the second neural network model according to the loss value until the loss value falls within the preset range, then stops the second neural network. Model training.
  • the loss function can use the mean square error loss function, the average absolute value loss function, the cross entropy loss function, etc.
  • the terminal device when the terminal device determines the local expression classification probability corresponding to the face image based on the local feature vector, the terminal device can input the local feature vector into the fully connected layer network model, so that the fully connected layer network model performs on the local feature vector. Feature classification to determine the local expression classification probability corresponding to the face image.
  • the fully connected layer network model is the same as the fully connected layer network model in step 203 above, and is used to classify feature vectors to obtain expression classification probabilities.
  • the terminal device After the terminal device determines the global expression classification probability and the local expression classification probability, it can obtain the first weight corresponding to the global expression classification probability and the second weight corresponding to the local expression classification probability. It should be noted that the first weight corresponding to the global expression classification probability and the second weight corresponding to the local expression classification probability may be determined based on experience or may be determined based on historical training data.
  • the sum of the first weight and the second weight is 1.
  • the terminal device can pre-construct the mapping relationship between the first weight and the global expression classification probability, and the mapping relationship between the second weight and the local expression classification probability, and store them in the database of the terminal device or the server, so that the terminal device obtains After determining the global expression classification probability and local expression classification probability, the corresponding first weight and second weight can be determined from the database.
  • the first weight is greater than the second weight.
  • the first weight may be 0.7 and the second weight may be 0.3; if The local expression classification probability corresponding to the local feature vector is more accurate for facial expression recognition, so the second weight is greater than the first weight.
  • the first weight may be 0.26, and the second weight may be 0.74, but is not limited to this.
  • the terminal device in the process of fusing the global expression classification probability and the local expression classification probability, the terminal device combines the respective weights of the global expression classification probability and the local expression classification probability.
  • both the global expression classification probability and the local expression classification probability include the probability corresponding to each facial expression category
  • each The probabilities corresponding to the facial expression categories are respectively brought into the first formula, and the target expression classification probability corresponding to the facial expression category is calculated.
  • the target expression classification probability with the highest probability is selected, so as to Determine the target expression classification probability corresponding to the face image.
  • the global expression classification probabilities include: the probability of sadness is 99%, the probability of anger is 32%, the probability of disgust is 26%, and the probability of fear is 33%; the local expression classification probabilities include: the probability of sadness is 95 %, the probability of anger is 41%, the probability of disgust is 33%, and the probability of fear is 29%; then the global expression classification probability 99% corresponding to sadness and the corresponding local expression classification probability 95% can be brought into the first formula, The classification probability of the target expression corresponding to sadness is 99%. Similarly, the classification probability of the target expression corresponding to anger is 35%, the classification probability of the target expression corresponding to disgust is 28%, and the classification probability of the target expression corresponding to fear is 31%. Finally, these four targets are The highest 99% of expression classification probabilities are determined as the target expression classification probabilities corresponding to the face image.
  • step 209 please refer to the detailed description of step 104 in Embodiment 1, which will not be described again in this disclosure.
  • the global expression classification probability and the local expression classification probability can be calculated respectively through the two branch architectures of global features and local features, and the global expression classification probability and the local expression classification probability can be fused with their respective weights to determine Facial expressions, and the extraction of image features combined with neural network models, can effectively improve the resolution of face images and their robustness to the environment, more accurately extract global features and local features, and better express facial expressions, thereby The classification accuracy is improved, which effectively reduces the impact of environmental factors on global features and local features respectively, and improves the accuracy of facial expression detection.
  • the present disclosure provides a facial expression recognition method, which can be applied to the terminal device 12 or the server 13 in Figure 1A.
  • the terminal device 12 is used as an example for illustration.
  • the method may also include the following steps:
  • the terminal device can take pictures through a camera provided on the terminal device to obtain the initial image.
  • the terminal device can perform face recognition on the initial image and determine the facial area map in the initial image, and the part other than the facial area map is the background area map.
  • facial feature points in the initial image can be detected. If a certain number of facial feature points are detected in the initial image, then it can be determined that there are Facial area map.
  • the facial area map crop the initial image to obtain a face image including the facial area.
  • the terminal device can crop the initial image and remove all background area maps except the facial area map to obtain a face including the facial area map. image.
  • steps 304 to 305 please refer to the detailed description of steps 202 to 203 in Embodiment 2, which will not be described again in this disclosure.
  • the terminal device when cropping a face image, can first enlarge the face image, and then crop it according to the preset direction and preset size to obtain multiple first sub-images.
  • the preset size is the preset number of pixels
  • the preset direction is the fixed cropping direction.
  • the cropping direction is the order in which the face image is cropped, which can be parallel boundary traversal and cropping from left to right, parallel boundary traversal and cropping from top to bottom, or top left to bottom right. Corner traversal clipping.
  • cropping a face image it can be cropped with overlap, that is, one pixel can exist in multiple first sub-images, or it can be cropped without overlap, that is, one pixel can only exist in one first sub-image. in the sub-image.
  • the face image is an image of 12*12 pixels
  • each grid represents a pixel
  • the preset size is 4*4
  • the preset method is from left to right, and top to bottom.
  • Figure 4 only shows part of the cropping steps. Crop from the upper left corner of the face image according to the preset size of 4*4, and move and crop in steps of 2 pixels, that is, as shown in Figure 4 After cropping the upper left corner as shown in a to obtain a first sub-image as shown in h in Figure 4, move two pixels to the right for cropping as shown in b in Figure 4 to obtain a sub-image as shown in Figure 4 The first sub-image shown in h in Figure 4 is continuously moved and cropped to the right. After cropping in the upper right corner as shown in c in Figure 4, a first sub-image shown in h in Figure 4 is obtained.
  • Figure 5 shows all the cropping steps. Cropping starts from the upper left corner of the face image according to the preset size of 4*4, that is, cropping from the upper left corner as shown in a in Figure 5. After a first sub-image as shown in j in Figure 5, move four (i.e. preset size) pixels to the right as shown in b in Figure 5 for cropping to obtain a sub-image as shown in j in Figure 5. The first sub-image, and after cropping the upper right corner as shown in c in Figure 5 to obtain a first sub-image as shown in j in Figure 5, you can follow the d in Figure 5, in the figure On the basis of a in Figure 5, move downward four pixels (i.e.
  • the face image before cropping the face image according to the preset direction and the preset size, the face image can be enlarged first to obtain an enlarged image of the preset multiple, and the face image can be cropped according to the preset direction and preset size. size, crop the enlarged image.
  • the terminal device can first The face image is enlarged by a preset factor and then cropped.
  • the terminal device crops the face image to obtain a plurality of first sub-images
  • the terminal device can input each first sub-image into the first neural network model, so that the first neural network model
  • Each first sub-image is subjected to super-resolution processing and noise reduction processing, thereby obtaining multiple second sub-images.
  • a second sub-image can be obtained, that is, multiple second sub-images correspond to multiple first sub-images in a one-to-one manner.
  • the terminal device obtains the plurality of second sub-images
  • the plurality of second sub-images can be spliced together again to obtain the first image.
  • splicing multiple second sub-images to obtain the first image may include: obtaining the position identifier of each first sub-image in the face image; and combining the multiple second sub-images according to the position identifier. Splice them separately to obtain the first image.
  • the position identifier of the first sub-image in the face image can be used to represent the corresponding image area position of the first sub-image in the face image.
  • the position identifier of the target first sub-image in the face image is consistent with the target first sub-image.
  • the position identifiers of the two sub-images in the first image are the same, the target first sub-image is any one of the plurality of first sub-images, and the target second sub-image is one of the plurality of second sub-images corresponding to the target first sub-image. Image.
  • each first sub-image undergoes super-resolution processing and noise reduction processing of the first neural network model, a second sub-image will be obtained. Therefore, the position of the first sub-image and the second sub-image will be obtained.
  • the positions of the images should also be corresponding. Therefore, when the terminal device crops the face image, it can simultaneously obtain the position identification of the target first sub-image in the face image, and then obtain the position identification of the target first sub-image through the first neural network model. After the target second sub-image corresponding to the sub-image is obtained, the target second sub-image can be placed in the first image according to the position identifier, thereby obtaining the first image.
  • the position identifier of the first sub-image of the target in the face image can be the position coordinate of a certain pixel point in the first sub-image of the target in the face image, or it can be the position coordinate of each pixel point in the face image.
  • the average value of the position coordinates in the image is not limited by this disclosure.
  • each preset size area can be regarded as a whole, then the position corresponding to the first sub-image is obtained as shown in a in Figure 5
  • the identifier can be (1, 1).
  • the position identifier corresponding to the first sub-image obtained as shown in b in Figure 5 can be (1, 2).
  • the position identifier corresponding to the first sub-image obtained as shown in c in Figure 5 The position identifier of can be (1, 3).
  • the position identifier corresponding to the first sub-image obtained as shown in d in Figure 5 can be (2, 1).
  • the position identifier corresponding to the image can be (2, 2).
  • the position identifier corresponding to the first sub-image obtained as shown in f in Figure 5 can be (2, 3).
  • the position identifier corresponding to the first sub-image obtained as shown in g in Figure 5 The position identifier corresponding to one sub-image can be (3, 1), as shown in h in Figure 5.
  • the position identifier corresponding to the first sub-image can be (3, 2), as shown in i in Figure 5.
  • the position identifier corresponding to the first sub-image may be (3, 3).
  • the terminal device can identify the key points in the first image. Since when the user changes his facial expression, the changes in the eyes and mouth are generally greater than the changes in other facial features, so the terminal device can only identify the key points in the eyes. Points and mouth key points.
  • the terminal device can detect the first image through a face 68 key point algorithm.
  • the face 68 key point algorithm is a commonly used algorithm for identifying facial key points.
  • 68 key points can be determined in a facial image.
  • Points, through the face 68 key point algorithm, multiple key points can be marked in the first image. These key points can describe the user's eye contour, eyebrow contour, nose contour, mouth contour, and facial contour.
  • the terminal device after the terminal device obtains the eye key points and the mouth key points, it can combine the eye key points and the mouth key points in the first image based on the local contours drawn by these eye key points and mouth key points.
  • the local areas represented by the key points are extracted to obtain the eye area map and the mouth area map.
  • the second neural network model is used to extract local features of the eye area map and the mouth area map respectively to obtain the local feature vectors corresponding to the eye area map and the mouth area map respectively.
  • it may include: The preset sliding window in the second neural network model slides to multiple preset positions on the eye area map and mouth area map according to the preset sliding distance, and local features are extracted at each preset position to obtain Multiple local feature vectors corresponding to the eye area map and the mouth area map respectively.
  • the size of the preset sliding window is determined based on the width and height of the eye area map and the mouth area map respectively.
  • a preset sliding window is set in the second neural network model, and the preset sliding window can be the second neural network model.
  • the convolution kernel of the two neural network models, the preset sliding window can slide on the eye area map and the mouth area map respectively according to a certain preset sliding distance and extract local features respectively.
  • the preset sliding distance is the preset sliding distance. Assuming the sliding step size of the sliding window, the larger the preset sliding distance, the smaller the calculation workload of the second neural network model. The smaller the preset sliding distance, the smaller the calculation error of the second neural network model and the higher the accuracy. , so the preset sliding distance can be determined based on experience or based on historical training data.
  • the preset window threshold can be determined based on experience or based on historical training data.
  • local feature extraction is performed at each preset position to obtain multiple local feature vectors respectively corresponding to the eye area map and the mouth area map. Specifically, it may include: at each preset position, extract the eyes respectively. Crop the head area map and mouth area map to obtain multiple eye feature maps and multiple mouth feature maps; perform local feature extraction on each eye feature map and each mouth feature map to obtain the eye area Multiple eye feature vectors corresponding to the image and multiple mouth feature vectors corresponding to the mouth area image.
  • the feature extraction method of the eye area map and the mouth area map by the second neural network model is the same, taking the eye area map as an example.
  • the eye area map is cropped according to the size of the preset sliding window to obtain an eye feature map.
  • multiple eye feature maps can be obtained, and multiple mouth feature maps can also be obtained in the same way; then the terminal device extracts local features from the multiple eye feature maps and multiple mouth feature maps, and multiple mouth feature maps can be obtained Multiple eye feature vectors corresponding to each eye area map and multiple mouth feature vectors corresponding to multiple mouth area maps.
  • steps 312 to 315 please refer to the detailed description of steps 206 to 209 in Embodiment 2, which will not be described again in this disclosure.
  • the terminal device after the terminal device determines the facial expression, it can determine the target rendering image corresponding to the facial expression among multiple pre-stored rendering images.
  • the face image is rendered through the target rendering image to obtain the target rendered facial image, which may include but is not limited to the following implementation methods:
  • a target rendering 61 which includes a right eye rendering 611, a left eye rendering 612 and a mouth rendering 613.
  • the right eye rendering 611 is the same as in Figure 1C
  • the right eye area map b corresponds
  • the left eye rendering map 612 corresponds to the left eye area map c in FIG. 1C
  • the mouth rendering map 613 corresponds to the mouth area map d in FIG. 1C .
  • the terminal device can replace the right eye rendering image 611 in the target rendering image 61 with the right eye area image b in FIG. 1C
  • the mouth rendering image 613 in the target rendering image 61 is replaced with the mouth area image d in FIG. 1C to obtain the target rendered facial image 62.
  • Implementation method two perform facial recognition on the target rendering, determine the facial features region map in the target rendering that corresponds to the eye region map and mouth region map in the face image, and replace the person with the facial features region map in the target rendering.
  • the eye area map and mouth area map in the face image are used to obtain the target rendered facial image.
  • the facial features area map in the target rendering can achieve the effect of the target rendering on facial images, enriching the application scenarios of facial expression recognition and effectively reflecting the fun.
  • the face image can be cropped to obtain multiple first sub-images, the first sub-images can be processed separately, and then spliced into the first image, which can make the image processing more detailed.
  • Face images are processed.
  • multiple first sub-images are processed, which can make the super-resolution processing and noise reduction processing more refined; and through the two branch architectures of global features and local features, the global expression classification is calculated respectively.
  • probability and local expression classification probability and combine the global expression classification probability and local expression classification probability with their respective weights to determine facial expressions, and combine with the neural network model to extract image features, which can effectively improve the resolution of face images.
  • efficiency and robustness to the environment more accurate extraction of global features and local features, better expression of facial expressions, thereby improving classification accuracy, which effectively reduces the impact of environmental factors on global features and local features respectively. , improve the accuracy of facial expression detection.
  • a facial expression recognition device which includes:
  • Acquisition module 701 used to acquire face images
  • Feature extraction module 702 is used to extract global features from face images to obtain global feature vectors
  • the processing module 703 is used to determine the global expression classification probability corresponding to the face image according to the global feature vector;
  • the feature extraction module 702 is also used to extract local features of the face image through the trained neural network model to obtain local feature vectors;
  • the processing module 703 is also used to determine the local expression classification probability corresponding to the face image according to the local feature vector;
  • the processing module 703 is also used to determine the target expression classification probability corresponding to the face image based on the global expression classification probability and the local expression classification probability, and determine the facial expression corresponding to the face image based on the target expression classification probability.
  • the processing module 703 is specifically configured to perform super-resolution processing and noise reduction processing on the face image through the first neural network model to obtain the first image;
  • the feature extraction module 702 is specifically configured to extract local features of the first image through the second neural network model to obtain local feature vectors.
  • the processing module 703 is specifically used to detect local key points on the first image to obtain eye key points and mouth key points;
  • the processing module 703 is specifically configured to extract the first image according to the eye key points and the mouth key points to obtain the eye area map and the mouth area map;
  • the feature extraction module 702 is specifically configured to extract local features from the eye area map and the mouth area map respectively through the second neural network model to obtain local feature vectors corresponding to the eye area map and the mouth area map respectively.
  • the feature extraction module 702 is specifically configured to slide to multiple preset positions on the eye area map and the mouth area map according to the preset sliding distance through the preset sliding window in the second neural network model, and Perform local feature extraction at each preset position to obtain multiple local feature vectors corresponding to the eye area map and mouth area map;
  • the size of the preset sliding window is determined based on the width and height of the eye area map and the mouth area map respectively.
  • the processing module 703 is specifically configured to crop the eye area map and the mouth area map respectively at each preset position to obtain multiple eye feature maps and multiple mouth feature maps;
  • the feature extraction module 702 is specifically used to extract local features from each eye feature map and each mouth feature map to obtain multiple eye feature vectors and multiple mouth region maps corresponding to multiple eye area maps. Corresponding multiple mouth feature vectors.
  • the processing module 703 is specifically configured to input multiple eye feature vectors and multiple mouth feature vectors into the fully connected layer network model respectively, so as to obtain the correspondence between multiple eye feature vectors and multiple mouth feature vectors.
  • Multiple expression classification probabilities each expression classification probability corresponds to an eye feature vector and a mouth feature vector;
  • the processing module 703 is specifically used to average multiple expression classification probabilities to determine the local expression classification probability corresponding to the face image.
  • the processing module 703 is specifically configured to perform super-resolution processing and noise reduction processing on a plurality of first sub-images through a first neural network model to obtain a plurality of second sub-images, a plurality of second sub-images and a plurality of third sub-images.
  • One sub-image has one-to-one correspondence;
  • the processing module 703 is specifically used to splice multiple second sub-images to obtain a first image.
  • the processing module 703 is specifically used to splice multiple second sub-images respectively according to the position identifier to obtain the first image;
  • the position identification of the first sub-image of the target in the face image is the same as the position identification of the second sub-image of the target in the first image
  • the first sub-image of the target is any one of the plurality of first sub-images
  • the first sub-image of the target is The second sub-image is an image corresponding to the target first sub-image among the plurality of second sub-images.
  • the acquisition module 701 is specifically used to obtain the first weight corresponding to the global expression classification probability and the second weight corresponding to the local expression classification probability, where the sum of the first weight and the second weight is 1;
  • the processing module 703 is also used to output the target rendered facial image.
  • each module can implement the steps of the facial expression recognition method provided in the above method embodiment, and can achieve the same technical effect. To avoid duplication, the details will not be described here.
  • the present disclosure also provides a terminal device, which may include:
  • Memory 801 storing executable program code
  • processor 802 coupled to memory 801;
  • the present disclosure provides a computer-readable storage medium that stores a computer program, wherein the computer program causes the computer to perform some or all steps of the method in each of the above method embodiments.
  • the present disclosure also provides a computer program product, wherein when the computer program product is run on a computer, the computer is caused to perform part or all of the steps of the method in each of the above method embodiments.
  • the present disclosure also provides an application publishing platform, wherein the application publishing platform is used to publish a computer program product, wherein when the computer program product is run on a computer, the computer is caused to execute part or all of the methods in the above method embodiments. step.
  • the size of the sequence numbers of the above-mentioned processes does not necessarily mean the execution order.
  • the execution order of each process should be determined by its functions and internal logic, and should not be determined by the present disclosure.
  • the implementation process constitutes no limitation.
  • the units described above as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of this embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-accessible memory.
  • the technical solution of the present disclosure is essentially, or the part that contributes to the existing technology, or all or part of the technical solution, can be embodied in the form of a software product, and the computer software product is stored in a memory , including several requests to cause a computer device (which can be a personal computer, a server or a network device, etc., specifically a processor in a computer device) to execute some or all of the steps of the above methods in various embodiments of the present disclosure.
  • a computer device which can be a personal computer, a server or a network device, etc., specifically a processor in a computer device
  • the program can be stored in a computer-readable storage medium, and the storage medium includes a read-only storage medium.
  • Memory Read-Only Memory, ROM), Random Access Memory (RAM), Programmable Read-only Memory (PROM), Erasable Programmable Read Only Memory, EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.
  • Memory Read-Only Memory, ROM
  • RAM Random Access Memory
  • PROM Programmable Read-only Memory
  • EPROM Erasable Programmable Read Only Memory
  • OTPROM One-time Programmable Read-Only Memory
  • EEPROM Electronically Erasable Programmable Read-Only Memory
  • CD-ROM Compact Disc Read-Only Memory
  • This disclosure discloses a facial expression recognition method, by obtaining a face image; extracting global features of the face image to obtain a global feature vector, and determining the global expression classification probability corresponding to the face image based on the global feature vector; through the trained neural network The model extracts local features of the face image to obtain local feature vectors, and determines the local expression classification probability corresponding to the face image based on the local feature vector; determines the target expression classification probability corresponding to the face image based on the global expression classification probability and local expression classification probability , and determine the facial expression corresponding to the face image based on the target expression classification probability.
  • This can effectively reduce the impact of environmental factors on global features and local features respectively, improve the accuracy of facial expression detection, and has strong industrial practicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

La présente divulgation divulgue un procédé de reconnaissance d'expression faciale, un dispositif terminal et un support de stockage. Le procédé consiste : à acquérir une image faciale ; à effectuer une extraction de caractéristiques globales sur l'image faciale de façon à obtenir un vecteur de caractéristiques globales et, selon le vecteur de caractéristiques globales, à déterminer une probabilité de classification d'expression globale correspondant à l'image faciale ; à extraire une caractéristique locale de l'image faciale au moyen d'un modèle de réseau neuronal formé de façon à obtenir un vecteur de caractéristiques locales et, selon le vecteur de caractéristiques locales, à déterminer une probabilité de classification d'expression locale correspondant à l'image faciale ; et, selon la probabilité de classification d'expression globale et la probabilité de classification d'expression locale, à déterminer une probabilité de classification d'expression cible correspondant à l'image faciale et, selon la probabilité de classification d'expression cible, à déterminer une expression faciale correspondant à l'image faciale.
PCT/CN2022/140931 2022-06-27 2022-12-22 Procédé de reconnaissance d'expression faciale, dispositif terminal et support de stockage WO2024001095A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210738438.4 2022-06-27
CN202210738438.4A CN115035581A (zh) 2022-06-27 2022-06-27 面部表情识别方法、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024001095A1 true WO2024001095A1 (fr) 2024-01-04

Family

ID=83126260

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/140931 WO2024001095A1 (fr) 2022-06-27 2022-12-22 Procédé de reconnaissance d'expression faciale, dispositif terminal et support de stockage

Country Status (2)

Country Link
CN (1) CN115035581A (fr)
WO (1) WO2024001095A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035581A (zh) * 2022-06-27 2022-09-09 闻泰通讯股份有限公司 面部表情识别方法、终端设备及存储介质
CN116128734B (zh) * 2023-04-17 2023-06-23 湖南大学 一种基于深度学习的图像拼接方法、装置、设备和介质
CN117315749A (zh) * 2023-09-25 2023-12-29 惠州市沃生照明有限公司 用于台灯的灯光智能调控方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629321A (zh) * 2012-03-29 2012-08-08 天津理工大学 基于证据理论的人脸表情识别方法
CN105095827A (zh) * 2014-04-18 2015-11-25 汉王科技股份有限公司 人脸表情识别装置和方法
CN109934173A (zh) * 2019-03-14 2019-06-25 腾讯科技(深圳)有限公司 表情识别方法、装置及电子设备
CN110580461A (zh) * 2019-08-29 2019-12-17 桂林电子科技大学 一种结合多级卷积特征金字塔的人脸表情识别算法
US20200110927A1 (en) * 2018-10-09 2020-04-09 Irene Rogan Shaffer Method and apparatus to accurately interpret facial expressions in american sign language
CN111144348A (zh) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 图像处理方法、装置、电子设备及存储介质
CN115035581A (zh) * 2022-06-27 2022-09-09 闻泰通讯股份有限公司 面部表情识别方法、终端设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102629321A (zh) * 2012-03-29 2012-08-08 天津理工大学 基于证据理论的人脸表情识别方法
CN105095827A (zh) * 2014-04-18 2015-11-25 汉王科技股份有限公司 人脸表情识别装置和方法
US20200110927A1 (en) * 2018-10-09 2020-04-09 Irene Rogan Shaffer Method and apparatus to accurately interpret facial expressions in american sign language
CN109934173A (zh) * 2019-03-14 2019-06-25 腾讯科技(深圳)有限公司 表情识别方法、装置及电子设备
CN110580461A (zh) * 2019-08-29 2019-12-17 桂林电子科技大学 一种结合多级卷积特征金字塔的人脸表情识别算法
CN111144348A (zh) * 2019-12-30 2020-05-12 腾讯科技(深圳)有限公司 图像处理方法、装置、电子设备及存储介质
CN115035581A (zh) * 2022-06-27 2022-09-09 闻泰通讯股份有限公司 面部表情识别方法、终端设备及存储介质

Also Published As

Publication number Publication date
CN115035581A (zh) 2022-09-09

Similar Documents

Publication Publication Date Title
US11182591B2 (en) Methods and apparatuses for detecting face, and electronic devices
CN109359538B (zh) 卷积神经网络的训练方法、手势识别方法、装置及设备
CN110532984B (zh) 关键点检测方法、手势识别方法、装置及系统
JP6636154B2 (ja) 顔画像処理方法および装置、ならびに記憶媒体
WO2024001095A1 (fr) Procédé de reconnaissance d'expression faciale, dispositif terminal et support de stockage
US20210174072A1 (en) Microexpression-based image recognition method and apparatus, and related device
US9547908B1 (en) Feature mask determination for images
WO2020078119A1 (fr) Procédé, dispositif et système de simulation d'utilisateur portant des vêtements et des accessoires
CN102332095B (zh) 一种人脸运动跟踪方法和系统以及一种增强现实方法
EP3338217A1 (fr) Détection et masquage de caractéristique dans des images sur la base de distributions de couleurs
KR101141643B1 (ko) 캐리커쳐 생성 기능을 갖는 이동통신 단말기 및 이를 이용한 생성 방법
CN112800903B (zh) 一种基于时空图卷积神经网络的动态表情识别方法及系统
JP6207210B2 (ja) 情報処理装置およびその方法
CN109685713B (zh) 化妆模拟控制方法、装置、计算机设备及存储介质
JP2024500896A (ja) 3d頭部変形モデルを生成するための方法、システム及び方法
WO2019142127A1 (fr) Procédé et système de création d'émoticônes d'expression multiples
WO2024021742A9 (fr) Procédé d'estimation de point de fixation et dispositif associé
CN112836625A (zh) 人脸活体检测方法、装置、电子设备
JP2024503794A (ja) 2次元(2d)顔画像から色を抽出するための方法、システム及びコンピュータプログラム
CN111507149B (zh) 基于表情识别的交互方法、装置和设备
US20160140748A1 (en) Automated animation for presentation of images
Chien et al. Detecting nonexistent pedestrians
KR101189043B1 (ko) 영상통화 서비스 및 그 제공방법, 이를 위한 영상통화서비스 제공서버 및 제공단말기
CN112132750A (zh) 一种视频处理方法与装置
US20230093827A1 (en) Image processing framework for performing object depth estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22949171

Country of ref document: EP

Kind code of ref document: A1