WO2023065503A1 - 一种面部表情的分类方法和电子设备 - Google Patents

一种面部表情的分类方法和电子设备 Download PDF

Info

Publication number
WO2023065503A1
WO2023065503A1 PCT/CN2021/138099 CN2021138099W WO2023065503A1 WO 2023065503 A1 WO2023065503 A1 WO 2023065503A1 CN 2021138099 W CN2021138099 W CN 2021138099W WO 2023065503 A1 WO2023065503 A1 WO 2023065503A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification
feature
layer
expression
feature extraction
Prior art date
Application number
PCT/CN2021/138099
Other languages
English (en)
French (fr)
Inventor
叶欣婷
谢耀钦
胡嘉尼
梁晓坤
秦文健
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2023065503A1 publication Critical patent/WO2023065503A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present application relates to the field of image processing, in particular to a facial expression classification method and electronic equipment.
  • facial expression classification is a research hotspot in the field of image processing.
  • the classification of facial pain expressions is one of the research hotspots in the medical field.
  • Convolutional neural networks are usually used to classify the pain degree of facial pain expressions of newborns, critically ill patients, and patients with aphasia.
  • the pain feature extraction results of the existing convolutional neural network for facial pain expressions are not ideal, which affects the accuracy of pain degree classification for facial pain expressions based on the pain feature extraction results.
  • the present application provides a facial expression classification method and an electronic device, which can solve the problem of low facial expression classification accuracy.
  • a method for classifying facial expressions including: acquiring a target image, the target image including the facial expression of the target object; inputting the target image into an expression classification model to obtain a classification result, the classification The result is used to indicate the degree of emotional expression of the facial expression;
  • the expression classification model includes: a first feature extraction network, a second feature extraction network and a fusion classification module; the target image is processed by the first feature extraction network Local feature extraction to obtain the first feature of the region where the facial expression is located; the second feature extraction network is used to extract the global feature of the target image to obtain the second feature; the first feature and the first feature are obtained by the fusion classification module The second feature is subjected to feature fusion and classification to obtain the classification result.
  • the above method can be executed by a chip on an electronic device.
  • this application adopts the expression classification model formed by the parallel first feature extraction network and the second feature extraction network to analyze the facial expressions of the target object.
  • the extraction rate of emotional features of facial expressions is improved, and the accuracy rate of emotional expression degree classification according to the feature extraction results of facial expressions is improved.
  • the first feature extraction network is VGG16
  • the input layer of the VGG16 includes: a local attention layer
  • the local attention layer is used to identify areas other than the area where the facial expression is located in the target image Perform information decay operations.
  • the above local attention layer performs an irrelevant information attenuation operation on areas other than the area where the facial expression is located in the target image, and at the same time inversely enhances the important relevant information of the area where the facial expression is located in the target image, which is conducive to improving the facial expression classification model.
  • the accuracy of the classification of the emotional feature extraction results.
  • the input layer of the VGG16 includes the first convolutional layer, the first batch of normalization layers, the first activation layer, the local attention layer and the first maximum pooling layer connected in sequence; the local The manner in which the attention layer performs an information attenuation operation on areas other than the area where the facial expression is located in the target image is: after the local attention layer receives the output information of the first activation layer, according to the first The output information of the activation layer determines a two-dimensional image mask, and multiplies the two-dimensional image mask and the output information of the first activation layer to obtain the output information of the local attention layer; wherein, the local The output information of the attention layer is used to input to the network layer connected after the local attention layer for local feature extraction.
  • the determining the two-dimensional image mask according to the output information of the first activation layer includes: calculating the average activation value of the feature map of each channel in the output information of the first activation layer to obtain N average activation values; determine the first channel according to the N average activation values, and the first channel is the channel corresponding to the largest average activation value among the N average activation values; for each of the first channels mask setting for pixels, wherein, when the first pixel in the first channel is greater than or equal to the maximum average activation value, the mask value corresponding to the first pixel value is set to 1; When the first pixel in the first channel is smaller than the maximum average activation value, the mask value corresponding to the first pixel value is set to 0; the first pixel is the first channel Any pixel in , the N is a positive integer.
  • the second feature extraction network is ResNet18.
  • the fusion classification module includes an orthogonal module and a classification module, and the orthogonal module is used to perform normalization on the first feature and the second feature of the region where the facial expression is located by using a preset orthogonal function. an intersection operation to obtain an orthogonal result; the classification module is used to perform feature aggregation and classification on the orthogonal result by using a preset classification function to obtain the classification result.
  • the target image is a pain expression image.
  • a facial expression classification device including a module for performing any one of the methods in the first aspect.
  • an electronic device including a module for performing any one of the methods in the first aspect.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes any one of the first aspect. Methods.
  • Fig. 1 is the implementation step schematic diagram of the classification method of facial expression in the embodiment of the present invention
  • Fig. 2 is the structural representation of expression classification model in the embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a partial structure of a residual network in an embodiment of the present invention.
  • Fig. 4 is a schematic diagram of the result of classifying the pain degree of pain expression images by the expression classification model in the embodiment of the present invention.
  • Fig. 5 is a schematic diagram of specific process steps of a method for classifying facial expressions in an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a classification device for facial expressions in an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
  • references to "one embodiment” or “some embodiments” or the like in the specification of the present application means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Accordingly, appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc., in various places in this specification are not necessarily all References to the same embodiment mean “one or more but not all” unless specifically stated otherwise.
  • the terms “including”, “comprising”, “having” and variations thereof mean “including but not limited to”, unless specifically stated otherwise.
  • facial expression classification has become a research hotspot in the field of image processing.
  • convolutional neural networks are usually used to classify the facial pain expressions of newborns, critically ill patients, and patients with aphasia.
  • the existing convolutional neural network loses a lot of key feature information when extracting pain features from facial pain expressions, resulting in a low accuracy rate of classification results for facial pain expressions.
  • This application will use a double-parallel expression classification model to classify the facial expressions of human faces, wherein the first feature extraction network is used to extract the first feature of the area where the facial expression is located in the expression image, and the second feature extraction network is used to extract The global features of the facial expression image are used to make up for other feature information not extracted by the first feature extraction network, so that the double-parallel expression classification model can improve the extraction rate of emotional features of facial expressions, and then solve the problem of accurate facial expression classification. low rate problem.
  • the application proposes a method for classifying facial expressions, as shown in Figure 1, the method is executed by electronic equipment, and the method includes :
  • the electronic device acquires a target image (that is, a facial expression image of a human face), wherein the target object includes a newborn, aphasia patient, and normal person; the target image includes: a happy expression image, a fear expression image, an angry expression image and pain expression images.
  • a target image that is, a facial expression image of a human face
  • the target object includes a newborn, aphasia patient, and normal person
  • the target image includes: a happy expression image, a fear expression image, an angry expression image and pain expression images.
  • This application only uses the pain degree classification of pain expression images as an example to illustrate the pain degree classification method for facial pain expressions.
  • the classification methods for other types of expression images are similar and will not be repeated here.
  • the electronic device can obtain a pain expression dataset of a human face through the UNBC-McMaster Shoulder Pain Expression Archive Database (UNBC-McMaster Shoulder Pain Expression Archive Database, referred to as the UNBC database).
  • the data set contains shoulder pain video data of 25 volunteers, and the shoulder pain video data has a total of 200 video sequence data; the 200 video sequence data contains a total of 48198 frames of pain expression images.
  • the 48198 frames of pain expression images are all stored in PNG format, and the resolution of each frame of pain expression images is about 352 ⁇ 240 pixels; in practical applications, each frame of pain expression images can be cropped to obtain the image Target data with a dimension of 3*200*200 (that is, image data with a size of 200*200 and a channel number of 3).
  • the above-mentioned existing pain expression data set has divided the pain degree of each frame of the pain expression image according to the PSPI standard, and has been divided into 16 levels according to the severity of the pain degree. The higher the above-mentioned pain level, the greater the pain level. However, the division of pain degree in the above-mentioned existing pain expression data set has the situation that the data volume of different pain levels is unevenly distributed.
  • the pain degree division of the existing pain expression data set re-cluster and reduce the dimensionality of the pain expression data of different pain degrees, for example, divide the existing pain degree into 0 level, 1 level, 2 level and 3 level
  • the expression division results of the grades are retained; the pain expression data with the original pain level of 4 and 5 are combined into a new level, that is, level 4; the pain expression data with the original pain level of 6 and above are combined into one New level, namely level 5.
  • the pain degree of the existing pain expression dataset is reclassified into 6 levels.
  • the number of pain expression images corresponding to some pain levels is much larger than the data volume of other pain levels.
  • the number of pain expression images at level 0 is 31200 frames
  • the number of pain expression images of grade 0 is extremely large.
  • 1/10 of the data can be randomly extracted from the pain expression images of grade 0. , that is, 3120 (ie, 31200 divided by 10) frames of pain expression images in level 0 are used in practical applications.
  • the pain expression image data in the same pain degree after re-division are stored according to the order of different volunteers, in order to avoid only extracting the pain expression images of some volunteers during the data extraction process, we will now
  • the storage method of the pain expression image data of different volunteers in each pain degree is randomly disrupted; Pain expression image data for each pain level is divided into training dataset and test dataset.
  • the training data set and the test data set are used to train the expression classification model and test the expression classification model respectively.
  • the expression classification model includes: a first feature extraction network, a second feature extraction network and a fusion classification module; The local feature extraction of the target image is performed through the first feature extraction network to obtain the first feature of the area where the facial expression is located; the global feature extraction of the target image is obtained through the second feature extraction network to obtain the second feature; the first feature is obtained through the fusion classification module The feature and the second feature are combined and classified to obtain the classification result.
  • the above classification result refers to the result of classifying the emotion degree of the input target image by the expression classification model, and the classification result can indicate the degree of emotional expression of the facial expression, wherein the emotion includes: pain, joy, fear and anger; the above
  • the classification results include: degree of pain, degree of fear, degree of anger and degree of pleasure; for example, the pain degree of the above pain expression image can be divided into 6 grades, namely grade 0, grade 1, grade 2, grade 3, grade 4, grade 5;
  • the expression classification model extracts pain features from the input painful expression image (i.e. target image) and outputs the classification result of pain degree; as another example, the expression classification model extracts the anger feature from the input angry expression image and outputs the classification result of the degree of anger classification results.
  • the expression classification model will output the pain degree corresponding to the image of pain expression as level 1.
  • the expression classification model outputs the fear degree corresponding to the fear expression image as level 2.
  • the above-mentioned expression classification model includes: a first feature extraction network 202, a second feature extraction network 203 and a fusion classification module 204;
  • the local important feature i.e. the emotional feature of the region where the face is located in the target image
  • the second feature extraction network 203 is used to supplement the global feature of the target image.
  • the second feature is obtained; after that, the fusion and classification module 204 performs feature fusion and classification on the first feature and the second feature of the area where the facial expression is located, to obtain a classification result.
  • the pain expression image 201 is input to the first feature extraction network 202, and the first feature extraction network 202 extracts the important feature information of the area where the facial expression of the face is located in the pain expression image 201 (i.e. the local feature extraction process), and obtains The first feature of the area where the facial pain expression is located; meanwhile, the pain expression image 201 is input into the second feature extraction network 203, and the second feature extraction network 203 extracts the global feature information of the pain expression image 201 of the human face pain expression (i.e.
  • the global feature extraction process to obtain the second feature; then, through the fusion classification module 204, the first feature and the second feature of the area where the facial pain expression is located are subjected to feature fusion and the pain degree classification is carried out to the feature fusion result to obtain the pain The degree of pain corresponding to the expression image 201 (that is, the classification result).
  • the first feature extraction network is VGG16
  • the input layer of the VGG16 includes: a local attention layer
  • the local attention layer is used to perform an information attenuation operation on areas other than the area where the facial expression is located in the target image.
  • the local attention layer attenuates the irrelevant information outside the area where the pain expression of the face is located in the pain expression image 201, and can remove the non-important information in the pain expression image 201 that is not related to the pain expression of the human face, thereby reversely enhancing
  • the important relevant information of the area where the pain expression on the human face is located in the pain expression image 201 ie, the target image
  • the input layer of the above-mentioned VGG16 includes the first convolutional layer, the first batch of normalization layers, the first activation layer, the above-mentioned local attention layer and the first maximum pooling layer connected in sequence; the local attention layer
  • the method of performing information attenuation operation on the area other than the area where the facial expression is located in the target image is: after the local attention layer receives the output information of the first activation layer, it determines the two-dimensional image mask according to the output information of the first activation layer, and Multiply the output information of the two-dimensional image mask and the first activation layer to obtain the output information of the local attention layer; wherein, the output information of the local attention layer is used to input to the network layer connected after the local attention layer for local feature extraction.
  • the convolution kernel size of the first convolution layer above is 3 ⁇ 3
  • the kernel size of the first normalization layer is 64
  • the kernel size of the first max pooling layer is 2 ⁇ 2.
  • the pain expression image 201 whose size is 200 ⁇ 200 and the number of channels is 3 (that is, 200 ⁇ 200 ⁇ 3) is input to the first convolutional layer.
  • the first convolution layer output size is 200 ⁇ 200 and the number of channels is 64 (ie 200 ⁇ 200 ⁇ 64)
  • the first convolution result; the first batch of normalization layers Normalize the first convolution results of 200 ⁇ 200 ⁇ 64, and output the first batch of normalized results of 200 ⁇ 200 ⁇ 64;
  • the first activation layer normalizes the first batch of 200 ⁇ 200 ⁇ 64 Normalize the result, and output the first activation result of 200 ⁇ 200 ⁇ 64; after the local attention layer receives the first activation result of 200 ⁇ 200 ⁇ 64 (that is, the output information of the first activation layer), and A two-dimensional image mask is generated according to the first activation result of 200 ⁇ 200 ⁇ 64, and then the two-dimensional image mask is multiplied by the first activ
  • the output information of the local attention layer is used to input to the network layer connected after the local attention layer for local feature extraction.
  • the first maximum pooling layer performs a maximum pooling operation on the output information of the local attention layer, and outputs a feature map with a dimension of 100 ⁇ 100 ⁇ 64.
  • determining the two-dimensional image mask according to the output information of the first activation layer includes: performing average activation value calculation on the feature map of each channel in the output information of the first activation layer to obtain N average activation values; according to N average activation values determine the first channel, and the first channel is the channel corresponding to the largest average activation value in the N average activation values; each pixel in the first channel is masked, wherein, when the first When the first pixel in the channel is greater than or equal to the maximum average activation value, set the mask value corresponding to the first pixel value to 1; when the first pixel in the first channel is smaller than the maximum average activation value, set The mask value corresponding to the first pixel value is set to 0; the first pixel is any pixel in the first channel, and N is a positive integer.
  • the activation value is 0.6
  • the first pixel is A or B or C or D as an example to illustrate the process of mask setting for each pixel in the first channel by the mask generation module; for example, the A pixel in the first channel is 0.71, B pixel is 0.52, C pixel is 0.64, and D pixel is 0.42.
  • the mask value corresponding to the A pixel is set to 1; the first The B pixel in the channel is 0.52 (less than 0.6), so the mask value corresponding to the B pixel is set to 0; the C pixel in the first channel is 0.64 (greater than 0.6), therefore, the position corresponding to the C pixel is set
  • the mask value of is set to 1; the D pixel in the first channel is 0.42 (less than 0.6), therefore, the mask value corresponding to the D pixel is set to 0; and so on, each of the first channel
  • the pixels are all masked in this manner, thereby generating a 200 ⁇ 200 two-dimensional image mask; the 200 ⁇ 200 two-dimensional image mask includes 200 ⁇ 200 mask values; the 200 ⁇ 200 two-dimensional image mask
  • the code is used as a reference template (that is, a reference basis) for the subsequent local attention layer to perform pixel value attenuation operations on the first activation result of 200 ⁇ 200 ⁇ 64 according to the 200 ⁇
  • the local attention layer performs a pixel value attenuation operation on the first activation result of 200 ⁇ 200 ⁇ 64 according to the two-dimensional image mask of 200 ⁇ 200 to obtain the first feature map of 200 ⁇ 200 ⁇ 64.
  • the random pixel value attenuation factor r is set to a specific value between (0, 1) through a random access mechanism.
  • the pixel value of the corresponding position in the first activation result of 200 ⁇ 200 ⁇ 64 is multiplied by r (that is, for the corresponding position The pixel value of the attenuation operation) to obtain the attenuated first activation result of 200 ⁇ 200 ⁇ 64 (ie, the second feature map of 200 ⁇ 200 ⁇ 64); if a certain position in the two-dimensional image mask used as a reference When the current mask value of is 1, the pixel value at the corresponding position in the first activation result of 200 ⁇ 200 ⁇ 64 remains unchanged.
  • the local attention layer traverses each pixel value in the first activation result of 200 ⁇ 200 ⁇ 64 according to the two-dimensional image mask, that is, for each layer in the 64 layers (each layer has a total of 200 ⁇ 200 pixel values ) for each pixel value of the attenuation operation (that is, for each pixel value of each layer in the 64 layers (each layer has a total of 200 ⁇ 200 pixel values), the attenuation operation of non-important feature information is performed), so as to extract 64*200 *200 important feature information of the area where the facial pain expression is located.
  • the local attention layer performs pixel value attenuation operation on the first activation result of 200 ⁇ 200 ⁇ 64 according to the two-dimensional image mask of 200 ⁇ 200, which can not only weaken the feature information that is less related to the pain feature, but also It can reversely enhance feature information that is highly correlated with pain features (that is, extract important feature information of the area where the painful expression on the face is located).
  • the above-mentioned VGG16 also includes: a convolution processing module, which includes: two convolution activation layers and multiple double convolution activation layers, wherein the first convolution activation layer in the two convolution activation layers is connected to multiple The first of the dual convolutional activation layers, and the last of the dual convolutional activation layers connects the second of the two convolutional activation layers.
  • a convolution processing module which includes: two convolution activation layers and multiple double convolution activation layers, wherein the first convolution activation layer in the two convolution activation layers is connected to multiple The first of the dual convolutional activation layers, and the last of the dual convolutional activation layers connects the second of the two convolutional activation layers.
  • Each of the above two convolutional activation layers includes a second convolutional layer, a second batch of normalization layers, a second ReLU activation layer, and a second maximum pooling layer connected in sequence, wherein the second volume
  • the convolution kernel size of the product layer is 3 ⁇ 3, and the kernel size of the second maximum pooling layer is 2 ⁇ 2;
  • each double convolution activation layer in multiple double convolution activation layers includes a third convolution layer connected in sequence , the third batch of normalization layer, the third ReLU activation layer, the fourth convolutional layer, the fourth batch of normalization layer, the fourth ReLU activation layer and the third maximum pooling layer, where the third convolutional layer and The convolution kernel size of the fourth convolutional layer is 3 ⁇ 3, and the kernel size of the third maximum pooling layer is 2 ⁇ 2.
  • the 100 ⁇ 100 ⁇ 64 feature map output by the first convolutional activation layer for the first maximum pooling layer passes through the second convolutional layer in the first convolutional activation layer, the The second batch of normalization layers (with a core size of 128), the second ReLU activation layer in the first convolutional activation layer, and the second maximum pooling layer in the first convolutional activation layer are convoluted for convolution processing, and finally
  • the output dimension is a feature map of 50 ⁇ 50 ⁇ 128.
  • the convolution processing module includes 3 double convolution activation layers, and the 3 double convolution activation layers include the first double convolution activation layer, the second double convolution activation layer and the third double convolution activation layer connected in sequence , wherein the third convolutional layer in the first double convolutional activation layer performs convolution processing on the 50 ⁇ 50 ⁇ 128 feature map output by the second maximum pooling layer to obtain a 50 ⁇ 50 ⁇ 256 feature map; the first The third batch normalization layer (with a core size of 256) in a dual convolutional activation layer performs batch normalization on the 50 ⁇ 50 ⁇ 256 feature map output by the third convolutional layer to obtain 50 ⁇ 50 ⁇ 256 The third batch of normalized results of ; the third ReLU activation layer in the first double convolution activation layer activates the third batch of normalized results of input 50 ⁇ 50 ⁇ 256 to obtain the first batch of 50 ⁇ 50 ⁇ 256 Three activation results; the fourth convolution layer in the first double convolution activation layer performs convolution processing on the third activation result of 50 ⁇ 50, the
  • the fourth maximum pooling layer in the first double convolutional activation layer outputs a feature map of 25 ⁇ 25 ⁇ 256 through the third convolutional layer in the second double convolutional activation layer and the second convolutional layer in the second double convolutional activation layer.
  • Three batches of normalization layers core size 512), third ReLU activation layer in the second dual convolutional activation layer, fourth convolutional layer in the second dual convolutional activation layer, second dual convolutional activation layer.
  • the fourth batch of normalization layers in (core size 512), the fourth ReLU activation layer in the second double convolutional activation layer, and the third maximum pooling layer in the second double convolutional activation layer, the final output dimension It is a feature map of 12 ⁇ 12 ⁇ 512.
  • the fourth maximum pooling layer in the second double convolutional activation layer outputs a feature map of 12 ⁇ 12 ⁇ 512 through the third convolutional layer in the third double convolutional activation layer and the second convolutional layer in the third double convolutional activation layer.
  • Three batch normalization layers (with a core size of 512), the third ReLU activation layer in the third dual convolutional activation layer, the fourth convolutional layer in the third dual convolutional activation layer, and the third dual convolutional activation layer
  • the 6 ⁇ 6 ⁇ 512 feature map output by the fourth maximum pooling layer in the third double convolutional activation layer passes through the second convolutional layer in the second convolutional activation layer and the second convolutional activation layer in the second convolutional activation layer.
  • the second batch of normalization layers (core size 512), the second ReLU activation layer in the second convolutional activation layer, and the second max pooling layer in the second convolutional activation layer, the final output dimension is 1 ⁇ 2048
  • the feature vector of that is, the first feature of the area where the pain expression is located).
  • the above-mentioned second feature extraction network 203 includes: ResNet18, ResNet50 and so on.
  • the above-mentioned ResNet18 includes: an input module, a residual network and an output module, wherein the input module performs convolution processing and activation processing on the input pain expression image 201 of 200 ⁇ 200 ⁇ 3, The first output information is obtained; the residual network performs global feature extraction (supplementary extraction of global features) on the first output information to obtain the second output information; the output module performs average pooling processing on the second output information to obtain the second feature.
  • the above input module includes the fifth convolutional layer, the fifth batch normalization layer, the fifth ReLU activation layer and the fourth maximum pooling layer connected in sequence, wherein the convolution kernel size of the fifth convolutional layer is 7 ⁇ 7 , the kernel size of the fifth normalization layer is 64, and the kernel size of the fourth max pooling layer is 3 ⁇ 3.
  • the above 200 ⁇ 200 ⁇ 3 pain expression image 201 sequentially passes through the fifth convolutional layer (output dimension is 100 ⁇ 100 ⁇ 64), the fifth batch of normalization layer (output dimension is 100 ⁇ 100 ⁇ 64), the fifth ReLU The activation layer (output dimension is 100 ⁇ 100 ⁇ 64) and the fourth maximum pooling layer, and the final output dimension is a feature map of 50 ⁇ 50 ⁇ 64.
  • the above residual network includes: a direct mapping subnetwork and a plurality of residual subnetworks, wherein the direct mapping subnetwork connects the first residual subnetwork in the plurality of residual subnetworks, and the last residual subnetwork in the plurality of residual subnetworks
  • a residual subnetwork connects the output modules.
  • the direct mapping subnetwork performs convolution processing on the 50 ⁇ 50 ⁇ 64 feature map output by the fourth maximum pooling layer to obtain the output information of the direct mapping subnetwork.
  • the dimension of the output information of the direct mapping subnetwork is 50 ⁇ 50 ⁇ 64.
  • the above-mentioned direct mapping sub-network includes: two cascaded residual modules 301 (i.e. weight layer 301) and residual module 302 (i.e. weight layer 302), and a first direct mapping branch 303, wherein the residual module 301 includes a sixth convolutional layer, a sixth batch normalization layer, a sixth ReLU activation layer, a seventh convolutional layer, and a seventh batch normalization layer connected in sequence, the sixth volume
  • the convolution kernel sizes of the product layer and the seventh convolutional layer are both 3 ⁇ 3, and the core sizes of the sixth batch of normalization layers and the seventh batch of normalization layers are both 64;
  • the residual module 302 includes sequentially connected Eight convolutional layers, the eighth batch normalization layer, the seventh ReLU activation layer, the ninth convolutional layer and the ninth batch normalization layer, the convolution kernel size of the eighth convolutional layer and the ninth convolutional layer Both are 3 ⁇ 3, and the core size of the eighth normalization layer and the ninth normalization layer are both 64.
  • the residual module 301 passes through the sixth convolutional layer, the sixth normalization layer, the sixth ReLU activation layer, and the seventh convolutional layer for the 50 ⁇ 50 ⁇ 64 feature map output by the fourth maximum pooling layer.
  • the output information with a dimension of 50 ⁇ 50 ⁇ 64 output by module 301 is sequentially processed through the eighth convolutional layer, the eighth batch of normalization layers, the seventh ReLU activation layer, the ninth convolutional layer and the ninth batch of normalization layers.
  • the convolution processing and the normalization processing obtain the output information of the residual module 302, and the dimension of the output information of the residual module 302 is 50 ⁇ 50 ⁇ 64.
  • the first direct mapping branch 303 directly maps the 50 ⁇ 50 ⁇ 64 feature map output by the fourth maximum pooling layer to obtain the first mapping result, which is still 50 ⁇ the output of the fourth maximum pooling layer 50 ⁇ 64 feature map.
  • the output information of the residual module 302 is spliced with the first mapping result to obtain a first splicing result with a dimension of 50 ⁇ 50 ⁇ 64.
  • each residual sub-network in the above-mentioned multiple residual sub-networks includes a sequentially connected residual module 304 (ie weight layer 304), residual module 305 (ie weight layer 305) , the residual branch 306, wherein the residual module 304 includes the tenth convolutional layer, the tenth batch of normalization layers, the seventh ReLU activation layer, the eleventh convolutional layer and the eleventh batch of normalization layers connected in sequence layer, the size of the convolution kernel of the tenth convolutional layer and the eleventh convolutional layer is 3 ⁇ 3, and the kernel size of the tenth batch of normalization layer and the eleventh batch of normalization layer is 64;
  • the difference module 305 includes a twelfth convolutional layer, a twelfth batch normalization layer, an eighth ReLU activation layer, a thirteenth convolutional layer, and a thirteenth batch normalization layer connected in sequence, the twelfth volume
  • the first residual sub-network in the 3 residual sub-networks processes the first splicing result of 50 ⁇ 50 ⁇ 64 in the following way: 50 ⁇ 50 ⁇
  • the first splicing result of 64 passes through the residual module 304 in the first residual sub-network and the residual module 305 in the first residual sub-network in turn to obtain the first residual information of 25 ⁇ 25 ⁇ 128; at the same time, 50
  • the first splicing result of ⁇ 50 ⁇ 64 is sampled through the residual branch 306 in the first residual sub-network (the core size of the fourteenth batch of normalization layers in the residual branch 306 is 128) for sampling processing (that is, the dimension increase operation ), to obtain the first sampling information of 25 ⁇ 25 ⁇ 128, the sampling process includes: at least one of upsampling and downsampling, selected according to the actual situation; the first residual information of 25 ⁇ 25 ⁇ 128 and 25 ⁇ 25
  • the first sample information of ⁇ 128 is spliced to
  • the residual module 304 in the above-mentioned first residual subnetwork sequentially passes through the tenth convolutional layer, the tenth batch of normalization layers (core size is 128), and the seventh ReLU activation for the first stitching result of 50 ⁇ 50 ⁇ 64 layer, the eleventh convolutional layer, and the eleventh batch of normalization layers (the core size is 128) perform convolution processing and normalization processing to obtain the output information of the residual module 304, and the output information of the residual module 304
  • the dimension is 25 ⁇ 25 ⁇ 128;
  • the output information of the residual module 305 in the above-mentioned first residual subnetwork to the residual module 304 passes through the twelfth convolutional layer and the twelfth batch normalization layer (core size 128), the eighth ReLU activation layer, the thirteenth convolutional layer and the thirteenth batch normalization layer (core size is 128) for convolution processing and normalization processing, to obtain the output information of the residual module 305 ( That is, the first residual information of 25 ⁇ 25
  • the second residual sub-network processes the second splicing result of 25 ⁇ 25 ⁇ 128 in the following way: the second splicing result of 25 ⁇ 25 ⁇ 128 passes through the second residual sub-network in sequence
  • the residual module 304 and the residual module 305 in the second residual sub-network obtain the second residual information of 13 ⁇ 13 ⁇ 256; at the same time, the second splicing result of 25 ⁇ 25 ⁇ 128 passes through the second residual sub-network
  • the residual branch 306 in the residual branch 306 (the core size of the fourteenth batch of normalization layers in the residual branch 306 is 256) performs sampling processing (that is, the dimension raising operation), and obtains the second sampling information of 13 ⁇ 13 ⁇ 256, the sampling
  • the processing includes: at least one of upsampling and downsampling, selected according to the actual situation; splicing the second residual information of 13 ⁇ 13 ⁇ 256 and the second sampling information of 13 ⁇ 13 ⁇ 256 to obtain 13 ⁇ 13 ⁇ 256
  • the third splicing result of 256 includes: at least one
  • the residual module 304 in the above-mentioned second residual sub-network passes through the second concatenation result of 25 ⁇ 25 ⁇ 128 sequentially through the tenth convolutional layer, the tenth batch of normalization layers (core size is 256), and the seventh ReLU activation Layer, the eleventh convolutional layer and the eleventh batch of normalization layers (core size is 256) perform convolution processing and normalization processing to obtain the output information of the residual module 304, the output information of the residual module 304
  • the dimension is 13 ⁇ 13 ⁇ 256;
  • the output information of the residual module 305 in the second residual subnetwork to the residual module 304 passes through the twelfth convolutional layer and the twelfth batch normalization layer (core size 256), the eighth ReLU activation layer, the thirteenth convolutional layer, and the thirteenth batch normalization layer (core size is 256) for convolution processing and normalization processing, to obtain the output information of the residual module 305 ( That is, the second residual information of 13 ⁇ 13 ⁇ 256),
  • the third residual sub-network processes the third stitching result of 13 ⁇ 13 ⁇ 256 in the following way: the third stitching result of 13 ⁇ 13 ⁇ 256 passes through the third residual sub-network successively.
  • the residual module 304 and the residual module 305 in the third residual sub-network obtain the third residual information of 7 ⁇ 7 ⁇ 512; at the same time, the third splicing result of 13 ⁇ 13 ⁇ 256 passes through the third residual sub-network
  • the residual branch 306 in the residual branch 306 (the core size of the fourteenth batch of normalization layers in the residual branch 306 is 512) performs sampling processing (that is, the dimension raising operation), and obtains the third sampling information of 7 ⁇ 7 ⁇ 512.
  • the sampling The processing includes: at least one of upsampling and downsampling, selected according to the actual situation; splicing the third residual information of 7 ⁇ 7 ⁇ 512 and the third sampling information of 7 ⁇ 7 ⁇ 512 to obtain 7 ⁇ 7 ⁇ 7 for the fourth splicing result.
  • the residual module 304 in the above-mentioned third residual sub-network passes through the third concatenation result of 13 ⁇ 13 ⁇ 256 sequentially through the tenth convolutional layer, the tenth batch of normalization layers (core size is 512), and the seventh ReLU activation layer, the eleventh convolutional layer, and the eleventh batch of normalization layers (the core size is 512) perform convolution processing and normalization processing to obtain the output information of the residual module 304, and the output information of the residual module 304
  • the dimension is 7 ⁇ 7 ⁇ 512;
  • the output information of the residual module 305 in the third residual sub-network to the residual module 304 passes through the twelfth convolutional layer and the twelfth batch normalization layer (core size 512), the eighth ReLU activation layer, the thirteenth convolutional layer, and the thirteenth batch of normalization layers (core size is 512) for convolution processing and normalization processing, to obtain the output information of the residual module 305 ( That is, the third residual information of 7 ⁇
  • the above output module includes an average pooling layer with a size of 3 ⁇ 3.
  • the average pooling layer performs an average pooling operation on the fourth splicing result of 7 ⁇ 7 ⁇ 7 to obtain a 1 ⁇ 1 ⁇ 512 eigenvector (i.e. the second feature).
  • the fusion classification module 204 includes an orthogonal module 2041 and a classification module 2042, and the orthogonal module 2041 is used to perform an orthogonal operation on the first feature and the second feature of the area where the facial expression is located by using a preset orthogonal function to obtain Orthogonal results; the classification module 2042 is used to perform feature aggregation and classification on the orthogonal results by using a preset classification function to obtain classification results.
  • the above-mentioned orthogonal module 2041 includes: the Bilinear function provided in the Pytorch deep learning library; the above-mentioned classification module 2042 includes: the Linear function and the Softmax classification function provided in the Pytorch deep learning library connected in sequence.
  • the Bilinear function performs an orthogonal operation on the feature vector with an output dimension of 1 ⁇ 2048 (that is, the first feature of the area where the pain expression is located) and the feature vector with a dimension of 1 ⁇ 1 ⁇ 512 (that is, the second feature) to obtain an orthogonal
  • the fusion operation of the first feature and the second feature of the area where the pain expression is located is realized.
  • the Linear function performs feature aggregation on the orthogonal results (that is, dimensionality reduction operation), and obtains the aggregation result with an output dimension of 6; the Softmax classification function classifies the aggregation result, and finally obtains the classification result (for example, a pain expression of 200 ⁇ 200 ⁇ 3 The pain degree corresponding to the image 201 is grade 3).
  • the cross-entropy function provided in the Pytorch deep learning library can be selected as the loss function for training, and the stochastic gradient descent method can be selected as the training optimizer.
  • the loss after Softmax classification is gradually calculated through forward propagation, and the weight of the expression classification model is updated according to the calculated loss value backpropagation until the expression classification model tends to converge, then the training of the expression classification model can be stopped. And save the weight of the expression classification model.
  • Acc is the prediction accuracy rate of the expression classification model, that is, the proportion of the predicted label values of the expression images that conform to the actual real label values Compare.
  • RMSE is calculated as the error between the predicted and true values, i.e., the error between the predicted and true label values of the expression images.
  • the PCC coefficient is used to reflect the performance of the expression classification model in predicting the results of expression images in different time dimensions.
  • Acc can be used to indicate the proportion of the predicted label value of the pain expression image that matches the actual real label value.
  • RMSE can be used to indicate the error between the predicted value and the true value of the pain expression image.
  • the PCC coefficient is used to reflect the performance of the expression classification model in predicting the results of pain expression images in different time dimensions.
  • the expression classification model performs pain degree on 80 frames of facial pain expression images After classification, there are only 6 frames that misclassify the pain degree of the face pain expression image (the black '+' that is not on the black solid line in Figure 4). It can be seen that the double-parallel expression classification model proposed by the present invention can effectively classify the pain degree of pain expression images.
  • M i is the number of correctly classified samples in each classification result tested by the expression classification model.
  • N represents the number of all expression images in the experiment, y i and Respectively represent the real label value and predicted label value of the expression image (for example, pain expression image). and Respectively represent the sequence and the mean of ⁇ y 1 ,y 2 ,...,y N ⁇ .
  • the expression classification model provided by this application is trained and verified on publicly available data sets, and according to the three evaluations of accuracy (Acc), root mean square error (RMSE) and Pearson correlation coefficient (PCC)
  • the index quantitatively evaluates the performance of the expression classification model (that is, the classification accuracy rate); among them, the accuracy rate (Acc) is 92.11%, the root mean square error (RMSE) is 0.48, and the Pearson correlation coefficient (PCC) is 0.95.
  • the performance evaluation results of the expression classification model provided by this application are compared with existing relatively advanced experimental methods (for example, a new 3D deep network model SCN, which uses multiple convolutional layers of different time depths to obtain facial expressions. Extensive spatio-temporal variation, the root mean square error (RMSE) finally obtained is 0.57, and the Pearson correlation coefficient (PCC) is 0.92), which is very close to the results, which also illustrates the effectiveness of the expression classification model provided by this application.
  • Preparatory stage 501 preparing a data set of facial pain expression images containing pain level labels, for example, the pain expression images have been divided into 6 pain levels according to the degree of pain.
  • Modeling stage 502 Construct a model for pain expression classification based on a dual-parallel expression classification model combined with a local attention mechanism.
  • the parallel expression classification model can be used to classify the pain degree of pain expression images.
  • Training stage 503 Make a training data set based on the prepared pain expression image data, and use the training data set to iteratively train the constructed expression classification model, for example, divide the pain expression image data set into a training data set and a test data set, wherein , the training data set is used to iteratively train the constructed expression classification model, and an expression classification model that meets the requirements has been obtained.
  • Classification stage 504 use the trained expression classification model to classify the pain level of the facial pain expressions in the test set, for example, use the trained expression classification model to classify the pain degree of the pain expression images in the test set to obtain the final Classification results (that is, the specific pain level corresponding to the pain expression image).
  • this application adopts the expression classification model formed by the parallel first feature extraction network and the second feature extraction network to analyze the facial expressions of the target object.
  • Perform local feature extraction and global feature extraction wherein the second feature extraction network performs global feature extraction on the emotional features of facial expressions to make up for the important feature information that the first feature extraction network missed when extracting local features of facial expressions, thus
  • the extraction rate of emotional features of facial expressions is improved, and the accuracy rate of emotional expression degree classification according to the feature extraction results of facial expressions is improved.
  • the local attention layer of this application performs irrelevant information attenuation operations on areas other than the area where facial expressions are located in the target image, and at the same time reversely enhances important relevant information in the area where facial expressions are located in the target image, which is conducive to improving the basis of the expression classification model. Accuracy of facial expression emotion feature extraction results for classification.
  • FIG. 6 is a schematic structural diagram of a facial expression classification device provided by the present application.
  • the classification device 600 includes an acquisition module 601 and a processing module 602 .
  • the obtaining module 601 is used for: obtaining the target image, the target image including the facial expression of the target object;
  • the processing module 602 is used to: input the target image into the expression classification model to obtain a classification result, the classification result is used to indicate the degree of emotional expression of the facial expression;
  • the expression classification model includes: the first feature extraction network, the second feature extraction network and the fusion classification module; through the first feature extraction network, the local feature extraction of the target image is carried out to obtain the first feature of the area where the facial expression is located; through the second feature extraction The network performs global feature extraction on the target image to obtain the second feature; the fusion and classification module performs feature fusion and classification on the first feature and the second feature to obtain the classification result.
  • the classification apparatus 600 implements the method for classifying facial expressions and the beneficial effects produced, refer to the relevant descriptions in the method embodiments.
  • FIG. 7 shows a schematic structural diagram of an electronic device provided by the present application.
  • the dotted line in Fig. 7 indicates that the unit or the module is optional.
  • the electronic device 700 may be used to implement the methods described in the foregoing method embodiments.
  • the electronic device 700 may be a terminal device or a server or a chip.
  • the electronic device 700 includes one or more processors 701, and the one or more processors 701 can support the electronic device 700 to implement the method in the method embodiment corresponding to FIG. 1 .
  • Processor 701 may be a general purpose processor or a special purpose processor.
  • the processor 701 may be a central processing unit (central processing unit, CPU).
  • the CPU can be used to control the electronic device 700, execute software programs, and process data of the software programs.
  • the electronic device 700 may further include a communication unit 705, configured to implement input (reception) and output (send) of signals.
  • the electronic device 700 may be a chip, and the communication unit 705 may be an input and/or output circuit of the chip, or the communication unit 705 may be a communication interface of the chip, and the chip may serve as a component of a terminal device.
  • the electronic device 700 may be a terminal device, and the communication unit 705 may be a transceiver of the terminal device, or the communication unit 705 may be a transceiver circuit of the terminal device.
  • the electronic device 700 may include one or more memories 702, on which a program 704 is stored, and the program 704 may be run by the processor 701 to generate instructions 703, so that the processor 701 executes the methods described in the above method embodiments according to the instructions 703.
  • data may also be stored in the memory 702 .
  • the processor 701 may also read the data stored in the memory 702, the data may be stored in the same storage address as the program 704, and the data may also be stored in a different storage address from the program 704.
  • the processor 701 and the memory 702 may be set separately, or may be integrated together, for example, integrated on a system-on-chip (system on chip, SOC) of the terminal device.
  • SOC system on chip
  • the steps in the foregoing method embodiments may be implemented by logic circuits in the form of hardware or instructions in the form of software in the processor 701 .
  • the processor 701 may be a CPU, a digital signal processor (digital signal processor, DSP), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, such as discrete gates, transistor logic devices or discrete hardware components.
  • the present application also provides a computer program product, which implements the method described in any method embodiment in the present application when the computer program product is executed by the processor 701 .
  • the computer program product can be stored in the memory 702, such as a program 704, and the program 704 is finally converted into an executable object file that can be executed by the processor 701 through processes such as preprocessing, compiling, assembling and linking.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the method described in any method embodiment in the present application is implemented.
  • the computer program may be a high-level language program or an executable object program.
  • the computer readable storage medium is, for example, the memory 702 .
  • the memory 702 may be a volatile memory or a nonvolatile memory, or, the memory 702 may include both a volatile memory and a nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • Volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • Synchlink DRAM SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the disclosed systems, devices and methods may be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not implemented.
  • the device embodiments described above are only illustrative, and the division of units is only a logical function division. In actual implementation, there may be other division methods, and multiple units or components may be combined or integrated into another system.
  • the coupling between the various units or the coupling between the various components may be direct coupling or indirect coupling, and the above coupling includes electrical, mechanical or other forms of connection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

图像处理领域,一种面部表情的分类方法,包括:获取目标图像,该目标图像包括目标对象的面部表情(S101);将目标图像输入到表情分类模型中,得到分类结果,分类结果用于指示面部表情的情绪表达程度;表情分类模型包括:第一特征提取网络、第二特征提取网络和融合分类模块;通过第一特征提取网络对目标图像进行局部特征提取,得到面部表情所在区域的第一特征;通过第二特征提取网络对目标图像进行全局特征提取,得到第二特征;通过融合分类模块对第一特征和第二特征进行特征融合和分类,得到分类结果(S102)。上述方法能够提高面部表情的情绪特征的提取率,从而解决面部表情分类准确率低的问题。

Description

一种面部表情的分类方法和电子设备 技术领域
本申请涉及图像处理领域,尤其涉及一种面部表情的分类方法和电子设备。
背景技术
近年来,面部表情分类是图像处理领域的研究热点,例如,面部疼痛表情的分类是医学领域研究的热点之一。通常采用卷积神经网络对新生儿、重症患者以及失语症患者的面部疼痛表情进行疼痛程度分类。但是,现有卷积神经网络对面部疼痛表情进行疼痛特征提取的结果不理想,从而影响其根据该疼痛特征提取结果对面部疼痛表情进行疼痛程度分类的准确率。
因此,如何提高对面部表情分类的准确率是当前急需解决的问题。
发明内容
本申请提供了一种面部表情的分类方法和电子设备,能够解决面部表情分类准确率低的问题。
第一方面,提供了一种面部表情的分类方法,包括:获取目标图像,所述目标图像包括目标对象的面部表情;将所述目标图像输入到表情分类模型中,得到分类结果,所述分类结果用于指示所述面部表情的情绪表达程度;所述表情分类模型包括:第一特征提取网络、第二特征提取网络和融合分类模块;通过所述第一特征提取网络对所述目标图像进行局部特征提取,得到所述面部表情所在区域的第一特征;通过第二特征提取网络对所述目标图像进行全局特征提取,得到第二特征;通过所述融合分类模块对所述第一特征和所述第二特征进行特征融合和分类,得到所述分类结果。
上述方法可以由电子设备上的芯片执行。相比现有卷积神经网络对目标对象的面部表情仅进行局部特征提取的方法,本申请采用并联式的第一特征提取网络和第二特征提取网络形成的表情分类模型对目标对象的面部表情进行局部 特征提取和全局特征提取,其中,第二特征提取网络对面部表情的情绪特征进行全局特征提取可以弥补第一特征提取网络在对面部表情的局部特征进行提取时遗漏的重要特征信息,从而提高了面部表情的情绪特征的提取率,进而提高了根据面部表情的特征提取结果进行情绪表达程度分类的准确率。
可选地,所述第一特征提取网络为VGG16,所述VGG16的输入层包括:局部注意力层,所述局部注意力层用于对所述目标图像中所述面部表情所在区域以外的区域进行信息衰减操作。
上述局部注意力层对目标图像中面部表情所在区域以外的区域进行不相关信息衰减操作,同时反向增强了目标图像中面部表情所在区域的重要相关信息,从而有利于提高表情分类模型根据面部表情的情绪特征提取结果进行分类的准确率。
可选地,所述VGG16的输入层包括依次连接的第一卷积层、第一批归一化层、第一激活层、所述局部注意力层和第一最大池化层;所述局部注意力层对所述目标图像中所述面部表情所在区域以外的区域进行信息衰减操作的方式为:所述局部注意力层接收到所述第一激活层的输出信息后,根据所述第一激活层的输出信息确定二维图像掩码,并将所述二维图像掩码和所述第一激活层的输出信息相乘,得到所述局部注意力层的输出信息;其中,所述局部注意力层的输出信息用于输入至连接在所述局部注意力层之后的网络层进行局部特征提取。
可选地,所述根据所述第一激活层的输出信息确定二维图像掩码,包括:对所述第一激活层的输出信息中每个通道的特征图进行平均激活值计算,得到N个平均激活值;根据所述N个平均激活值确定第一通道,所述第一通道为所述N个平均激活值中最大的平均激活值对应的通道;对所述第一通道中的每个像素进行掩码设置,其中,当所述第一通道中的第一像素大于或者等于所述最大的平均激活值时,将与所述第一像素值对应位置的掩码值设置为1;当所述第一通道中的第一像素小于所述最大的平均激活值时,将与所述第一像素 值对应位置的掩码值设置为0;所述第一像素为所述第一通道中的任意一个像素,所述N为正整数。
可选地,所述第二特征提取网络为ResNet18。
可选地,所述融合分类模块包括正交模块和分类模块,所述正交模块用于利用预设的正交函数对所述面部表情所在区域的第一特征和所述第二特征进行正交操作,得到正交结果;所述分类模块用于利用预设的分类函数对所述正交结果进行特征汇聚以及分类,得到所述分类结果。
可选地,所述目标图像为疼痛表情图像。
第二方面,提供了一种面部表情的分类装置,包括用于执行第一方面中任一种方法的模块。
第三方面,提供了一种电子设备,包括用于执行第一方面中任一种方法的模块。
第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行第一方面中任一项所述的方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例中面部表情的分类方法的实现步骤示意图;
图2为本发明实施例中表情分类模型的结构示意图;
图3为本发明实施例中残差网络的部分结构示意图;
图4为本发明实施例中表情分类模型对疼痛表情图像进行疼痛程度分类的结果示意图;
图5为本发明实施例中面部表情的分类方法的具体流程步骤示意图;
图6为本发明实施例中面部表情的分类装置结构示意图;
图7为本发明实施例中电子设备的结构示意图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、像素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、像素、组件和/或其集合的存在或添加。
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。因此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
近年来,面部表情分类是图像处理领域的研究热点,例如,在医学研究领域,通常采用卷积神经网络对新生儿、重症患者以及失语症患者的面部疼痛表 情进行疼痛程度分类。但是,现有卷积神经网络在对面部疼痛表情进行疼痛特征提取时丢失了很多关键特征信息,从而导致面部疼痛表情的分类结果准确率很低。
本申请将采用双并联式的表情分类模型对人脸的面部表情进行分类,其中,第一特征提取网络用于提取表情图像中面部表情所在区域的第一特征,第二特征提取网络用于提取表情图像的全局特征,以弥补第一特征提取网络未提取到表情图像的其他特征信息,从而使得该双并联式的表情分类模型能够提高面部表情的情绪特征的提取率,进而解决面部表情分类准确率低的问题。
下面结合附图和具体实施例对本申请做进一步详细说明。
为了提高面部表情的情绪特征的提取率,从而解决面部表情分类准确率低的问题,本申请提出了一种面部表情的分类方法,如图1所示,该方法由电子设备执行,该方法包括:
S101,获取目标图像,该目标图像包括目标对象的面部表情。
示例性地,电子设备获取目标图像(即人脸的面部表情图像),其中,目标对象包括新生儿、失语症患者、普通正常人;目标图像包括:高兴表情图像、恐惧表情图像、愤怒表情图像以及疼痛表情图像。本申请仅以疼痛表情图像的疼痛程度分类为例,来说明面部疼痛表情的疼痛程度分类方法,其他类型的表情图像的分类方式类似,在此不再赘述。
例如,电子设备可以通过UNBC-McMaster肩痛表达档案数据库(UNBC-McMaster Shoulder Pain Expression Archive Database,简称UNBC数据库)获取人脸的疼痛表情数据集。该数据集包含25个志愿者的肩痛视频数据,并且,该肩痛视频数据总共有200段视频序列数据;该200段视频序列数据总共包含48198帧疼痛表情图像。此外,该48198帧疼痛表情图像均按照PNG格式进行存储,并且,每帧疼痛表情图像的分辨率在352×240像素左右;实际应用中,可将每帧疼痛表情图像进行裁剪处理,以得到图像维度为3*200*200的目标数据(即尺寸大小为200*200且通道数为3的图像数据)。
上述现有疼痛表情数据集已经根据PSPI标准对每帧疼痛表情图像进行疼痛程度划分,并且,已按照疼痛程度的轻重程度划分为16个级别。上述疼痛程度越高,说明疼痛程度越大。但是,上述现有疼痛表情数据集的疼痛程度划分出现了不同疼痛级别数据量分布不均匀的情况。因此,在现有疼痛表情数据集的疼痛程度划基础上,重新对不同疼痛程度的疼痛表情数据进行聚类降维操作,比如,将现有疼痛程度为0等级、1等级、2等级和3等级的表情划分结果保留;将原疼痛程度为4级和5级的疼痛表情数据合并为一个新的级别,即4级;将原疼痛程度为6级及6级以上的疼痛表情数据合并为一个新的级别,即5级。最终将现有疼痛表情数据集的疼痛程度重新划分为6个等级。
由于重新划分后的不同疼痛程度中的疼痛表情图像的数量不同,有些疼痛程度对应的疼痛表情图像的数量远远大于其他疼痛程度的数据量,比如,0等级的疼痛表情图像的数量有31200帧;1等级的疼痛表情图像的数量有4000帧、2等级的疼痛表情图像的数量有3409帧;3等级的疼痛表情图像的数量有1989帧,4等级的疼痛表情图像的数量有3600帧,5等级的疼痛表情图像的数量有4100帧,显然,0等级的疼痛表情图像的数量出现了极大的情况,此时,实际应用中可以从0等级的疼痛表情图像中随机抽取1/10的数据,即0等级中3120(即31200除以10)帧的疼痛表情图像被用于实际应用中。当然,也可以从数量极大的疼痛程度(比如,0等级)的疼痛表情图像数据中随机抽取1/8或者1/11的数据用于实际应用中,本申请对随机抽取的比例不做任何限定,用户可以根据实际需求进行选择。
在实际应用中,由于重新划分后的同一疼痛程度中的疼痛表情图像数据时按照不同志愿者的顺序存放的,为了避免在数据抽取过程中,仅仅抽取到部分志愿者的疼痛表情图像,现将每个疼痛程度中不同志愿者的疼痛表情图像数据的存放方式随机打乱;然后,按照一定比例(比如,8:2的比例,也可以是其他比例,本申请对此不作任何限定),将每个疼痛程度的疼痛表情图像数据划分为训练数据集和测试数据集。该训练数据集和测试数据集分别用于训练表 情分类模型以及测试表情分类模型。
S102,将目标图像输入到表情分类模型中,得到分类结果;该分类结果用于指示面部表情的情绪表达程度;表情分类模型包括:第一特征提取网络、第二特征提取网络和融合分类模块;通过第一特征提取网络对目标图像进行局部特征提取,得到面部表情所在区域的第一特征;通过第二特征提取网络对目标图像进行全局特征提取,得到第二特征;通过融合分类模块对第一特征和第二特征进行特征融合和分类,得到分类结果。
示例性地,上述分类结果是指表情分类模型对输入的目标图像进行情绪程度分类的结果,该分类结果可以指示面部表情的情绪表达程度,其中,情绪包括:疼痛、愉悦、恐惧和愤怒;上述分类结果包括:疼痛程度,恐惧程度,愤怒程度和愉悦程度;比如,上述疼痛表情图像的疼痛程度可以划分为6级,即0级、1级、2级、3级、4级、5级;再比如,表情分类模型对输入的疼痛表情图像(即目标图像)进行疼痛特征提取并输出疼痛程度的分类结果;再比如,表情分类模型对输入的愤怒表情图像进行愤怒特征提取并输出愤怒程度的分类结果。比如,向表情分类模型中输入未知疼痛程度的疼痛表情图像,则表情分类模型输出该疼痛表情图像对应的疼痛程度为1级。再比如,向表情分类模型中输入未知恐惧程度的恐惧表情图像,则表情分类模型输出该恐惧表情图像对应的恐惧程度为2级。
示例性地,如图2所示,上述表情分类模型包括:第一特征提取网络202、第二特征提取网络203和融合分类模块204;电子设备通过第一特征提取网络202对目标图像中面部表情所在区域进行局部重要特征(即目标图像中人脸所在区域的情绪特征)提取,得到面部表情所在区域的第一特征;同时,通过第二特征提取网络203对目标图像进行全局特征的补充提取,得到第二特征;之后,通过融合分类模块204对面部表情所在区域的第一特征和第二特征进行特征融合和分类,得到分类结果。例如,疼痛表情图像201输入到第一特征提取网络202,该第一特征提取网络202对疼痛表情图像201中人脸的面部表情所 在区域的重要特征信息进行提取(即局部特征提取过程),得到人脸疼痛表情所在区域的第一特征;同时,将疼痛表情图像201输入到第二特征提取网络203中,该第二特征提取网络203对疼痛表情图像201进行人脸疼痛表情的全局特征信息提取(即全局特征提取过程),得到第二特征;随后,通过融合分类模块204对人脸疼痛表情所在区域的第一特征和第二特征进行特征融合以及对特征融合结果进行疼痛程度分类,得到疼痛表情图像201对应的疼痛程度(即分类结果)。
示例性地,第一特征提取网络为VGG16,该VGG16的输入层包括:局部注意力层,局部注意力层用于对目标图像中面部表情所在区域以外的区域进行信息衰减操作。比如,局部注意力层对疼痛表情图像201中人脸疼痛表情所在区域以外的不相关信息进行衰减操作,可以将疼痛表情图像201中与人脸疼痛表情无关的非重要信息去除,从而反向增强了疼痛表情图像201(即目标图像)中人脸疼痛表情所在区域的重要相关信息。
示例性地,上述VGG16的输入层包括依次连接的第一卷积层、第一批归一化层、第一激活层、上述局部注意力层和第一最大池化层;该局部注意力层对目标图像中面部表情所在区域以外的区域进行信息衰减操作的方式为:局部注意力层接收到第一激活层的输出信息后,根据第一激活层的输出信息确定二维图像掩码,并将二维图像掩码和第一激活层的输出信息相乘,得到局部注意力层的输出信息;其中,局部注意力层的输出信息用于输入至连接在局部注意力层之后的网络层进行局部特征提取。上述第一卷积层的卷积核大小为3×3,第一批归一化层的核心大小为64,第一最大池化层的核心大小为2×2。
比如,向第一卷积层输入尺寸大小为200×200且通道数为3(即200×200×3)的疼痛表情图像201,当然也可以输入其他尺寸大小的疼痛表情图像201,用户可以根据实际情况选择,本申请对此不作限定;第一卷积层输出尺寸大小为200×200且通道数为64(即200×200×64)的第一卷积结果;第一批归一化层对200×200×64的第一卷积结果进行归一化操作,并输出200× 200×64的第一批归一化结果;第一激活层对200×200×64的第一批归一化结果进行归一化操作,并输出200×200×64的第一激活结果;局部注意力层接收到200×200×64的第一激活结果(即第一激活层的输出信息)后,并根据200×200×64的第一激活结果生成二维图像掩码,再将二维图像掩码和200×200×64的第一激活结果相乘,得到局部注意力层的输出信息,即局部注意力层利用二维图像掩码对200×200×64的第一激活结果中疼痛表情所在区域以外的区域进行衰减操作。局部注意力层的输出信息用于输入至连接在局部注意力层之后的网络层进行局部特征提取。第一最大池化层对局部注意力层的输出信息进行最大池化操作,并输出维度为100×100×64的特征图。
示例性地,根据第一激活层的输出信息确定二维图像掩码,包括:对第一激活层的输出信息中每个通道的特征图进行平均激活值计算,得到N个平均激活值;根据N个平均激活值确定第一通道,第一通道为所述N个平均激活值中最大的平均激活值对应的通道;对第一通道中的每个像素进行掩码设置,其中,当第一通道中的第一像素大于或者等于最大的平均激活值时,将与第一像素值对应位置的掩码值设置为1;当第一通道中的第一像素小于最大的平均激活值时,将与第一像素值对应位置的掩码值设置为0;第一像素为第一通道中的任意一个像素,N为正整数。
例如,局部注意力层对200×200×64的第一激活结果(即第一激活层的输出信息)中每个通道的特征图进行平均激活值计算,得到N=64个平均激活值;局部注意力层从64个平均激活值中选取最大的平均激活值以及该最大的平均激活值对应的通道(即第一通道);每个通道都有200×200个像素值;现以最大的平均激活值为0.6,第一像素为A或B或C或D为例,来说明掩码生成模块对第一通道中的每个像素进行掩码设置的过程;比如,第一通道中的A像素为0.71,B像素为0.52,C像素为0.64,D像素为0.42,由于第一通道中的A像素0.71(大于0.6),因此,将与A像素对应位置的掩码值设置为1;第一通道中的B像素为0.52(小于0.6),因此,将与B像素对应位置的 掩码值设置为0;第一通道中的C像素为0.64(大于0.6),因此,将与C像素对应位置的掩码值设置为1;第一通道中的D像素为0.42(小于0.6),因此,将与D像素对应位置的掩码值设置为0;以此类推,将第一通道中的每个像素都按照该方式进行掩码设置,从而生成200×200的二维图像掩码;该200×200的二维图像掩码包括200×200个掩码值;该200×200的二维图像掩码作为后续局部注意力层根据该200×200的二维图像掩码对200×200×64的第一激活结果进行像素值衰减操作的参考模板(即参考依据)。
示例性地,局部注意力层根据200×200的二维图像掩码对200×200×64的第一激活结果进行像素值衰减操作,得到200×200×64的第一特征图。随机像素值衰减因子r通过随机取数机制在(0,1)之间设置具体取值。若作为参考依据的二维图像掩码中某一位置的当前掩码值为0时,则将200×200×64的第一激活结果中对应位置的像素值与r相乘(即对对应位置的像素值进行衰减操作),得到衰减后的200×200×64的第一激活结果(即200×200×64的第二特征图);若作为参考依据的二维图像掩码中某一位置的当前掩码值为1时,则200×200×64的第一激活结果中对应位置的像素值保持不变。具体地,局部注意力层根据二维图像掩码遍历200×200×64的第一激活结果中的每个像素值,即分别对64层中每一层(每层共有200×200个像素值)的每个像素值进行衰减操作(即对64层中每一层(每层共有200×200个像素值)的每个像素值进行非重要特征信息的衰减操作),从而提取到64*200*200的人脸疼痛表情所在区域的重要特征信息。由此可见,局部注意力层根据200×200的二维图像掩码对200×200×64的第一激活结果进行像素值衰减操作,不仅能够减弱与疼痛特征相关度较低的特征信息,而且能够逆向增强与疼痛特征相关度较高的特征信息(即提取到人脸疼痛表情所在区域的重要特征信息)。
上述VGG16还包括:卷积处理模块,该卷积处理模块包括:两个卷积激活层和多个双卷积激活层,其中,两个卷积激活层中第一卷积激活层连接多个双卷积激活层中的第一双卷积激活层,而双卷积激活层中的最后一个双卷积激 活层连接两个卷积激活层中第二卷积激活层。上述两个卷积激活层中每个卷积激活层包括依次连接的第二卷积层、第二批归一化层、第二ReLU激活层和第二最大池化层,其中,第二卷积层的卷积核大小为3×3,第二最大池化层的核心大小为2×2;多个双卷积激活层中每个双卷积激活层包括依次连接的第三卷积层、第三批归一化层、第三ReLU激活层、第四卷积层、第四批归一化层、第四ReLU激活层和第三最大池化层,其中,第三卷积层和第四卷积层的卷积核大小均为3×3,第三最大池化层的核心大小为2×2。
例如,第一卷积激活层对上述第一最大池化层输出的100×100×64的特征图依次经过第一卷积激活层中的第二卷积层、第一卷积激活层中的第二批归一化层(核心大小为128)、第一卷积激活层中的第二ReLU激活层和第一卷积激活层中的第二最大池化层卷积进行卷积处理,最终输出维度为50×50×128的特征图。
例如,卷积处理模块包括3个双卷积激活层,该3个双卷积激活层包括依次连接的第一双卷积激活层、第二双卷积激活层和第三双卷积激活层,其中,第一双卷积激活层中的第三卷积层对第二最大池化层输出的50×50×128的特征图进行卷积处理,得到50×50×256的特征图;第一双卷积激活层中的第三批归一化层(核心大小为256)对第三卷积层输出的50×50×256的特征图进行批量归一化处理,得到50×50×256的第三批归一化结果;第一双卷积激活层中的第三ReLU激活层对输入50×50×256的第三批归一化结果进行激活处理,得到50×50×256的第三激活结果;第一双卷积激活层中的第四卷积层对50×50×256的第三激活结果进行卷积处理,得到50×50×256的第四卷积层输出结果;第一双卷积激活层中的第四批归一化层(核心大小为256)对输入50×50×256的第四卷积层输出结果进行批量归一化处理,得到50×50×256的第四批归一化结果;第一双卷积激活层中的第四ReLU激活层对输入50×50×256的第四批归一化结果进行激活处理,得到50×50×256的第四激活结果;第一双卷积激活层中的第三最大池化层对输入50×50×256的第四激活结 果进行最大池化处理,并输出维度为25×25×256的特征图。
第一双卷积激活层中第四最大池化层输出25×25×256的特征图依次经过第二双卷积激活层中的第三卷积层、第二双卷积激活层中的第三批归一化层(核心大小为512)、第二双卷积激活层中的第三ReLU激活层、第二双卷积激活层中的第四卷积层、第二双卷积激活层中的第四批归一化层(核心大小为512)、第二双卷积激活层中的第四ReLU激活层和第二双卷积激活层中的第三最大池化层,最终输出维度为12×12×512的特征图。
第二双卷积激活层中第四最大池化层输出12×12×512的特征图依次经过第三双卷积激活层中的第三卷积层、第三双卷积激活层中的第三批归一化层(核心大小为512)、第三双卷积激活层中的第三ReLU激活层、第三双卷积激活层中的第四卷积层、第三双卷积激活层中的第四批归一化层(核心大小为512)、第三双卷积激活层中的第四ReLU激活层和第三双卷积激活层中的第三最大池化层,最终输出维度为6×6×512的特征图。
上述第三双卷积激活层中第四最大池化层输出的6×6×512的特征图依次经过上述第二卷积激活层中的第二卷积层、第二卷积激活层中的第二批归一化层(核心大小为512)、第二卷积激活层中的第二ReLU激活层和第二卷积激活层中的第二最大池化层,最终输出维度为1×2048的特征向量(即疼痛表情所在区域的第一特征)。
示例性地,如图2所示,上述第二特征提取网络203包括:ResNet18,ResNet50等。以第二特征提取网络203为ResNet18为例,上述ResNet18包括:输入模块、残差网络和输出模块,其中,输入模块对输入200×200×3的疼痛表情图像201进行卷积处理和激活处理,得到第一输出信息;残差网络对第一输出信息进行全局特征提取(全局特征的补充提取),得到第二输出信息;输出模块对第二输出信息进行平均池化处理,得到第二特征。
上述输入模块包括依次连接的第五卷积层、第五批归一化层、第五ReLU激活层和第四最大池化层,其中,第五卷积层的卷积核大小为7×7,第五批 归一化层的核心大小为64,第四最大池化层的核心大小为3×3。上述200×200×3的疼痛表情图像201依次经过第五卷积层(输出维度为100×100×64)、第五批归一化层(输出维度为100×100×64)、第五ReLU激活层(输出维度为100×100×64)和第四最大池化层,最终输出维度为50×50×64的特征图。
上述残差网络包括:直接映射子网络和多个残差子网络,其中,直接映射子网络连接多个残差子网络中的第一残差子网络,而多个残差子网络中的最后一个残差子网络连接输出模块。比如,直接映射子网络对第四最大池化层输出的50×50×64的特征图进行卷积处理,得到直接映射子网络的输出信息,该直接映射子网络的输出信息的维度为50×50×64。
如图3(a)所示,上述直接映射子网络包括:两个级联的残差模块301(即权重层301)和残差模块302(即权重层302),以及一个第一直接映射分支303,其中,残差模块301包括依次连接的第六卷积层、第六批归一化层、第六ReLU激活层、第七卷积层和第七批归一化层,该第六卷积层和第七卷积层的卷积核大小均为3×3,第六批归一化层和第七批归一化层的核心大小均为64;残差模块302包括依次连接的第八卷积层、第八批归一化层、第七ReLU激活层、第九卷积层和第九批归一化层,该第八卷积层和第九卷积层的卷积核大小均为3×3,第八批归一化层和第九批归一化层的核心大小均为64。
比如,残差模块301对第四最大池化层输出的50×50×64的特征图依次经过第六卷积层、第六批归一化层、第六ReLU激活层、第七卷积层和第七批归一化层进行卷积处理和归一化处理,得到残差模块301的输出信息,该残差模块301的输出信息维度为50×50×64;残差模块302对残差模块301输出的维度为50×50×64的输出信息依次经过第八卷积层、第八批归一化层、第七ReLU激活层、第九卷积层和第九批归一化层进行卷积处理和归一化处理,得到残差模块302的输出信息,该残差模块302的输出信息维度为50×50×64。第一直接映射分支303对第四最大池化层输出的50×50×64的特征图进 行直接映射,得到第一映射结果,该第一映射结果仍为第四最大池化层输出的50×50×64的特征图。残差模块302的输出信息与该第一映射结果进行拼接处理,得到维度为50×50×64的第一拼接结果。
如图3(b)所示,上述多个残差子网络中的每个残差子网路包括依次连接的残差模块304(即权重层304)、残差模块305(即权重层305)、残差分支306,其中,残差模块304包括依次连接的第十卷积层、第十批归一化层、第七ReLU激活层、第十一卷积层和第十一批归一化层,该第十卷积层和第十一卷积层的卷积核大小均为3×3,第十批归一化层和第十一批归一化层的核心大小均为64;残差模块305包括依次连接的第十二卷积层、第十二批归一化层、第八ReLU激活层、第十三卷积层和第十三批归一化层,该第十二卷积层和第十三卷积层的卷积核大小均为3×3,第十二批归一化层和第十三批归一化层的核心大小均为64;残差分支306包括依次连接的第十四卷积层(卷积核大小为1×1)和第十四批归一化层。
现以上述残差网络包括3个残差子网络为例,3个残差子网络中第一残差子网络对50×50×64的第一拼接结果进行处理的方式为:50×50×64的第一拼接结果依次经过第一残差子网络中的残差模块304和第一残差子网络中的残差模块305,得到25×25×128的第一残差信息;同时,50×50×64的第一拼接结果经过第一残差子网络中的残差分支306(残差分支306中第十四批归一化层的核心大小为128)进行采样处理(即升维操作),得到25×25×128的第一采样信息,该采样处理包括:上采样和下采样中的至少一种,根据实际情况选择;25×25×128的第一残差信息和25×25×128的第一采样信息进行拼接处理,得到25×25×128的第二拼接结果。
上述第一残差子网络中的残差模块304对50×50×64的第一拼接结果依次经过第十卷积层、第十批归一化层(核心大小为128)、第七ReLU激活层、第十一卷积层和第十一批归一化层(核心大小为128)进行卷积处理和归一化处理,得到残差模块304的输出信息,该残差模块304的输出信息的维度为 25×25×128;上述第一残差子网络中的残差模块305对残差模块304的输出信息依次经过第十二卷积层、第十二批归一化层(核心大小为128)、第八ReLU激活层、第十三卷积层和第十三批归一化层(核心大小为128)进行卷积处理和归一化处理,得到残差模块305的输出信息(即25×25×128的第一残差信息),该残差模块305的输出信息的维度为25×25×128。
3个残差子网络中第二残差子网络对25×25×128的第二拼接结果进行处理的方式为:25×25×128的第二拼接结果依次经过第二残差子网络中的残差模块304和第二残差子网络中的残差模块305,得到13×13×256的第二残差信息;同时,25×25×128的第二拼接结果经过第二残差子网络中的残差分支306(残差分支306中第十四批归一化层的核心大小为256)进行采样处理(即升维操作),得到13×13×256的第二采样信息,该采样处理包括:上采样和下采样中的至少一种,根据实际情况选择;13×13×256的第二残差信息和13×13×256的第二采样信息进行拼接处理,得到13×13×256的第三拼接结果。
上述第二残差子网络中的残差模块304对25×25×128的第二拼接结果依次经过第十卷积层、第十批归一化层(核心大小为256)、第七ReLU激活层、第十一卷积层和第十一批归一化层(核心大小为256)进行卷积处理和归一化处理,得到残差模块304的输出信息,该残差模块304的输出信息的维度为13×13×256;上述第二残差子网络中的残差模块305对残差模块304的输出信息依次经过第十二卷积层、第十二批归一化层(核心大小为256)、第八ReLU激活层、第十三卷积层和第十三批归一化层(核心大小为256)进行卷积处理和归一化处理,得到残差模块305的输出信息(即13×13×256的第二残差信息),该残差模块305的输出信息的维度为13×13×256。
3个残差子网络中第三残差子网络对13×13×256的第三拼接结果进行处理的方式为:13×13×256的第三拼接结果依次经过第三残差子网络中的残差模块304和第三残差子网络中的残差模块305,得到7×7×512的第三残差信 息;同时,13×13×256的第三拼接结果经过第三残差子网络中的残差分支306(残差分支306中第十四批归一化层的核心大小为512)进行采样处理(即升维操作),得到7×7×512的第三采样信息,该采样处理包括:上采样和下采样中的至少一种,根据实际情况选择;7×7×512的第三残差信息和7×7×512的第三采样信息进行拼接处理,得到7×7×7的第四拼接结果。
上述第三残差子网络中的残差模块304对13×13×256的第三拼接结果依次经过第十卷积层、第十批归一化层(核心大小为512)、第七ReLU激活层、第十一卷积层和第十一批归一化层(核心大小为512)进行卷积处理和归一化处理,得到残差模块304的输出信息,该残差模块304的输出信息的维度为7×7×512;上述第三残差子网络中的残差模块305对残差模块304的输出信息依次经过第十二卷积层、第十二批归一化层(核心大小为512)、第八ReLU激活层、第十三卷积层和第十三批归一化层(核心大小为512)进行卷积处理和归一化处理,得到残差模块305的输出信息(即7×7×512的第三残差信息),该残差模块305的输出信息的维度为7×7×512。
上述输出模块包括平均池化层,该平均池化层大小为3×3,平均池化层对7×7×7的第四拼接结果进行平均池化操作,得到维度为1×1×512的特征向量(即第二特征)。
示例性地,融合分类模块204包括正交模块2041和分类模块2042,正交模块2041用于利用预设的正交函数对面部表情所在区域的第一特征和第二特征进行正交操作,得到正交结果;分类模块2042用于利用预设的分类函数对正交结果进行特征汇聚以及分类,得到分类结果。上述正交模块2041包括:Pytorch深度学习库中提供的Bilinear函数;上述分类模块2042包括:依次连接的Pytorch深度学习库中提供的Linear函数和Softmax分类函数。例如,Bilinear函数对输出维度为1×2048的特征向量(即疼痛表情所在区域的第一特征)和维度为1×1×512的特征向量(即第二特征)进行正交操作,得到正交结果;从而实现了疼痛表情所在区域的第一特征和第二特征的融合操作。 Linear函数对正交结果进行特征汇聚(即降维操作),得到输出维度为6的汇聚结果;Softmax分类函数对该汇聚结果进行分类,最终得到分类结果(比如,200×200×3的疼痛表情图像201对应的疼痛程度为3级)。
示例性地,对上述表情分类模型进行训练时,可以选取Pytorch深度学习库中提供的交叉熵函数作为损失函数进行训练,并且,选取随机梯度下降法作为训练优化器。通过前向传播逐步计算Softmax分类后的损失,并根据计算出的loss值反向传播,对表情分类模型的权值进行更新,直至表情分类模型趋向于收敛,即可停止表情分类模型的训练,并保存表情分类模型的权值。
示例性地,可以采用如下几种评价指标对表情分类模型的分类结果的准确率进行评估:准确率(Accuracy,Acc)、均方根误差(Root Mean Square Error,RMSE)和皮尔逊相关系数(Pearson Correlation Coefficient,PCC),当然,也可以采用其他评估指标,本申请对此不作限定;其中,Acc为表情分类模型的预测准确率,即表情图像的预测标签值中符合实际真实标签值的占比。RMSE被计算为预测值和真实值之间的误差,即表情图像的预测标签值与真实标签值之间的误差。PCC系数用于反映表情分类模型对不同时间维度的表情图像进行结果预测的情况。比如,Acc可以用于指示疼痛表情图像的预测标签值中符合实际真实标签值的占比。RMSE可以用于指示疼痛表情图像的预测值和真实值之间的误差。PCC系数用于反映表情分类模型对不同时间维度的疼痛表情图像进行结果预测的情况。
比如,选取的一段连续且时序相邻的80帧的人脸疼痛表情图像,将该80帧的人脸疼痛表情图像依次输入表情分类模型中进行分类测试;分类测试结果如图4所示,其中,实际曲线401表示真实值,预测值分布402用黑色加号表示,横坐标表示图像帧,纵坐标表示疼痛等级;由图4可知,表情分类模型对80帧的人脸疼痛表情图像进行疼痛程度分类后,对人脸疼痛表情图像的疼痛程度分类错误的仅有6帧(如图4中不在黑色实线上的黑色‘+’)。由此可见,本发明提出的双并联式的表情分类模型能够对疼痛表情图像进行有效的疼痛程 度分类。
上述Acc、RMSE和PCC的计算公式分别如下:
Figure PCTCN2021138099-appb-000001
Figure PCTCN2021138099-appb-000002
Figure PCTCN2021138099-appb-000003
其中,M i为经过表情分类模型进行实验的每一分类结果中被正确分类的样本数。N表示实验中所有表情图像的数量,y i
Figure PCTCN2021138099-appb-000004
分别表示表情图像(比如,疼痛表情图像)的真实标签值与预测标签值。
Figure PCTCN2021138099-appb-000005
Figure PCTCN2021138099-appb-000006
分别表示序列
Figure PCTCN2021138099-appb-000007
和{y 1,y 2,...,y N}的平均值。
例如,将本申请提供的表情分类模型在公开可获取的数据集上进行训练和验证,并根据准确率(Acc)、均方根误差(RMSE)和皮尔逊相关系数(PCC)这三个评价指标对表情分类模型的性能(即分类的准确率)进行量化评估;其中,准确率(Acc)为92.11%,均方根误差(RMSE)为0.48,皮尔逊相关系数(PCC)为0.95。本申请提供的表情分类模型的性能评估结果与现有较为先进的试验方法(比如,一种新的3D深度网络模型SCN,通过使用不同时间深度的多个卷积层,来获取人脸面部表情的广泛时空变化,最终得到的均方根误差(RMSE)为0.57,皮尔逊相关系数(PCC)为0.92)的结果非常接近,这也说明了本申请提供的表情分类模型的有效性。
为了便于理解,下面结合图5对本申请提供的面部表情的分类方法的整体流程进行示例性说明。以对面部疼痛表情进行疼痛程度分类为例来说明面部表情的分类方法的流程步骤:
准备阶段501:准备含有疼痛等级标签的人脸疼痛表情图像数据集,比如, 疼痛表情图像已经根据疼痛程度划分为6个疼痛等级。
建模阶段502:基于结合局部注意力机制的双并联式表情分类模型构建用于疼痛表情分类的模型,比如,VGG16的输入层融合局部注意力层与ResNet18形成双并联式表情分类模型,该双并联式表情分类模型可以用于对疼痛表情图像进行疼痛程度分类。
训练阶段503:基于准备的疼痛表情图像数据制作训练数据集,并利用训练数据集对构建的表情分类模型进行迭代训练,比如,将疼痛表情图像数据集划分为训练数据集和测试数据集,其中,训练数据集用于对构建的表情分类模型进行迭代训练,已得到符合要求的表情分类模型。
分类阶段504:利用训练好的表情分类模型对测试集中的人脸疼痛表情进行疼痛级的分类,比如,利用训练好的表情分类模型对测试数集中的疼痛表情图像进行疼痛程度分类,得到最终的分类结果(即疼痛表情图像对应的具体的疼痛等级)。
相比现有卷积神经网络对目标对象的面部表情仅进行局部特征提取的方法,本申请采用并联式的第一特征提取网络和第二特征提取网络形成的表情分类模型对目标对象的面部表情进行局部特征提取和全局特征提取,其中,第二特征提取网络对面部表情的情绪特征进行全局特征提取可以弥补第一特征提取网络在对面部表情的局部特征进行提取时遗漏的重要特征信息,从而提高了面部表情的情绪特征的提取率,进而提高了根据面部表情的特征提取结果进行情绪表达程度分类的准确率。
本申请的局部注意力层对目标图像中面部表情所在区域以外的区域进行不相关信息衰减操作,同时反向增强了目标图像中面部表情所在区域的重要相关信息,从而有利于提高表情分类模型根据面部表情的情绪特征提取结果进行分类的准确率。
图6是本申请提供的一种面部表情的分类装置的结构示意图。该分类装置600包括获取模块601和处理模块602。
获取模块601用于:获取目标图像,该目标图像包括目标对象的面部表情;
处理模块602用于:将目标图像输入到表情分类模型中,得到分类结果,分类结果用于指示面部表情的情绪表达程度;
表情分类模型包括:第一特征提取网络、第二特征提取网络和融合分类模块;通过第一特征提取网络对目标图像进行局部特征提取,得到面部表情所在区域的第一特征;通过第二特征提取网络对目标图像进行全局特征提取,得到第二特征;通过融合分类模块对第一特征和第二特征进行特征融合和分类,得到分类结果。
分类装置600执行面部表情的分类方法的具体方式以及产生的有益效果可以参见方法实施例中的相关描述。
图7示出了本申请提供了一种电子设备的结构示意图。图7中的虚线表示该单元或该模块为可选的。电子设备700可用于实现上述方法实施例中描述的方法。电子设备700可以是终端设备或服务器或芯片。
电子设备700包括一个或多个处理器701,该一个或多个处理器701可支持电子设备700实现图1所对应方法实施例中的方法。处理器701可以是通用处理器或者专用处理器。例如,处理器701可以是中央处理器(central processing unit,CPU)。CPU可以用于对电子设备700进行控制,执行软件程序,处理软件程序的数据。电子设备700还可以包括通信单元705,用以实现信号的输入(接收)和输出(发送)。
例如,电子设备700可以是芯片,通信单元705可以是该芯片的输入和/或输出电路,或者,通信单元705可以是该芯片的通信接口,该芯片可以作为终端设备的组成部分。
又例如,电子设备700可以是终端设备,通信单元705可以是该终端设备的收发器,或者,通信单元705可以是该终端设备的收发电路。
电子设备700中可以包括一个或多个存储器702,其上存有程序704,程序704可被处理器701运行,生成指令703,使得处理器701根据指令703执 行上述方法实施例中描述的方法。可选地,存储器702中还可以存储有数据。可选地,处理器701还可以读取存储器702中存储的数据,该数据可以与程序704存储在相同的存储地址,该数据也可以与程序704存储在不同的存储地址。
处理器701和存储器702可以单独设置,也可以集成在一起,例如,集成在终端设备的系统级芯片(system on chip,SOC)上。
处理器701执行面部表情的分类方法的具体方式可以参见方法实施例中的相关描述。
应理解,上述方法实施例的各步骤可以通过处理器701中的硬件形式的逻辑电路或者软件形式的指令完成。处理器701可以是CPU、数字信号处理器(digital signal processor,DSP)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件,例如,分立门、晶体管逻辑器件或分立硬件组件。
本申请还提供了一种计算机程序产品,该计算机程序产品被处理器701执行时实现本申请中任一方法实施例所述的方法。
该计算机程序产品可以存储在存储器702中,例如是程序704,程序704经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理器701执行的可执行目标文件。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现本申请中任一方法实施例所述的方法。该计算机程序可以是高级语言程序,也可以是可执行目标程序。
该计算机可读存储介质例如是存储器702。存储器702可以是易失性存储器或非易失性存储器,或者,存储器702可以同时包括易失性存储器和非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器 (Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和设备的具体工作过程以及产生的技术效果,可以参考前述方法实施例中对应的过程和技术效果,在此不再赘述。
在本申请所提供的几个实施例中,所揭露的系统、装置和方法,可以通过其它方式实现。例如,以上描述的方法实施例的一些特征可以忽略,或不执行。以上所描述的装置实施例仅仅是示意性的,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,多个单元或组件可以结合或者可以集成到另一个系统。另外,各单元之间的耦合或各个组件之间的耦合可以是直接耦合,也可以是间接耦合,上述耦合包括电的、机械的或其它形式的连接。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制。尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换,而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种面部表情的分类方法,其特征在于,所述方法包括:
    获取目标图像,所述目标图像包括目标对象的面部表情;
    将所述目标图像输入到表情分类模型中,得到分类结果,所述分类结果用于指示所述面部表情的情绪表达程度;
    所述表情分类模型包括:第一特征提取网络、第二特征提取网络和融合分类模块;通过所述第一特征提取网络对所述目标图像进行局部特征提取,得到所述面部表情所在区域的第一特征;通过第二特征提取网络对所述目标图像进行全局特征提取,得到第二特征;通过所述融合分类模块对所述第一特征和所述第二特征进行特征融合和分类,得到所述分类结果。
  2. 根据权利要求1所述的分类方法,其特征在于,所述第一特征提取网络为VGG16,所述VGG16的输入层包括:局部注意力层,
    所述局部注意力层用于对所述目标图像中所述面部表情所在区域以外的区域进行信息衰减操作。
  3. 根据权利要求2所述的分类方法,其特征在于,所述VGG16的输入层包括依次连接的第一卷积层、第一批归一化层、第一激活层、所述局部注意力层和第一最大池化层;
    所述局部注意力层对所述目标图像中所述面部表情所在区域以外的区域进行信息衰减操作的方式为:所述局部注意力层接收到所述第一激活层的输出信息后,根据所述第一激活层的输出信息确定二维图像掩码,并将所述二维图像掩码和所述第一激活层的输出信息相乘,得到所述局部注意力层的输出信息;其中,所述局部注意力层的输出信息用于输入至连接在所述局部注意力层之后的网络层进行局部特征提取。
  4. 根据权利要求3所述的分类方法,其特征在于,所述根据所述第一激活层的输出信息确定二维图像掩码,包括:
    对所述第一激活层的输出信息中每个通道的特征图进行平均激活值计算, 得到N个平均激活值;
    根据所述N个平均激活值确定第一通道,所述第一通道为所述N个平均激活值中最大的平均激活值对应的通道;
    对所述第一通道中的每个像素进行掩码设置,其中,当所述第一通道中的第一像素大于或者等于所述最大的平均激活值时,将与所述第一像素值对应位置的掩码值设置为1;当所述第一通道中的第一像素小于所述最大的平均激活值时,将与所述第一像素值对应位置的掩码值设置为0;所述第一像素为所述第一通道中的任意一个像素,所述N为正整数。
  5. 根据权利要求1至4任一项所述的分类方法,其特征在于,所述第二特征提取网络为ResNet18。
  6. 根据权利要求1至4任一项所述的分类方法,其特征在于,所述融合分类模块包括正交模块和分类模块,
    所述正交模块用于利用预设的正交函数对所述面部表情所在区域的第一特征和所述第二特征进行正交操作,得到正交结果;
    所述分类模块用于利用预设的分类函数对所述正交结果进行特征汇聚以及分类,得到所述分类结果。
  7. 根据权利要求1至4中任一项所述的分类方法,其特征在于,所述分类结果为疼痛程度。
  8. 一种面部表情的分类装置,其特征在于,包括获取模块和处理模块,
    所述获取模块用于:获取目标图像,所述目标图像包括目标对象的面部表情;
    所述处理模块用于:将所述目标图像输入到表情分类模型中,得到分类结果,所述分类结果用于指示所述面部表情的情绪表达程度;
    所述表情分类模型包括:第一特征提取网络、第二特征提取网络和融合分类模块;通过所述第一特征提取网络对所述目标图像进行局部特征提取,得到所述面部表情所在区域的第一特征;通过第二特征提取网络对所述目标图像进 行全局特征提取,得到第二特征;通过所述融合分类模块对所述第一特征和所述第二特征进行特征融合和分类,得到所述分类结果。
  9. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于从所述存储器中调用并运行所述计算机程序,使得所述电子设备执行权利要求1至7中任一项所述的方法。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行权利要求1至7中任一项所述的方法。
PCT/CN2021/138099 2021-10-19 2021-12-14 一种面部表情的分类方法和电子设备 WO2023065503A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111216040.6 2021-10-19
CN202111216040.6A CN114067389A (zh) 2021-10-19 2021-10-19 一种面部表情的分类方法和电子设备

Publications (1)

Publication Number Publication Date
WO2023065503A1 true WO2023065503A1 (zh) 2023-04-27

Family

ID=80234862

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138099 WO2023065503A1 (zh) 2021-10-19 2021-12-14 一种面部表情的分类方法和电子设备

Country Status (2)

Country Link
CN (1) CN114067389A (zh)
WO (1) WO2023065503A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912924A (zh) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 一种目标图像识别方法和装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114612987A (zh) * 2022-03-17 2022-06-10 深圳集智数字科技有限公司 一种表情识别方法及装置
CN115187579B (zh) * 2022-08-11 2023-05-02 北京医准智能科技有限公司 一种图像类别判定方法、装置及电子设备
CN116597486A (zh) * 2023-05-16 2023-08-15 暨南大学 一种基于增量技术和掩码剪枝的人脸表情平衡识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112464865A (zh) * 2020-12-08 2021-03-09 北京理工大学 一种基于像素和几何混合特征的人脸表情识别方法
CN112651301A (zh) * 2020-12-08 2021-04-13 浙江工业大学 一种整合人脸全局和局部特征的表情识别方法
CN113011386A (zh) * 2021-04-13 2021-06-22 重庆大学 一种基于等分特征图的表情识别方法及系统
US20210248718A1 (en) * 2019-08-30 2021-08-12 Shenzhen Sensetime Technology Co., Ltd. Image processing method and apparatus, electronic device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210248718A1 (en) * 2019-08-30 2021-08-12 Shenzhen Sensetime Technology Co., Ltd. Image processing method and apparatus, electronic device and storage medium
CN112464865A (zh) * 2020-12-08 2021-03-09 北京理工大学 一种基于像素和几何混合特征的人脸表情识别方法
CN112651301A (zh) * 2020-12-08 2021-04-13 浙江工业大学 一种整合人脸全局和局部特征的表情识别方法
CN113011386A (zh) * 2021-04-13 2021-06-22 重庆大学 一种基于等分特征图的表情识别方法及系统

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116912924A (zh) * 2023-09-12 2023-10-20 深圳须弥云图空间科技有限公司 一种目标图像识别方法和装置
CN116912924B (zh) * 2023-09-12 2024-01-05 深圳须弥云图空间科技有限公司 一种目标图像识别方法和装置

Also Published As

Publication number Publication date
CN114067389A (zh) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2023065503A1 (zh) 一种面部表情的分类方法和电子设备
US20210049397A1 (en) Semantic segmentation method and apparatus for three-dimensional image, terminal, and storage medium
CN111524106B (zh) 颅骨骨折检测和模型训练方法、装置、设备和存储介质
EP4099220A1 (en) Processing apparatus, method and storage medium
Borsting et al. Applied deep learning in plastic surgery: classifying rhinoplasty with a mobile app
EP4006776A1 (en) Image classification method and apparatus
CN111932529B (zh) 一种图像分类分割方法、装置及系统
CN110276408B (zh) 3d图像的分类方法、装置、设备及存储介质
CN111899252B (zh) 基于人工智能的病理图像处理方法和装置
EP3933708A2 (en) Model training method, identification method, device, storage medium and program product
EP4322056A1 (en) Model training method and apparatus
CN110598714A (zh) 一种软骨图像分割方法、装置、可读存储介质及终端设备
EP4047509A1 (en) Facial parsing method and related devices
CN112989085B (zh) 图像处理方法、装置、计算机设备及存储介质
WO2021120961A1 (zh) 大脑成瘾结构图谱评估方法及装置
EP4006777A1 (en) Image classification method and device
US20230071661A1 (en) Method for training image editing model and method for editing image
CN111815606B (zh) 图像质量评估方法、存储介质及计算装置
WO2021139351A1 (zh) 图像分割方法、装置、介质及电子设备
CN112529068A (zh) 一种多视图图像分类方法、系统、计算机设备和存储介质
Yanling et al. Segmenting vitiligo on clinical face images using CNN trained on synthetic and internet images
CN113850796A (zh) 基于ct数据的肺部疾病识别方法及装置、介质和电子设备
CN112101456A (zh) 注意力特征图获取方法及装置、目标检测的方法及装置
WO2023173827A1 (zh) 图像生成方法、装置、设备、存储介质及计算机程序产品
Xian et al. Automatic tongue image quality assessment using a multi-task deep learning model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21961241

Country of ref document: EP

Kind code of ref document: A1