CN116563908A - Face analysis and emotion recognition method based on multitasking cooperative network - Google Patents

Face analysis and emotion recognition method based on multitasking cooperative network Download PDF

Info

Publication number
CN116563908A
CN116563908A CN202310204150.3A CN202310204150A CN116563908A CN 116563908 A CN116563908 A CN 116563908A CN 202310204150 A CN202310204150 A CN 202310204150A CN 116563908 A CN116563908 A CN 116563908A
Authority
CN
China
Prior art keywords
face
feature
detail
boundary
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310204150.3A
Other languages
Chinese (zh)
Inventor
宋海裕
王浩宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Finance and Economics
Original Assignee
Zhejiang University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Finance and Economics filed Critical Zhejiang University of Finance and Economics
Priority to CN202310204150.3A priority Critical patent/CN116563908A/en
Publication of CN116563908A publication Critical patent/CN116563908A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face analysis and emotion recognition method based on a multitasking cooperative network. The method comprises the following steps of 1, preprocessing experimental data; 2. constructing an MPNET network model; step 2.1, adopting ResNet18 as a backbone network of an encoder, and extracting semantic information of an input picture; step 2.2, constructing an edge perception branch, and adding a detail perception module DPM and a feature fusion module FFM into the edge perception branch; step 2.3, constructing a segmentation branch for outputting a face analysis result and supervising the face analysis; step 2.4, constructing a classification branch for identifying facial emotion; 3. training MPNET by utilizing the inter-task consistency learning loss function and the intra-task loss function; 4. and (3) performing experiments by adopting a trained MPNET network model, and verifying the model effect on the CelebAMask_HQ data set. The invention integrates the face analysis and the face emotion recognition into a network, has high real-time performance and accuracy, and can be deployed on mobile terminals and other devices.

Description

Face analysis and emotion recognition method based on multitasking cooperative network
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a face analysis and emotion recognition method based on a multitasking collaborative network.
Background
Face analysis is a fine-grained semantic segmentation task, and is often applied to P-graph technology, photo beautifying and the like. The facial emotion recognition is a classification task and can be applied to the fields of man-machine interaction, psychological health assessment and the like. The method develops a new multi-task cooperative network and can realize face analysis and emotion recognition at the same time. Compared with other methods, the method has the advantages that the reasoning speed and the accuracy are obviously improved, and the method can be deployed to mobile terminal equipment such as mobile phones and the like.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a face analysis and emotion recognition method based on a multitasking cooperative network. Face analysis and emotion recognition are realized. The invention provides a deep learning model named MPNET, which comprises the following specific steps:
step 1, preprocessing experimental data;
step 2, constructing an MPNET network model;
step 3, training an MPNET network model;
and 4, carrying out experiments on the face analysis data sets by adopting a trained MPNET network model, and evaluating the experimental results.
The step 1 specifically comprises the following steps:
step 1.1, in order to improve the generalization capability of a model, firstly, carrying out normalization processing on an image;
step 1.2, clipping the normalized image, wherein the size of the image is 512 multiplied by 512;
step 1.3, carrying out data enhancement on the cut image, specifically through random rotation and random scaling;
step 1.4 divides the training set, the validation set and the test set.
The step 2 comprises the following steps:
step 2.1, adopting ResNet18 as a backbone network of an encoder, and extracting semantic information of an input picture;
step 2.2, constructing an edge perception branch, and adding a Detail Perception Module (DPM) and a Feature Fusion Module (FFM) into the edge perception branch.
The second layer of features of the ResNet18 firstly pass through a detail perception module DPM, and the output of the detail perception module DPM and the features of the third layer of features of the ResNet18 which are subjected to 2 times up-sampling are subjected to feature fusion together through a feature fusion module FFM to obtain fusion features I;
further, the fusion feature I passes through the detail perception module DPM again, and the output of the fusion feature I and the feature of the fourth layer of ResNet18 which is subjected to 4 times of up-sampling are fused together through the feature fusion module FFM to obtain a fusion feature II; after the final fusion feature II is up-sampled again by the detail perception module DPM and 4 times, the final fusion feature II is respectively sent into two Detai heads to obtain a facial bi-classification boundary mapAnd face multi-class boundary map->The Detail Head consists of a layer of 3×3 convolutions, a batch norm layer, a Relu activation function, and a 1×1 convolution.
The main structure of detail perception module DPM is as follows:
for input feature X, spatial attention is first obtained through a global max pooling layer and two 1X 1 convolution layers. The input feature X is passed through a global averaging pooling layer and two 1X 1 convolution layers to obtain the spatial attention figure ii. The spatial attention pattern I is added to the spatial attention pattern II, and the final spatial attention pattern is obtained through a softmax function. And multiplying the spatial attention pattern by the input feature X to obtain an output feature y.
Further, the output feature y is used as a new input feature, the channel attention pattern I is obtained through a global maximum pooling layer, and the input feature is obtained through a global average pooling layer. Adding the channel attention pattern II and the channel attention pattern II, then obtaining a final channel attention pattern through a 1X 1 convolution layer and a softmax function, and multiplying the channel attention pattern with the input features to obtain the features finally passing through the detail perception module.
The main structure of the feature fusion module is as follows:
input feature Z 1 and Z2 . The two features are spliced, and the branching attention pattern is obtained through a global average pooling layer, a convolution layer of 1 multiplied by 1 and a softmax function. Expanding the branch attention according to the channel dimension, and respectively matching the branch attention with Z according to the set dimension index 1 and Z2 And multiplying to obtain the final fused output characteristic Z. For example: the dimension of the branch attention pattern is 512, and the weight value corresponding to the expanded number 0-255 channel is equal to Z 1 Multiplying the weight value corresponding to the 256-511 channel with Z 2 The multiplication is performed and,
and 2.3, constructing a segmentation branch for outputting a face analysis result and supervising the face analysis.
For the fifth layer of the ResNet18, a five-layer decoder with the same structure is designed, each decoder is composed of a 3×3 convolution layer and upsampling, and the input features are restored to the original resolution through the five-layer decoder with the same structure, so as to obtain the supervised face analysis result
For the fifth layer of characteristics of the ResNet18 of the encoder, 8 times of up-sampling is performed to obtain a characteristic Y2, then the characteristic Y1 of the last detail sensing module DPM in the edge sensing branch and the characteristic Y2 after 8 times of up-sampling are sent to a double-image self-adaptive learning module DGALM, and a final face analysis result can be obtained after the characteristic of the DGALM module and a Seg Head. The Seg Head consists of a layer of 3 x 3 convolutions, a batch norm layer, a Relu activation function, and a 1 x 1 convolution.
The main structure of the double-graph self-adaptive learning module is as follows, firstly, the feature Y1 and the feature Y2 are spliced, and the spliced features are respectively convolved by two 1 multiplied by 1 to obtain a semantic feature graph Z semantic And detail profile Z detail . In the boundary sensing branch, two-class face boundaries are obtainedThen will->Scaling to 1/4 of the original size, and then classifying the face boundary by two classes to obtain Z semantic and Zdetail The distinction is made between boundary pixels and non-boundary pixels, the specific formula is as follows:
[Z detail_edge ,Z detail_noneedge ]=Z detail ⊙[Mask,A-Mask]
[Z semantic_edge ,Z semantic_noneedge ]=Z semantic ⌒[Mask,A-Mask]
wherein, represents matrix dot product, Z detail_noneedge Is a detail feature map not containing boundary pixels, Z detail_edge Is a feature map containing details of boundary pixels. Z is Z semantic_noneedge Is a semantic feature map not containing boundary pixels, Z semantic_edge Is a semantic feature map containing boundary pixels. A is a matrix containing only element 1. argmax dim=2 Representing the index along the second dimension of the feature, obtaining the maximum value.
Further, the result of the face supervision analysis is obtained in the segmentation branchThen will->Scaling to 1/4 of the original size, and selecting Topk elements from the original size as face components to represent the top of the graph.
wherein ,Zgraph_semantic Is a human face semantic component, Z graph_detail Is a face detail constituent component Z semantic_noneedge Is a semantic feature map, Z, not containing boundaries detail_edge Is a detail feature map containing boundaries, and C is the number of channels of the feature.
Further, graph reasoning is carried out through a layer of graph convolution, and remote interaction among pixels of different face components is established by utilizing message transmission of graph nerves, so that the face component is obtained and />
Further, a mapping matrix P is constructed 1 and P2 The features are mapped into the original geometric space, and the specific implementation is as follows:
further, the transposed mapping matrix is multiplied by the features after graph reasoning, the features are mapped back to the original geometric space, and the final feature output result is X out
wherein ,representing a semantic feature map mapped back to the original geometric space; />Representing a detail feature map mapped back to the original geometric space; />Representing a feature stitching operation.
And 2.5, constructing a classification branch for face emotion recognition.
The last layer (fifth layer) feature s= [ S ] for the encoder ResNet18 output 1 ,s 2 ,...,s C], wherein Will s i The image block is regarded as an image block input into a transducer layer, then the image block is sent into the transducer layer, and finally the output characteristics are subjected to a layer of MLP to obtain the result of facial emotion recognition>
The step 3 comprises the following steps:
and 3.1, constructing an intra-task loss function.
Firstly, a loss function of a segmentation branch mainly comprises the loss of supervision face analysis and the loss of output face analysis, and the cross entropy loss function is used, and specifically comprises the following steps:
further, constructing a loss function of the boundary-aware branch, we use a cross entropy loss function, specifically as follows:
further, a loss function of facial emotion recognition is constructed, and a cross entropy loss function is used, specifically as follows:
further, the total intra-task loss function is:
and 3.2, constructing a consistency loss function among tasks.
HoldingThe 0 # channel of (2) is unchanged, and +.> Seg 2-joint-3 Representing a categorized face boundary.
And calculating a task consistency loss function between the classified boundary tasks and the multi-classified boundary tasks by using the dice coefficient.
Further, a consistency loss function between the classification boundary task, the multi-classification boundary task and the face analysis task is calculated. First alongObtain the index of the maximum value in the second dimension direction of (2) to obtainThen using a boundary localization algorithm for +.>The pixel points located in the boundary are assigned 1, and other non-boundary pixel points are assigned 0. Will->And->Multiplication to obtain
Then calculate
And then, calculating a task consistency loss function between the analysis task and the classification boundary task and a consistency loss function of the analysis task and the multi-classification boundary task respectively by using the dice coefficient.
Further, the overall inter-task consistency loss function is.
The step 4 specifically comprises the following steps:
step 4.1: f1 coefficient is introduced to evaluate the effects of face analysis and emotion recognition, and the definition is as follows:
compared with the prior art, the beneficial results of the invention are that:
the invention realizes face analysis and face emotion recognition by establishing the deep learning model of MPNET. And boundary perception branches are added, so that the face analysis result is more refined. Meanwhile, a double-image self-adaptive learning module is added, and the dependency relationship among different face components is established. Meanwhile, the FPS of MPNET on RTX 3090 reaches 92.9, the model parameter is only 11.63M, and the MPNET data processing method has high instantaneity and can be deployed on equipment such as mobile terminals.
Drawings
Fig. 1 is a network configuration diagram of MPENet.
Fig. 2 is an example effect of MPENet versus other models.
Fig. 3 is an example effect of MPENet ablation experiments.
Detailed Description
The invention will be further described with reference to the drawings and the specific examples.
In order to solve the problems encountered in face analysis and face expression recognition, the invention designs a novel multi-task collaborative learning network for face analysis and face emotion recognition. Specifically, MPENet consists of one shared encoder and three downstream branches (classification branch, segmentation branch and edge-aware branch). In the classification branch, we design a transducer module to convert the features extracted by the shared encoder into embedded level features for facial expression recognition. In the edge perception branches, we utilize multi-class face boundaries and two class boundaries to extract face boundary information, helping face analysis tasks to better locate face boundaries. In the segmentation branch, a dual-image self-adaptive learning module is used for fusing the edge information and semantic information of images to infer the relation between different characteristic areas and capturing more context relations, and meanwhile, an additional decoder is designed to serve as the supervision output of face analysis, so that a finer analysis image is obtained. Finally, a consistency learning loss function is designed among tasks, so that the tasks are matched with each other, and the overall accuracy of the model is improved.
Example 1 pretreatment of experimental data.
(1) And (5) normalizing the data.
(2) The picture is cropped to a size of 512 x 512.
(3) And carrying out data enhancement on the cut image, and carrying out random rotation and random scaling.
(4) The data set is divided into a training set, a validation set and a test set.
Example 2 an MPENet network model was constructed.
(1) Extracting semantic information by adopting ResNet18 as backbone network of encoder
(2) And constructing boundary sensing branches. The second layer features of ResNet first go through DPM and then go through FFM along with the third layer features that go through 2-fold upsampling. Further, the fused features are again DPM passed along with the 4-fold upsampled fourth layer features of ResNet18 through FFM. And finally, after DPM and quadruple up-sampling, the fused features are respectively sent into two Detai Head to obtain a face classification boundary map and a face multi-classification boundary map.
(3) For the fifth layer of characteristics of the encoder ResNet18, we designed a five-layer decoder structure, each decoder is composed of a 3×3 convolutional layer and upsampling, through which the input characteristics are restored to the original resolution, and the supervised face parsing result can be obtained.
(4) For the fifth layer feature of the decoder ResNet18, we first perform 8 times of upsampling, then send the feature X of the last DPM in the edge perception branch and the feature Y after 8 times of upsampling together into the DGALM, and then obtain the final face analysis result through the feature of the DGALM module and a Seghead.
(5) The final facial emotion classification result can be obtained by the last layer of characteristics output by the ResNet18 of the encoder through a layer of transformers and then through a layer of MLPs.
Example 3A DA-Net network model was trained.
(1) And adopting an SGD optimization mode as an optimization method.
(2) The res net18 network weights of the MPENet encoder employ weights pre-trained on the ImageNet dataset.
Example 4 experiments were performed on the public face dataset celebamask_hq using a trained MPENet network model and the experimental effect was evaluated.
(1) Table 1 below is a comparison of MPENet against the current mainstream semantic segmentation framework effect on the celebamask_hq dataset. The average F1 coefficient of the model reaches 85.9%, and Mean F1 of the emotion of the human face is 80.04%. See in particular the MPENet effect comparison with other methods in table 1.
Table 1. Comparison of mpenet with results from other models
(2) Table 2 below shows an ablation experiment of MPENet on the celebamask_hq dataset, and it can be seen that each module of MPENet can improve the model accuracy.
Table 2 ablation experiments of mpenet
(3) Table 2 below shows the performance comparison of MPNET with other models, and it can be seen that MPNET is at a leading level in both inference speed and resolution, FPS is 92.9, and model parameters are only 11.6.
TABLE 3 comparison of performance of MPNET with other models

Claims (10)

1. A face analysis and emotion recognition method based on a multitasking cooperative network is characterized by comprising the following steps:
step 1, preprocessing experimental data;
step 2, constructing an MPNET network model;
step 3, training an MPNET network model;
and 4, carrying out experiments on the face analysis data sets by adopting a trained MPNET network model, and evaluating the experimental results.
2. The face parsing and emotion recognition method based on the multitasking collaborative network according to claim 1, wherein the step 2 includes the steps of:
step 2.1, adopting ResNet18 as a backbone network of an encoder, and extracting semantic information of an input picture;
step 2.2, constructing an edge perception branch, and adding a detail perception module DPM and a feature fusion module FFM into the edge perception branch;
step 2.3, constructing a segmentation branch for outputting a face analysis result and supervising the face analysis;
and 2.4, constructing a classification branch for face emotion recognition.
3. The face parsing and emotion recognition method based on the multitasking collaborative network according to claim 2, wherein the step 2.2 is specifically implemented as follows:
the second layer of features of the ResNet18 firstly pass through a detail perception module DPM, and the output of the detail perception module DPM and the features of the third layer of features of the ResNet18 which are subjected to 2 times up-sampling are subjected to feature fusion together through a feature fusion module FFM to obtain fusion features I; the fusion feature I passes through the detail perception module DPM again, and the output of the fusion feature I and the feature of the ResNet18 fourth layer which is subjected to 4 times of upsampling are fused together through the feature fusion module FFM to obtain a fusion feature II; after the final fusion feature II is up-sampled again by the detail perception module DPM and 4 times, the final fusion feature II is respectively sent into two Detai heads to obtain a facial bi-classification boundary mapAnd face multi-class boundary map->
4. A face parsing and emotion recognition method based on a multitasking collaborative network according to claim 2 or 3, characterized in that the detail perception module DPM has the following structure:
for the input feature X, firstly, a space attention figure I is obtained through a global maximum pooling layer and two 1X 1 convolution layers; the input feature X is subjected to a global average pooling layer and two 1 multiplied by 1 convolution layers to obtain a spatial attention diagram II; the spatial attention pattern I is added with the spatial attention pattern II, and the final spatial attention pattern is obtained through a softmax function; multiplying the spatial attention pattern by the input feature X to obtain an output feature y;
taking the output characteristic y as a new input characteristic, obtaining a channel attention figure I through a global maximum pooling layer, and obtaining a channel attention figure II through a global average pooling layer by the input characteristic; adding the channel attention pattern II and the channel attention pattern II, then obtaining a final channel attention pattern through a 1X 1 convolution layer and a softmax function, and multiplying the channel attention pattern with the input features to obtain the features finally passing through the detail perception module.
5. A face parsing and emotion recognition method based on a multitasking collaborative network according to claim 2 or 3, characterized in that the feature fusion module has the following structure:
for input feature Z 1 and Z2 Firstly, splicing two features, and then obtaining a branch attention pattern through a global average pooling layer, a convolution layer of 1 multiplied by 1 and a softmax function; expanding the branch attention according to the channel dimension, and respectively matching the branch attention with Z according to the set dimension index 1 and Z2 And multiplying to obtain the final fused output characteristic Z.
6. The face parsing and emotion recognition method based on the multitasking collaborative network according to claim 2, wherein the step 2.3 is specifically implemented as follows:
for the fifth layer of the ResNet18, a five-layer decoder with the same structure is designed, each decoder is composed of a 3×3 convolution layer and upsampling, and the input features are restored to the original resolution through the five-layer decoder with the same structure, so as to obtain the supervised face analysis result
For the fifth layer of characteristics of the ResNet18 of the encoder, 8 times of up-sampling is performed to obtain a characteristic Y2, then the characteristic Y1 of the last detail sensing module DPM in the edge sensing branch and the characteristic Y2 after 8 times of up-sampling are sent to a double-image self-adaptive learning module DGALM, and a final face analysis result can be obtained after the characteristics of the DGALM and a Seg Head.
7. The face parsing and emotion recognition method based on the multi-task cooperative network as claimed in claim 6, wherein the structure of the dual-graph adaptive learning module is as follows:
(1) splicing the feature Y1 and the feature Y2, and respectively carrying out convolution on the spliced features by two 1 multiplied by 1 to obtain a semantic feature map Z semantic And detail profile Z detail The method comprises the steps of carrying out a first treatment on the surface of the In the boundary sensing branch, two-class face boundaries are obtainedThen willScaling to 1/4 of the original size, and then classifying the face boundary by two classes to obtain Z semantic and Zdetail The distinction is made between boundary pixels and non-boundary pixels, the specific formula is as follows:
[Z detail_edge ,Z detail_noneedge ]=Z detail ⊙[Mask,A-Mask]
[Z semantic_edge ,Z semantic_noneedge ]=Z semantic ⊙[Mask,A-Mask]
wherein, represents matrix dot product, Z detail_noneedge Is a detail feature map not containing boundary pixels, Z detail_edge Is a feature map comprising details of boundary pixels; z is Z semantic_noneedge Is a semantic feature map not containing boundary pixels, Z semantic_edge Is a semantic feature map containing boundary pixels; a is a matrix containing only element 1; argmax dim=2 An index indicating a maximum value obtained along a second dimension of the feature;
(2) obtaining the result of face supervision analysis in the segmentation branchThen will->Scaling to 1/4 of the original size, and selecting Topk elements from the scaled values to serve as face components to represent the top points of the graph;
wherein ,Zgraph_semantic Is a human face semantic component, Z graph_detail Is a face detail constituent component Z semantic_noneedge Is a semantic feature map, Z, not containing boundaries detail_edge A detail feature map including boundaries, C being the number of channels of the feature;
(3) graph reasoning is carried out through one layer of graph convolution to obtain and />
(4) Construction of mapping matrix P 1 and P2 The features are mapped into the original geometric space, and the specific implementation is as follows:
(5) multiplying the transposed mapping matrix by the features inferred from the graph, mapping the features back to the original geometric space, and outputting the final feature result X out
wherein ,representing a semantic feature map mapped back to the original geometric space; />Representing a detail feature map mapped back to the original geometric space; />Representing a feature stitching operation.
8. The face parsing and emotion recognition method based on the multitasking collaborative network according to claim 7, wherein the step 2.5 is specifically implemented as follows:
for the last layer of characteristics s= [ S ] of the encoder res net18 output 1 ,s 2 ,...,s C], wherein Will s i The image block is regarded as an image block input into a transducer layer, then the image block is sent into the transducer layer, and finally the output characteristics are subjected to a layer of MLP to obtain the result of facial emotion recognition>
9. The face analysis and emotion recognition method based on the multi-task cooperative network as set forth in claim 7, wherein the step 3 includes constructing an intra-task loss function, specifically implemented as follows:
3-1-1. Loss function of segmentation branch mainly comprises loss Seg for supervising face analysis True And outputting a loss Seg of face parsing Pre The cross entropy loss function is used, specifically as follows:
3-1-2. Constructing a loss function of the boundary sensing branch, and using a cross entropy loss function, wherein the loss function is specifically as follows:
3-1-3, constructing a loss function of facial emotion recognition, and using a cross entropy loss function, wherein the loss function is specifically as follows:
3-1-4. The total intra-task loss function is:
wherein ,λ0 、λ 1 and λ2 Is a scaling factor.
10. The face parsing and emotion recognition method based on a multitasking collaborative network according to claim 8 or 9, wherein step 3 includes constructing a task-to-task consistency loss function, specifically implemented as follows:
3-2-1. First keepThe 0 # channel of (2) is unchanged, and +.> Seg 2-joint-3 Representing the boundary of the classified faces;
3-2-2, calculating a task consistency loss function between the classified boundary tasks and the multi-classified boundary tasks by using the dice coefficient;
3-2-3, calculating a consistency loss function among the two-classification boundary task, the multi-classification boundary task and the face analysis task;
(1) First alongObtaining the index of the maximum value, obtaining +.>
(2) Then using a boundary locating algorithm forThe pixel points positioned in the boundary are assigned 1, and other non-boundary pixel points are assigned 0; will->And->Multiplication to obtain +.>
(3) Calculating
(4) The method comprises the steps of respectively calculating a task consistency loss function between an analysis task and a classification boundary task and a consistency loss function of the analysis task and a multi-classification boundary task by using a dice coefficient, wherein the consistency loss function comprises the following specific steps:
(5) The overall inter-task consistency loss function is:
CN202310204150.3A 2023-03-06 2023-03-06 Face analysis and emotion recognition method based on multitasking cooperative network Pending CN116563908A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310204150.3A CN116563908A (en) 2023-03-06 2023-03-06 Face analysis and emotion recognition method based on multitasking cooperative network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310204150.3A CN116563908A (en) 2023-03-06 2023-03-06 Face analysis and emotion recognition method based on multitasking cooperative network

Publications (1)

Publication Number Publication Date
CN116563908A true CN116563908A (en) 2023-08-08

Family

ID=87492218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310204150.3A Pending CN116563908A (en) 2023-03-06 2023-03-06 Face analysis and emotion recognition method based on multitasking cooperative network

Country Status (1)

Country Link
CN (1) CN116563908A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274608A (en) * 2023-11-23 2023-12-22 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274608A (en) * 2023-11-23 2023-12-22 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance
CN117274608B (en) * 2023-11-23 2024-02-06 太原科技大学 Remote sensing image semantic segmentation method based on space detail perception and attention guidance

Similar Documents

Publication Publication Date Title
CN111178211B (en) Image segmentation method, device, electronic equipment and readable storage medium
CN112330574B (en) Portrait restoration method and device, electronic equipment and computer storage medium
CN113240580A (en) Lightweight image super-resolution reconstruction method based on multi-dimensional knowledge distillation
CN112287940A (en) Semantic segmentation method of attention mechanism based on deep learning
CN112001241B (en) Micro-expression recognition method and system based on channel attention mechanism
CN109711356B (en) Expression recognition method and system
CN109657582A (en) Recognition methods, device, computer equipment and the storage medium of face mood
CN114092764A (en) YOLOv5 neural network vehicle detection method added with attention mechanism
CN112801146A (en) Target detection method and system
CN116645716B (en) Expression recognition method based on local features and global features
CN110738153A (en) Heterogeneous face image conversion method and device, electronic equipment and storage medium
CN111797811B (en) Blind person navigation system based on image understanding
CN116563908A (en) Face analysis and emotion recognition method based on multitasking cooperative network
CN114529982A (en) Lightweight human body posture estimation method and system based on stream attention
CN113592726A (en) High dynamic range imaging method, device, electronic equipment and storage medium
CN115620384B (en) Model training method, fundus image prediction method and fundus image prediction device
CN116229077A (en) Mathematical function image example segmentation method based on improved Mask-R-CNN network
Jang et al. Lip reading using committee networks with two different types of concatenated frame images
CN113361387A (en) Face image fusion method and device, storage medium and electronic equipment
CN113935435A (en) Multi-modal emotion recognition method based on space-time feature fusion
Panetta et al. Deep perceptual image enhancement network for exposure restoration
CN113724134A (en) Aerial image blind super-resolution reconstruction method based on residual distillation network
CN114764754B (en) Occlusion face restoration method based on geometric perception priori guidance
CN110427892B (en) CNN face expression feature point positioning method based on depth-layer autocorrelation fusion
CN114005157A (en) Micro-expression recognition method of pixel displacement vector based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination