CN115661911A - Face feature extraction method, device and storage medium - Google Patents

Face feature extraction method, device and storage medium Download PDF

Info

Publication number
CN115661911A
CN115661911A CN202211658800.3A CN202211658800A CN115661911A CN 115661911 A CN115661911 A CN 115661911A CN 202211658800 A CN202211658800 A CN 202211658800A CN 115661911 A CN115661911 A CN 115661911A
Authority
CN
China
Prior art keywords
feature
feature map
layer
feature extraction
activation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211658800.3A
Other languages
Chinese (zh)
Other versions
CN115661911B (en
Inventor
朱文忠
肖顺兴
车璇
李韬
杜洪文
谢康康
谢林森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University of Science and Engineering
Original Assignee
Sichuan University of Science and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University of Science and Engineering filed Critical Sichuan University of Science and Engineering
Priority to CN202211658800.3A priority Critical patent/CN115661911B/en
Publication of CN115661911A publication Critical patent/CN115661911A/en
Application granted granted Critical
Publication of CN115661911B publication Critical patent/CN115661911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a face feature extraction method, face feature extraction equipment and a storage medium, and belongs to the technical field of face recognition. The feature extraction method comprises the steps of obtaining a face image, obtaining a trained feature extraction network model, extracting basic feature information of the face image by using a basic operation layer, inputting a basic feature map into a first deep tone feature extraction monomer, taking a feature map output by an upstream deep tone feature extraction monomer of a next deep tone feature extraction monomer as input, and generating and outputting a corresponding intermediate feature map; repeating continuously until the last deep tone feature extraction monomer generates and outputs a final-stage feature map; and inputting the final-stage feature map into a feature shaping unit to generate a face feature vector, and finishing face feature extraction. The feature extraction network model of the invention gradually modulates the feature information by setting a plurality of space attention mechanisms, and the network can well eliminate noise and extract the core feature information.

Description

Face feature extraction method, device and storage medium
Technical Field
The invention belongs to the technical field of face recognition, and particularly relates to a face feature extraction method, face feature extraction equipment and a storage medium.
Background
With the improvement and popularization of hardware performance, the face recognition technology gradually leaves out of laboratories and enters the daily life of people. Through long-term development, many face recognition algorithms can well deal with common real scenes at present, and satisfactory recognition accuracy is achieved. However, when the quality of the obtained face image is poor (such as unsatisfactory illumination, great posture change, various expression changes, etc.), the existing algorithm still has the problem of poor robustness, and especially for facial image changes caused by age changes, the existing algorithm is difficult to effectively extract required feature information.
Disclosure of Invention
In view of the above-mentioned deficiencies in the prior art, the present invention provides a method, an apparatus and a storage medium for extracting facial features, so as to effectively learn and extract feature information in an age-spanning facial image, and improve the accuracy of identifying the age-spanning facial image.
In order to achieve the above purpose, the solution adopted by the invention is as follows: a face feature extraction method comprises the following steps:
s100, obtaining a face image, and obtaining a trained feature extraction network model; the characteristic extraction network model is sequentially provided with a basic operation layer, a plurality of deep tone characteristic extraction monomers and a characteristic shaping unit, wherein the plurality of deep tone characteristic extraction monomers are sequentially connected in series;
s200, inputting the face image into the feature extraction network model, extracting basic feature information of the face image by using the basic operation layer, and then generating a basic feature map;
s300, inputting the basic feature map into a first deep tone feature extraction monomer, and outputting a primary feature map by the first deep tone feature extraction monomer after feature extraction operation;
s400, the next deep tone feature extraction monomer takes the feature graph output by the upstream deep tone feature extraction monomer as input, then carries out feature extraction operation, generates and outputs a corresponding intermediate feature graph;
s500, continuously repeating the step S400 until the last deep tone feature extraction monomer generates and outputs a final-stage feature map;
s600, inputting the final-stage feature map into the feature shaping unit, and after shaping operation is carried out on the final-stage feature map, generating a face feature vector to finish face feature extraction;
the calculation operation process inside the deep tone feature extraction monomer is represented as the following mathematical model:
Figure 178989DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 635378DEST_PATH_IMAGE002
a precursor feature map representing input said depth feature extraction singlets,
Figure 125265DEST_PATH_IMAGE003
Figure 983500DEST_PATH_IMAGE004
and
Figure 799009DEST_PATH_IMAGE005
respectively representing a first convolution operation, a second convolution operation and a third convolution operation,
Figure 227847DEST_PATH_IMAGE006
Figure 255846DEST_PATH_IMAGE007
Figure 968587DEST_PATH_IMAGE008
Figure 689419DEST_PATH_IMAGE009
and
Figure 120400DEST_PATH_IMAGE010
respectively representing a first attention module, a second attention module, a third attention module, a fourth attention module and a fifth attention module,
Figure 952090DEST_PATH_IMAGE011
it is shown that the elements correspond to a product operation,
Figure 768605DEST_PATH_IMAGE012
Figure 660338DEST_PATH_IMAGE013
and
Figure 578615DEST_PATH_IMAGE014
respectively representing a first activation function, a second activation function and a third activation function,
Figure 948416DEST_PATH_IMAGE015
Figure 635750DEST_PATH_IMAGE016
and
Figure 449116DEST_PATH_IMAGE017
respectively represent
Figure 323531DEST_PATH_IMAGE012
Figure 28182DEST_PATH_IMAGE013
And
Figure 570022DEST_PATH_IMAGE014
the first characteristic diagram, the second characteristic diagram and the third characteristic diagram generated after activation,
Figure 803557DEST_PATH_IMAGE018
a fourth feature map obtained by calibrating the third feature map by the third attention module is shown,
Figure 165268DEST_PATH_IMAGE019
a multi-scale fusion unit is represented,
Figure 391719DEST_PATH_IMAGE020
representing a side branch feature map generated by fusing the first feature map, the second feature map and the third feature map by the multi-scale fusion unit,
Figure 788065DEST_PATH_IMAGE021
a fifth feature map obtained by adding the side branch feature map after calibration to the fourth feature map for the fourth attention module,
Figure 458081DEST_PATH_IMAGE022
a first process feature map output from the first attention module,
Figure 307088DEST_PATH_IMAGE023
a second process feature map output from the second attention module,
Figure 87963DEST_PATH_IMAGE024
a third process feature map output from the third attention module,
Figure 89548DEST_PATH_IMAGE025
and
Figure 930465DEST_PATH_IMAGE026
respectively represent a fourth process profile anda fifth pass Cheng Tezheng map, the fourth process feature map and the fifth process feature map both output from the fourth attention module,
Figure 266768DEST_PATH_IMAGE027
representing a dimension-varying unit for increasing the number of feature map channels and decreasing feature map width and height dimensions,
Figure 585754DEST_PATH_IMAGE028
and representing the back-driving feature map of the deep tone feature extraction monomer output.
Further, the convolution kernel sizes of the first convolution operation, the second convolution operation and the third convolution operation are all 3*3, and the step sizes are all 1; the first activation function, the second activation function, and the third activation function are all ReLU functions.
Further, the internal operation process of the multi-scale fusion unit is represented as the following mathematical model:
Figure 956693DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 702932DEST_PATH_IMAGE015
Figure 769939DEST_PATH_IMAGE016
and
Figure 158195DEST_PATH_IMAGE017
respectively showing a first feature map, a second feature map and a third feature map, the multi-scale fusion unit taking the first feature map, the second feature map and the third feature map as input,
Figure 118061DEST_PATH_IMAGE020
representing a side-branch feature map as an output of the multi-scale fusion unit,
Figure 35202DEST_PATH_IMAGE011
it is shown that the elements correspond to a product operation,
Figure 611677DEST_PATH_IMAGE030
representing a first melting characteristic diagram generated by adding the first characteristic diagram, the second characteristic diagram and the third characteristic diagram,
Figure 23197DEST_PATH_IMAGE031
representing a second melting characteristic diagram generated by multiplying the first characteristic diagram, the second characteristic diagram and the third characteristic diagram by element correspondence,
Figure 103149DEST_PATH_IMAGE032
it is shown that the feature maps are stitched together,
Figure 456770DEST_PATH_IMAGE033
and
Figure 254962DEST_PATH_IMAGE034
respectively representing a fourth convolution operation and a fifth convolution operation, the convolution kernel sizes of the fourth convolution operation and the fifth convolution operation are 1*1, and the step sizes are 1,
Figure 453862DEST_PATH_IMAGE035
and
Figure 903166DEST_PATH_IMAGE036
respectively representing a fourth activation function and a fifth activation function, both the fourth activation function and the fifth activation function being ReLU functions,
Figure 427689DEST_PATH_IMAGE037
representing a third melting profile generated upon activation of the fourth activation function.
Further, a hierarchical pooling layer and a hierarchical activation function are arranged in each of the first attention module, the second attention module and the third attention module, the hierarchical pooling layer is arranged at an upstream end of the hierarchical activation function, the hierarchical pooling layer is used for performing global maximum pooling operation on the feature map in the channel direction, and the hierarchical activation function is sigmoid;
the first, second, and third process profiles are matrices of hierarchical pooling outputs of the first, second, and third attention modules, respectively.
Furthermore, a branch pooling layer, an introduction full-connection layer, an introduction activation layer, an extraction full-connection layer and an extraction activation layer are sequentially arranged in the fourth attention module; the branch pooling layer is used for performing global maximum pooling operation on the feature map in the space direction, the lead-in activation layer is a nonlinear activation function ReLU, and the lead-out activation layer is a nonlinear activation function sigmoid;
the fourth process characteristic diagram is a vector output after the operation of the branch pooling layer, and the fifth process characteristic diagram is a vector output after the activation of the lead-out activation layer.
Further, the mathematical model of the fifth attention module is:
Figure 447597DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 715768DEST_PATH_IMAGE022
Figure 770311DEST_PATH_IMAGE023
Figure 200156DEST_PATH_IMAGE024
Figure 723672DEST_PATH_IMAGE025
and
Figure 529954DEST_PATH_IMAGE026
the first process profile, the second process profile, the third process profile, and a fourth process profile representing input to a fifth attention module, respectively,The fourth process signature and the fifth process signature Cheng Tezheng,
Figure 439004DEST_PATH_IMAGE032
showing the splicing operation to the characteristic diagrams therein,
Figure 305329DEST_PATH_IMAGE039
representing a first internal reference feature map generated by splicing the first process feature map, the second process feature map and the third process feature map,
Figure 34251DEST_PATH_IMAGE040
and
Figure 159070DEST_PATH_IMAGE041
respectively representing a first bridging fully connected layer and a second bridging fully connected layer,
Figure 188206DEST_PATH_IMAGE042
Figure 959853DEST_PATH_IMAGE043
and
Figure 707229DEST_PATH_IMAGE044
respectively representing a first bridging activation function, a second bridging activation function and an integration activation function,
Figure 855314DEST_PATH_IMAGE045
and
Figure 473377DEST_PATH_IMAGE046
respectively representing a second internal reference feature diagram and a third internal reference feature diagram generated after the first bridging activation function and the second bridging activation function are activated,
Figure 432237DEST_PATH_IMAGE011
it is shown that the elements correspond to a product operation,
Figure 135751DEST_PATH_IMAGE047
which represents the integrated pooling layer of the pool,the integrated pooling layer is used for performing global maximum pooling operation on the feature map in the channel direction,
Figure 353105DEST_PATH_IMAGE048
an integrated attention map representing the output of the fifth attention module.
Furthermore, the dimension-changing unit comprises a dimension-changing convolutional layer, a dimension-changing activation layer and a dimension-changing pooling layer which are sequentially arranged, the convolutional kernel size of the dimension-changing convolutional layer is 3*3, the step length is 1, the dimension-changing activation layer is a ReLU function, the dimension-changing pooling layer is used for performing maximum pooling operation on the feature map, and the pooling window size of the dimension-changing pooling layer is 2*2, and the step length is 2.
Further, the feature shaping unit comprises a shaping pooling layer, a trunk full-connection layer and a shaping activation layer which are connected in sequence, the shaping pooling layer is used for performing global average pooling operation on the feature map in the spatial direction, and the shaping activation layer is a sigmoid function.
The invention also provides a face feature extraction device, which comprises a processor and a memory, wherein the memory stores a computer program, and the processor is used for executing the face feature extraction method by loading the computer program.
The present invention also provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the face feature extraction method as described above.
The invention has the beneficial effects that:
(1) With the increase of the depth of the network, the receptive field of the characteristic diagram is gradually increased, the characteristic extraction network model of the invention gradually modulates the characteristic information by setting a plurality of space attention mechanisms, and more accurately calibrates different space position information, and the network can well eliminate noise and extract the core characteristic information for the image difference caused by different ages;
(2) Research shows that after the multi-scale fusion unit is adopted to fuse the hierarchy information, the effect of improving the network performance by using one channel attention mechanism is similar to that of the channel attention mechanism arranged on each hierarchy, but the calculation amount is smaller than that of the channel attention mechanisms, and the network is lighter, because the multi-scale fusion unit well removes the interference information in the hierarchy information in the process of fusing the hierarchy information, the calibration efficiency of the channel attention mechanism is greatly improved;
(3) The network architecture of a Transformer and the like based on a pure attention mechanism proves that a large amount of effective information still exists in the attention mechanism, the information is fully utilized, and complex feature mapping can be realized, while in the existing convolutional neural network, only the utilization of the information at the tail output end of the attention mechanism is emphasized, and the information exchange between the middle of the attention mechanism and other parts of the network is lacked, so that the calibration effect of the attention mechanism is limited, and the nonlinear fitting capability of the network to a complex scene is reduced; according to the invention, the fifth attention module integrates and utilizes the intermediate information of the other four attention modules, so that the effect of front-back cooperative modulation of the fifth attention module and the other four attention modules is realized, the modulation consistency and the overall performance are enhanced, and a test result shows that the accuracy of the network for identifying the age-crossing face is obviously improved after the fifth attention module provided by the invention is adopted.
Drawings
Fig. 1 is a feature extraction network model architecture diagram of embodiment 1;
FIG. 2 is a diagram showing the internal structure of a deep tone feature extraction cell in example 1;
FIG. 3 is an internal structural diagram of a first attention module in example 1;
FIG. 4 is a diagram showing the internal architecture of the multi-scale fusion unit in example 1;
FIG. 5 is an internal structural view of a fourth attention module in example 1;
FIG. 6 is an internal structural view of a fifth attention module in example 1;
FIG. 7 is a diagram showing the internal structure of a deep tone feature extraction cell in example 2;
in the drawings:
1-face image, 2-basic operation layer, 3-deep tone feature extraction monomer, 31-first attention module, 311-hierarchical pooling layer, 312-hierarchical activation function, 32-second attention module, 33-third attention module, 34-fourth attention module, 341-branch pooling layer, 342-full-connection layer, 343-activation layer, 344-full-connection layer, 345-activation layer, 35-fifth attention module, 36-multi-scale fusion unit, 37-dimension-changing unit, 371-dimension-changing rolling layer, 372-dimension-changing activation layer, 373-dimension-changing pooling layer, 4-shaping pooling layer, 5-main-stem full-connection layer and 6-shaping activation layer.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
example 1:
fig. 1 shows a structure diagram of a feature extraction network model in this embodiment, and the entire network model is implemented on a computer by programming Python in combination with a pitorch framework. After the face image 1 is input into the network model, firstly, convolution operation is carried out through the basic operation layer 2, and then an output basic characteristic diagram is generated. The convolution kernel size of the base operation layer 2 is 3*3 with a step size of 1. The number of the deep tone feature extraction monomers 3 is 5, the plurality of deep tone feature extraction monomers 3 are sequentially connected end to end, the feature information sequentially passes through each deep tone feature extraction monomer 3, the width and height of each feature map are reduced by half when each feature map passes through one deep tone feature extraction monomer 3, and the number of channels is doubled. The feature shaping unit comprises a shaping pooling layer 4, a trunk full-connection layer 5 and a shaping activation layer 6 which are connected in sequence, the shaping pooling layer 4 is used for performing global average pooling operation on the feature map in the space direction, and the shaping activation layer is a sigmoid function. Let the size of the face image 1 be W × H × 3 (width × height channel, the same below), the feature map output by each module has the following size:
TABLE 1 feature extraction network model output feature map size for each module
Figure 825675DEST_PATH_IMAGE049
Fig. 2 shows an internal architecture diagram of the single deep tone feature extraction unit 3 in this embodiment, and a feature diagram of a certain single deep tone feature extraction unit 3 is input
Figure 470283DEST_PATH_IMAGE050
And the size is K G C, and the sizes of the first feature map, the second feature map, the third feature map, the fourth feature map, the fifth feature map and the side branch feature map are all K G C. The dimension-changing unit 37 comprises a dimension-changing convolution layer 371, a dimension-changing activation layer 372 and a dimension-changing pooling layer 373 which are arranged in sequence, the dimensions of output characteristic diagrams of the dimension-changing convolution layer 371 and the dimension-changing activation layer 372 are K G2C, and the dimension-changing pooling layer 373 outputs the characteristic diagrams
Figure 910360DEST_PATH_IMAGE051
Has the size of K/2*G/2 x 2C.
The internal operation processes of the first attention module 31, the second attention module 32 and the third attention module 33 are the same, and as shown in fig. 3, a hierarchical pooling layer 311 and a hierarchical activation function 312 are provided, which are connected in sequence, the hierarchical pooling layer 311 is used for performing global maximum pooling operation on feature maps in the channel direction, the hierarchical activation function 312 is a sigmoid function, and the outputs of the hierarchical pooling layer 311 and the hierarchical activation function 312 are matrices with a size K G1. First Process feature map
Figure 665827DEST_PATH_IMAGE022
Second process characteristic diagram
Figure 258482DEST_PATH_IMAGE023
And a third process profile
Figure 73991DEST_PATH_IMAGE024
The matrices, which are the output of the hierarchical pooling layer 311 in the first attention module 31, the second attention module 32, and the third attention module 33, respectively, after operation are all also K × G × 1.
FIG. 4 is a diagram illustrating the internal architecture of the multi-scale fusion unit 36 in this embodiment, and the feature map is first obtained by adding and multiplying the corresponding elements
Figure 752097DEST_PATH_IMAGE015
Figure 796408DEST_PATH_IMAGE016
And
Figure 509149DEST_PATH_IMAGE017
preliminary fusion, the first fusion characteristic diagram generated
Figure 229980DEST_PATH_IMAGE030
And a second melting profile
Figure 660962DEST_PATH_IMAGE031
All the dimensions are K G C. Then convolution and a fourth activation function are operated through splicing and a fourth convolution
Figure 492651DEST_PATH_IMAGE052
Activating to obtain a third fusion characteristic diagram of the second fusion
Figure 315026DEST_PATH_IMAGE037
(size K G C). Finally, convolution sum is operated through splicing and fifth convolution
Figure 206759DEST_PATH_IMAGE036
Activation of will
Figure 859457DEST_PATH_IMAGE037
Figure 494838DEST_PATH_IMAGE015
Figure 182171DEST_PATH_IMAGE016
And
Figure 244805DEST_PATH_IMAGE053
fusing to generate side branch feature map
Figure 135531DEST_PATH_IMAGE020
. The multi-scale fusion unit 36 transforms the feature map in a progressive, multi-pass, multi-dimensional manner
Figure 309024DEST_PATH_IMAGE015
Figure 850864DEST_PATH_IMAGE016
And
Figure 84399DEST_PATH_IMAGE017
the method has the advantages of high efficiency and fine denoising capability.
Fig. 5 shows an internal architecture diagram of the fourth attention module 34 in the present embodiment, and the internal architecture diagram is provided with a branch pooling layer 341, an incoming full-link layer 342, an incoming active layer 343, an outgoing full-link layer 344, and an outgoing active layer 345, which are connected in sequence. The branch pooling layer 341 is used for performing global maximum pooling operation on the feature map in the spatial direction, and the fourth process feature map
Figure 711689DEST_PATH_IMAGE025
I.e., the vector output after the operation of the branched pooling layer 341, has a size of 1 × c. The number of input elements of the introduced full-connection layer 342 is C, the number of output elements is C/8, the introduced activation layer 343 is a nonlinear activation function ReLU, the number of input elements of the introduced full-connection layer 344 is C/8, the number of output elements is C, and the introduced activation layer 345 is a nonlinear activation function sigmoid. The fifth process feature map is a vector of 1 × c size that is output after the lead-out active layer 345 is activated.
Fig. 6 shows an internal architecture diagram of the fifth attention module 35 in this embodiment, and a first internal reference feature diagram obtained by splicing the first process feature diagram, the second process feature diagram, and the third process feature diagram
Figure 938140DEST_PATH_IMAGE039
The size is K × G × 3. The number of input elements of the first bridge full connection layer and the second bridge full connection layer is C, the number of output elements is 3, the first bridging activation function, the second bridging activation function and the integration activation function are sigmoid functions and are generated
Figure 600066DEST_PATH_IMAGE045
And
Figure 4502DEST_PATH_IMAGE046
all with a size of 1 x 3, by element-to-element product operation,
Figure 853509DEST_PATH_IMAGE045
and
Figure 899963DEST_PATH_IMAGE046
are respectively as
Figure 901548DEST_PATH_IMAGE039
And distributing weight parameters with different sizes for each layer. The integrated pooling layer is used for performing global maximum pooling operation on the feature maps in the channel direction, and the integrated active layer outputs an integrated attention map with the size K G1. Integration of attention map and fifth feature map
Figure 742465DEST_PATH_IMAGE021
After the element-corresponding product operation, the integrated attention diagram is formed
Figure 813189DEST_PATH_IMAGE021
And different spatial positions are distributed with different weight parameters to realize modulation.
In practical application, the face image 1 inputs the trained feature extraction network model, and a corresponding face feature vector is obtained through feature extraction, and then the distance (L1 distance in this embodiment) between the feature vector and a vector in a preset sample library is calculated, and an identity corresponding to a sample vector which is closest to the face feature vector and has a distance smaller than a preset threshold is the identity of the face image 1, so that face recognition is completed. In the embodiment, the data set VGGFace2 is used as a training set to train the network model, and the loss function adopts a ternary loss function. Then, a common age-spanning face recognition test data set CPLFW is used as a test set to test the model, and the test result shows that the recognition accuracy of the feature extraction network model of the embodiment on CPLFW is 94.71%, while in the existing advanced algorithm, the recognition accuracy of VGGFace2 is 84.00%, and the recognition accuracy of ArcFace is 88.36%, which are lower than those of the embodiment.
Example 2:
in the embodiment, only the internal structure of the deep tone feature extraction unit 3 is modified on the basis of the embodiment 1, and other parts of the network model are kept unchanged. Fig. 7 shows an internal architecture diagram of the deep tone feature extraction cell 3 in example 2, and the fifth attention module 35 was removed for comparative experiments compared to example 1. After the same training and testing, the result shows that the recognition accuracy of the network model in example 2 on CPLFW is 89.24%, which is lower than that in example 1, and it fully illustrates that the fifth attention module 35 in the present invention has an important promoting role in the network model.
The above embodiments only express specific embodiments of the present invention, and the description is specific and detailed, but not to be understood as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (10)

1. A face feature extraction method is characterized by comprising the following steps: the method comprises the following steps:
s100, obtaining a face image, and obtaining a trained feature extraction network model; the characteristic extraction network model is sequentially provided with a basic operation layer, a plurality of deep tone characteristic extraction monomers and a characteristic shaping unit, wherein the plurality of deep tone characteristic extraction monomers are sequentially connected in series;
s200, inputting the face image into the feature extraction network model, extracting basic feature information of the face image by using the basic operation layer, and then generating a basic feature map;
s300, inputting the basic feature map into a first deep tone feature extraction monomer, and outputting a primary feature map by the first deep tone feature extraction monomer after feature extraction operation;
s400, the next deep tone feature extraction monomer takes the feature graph output by the upstream deep tone feature extraction monomer as input, then carries out feature extraction operation, generates and outputs a corresponding intermediate feature graph;
s500, continuously repeating the step S400 until the last deep tone feature extraction monomer generates and outputs a final-stage feature map;
s600, inputting the final stage feature map into the feature shaping unit, and generating a face feature vector after shaping operation is carried out on the final stage feature map to finish face feature extraction;
the calculation operation process inside the deep tone feature extraction monomer is represented as the following mathematical model:
Figure 740048DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 391609DEST_PATH_IMAGE002
a precursor feature map representing input of said deep tone feature extraction singlets,
Figure 702504DEST_PATH_IMAGE003
Figure 894451DEST_PATH_IMAGE004
and
Figure 239982DEST_PATH_IMAGE005
respectively representing a first convolution operation, a second convolution operation and a third convolution operation,
Figure 796865DEST_PATH_IMAGE006
Figure 345789DEST_PATH_IMAGE007
Figure 341427DEST_PATH_IMAGE008
Figure 541465DEST_PATH_IMAGE009
and
Figure 269249DEST_PATH_IMAGE010
respectively representing a first attention module, a second attention module, a third attention module, a fourth attention module and a fifth attention module,
Figure 820316DEST_PATH_IMAGE011
it is shown that the elements correspond to a product operation,
Figure 933411DEST_PATH_IMAGE012
Figure 722376DEST_PATH_IMAGE013
and
Figure 621062DEST_PATH_IMAGE014
respectively representing a first activation function, a second activation function and a third activation function,
Figure 190583DEST_PATH_IMAGE015
Figure 934548DEST_PATH_IMAGE016
and
Figure 843599DEST_PATH_IMAGE017
respectively represent
Figure 257393DEST_PATH_IMAGE012
Figure 455157DEST_PATH_IMAGE013
And
Figure 127446DEST_PATH_IMAGE014
the first characteristic diagram, the second characteristic diagram and the third characteristic diagram generated after activation,
Figure 891003DEST_PATH_IMAGE018
a fourth feature map obtained by calibrating the third feature map by the third attention module is shown,
Figure 865912DEST_PATH_IMAGE019
a multi-scale fusion unit is represented,
Figure 659294DEST_PATH_IMAGE020
representing a side branch feature map generated by fusing the first feature map, the second feature map and the third feature map by the multi-scale fusion unit,
Figure 807378DEST_PATH_IMAGE021
a fifth profile obtained by adding the calibrated sidebranch profile to the fourth profile for the fourth attention module,
Figure 363125DEST_PATH_IMAGE022
a first process feature map output from the first attention module,
Figure 899148DEST_PATH_IMAGE023
a second process feature map output from the second attention module,
Figure 805924DEST_PATH_IMAGE024
a third process feature map output from the third attention module,
Figure 492121DEST_PATH_IMAGE025
and
Figure 43319DEST_PATH_IMAGE026
representing a fourth process signature and a fifth process signature Cheng Tezheng, respectively, said fourth process signature and said fifth process signature each being output from said fourth attention module,
Figure 625610DEST_PATH_IMAGE027
representing a dimension-varying unit for increasing the number of feature map channels and decreasing feature map width and height dimensions,
Figure 816420DEST_PATH_IMAGE028
and representing the back-driving feature map of the output of the deep tone feature extraction monomer.
2. The face feature extraction method of claim 1, wherein: the convolution kernel sizes of the first convolution operation, the second convolution operation and the third convolution operation are 3*3, and the step sizes are 1; the first activation function, the second activation function, and the third activation function are all ReLU functions.
3. The face feature extraction method of claim 1, wherein: the internal operation process of the multi-scale fusion unit is expressed as the following mathematical model:
Figure 634203DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 898962DEST_PATH_IMAGE015
Figure 448892DEST_PATH_IMAGE016
and
Figure 704162DEST_PATH_IMAGE017
respectively showing a first feature map, a second feature map and a third feature map, the multi-scale fusion unit taking the first feature map, the second feature map and the third feature map as input,
Figure 935423DEST_PATH_IMAGE020
representing a side-branch feature map as an output of the multi-scale fusion unit,
Figure 710481DEST_PATH_IMAGE011
it is shown that the elements correspond to a product operation,
Figure 165733DEST_PATH_IMAGE030
representing a first melting characteristic diagram generated by adding the first characteristic diagram, the second characteristic diagram and the third characteristic diagram,
Figure 534398DEST_PATH_IMAGE031
representing a second melting characteristic diagram generated by multiplying the first characteristic diagram, the second characteristic diagram and the third characteristic diagram by element correspondence,
Figure 179137DEST_PATH_IMAGE032
it is shown that the feature maps are stitched together,
Figure 746384DEST_PATH_IMAGE033
and
Figure 575800DEST_PATH_IMAGE034
respectively representing a fourth convolution operation and a fifth convolution operation, the convolution kernel sizes of the fourth convolution operation and the fifth convolution operation are 1*1, and the step sizes are 1,
Figure 290815DEST_PATH_IMAGE035
and
Figure 129458DEST_PATH_IMAGE036
respectively representing a fourth activation function and a fifth activation function, both the fourth activation function and the fifth activation function being ReLU functions,
Figure 285633DEST_PATH_IMAGE037
representing a third melting profile generated after activation of the fourth activation function.
4. The face feature extraction method of claim 1, wherein: the first attention module, the second attention module and the third attention module are respectively provided with a hierarchical pooling layer and a hierarchical activation function, the hierarchical pooling layer is arranged at the upstream end of the hierarchical activation function and is used for performing global maximum pooling operation on the feature map in the channel direction, and the hierarchical activation function is sigmoid;
the first, second, and third process profiles are matrices of hierarchical pooling outputs of the first, second, and third attention modules, respectively.
5. The face feature extraction method of claim 4, wherein: a branch pooling layer, a lead-in full-connection layer, a lead-in activation layer, a lead-out full-connection layer and a lead-out activation layer are sequentially arranged in the fourth attention module; the branch pooling layer is used for performing global maximum pooling operation on the feature map in the space direction, the lead-in activation layer is a nonlinear activation function ReLU, and the lead-out activation layer is a nonlinear activation function sigmoid;
the fourth process characteristic diagram is a vector output after the operation of the branch pooling layer, and the fifth process characteristic diagram is a vector output after the activation of the lead-out activation layer.
6. The face feature extraction method of claim 5, wherein: the mathematical model of the fifth attention module is:
Figure 925431DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 737529DEST_PATH_IMAGE022
Figure 176600DEST_PATH_IMAGE023
Figure 780757DEST_PATH_IMAGE024
Figure 951975DEST_PATH_IMAGE025
and
Figure 392315DEST_PATH_IMAGE026
the first process signature, the second process signature, the third process signature, the fourth process signature, and a fifth cross Cheng Tezheng, respectively, representing inputs to a fifth attention module,
Figure 369498DEST_PATH_IMAGE032
showing the splicing operation to the characteristic diagrams therein,
Figure 703528DEST_PATH_IMAGE039
representing a first internal reference feature map generated by splicing the first process feature map, the second process feature map and the third process feature map,
Figure 170281DEST_PATH_IMAGE040
and
Figure 19289DEST_PATH_IMAGE041
respectively representing a first bridging fully connected layer and a second bridging fully connected layer,
Figure 737846DEST_PATH_IMAGE042
Figure 306143DEST_PATH_IMAGE043
and
Figure 84743DEST_PATH_IMAGE044
respectively representing a first bridge activation function and a second bridge activation functionTwo bridging activation functions and an integrating activation function,
Figure 421046DEST_PATH_IMAGE045
and
Figure 67928DEST_PATH_IMAGE046
respectively representing a second internal reference feature diagram and a third internal reference feature diagram generated after the first bridging activation function and the second bridging activation function are activated,
Figure 110970DEST_PATH_IMAGE011
it is shown that the elements correspond to a product operation,
Figure 857210DEST_PATH_IMAGE047
representing an integrated pooling layer for global maximal pooling of feature maps in the channel direction,
Figure 493858DEST_PATH_IMAGE048
an integrated attention map of the fifth attention module output is represented.
7. The face feature extraction method of claim 1, characterized in that: the variable-dimension unit comprises a variable-dimension convolution layer, a variable-dimension activation layer and a variable-dimension pooling layer which are sequentially arranged, the convolution kernel size of the variable-dimension convolution layer is 3*3, the step length is 1, the variable-dimension activation layer is a ReLU function, the variable-dimension pooling layer is used for performing maximum pooling operation on the feature map, and the pooling window size of the variable-dimension pooling layer is 2*2, and the step length is 2.
8. The face feature extraction method of claim 1, wherein: the feature shaping unit comprises a shaping pooling layer, a trunk full-connection layer and a shaping activation layer which are sequentially connected, wherein the shaping pooling layer is used for performing global average pooling operation on the feature map in the space direction, and the shaping activation layer is a sigmoid function.
9. A facial feature extraction apparatus comprising a processor and a memory, the memory storing a computer program, characterized in that: the processor is configured to execute the method for extracting human face features according to any one of claims 1 to 8 by loading the computer program.
10. A storage medium on which a computer program is stored, the computer program, when executed by a processor, implementing the face feature extraction method according to any one of claims 1 to 8.
CN202211658800.3A 2022-12-23 2022-12-23 Face feature extraction method, device and storage medium Active CN115661911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211658800.3A CN115661911B (en) 2022-12-23 2022-12-23 Face feature extraction method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211658800.3A CN115661911B (en) 2022-12-23 2022-12-23 Face feature extraction method, device and storage medium

Publications (2)

Publication Number Publication Date
CN115661911A true CN115661911A (en) 2023-01-31
CN115661911B CN115661911B (en) 2023-03-17

Family

ID=85023076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211658800.3A Active CN115661911B (en) 2022-12-23 2022-12-23 Face feature extraction method, device and storage medium

Country Status (1)

Country Link
CN (1) CN115661911B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984949A (en) * 2023-03-21 2023-04-18 威海职业学院(威海市技术学院) Low-quality face image recognition method and device with attention mechanism
CN116311479A (en) * 2023-05-16 2023-06-23 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3690721A1 (en) * 2019-01-31 2020-08-05 StradVision, Inc. Method for recognizing face using multiple patch combination based on deep neural network
WO2021027555A1 (en) * 2019-08-15 2021-02-18 华为技术有限公司 Face retrieval method and apparatus
CN114120406A (en) * 2021-11-22 2022-03-01 四川轻化工大学 Face feature extraction and classification method based on convolutional neural network
CN114187261A (en) * 2021-12-07 2022-03-15 天津大学 Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism
CN114360030A (en) * 2022-01-17 2022-04-15 重庆锐云科技有限公司 Face recognition method based on convolutional neural network
CN114743014A (en) * 2022-03-28 2022-07-12 西安电子科技大学 Laser point cloud feature extraction method and device based on multi-head self-attention
CN114998958A (en) * 2022-05-11 2022-09-02 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN115100720A (en) * 2022-07-04 2022-09-23 威海职业学院(威海市技术学院) Low-resolution face recognition method
CN115223221A (en) * 2022-07-04 2022-10-21 网易(杭州)网络有限公司 Face detection method and device, electronic equipment and storage medium
CN115496651A (en) * 2021-06-02 2022-12-20 武汉Tcl集团工业研究院有限公司 Feature processing method and device, computer-readable storage medium and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3690721A1 (en) * 2019-01-31 2020-08-05 StradVision, Inc. Method for recognizing face using multiple patch combination based on deep neural network
WO2021027555A1 (en) * 2019-08-15 2021-02-18 华为技术有限公司 Face retrieval method and apparatus
CN115496651A (en) * 2021-06-02 2022-12-20 武汉Tcl集团工业研究院有限公司 Feature processing method and device, computer-readable storage medium and electronic equipment
CN114120406A (en) * 2021-11-22 2022-03-01 四川轻化工大学 Face feature extraction and classification method based on convolutional neural network
CN114187261A (en) * 2021-12-07 2022-03-15 天津大学 Non-reference stereo image quality evaluation method based on multi-dimensional attention mechanism
CN114360030A (en) * 2022-01-17 2022-04-15 重庆锐云科技有限公司 Face recognition method based on convolutional neural network
CN114743014A (en) * 2022-03-28 2022-07-12 西安电子科技大学 Laser point cloud feature extraction method and device based on multi-head self-attention
CN114998958A (en) * 2022-05-11 2022-09-02 华南理工大学 Face recognition method based on lightweight convolutional neural network
CN115100720A (en) * 2022-07-04 2022-09-23 威海职业学院(威海市技术学院) Low-resolution face recognition method
CN115223221A (en) * 2022-07-04 2022-10-21 网易(杭州)网络有限公司 Face detection method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙劲光;荣文钊;: "基于区域的年龄估计模型研究" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984949A (en) * 2023-03-21 2023-04-18 威海职业学院(威海市技术学院) Low-quality face image recognition method and device with attention mechanism
CN116311479A (en) * 2023-05-16 2023-06-23 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile
CN116311479B (en) * 2023-05-16 2023-07-21 四川轻化工大学 Face recognition method, system and storage medium for unlocking automobile

Also Published As

Publication number Publication date
CN115661911B (en) 2023-03-17

Similar Documents

Publication Publication Date Title
CN110245665B (en) Image semantic segmentation method based on attention mechanism
CN108764471B (en) Neural network cross-layer pruning method based on feature redundancy analysis
CN115661911A (en) Face feature extraction method, device and storage medium
CN112257794B (en) YOLO-based lightweight target detection method
CN110188768B (en) Real-time image semantic segmentation method and system
CN113537138B (en) Traffic sign identification method based on lightweight neural network
CN110096968B (en) Ultra-high-speed static gesture recognition method based on depth model optimization
CN109783910B (en) Structure optimization design method for accelerating by using generation countermeasure network
CN111626300A (en) Image semantic segmentation model and modeling method based on context perception
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN112580515B (en) Lightweight face key point detection method based on Gaussian heat map regression
CN110059593B (en) Facial expression recognition method based on feedback convolutional neural network
CN116645716B (en) Expression recognition method based on local features and global features
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN112270300A (en) Method for converting human face sketch image into RGB image based on generating type confrontation network
CN112991493A (en) Gray level image coloring method based on VAE-GAN and mixed density network
CN111079767A (en) Neural network model for segmenting image and image segmentation method thereof
CN114742985A (en) Hyperspectral feature extraction method and device and storage medium
CN110414516B (en) Single Chinese character recognition method based on deep learning
CN115240259A (en) Face detection method and face detection system based on YOLO deep network in classroom environment
CN116051534A (en) Warehouse ceiling solar panel defect detection method based on artificial intelligence
CN111414988A (en) Remote sensing image super-resolution method based on multi-scale feature self-adaptive fusion network
CN112837212B (en) Image arbitrary style migration method based on manifold alignment
CN115587628A (en) Deep convolutional neural network lightweight method
CN116152128A (en) High dynamic range multi-exposure image fusion model and method based on attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant