CN114783019A

CN114783019A - Face feature extraction method, device, equipment and storage medium

Info

Publication number: CN114783019A
Application number: CN202210316045.4A
Authority: CN
Inventors: 陈欣; 戴磊; 刘玉宇; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-07-22
Anticipated expiration: 2042-03-29

Abstract

The invention relates to the technical field of artificial intelligence, and discloses a method, a device, equipment and a storage medium for extracting human face features, which are used for improving the accuracy of human face feature extraction. The method comprises the following steps: acquiring a target image containing a human face; inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network; outputting the initial face features of the face in the target image through a feature network; outputting attention features in the target image through an attention network; wherein, the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature; and processing the initial face features based on the attention features to obtain final face features of the face in the target image. In addition, the invention also relates to a block chain technology, and the data extracted by the human face features can be stored in the block chain nodes.

Description

Face feature extraction method, device, equipment and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a method, a device, equipment and a storage medium for extracting human face features.

Background

Attention mechanism was first proposed in the field of NLP (neural-linear Programming neural language programmers), and transducer Transformer structure based on the Attention mechanism has been recently highlighted on each task of NLP. In a visual task, an Attention mechanism is also paid much Attention, and a famous method comprises a Non-Local Network (Non-Local Network), so that a global relation can be modeled in a space-time Volume, and a good effect is obtained. However, the Self-attention module in the visual task usually needs to perform matrix multiplication of a large matrix, and the video memory is large and time-consuming. The following problems are encountered in face recognition or face tracking: large-scale side face, light brightness, background (such as clothes, walls and the like) and shielding, even if the characteristic distances can be similar through characteristic extraction, the similar threshold value is randomly variable along with the change of different environments, so that the accuracy of the characteristics is unstable; if the characteristics are simply extracted from CNN (Convolutional Neural Networks), much noise is inevitably brought in, and the simple data expansion cannot cover any situation.

Disclosure of Invention

The invention provides a face feature extraction method, a face feature extraction device, face feature extraction equipment and a storage medium, which are used for improving the accuracy of face feature extraction.

In order to achieve the above object, a first aspect of the present invention provides a face feature extraction method, including: acquiring a target image containing a human face; inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network; outputting the initial face features of the face in the target image through a feature network; outputting attention features in the target image through an attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature; and processing the initial face features based on the attention features to obtain final face features of the face in the target image.

Optionally, in a first implementation manner of the first aspect of the present invention, the number of the feature dimensions of the attention feature is the same as the number of the feature dimensions of the initial face feature; the step of processing the initial face features based on the attention features to obtain final face features of the face in the target image comprises the following steps: and performing fusion processing on the feature data with the same feature dimension in the initial face features and the attention features to obtain the final face features of the face in the target image.

Optionally, in a second implementation manner of the first aspect of the present invention, the initial face features include initial factor values of multiple feature factors; the attention feature comprises the weight of each feature factor; the step of performing fusion processing on the initial face features and feature data with the same feature dimension in the attention features to obtain final face features of the face in the target image comprises the following steps: aiming at each feature factor in the initial human face features, extracting the weight of the feature factor from the attention features, and multiplying the initial factor value of the feature factor by the weight to obtain a final factor value of the feature factor; and determining the final factor value of each characteristic factor as the final face characteristic of the face in the target image.

Optionally, in a third implementation manner of the first aspect of the present invention, the initial face features include initial factor values of a plurality of feature factors; the attention feature includes an indication value of each feature factor; the step of performing fusion processing on the initial face features and feature data with the same feature dimension in the attention features to obtain final face features of the face in the target image comprises the following steps: extracting an indicating value of the characteristic factor from the attention characteristic aiming at each characteristic factor in the initial human face characteristic, and determining the initial factor value of the characteristic factor as a final factor value of the characteristic factor if the indicating value is a preset first value; if the indicated value is a preset second value, deleting the characteristic factor and the factor value of the characteristic factor; and determining the final factor value of the residual characteristic factors as the final face characteristic of the face in the target image.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the attention network is specifically trained in the following manner: acquiring a sample image group containing a human face; the sample image group comprises a plurality of images, the faces in the images are the same, and the shooting angles of the faces in the images are different; extracting a first image feature of a first image in the sample image group, inputting the first image to a preset teacher network, and outputting a first attention feature; obtaining a first output result based on the first image feature and the first attention feature; extracting a second image feature of a second image in the sample image group, inputting the second image to a preset student network, and outputting a second attention feature; obtaining a second output result based on the second image feature and the second attention feature; calculating a loss value based on the first output result and the second output result, training a student network based on the loss value until the loss value is converged, and determining the student network when the loss value is converged as an attention network.

Optionally, in a fifth implementation manner of the first aspect of the present invention, the step of obtaining the first output result based on the first image feature and the first attention feature includes: performing feature fusion processing on feature values at the same position in the first image feature and the first attention feature to obtain a first output result; the step of obtaining a second output result based on the second image feature and the second attention feature includes: and performing feature fusion processing on the feature values at the same positions in the second image feature and the second attention feature to obtain a second output result.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the step of calculating a loss value based on the first output result and the second output result, and training the student network based on the loss value until the loss value converges includes: calculating a loss value based on the first output result and the second output result; updating network parameters of the student network and the teacher network respectively based on the loss values; and continuing to execute the step of obtaining the sample image group containing the human face until the loss value is converged.

The second aspect of the present invention provides a face feature extraction device, which includes an acquisition module, configured to acquire a target image that includes a face; the input module is used for inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network; the first output module is used for outputting the initial human face characteristics of the human face in the target image through a characteristic network; the second output module is used for outputting the attention characteristics in the target image through the attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature; and the obtaining module is used for processing the initial face features based on the attention features to obtain the final face features of the face in the target image.

Optionally, in a first implementation manner of the second aspect of the present invention, the number of the feature dimensions of the attention feature is the same as the number of the dimensions of the feature dimensions of the initial face feature; the obtaining module is specifically configured to: and performing fusion processing on the feature data with the same feature dimension in the initial face features and the attention features to obtain the final face features of the face in the target image.

Optionally, in a second implementation manner of the second aspect of the present invention, the initial face features include initial factor values of a plurality of feature factors; the attention feature includes a weight of each feature factor; the obtaining module is specifically further configured to: aiming at each characteristic factor in the initial human face characteristics, extracting the weight of the characteristic factor from the attention characteristics, and multiplying the initial factor value of the characteristic factor by the weight to obtain the final factor value of the characteristic factor; and determining the final factor value of each characteristic factor as the final face characteristic of the face in the target image.

Optionally, in a third implementation manner of the second aspect of the present invention, the initial face features include initial factor values of a plurality of feature factors; the attention feature includes an indication value of each feature factor; the obtaining module is specifically further configured to: extracting an indicating value of the characteristic factor from the attention characteristic aiming at each characteristic factor in the initial human face characteristic, and determining the initial factor value of the characteristic factor as a final factor value of the characteristic factor if the indicating value is a preset first value; if the indicated value is a preset second value, deleting the characteristic factor and the factor value of the characteristic factor; and determining the final factor value of the residual characteristic factors as the final face characteristic of the face in the target image.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the apparatus further includes a training module, configured to: acquiring a sample image group containing a human face; the sample image group comprises a plurality of images, the faces in the images are the same, and the shooting angles of the faces in the images are different; extracting a first image feature of a first image in the sample image group, inputting the first image into a preset teacher network, and outputting a first attention feature; obtaining a first output result based on the first image feature and the first attention feature; extracting a second image feature of a second image in the sample image group, inputting the second image to a preset student network, and outputting a second attention feature; obtaining a second output result based on the second image feature and the second attention feature; calculating a loss value based on the first output result and the second output result, training a student network based on the loss value until the loss value is converged, and determining the student network when the loss value is converged as an attention network.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the training module is further specifically configured to: performing feature fusion processing on feature values at the same position in the first image feature and the first attention feature to obtain a first output result; and performing feature fusion processing on feature values at the same position in the second image feature and the second attention feature to obtain a second output result.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the training module is further specifically configured to: calculating a loss value based on the first output result and the second output result; updating network parameters of the student network and the teacher network respectively based on the loss values; and continuing to execute the step of acquiring the sample image group containing the human face until the loss value is converged.

A third aspect of the present invention provides a face feature extraction device, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the facial feature extraction device to perform the facial feature extraction method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-mentioned face feature extraction method.

In the technical scheme provided by the invention, a target image containing a human face is obtained; inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network; outputting the initial face features of the face in the target image through a feature network; outputting attention features in the target image through an attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature; and processing the initial face features based on the attention features to obtain final face features of the face in the target image. In the method, the human face feature extraction model not only extracts the initial human face features, but also outputs attention features aiming at the human face features, and the attention features are used for indicating the importance degree of each feature factor in the human face features, so that the important features in the human face features are highlighted, unimportant features are weakened, the accuracy of human face feature extraction is improved, the human face feature extraction model can be suitable for human face feature extraction and human face recognition in various scenes, and the accuracy of human face recognition is higher.

Drawings

Fig. 1 is a schematic diagram of an embodiment of a face feature extraction method in an embodiment of the present invention;

fig. 2 is a schematic diagram of another embodiment of a face feature extraction method in an embodiment of the present invention;

FIG. 3 is a schematic diagram of an embodiment of a face feature extraction apparatus according to the present invention;

fig. 4 is a schematic diagram of an embodiment of a face feature extraction device in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a face feature extraction method, a face feature extraction device, face feature extraction equipment and a storage medium, which are used for improving the accuracy of face feature extraction.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Moreover, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a specific flow of the embodiment of the present invention is described below, with reference to fig. 1, an embodiment of a face feature extraction method in the embodiment of the present invention includes:

100. acquiring a target image containing a human face;

the application of the face recognition technology in the fields of life, work, learning and the like is increasing.

The face feature extraction is one of the most basic problems in the face recognition technology, the face feature extraction not only extracts features which are most beneficial to pattern classification from original pattern information, but also greatly reduces the dimension of a pattern sample, the face feature extraction is the early-stage work of pattern recognition, and has great influence on the later-stage face recognition result.

In the process of extracting the face features, a target image containing a face needs to be acquired first. Due to the wide application field of the face recognition technology, the target image can be an image which is acquired under different scenes and comprises a plurality of faces, and can also be an image of a certain target person under different states.

Specifically, the target image includes at least one face, and in practical implementation, if the target face needs to be determined, in one mode, the target face may be determined according to the proportion of the faces in the image, for example, when a plurality of faces appear in the image, the target face with a larger proportion in the image is determined. Alternatively, the target face may be determined according to the position of the face in the image, for example, when a plurality of faces appear in the image, the face at or near the center of the image is determined as the target face, or the target face may be selected manually or intelligently.

101. Inputting a target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network;

in the embodiment of the present invention, the feature network may be specifically configured to extract an initial face feature of a face in the target image, the initial face feature may include a plurality of feature factors and a value of each feature factor, and the attention network may be specifically configured to indicate an importance degree of the feature factors in the initial face feature. Based on the method, some unimportant features in the initial face features can be screened and excluded, and further the extraction of the important features of the face is improved.

102. Outputting the initial face features of the face in the target image through a feature network;

in actual implementation, after the feature network in the face recognition model extracts the initial face features of the face in the target image, the initial face features are output, and the initial face features can be represented by feature factors. The initial facial features include features of the face itself, such as: specific parts (eyes, nose, mouth, etc.) and overall feature information, and in addition, the above-described initial facial features also include facial expression features at the shooting angle in the target image and in the emotional state of the person.

103. Outputting attention features in the target image through an attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on the teacher network training; the attention characteristic is used for indicating the importance degree of a characteristic factor in the initial human face characteristic;

in practical implementation, after extracting the attention feature of the face in the target image by the attention network in the face recognition model, the attention feature is output, and the attention feature is used for indicating the importance degree of each feature in the initial face feature. In an optional mode, the face feature extraction method can be applied to an intelligent teaching auxiliary system and used for helping teachers to better judge the learning states, knowledge point absorption conditions and the like of students. The attention network is a student network of a preset teacher network, and specifically, the attention network is obtained based on teacher network training.

After the target image is input into the human face feature extraction model which is trained in advance, the initial human face feature of the human face in the target image and the attention feature of the target image can be output through the processing of the human face feature extraction model. Optionally, in the facial feature extraction model, an initial facial feature of a face in the target image is calculated and output through the feature network, and an attention feature in the target image is calculated and output through the attention network, where the initial facial feature may reflect basic feature information of the face, such as geometric features between nose, eyes, and the like, and facial expression features. The basic feature information is represented by a feature factor, and the attention feature indicates the importance degree of the feature factor in the initial human face feature.

In actual implementation, the acquiring scene of the target image has diversity, when the target image is acquired, the state of a person also has diversity, so that the initial human face features are different, when the aim of characteristic distance similarity is realized by extracting the human face features, the attention features are introduced to stabilize the accuracy of the human face features, the stability of a threshold value is maintained, the robustness is better, the human face features with higher precision can be acquired, and the use scene is wider.

When the embodiment of the invention is applied to the intelligent teaching auxiliary system, the teacher can be better helped to judge the learning state of students, the knowledge point absorption condition and other related information, so that the teacher can know the learning condition of each student in real time in the teaching process, and can timely take corresponding measures to help the students to better learn.

104. And processing the initial face features based on the attention features to obtain final face features of the face in the target image.

Specifically, based on the importance degree of the feature factor in the initial face feature indicated in the attention feature, the initial face feature can be processed more specifically, and then the final face feature of the face in the target image is obtained. The final face features can realize face recognition of target images acquired under various scenes.

The step adds and stores the attention feature on the basis of the initial human face feature, can compress partial feature storage, and simultaneously improves the accuracy of the human face feature.

In the embodiment of the invention, a target image containing a human face is obtained; inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network; outputting the initial face features of the face in the target image through a feature network; outputting attention features in the target image through an attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature; and processing the initial face features based on the attention features to obtain final face features of the face in the target image. In the method, the human face feature extraction model not only extracts the initial human face features, but also outputs attention features aiming at the human face features, and the attention features are used for indicating the importance degree of each feature factor in the human face features, so that the important features in the human face features are highlighted, unimportant features are weakened, the accuracy of human face feature extraction is improved, the human face feature extraction model can be suitable for human face feature extraction and human face recognition in various scenes, and the accuracy of human face recognition is higher.

In an optional manner, feature dimensions of the attention features are the same as the dimension number of the feature dimensions of the initial face features, and feature data with the same feature dimensions in the initial face features and the attention features are subjected to fusion processing to obtain final face features of the face in the target image, which can be seen in fig. 2.

In this embodiment, the steps 200-203 are the same as the steps 100-103, and are not described herein again.

204. And performing fusion processing on the feature data with the same feature dimension in the initial face features and the attention features to obtain final face features of the face in the target image.

In practical implementation, the human face features have multiple dimensions, each dimension can be represented by a feature vector, and the attention features and the initial human face features also have multiple dimensions, and the number of the dimensions is the same. The feature data with the same feature dimension are fused, so that the feature information in the target image can be enhanced, and the final face feature of the face in the target image is obtained, wherein the final face feature is a feature for removing noise, namely, a factor for preventing a sense organ from understanding the received information source information is removed. Based on the method, the human face feature extraction model can better analyze the target image, and can more pointedly extract important feature positions of the human face in the target image under different states by introducing the attention features, so that the purposes of compressing the important features and removing noise interference are achieved.

Optionally, the initial face features include initial factor values of a plurality of feature factors; the attention feature comprises the weight of each feature factor; aiming at each feature factor in the initial human face features, extracting the weight of the feature factor from the attention features, and multiplying the initial factor value of the feature factor by the weight to obtain a final factor value of the feature factor; and determining the final factor value of each characteristic factor as the final face characteristic of the face in the target image.

It will be appreciated that each face has its own unique features, and that the same face will exhibit different features in different light and at different angles. The face features include key point information of the face, for example: the geometric relationship among the facial features such as eyes, nose, mouth and the like, and the key point information can be used for the aspects of face similarity comparison, face identification and the like. Because the acquisition scenes of the target images have diversity, when the target images are acquired, the states of people also have diversity, in order to express the human face features more accurately, each dimension feature in the target images can be expressed by the initial factor value of the feature factor, when the factors influencing the human face features change, for example: the initial factor values of the characteristic factors are correspondingly changed to accurately reflect the characteristics of the human face, so that the characteristic factor difference of different human faces is as large as possible, and the characteristic factor difference of the same human face is as small as possible.

In actual implementation, for each feature factor in the initial face features, the weight of the feature factor is extracted from the attention features, and the final factor value of the feature factor is obtained by multiplying the initial factor value of the feature factor by the weight. Further, the final factor value of each feature factor is determined as the final face feature of the face in the target image. Based on the above, the attention features identify and mark the key features in the target images through a new layer of weights, so that the network learns the regions which need more attention in each target image, the feature difference of different faces is as large as possible, the feature difference of the same face is as small as possible, and in addition, the emotion change of the face can be captured more sensitively.

Optionally, the initial face features include initial factor values of a plurality of feature factors; the attention feature includes an indication value of each feature factor; extracting an indicating value of the characteristic factor from the attention characteristic aiming at each characteristic factor in the initial human face characteristic, and determining the initial factor value of the characteristic factor as a final factor value of the characteristic factor if the indicating value is a preset first value; if the indicated value is a preset second value, deleting the characteristic factor and the factor value of the characteristic factor; and determining the final factor value of the residual characteristic factors as the final face characteristic of the face in the target image.

Furthermore, the attention feature further includes an indication value of each feature factor, where the indication value is used to indicate whether an initial factor value of the feature factor has a feature value, a first value and a second value are preset, and for each feature factor in the initial face feature, the indication value of the feature factor is extracted from the attention feature, and if the indication value corresponding to a certain feature factor is the preset first value, it indicates that the feature factor can accurately express the feature and has a feature value, and then the initial factor value of the feature factor is determined as a final factor value of the feature factor; if the indicated value corresponding to a certain characteristic factor is a preset second value, the characteristic factor is not needed, namely, the characteristic factor does not have a characteristic value, and the characteristic factor and the factor value of the characteristic factor are deleted; and determining the final factor value of the residual characteristic factors as the final face characteristic of the face in the target image. Based on the method, the unnecessary features can be screened out, and then partial feature storage is compressed, so that the pressure of the system is reduced, and the picture response is faster and smoother.

By adopting the steps, important characteristic positions of the face in different states can be learned and extracted aiming at training of the same face in different scenes, so that the problem of face mode variability caused by factors such as face complexion, expression and the like is solved.

Optionally, the attention network is specifically trained in the following manner: acquiring a sample image group containing a human face; the sample image group comprises a plurality of images, the faces in the images are the same, and the shooting angles of the faces in the images are different; extracting a first image feature of a first image in the sample image group, inputting the first image into a preset teacher network, and outputting a first attention feature; obtaining a first output result based on the first image feature and the first attention feature; extracting a second image feature of a second image in the sample image group, inputting the second image into a preset student network, and outputting a second attention feature; obtaining a second output result based on the second image feature and the second attention feature; calculating a loss value based on the first output result and the second output result, training a student network based on the loss value until the loss value is converged, and determining the student network when the loss value is converged as an attention network.

Specifically, the attention network is trained in the following manner:

firstly, a sample image group containing a human face is obtained, wherein the sample image group comprises a plurality of images, optionally, the human faces in the plurality of images are the same, but the shooting angles are different, and it can be understood that the extracted feature data of the same human face are different under different shooting angles.

Secondly, extracting a first image feature of a first image in the sample image group, inputting the first image into a preset teacher network, and outputting a predicted value first attention feature, wherein the first image feature is a face feature under a shooting angle in the first image, the first attention feature is used for indicating the importance degree of a feature factor in the first image feature, the first image feature and the first attention feature are fused, and specifically, data with the same dimension in the first image feature and the first attention feature are combined, so that the important feature in the first image feature is more obvious, the unimportant feature is removed, and a first output result can be obtained;

then, extracting a second image feature of a second image in the sample image group, inputting the second image into a preset student network, and outputting a predicted value of a second attention feature, wherein the second image feature is a face feature under a shooting angle in the second image, and the second attention feature is used for indicating the importance degree of a feature factor in the second image feature, and the second image feature and the second attention feature are fused, specifically, data with the same dimension in the second image feature and the second attention feature are combined, so that the important feature in the second image feature is more obvious, the unimportant feature is removed, and a second output result can be obtained;

finally, a loss value is calculated based on the first output result and the second output result, and the loss value can be implemented in various ways, such as: and repeating the steps until the Loss value is converged, wherein the first output result and the second output result are close to the expected threshold, and determining the student network when the Loss value is converged as the attention network.

By adopting the step, the stability of the similar threshold value can be maintained, the problem that the similar threshold value is randomly changed due to different shooting angles, illumination intensity, background (clothes, walls and the like), shielding and other factors in the face recognition process is well solved, and in addition, partial feature storage can be compressed.

Optionally, feature fusion processing is performed on feature values at the same position in the first image feature and the first attention feature, so as to obtain a first output result; and performing feature fusion processing on the feature values at the same positions in the second image feature and the second attention feature to obtain a second output result.

In practical implementation, the computer can learn the initial face feature in the first image through the first image feature, then the first image feature and the feature value of the first attention feature at the same position in the first image are combined by combining the first attention feature of the first image output by the preset teacher network to obtain a first output result, the computer can learn the initial face feature in the second image through the second image feature, and then the second image feature and the feature value of the second attention feature at the same position in the second image are combined by combining the second attention feature of the second image output by the preset teacher network to obtain a second output result. Based on the method, the computer can learn the important features in the initial human face features, and the extraction of the important features of the human face is improved.

Further, calculating a loss value based on the first output result and the second output result; updating network parameters of the student network and the teacher network respectively based on the loss values; and continuing to execute the step of acquiring the sample image group containing the human face until the loss value is converged.

Specifically, the loss value is calculated based on the first output result and the second output result, that is, the preset predicted values of the teacher network and the student network are received, then the difference between the predicted values and the real values is calculated, and the network parameters of the student network and the teacher network are respectively updated according to the loss values, so that the difference between the predicted values and the real values is infinitesimal, and the predicted values are close to the real values. And continuing to execute the step of acquiring the sample image group containing the human face until the loss value is converged aiming at the situations of the images of the same human face under a plurality of shooting angles, the images of different human faces under a plurality of shooting angles and the like. Based on this, an attention network in the above-described face feature extraction model may be obtained, and the attention network may output an attention feature indicating the degree of importance of the feature factor in the initial face feature.

The above description of the face feature extraction method in the embodiment of the present invention, and the following description of the face feature extraction device in the embodiment of the present invention, please refer to fig. 3, an embodiment of the face feature extraction device in the embodiment of the present invention includes:

an obtaining module 300, configured to obtain a target image including a human face;

an input module 301, configured to input a target image into a human face feature extraction model that is trained in advance; the face feature extraction model comprises a feature network and an attention network;

a first output module 302, configured to output an initial face feature of a face in a target image through a feature network;

a second output module 303, configured to output the attention feature in the target image through an attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature;

an obtaining module 304, configured to process the initial face features based on the attention features to obtain final face features of the face in the target image.

In the embodiment of the invention, a target image containing a human face is obtained; inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network; outputting the initial face features of the face in the target image through a feature network; outputting attention features in the target image through an attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on teacher network training; the attention feature is used for indicating the importance degree of the feature factors in the initial human face feature; and processing the initial face features based on the attention features to obtain final face features of the face in the target image. In the method, the human face feature extraction model not only extracts the initial human face features, but also outputs attention features aiming at the human face features, and the attention features are used for indicating the importance degree of each feature factor in the human face features, so that the important features in the human face features are highlighted, unimportant features are weakened, the accuracy of human face feature extraction is improved, the method can be suitable for human face feature extraction and human face recognition in various scenes, and the accuracy of human face recognition is higher.

The feature dimensions of the attention features are the same as the dimension quantity of the feature dimensions of the initial face features; optionally, the obtaining module 304 may be further specifically configured to:

and performing fusion processing on the feature data with the same feature dimension in the initial face features and the attention features to obtain the final face features of the face in the target image.

The initial face features comprise initial factor values of a plurality of feature factors; the attention feature includes a weight of each feature factor; optionally, the obtaining module 304 may further specifically be configured to:

aiming at each characteristic factor in the initial human face characteristics, extracting the weight of the characteristic factor from the attention characteristics, and multiplying the initial factor value of the characteristic factor by the weight to obtain the final factor value of the characteristic factor; and determining the final factor value of each characteristic factor as the final face characteristic of the face in the target image.

The initial face features comprise initial factor values of a plurality of feature factors; the attention feature includes an indication value of each feature factor; optionally, the obtaining module 304 may further specifically be configured to:

extracting an indicating value of the characteristic factor from the attention characteristic aiming at each characteristic factor in the initial human face characteristic, and determining the initial factor value of the characteristic factor as a final factor value of the characteristic factor if the indicating value is a preset first value; if the indicated value is a preset second value, deleting the characteristic factor and the factor value of the characteristic factor; and determining the final factor value of the residual characteristic factors as the final face characteristic of the face in the target image.

Optionally, the apparatus further includes a training module, configured to: acquiring a sample image group containing a human face; the sample image group comprises a plurality of images, the faces in the images are the same, and the shooting angles of the faces in the images are different; extracting a first image feature of a first image in the sample image group, inputting the first image into a preset teacher network, and outputting a first attention feature; obtaining a first output result based on the first image feature and the first attention feature; extracting a second image feature of a second image in the sample image group, inputting the second image to a preset student network, and outputting a second attention feature; obtaining a second output result based on the second image feature and the second attention feature; calculating a loss value based on the first output result and the second output result, training a student network based on the loss value until the loss value is converged, and determining the student network when the loss value is converged as an attention network.

Optionally, the training module may be further specifically configured to:

performing feature fusion processing on feature values at the same position in the first image feature and the first attention feature to obtain a first output result; the step of obtaining a second output result based on the second image feature and the second attention feature includes: and performing feature fusion processing on the feature values at the same positions in the second image feature and the second attention feature to obtain a second output result.

Optionally, the training module may further specifically be configured to:

calculating a loss value based on the first output result and the second output result; updating network parameters of the student network and the teacher network respectively based on the loss values; and continuing to execute the step of acquiring the sample image group containing the human face until the loss value is converged.

The above figures describe the face feature extraction apparatus in the embodiment of the present invention in detail, and the following describes the face feature extraction apparatus in the embodiment of the present invention in detail from the perspective of hardware processing.

Fig. 4 is a schematic structural diagram of a facial feature extraction apparatus 400 according to an embodiment of the present invention, where the facial feature extraction apparatus 400 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 410 (e.g., one or more processors), a memory 420, and one or more storage media 430 (e.g., one or more mass storage devices) for storing applications 433 or data 432. Memory 420 and storage medium 430 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations in the facial feature extraction apparatus 400. Still further, the processor 410 may be configured to communicate with the storage medium 430 to execute a series of instruction operations in the storage medium 430 on the facial feature extraction device 400.

The face feature extraction device 400 may also include one or more power supplies 440, one or more wired or wireless network interfaces 450, one or more input-output interfaces 460, and/or one or more operating systems 431, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the configuration of the face feature extraction device shown in fig. 4 does not constitute a limitation of the face feature extraction device, and may include more or less components than those shown, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the face feature extraction method.

The invention also provides a face feature extraction device, which comprises a memory and a processor, wherein the memory stores instructions, and when the instructions are executed by the processor, the processor executes the steps of the face feature extraction method in each embodiment.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A face feature extraction method is characterized by comprising the following steps:

acquiring a target image containing a human face;

inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network;

outputting the initial face features of the face in the target image through the feature network;

outputting, by the attention network, an attention feature in the target image; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on the teacher network training; the attention feature is used for indicating the importance degree of a feature factor in the initial human face feature;

and processing the initial face features based on the attention features to obtain final face features of the face in the target image.

2. The method according to claim 1, wherein the number of dimensions of the feature dimension of the attention feature is the same as the number of dimensions of the feature dimension of the initial face feature; the step of processing the initial face features based on the attention features to obtain final face features of the face in the target image includes:

and performing fusion processing on the feature data with the same feature dimension in the initial face feature and the attention feature to obtain the final face feature of the face in the target image.

3. The method of claim 2, wherein the initial face features comprise initial factor values of a plurality of feature factors; the attention feature comprises a weight of each feature factor; the step of performing fusion processing on the feature data with the same feature dimension in the initial face feature and the attention feature to obtain a final face feature of the face in the target image includes:

extracting the weight of the feature factor from the attention feature aiming at each feature factor in the initial human face feature, and multiplying the initial factor value of the feature factor by the weight to obtain the final factor value of the feature factor;

and determining the final factor value of each feature factor as the final face feature of the face in the target image.

4. The method of claim 2, wherein the initial face features comprise initial factor values of a plurality of feature factors; the attention characteristic comprises an indication value of each characteristic factor; the step of performing fusion processing on feature data with the same feature dimension in the initial face feature and the attention feature to obtain a final face feature of the face in the target image includes:

extracting an indicating value of the feature factor from the attention feature aiming at each feature factor in the initial human face feature, and determining an initial factor value of the feature factor as a final factor value of the feature factor if the indicating value is a preset first value; if the indicated value is a preset second value, deleting the characteristic factor and the factor value of the characteristic factor;

and determining the final factor value of the residual characteristic factors as the final face characteristic of the face in the target image.

5. The method of claim 1, wherein the attention network is trained by:

acquiring a sample image group containing a human face; the sample image group comprises a plurality of images, the faces in the images are the same, and the shooting angles of the faces in the images are different;

extracting a first image feature of a first image in the sample image group, inputting the first image into a preset teacher network, and outputting a first attention feature; obtaining a first output result based on the first image feature and the first attention feature;

extracting a second image feature of a second image in the sample image group, inputting the second image into a preset student network, and outputting a second attention feature; obtaining a second output result based on the second image feature and the second attention feature;

calculating a loss value based on the first output result and the second output result, training the student network based on the loss value until the loss value converges, and determining the student network when the loss value converges as the attention network.

6. The method of claim 5, wherein the step of obtaining a first output result based on the first image feature and the first attention feature comprises: performing feature fusion processing on feature values at the same position in the first image feature and the first attention feature to obtain a first output result;

the step of deriving a second output result based on the second image feature and the second attention feature comprises: and performing feature fusion processing on feature values at the same position in the second image feature and the second attention feature to obtain a second output result.

7. The method of claim 5, wherein the step of calculating a loss value based on the first output result and the second output result, training the student network based on the loss value until the loss value converges comprises:

calculating a loss value based on the first output result and the second output result;

updating network parameters of the student network and the teacher network respectively based on the loss values;

and continuing to execute the step of acquiring the sample image group containing the human face until the loss value is converged.

8. A face feature extraction device, characterized in that the face feature extraction device includes:

the acquisition module is used for acquiring a target image containing a human face;

the input module is used for inputting the target image into a human face feature extraction model which is trained in advance; the face feature extraction model comprises a feature network and an attention network;

the first output module is used for outputting the initial face features of the face in the target image through the feature network;

a second output module, configured to output attention features in the target image through the attention network; wherein the attention network is a student network of a preset teacher network; the attention network is obtained based on the teacher network training; the attention characteristic is used for indicating the importance degree of a characteristic factor in the initial human face characteristic;

and the obtaining module is used for processing the initial face features based on the attention features to obtain final face features of the face in the target image.

9. A face feature extraction device characterized by comprising: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invokes the instructions in the memory to cause the facial feature extraction device to perform the facial feature extraction method of any one of claims 1-7.

10. A computer-readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the method of extracting human face features as claimed in any one of claims 1 to 7.