CN113435334B

CN113435334B - Small target face recognition method based on deep learning

Info

Publication number: CN113435334B
Application number: CN202110718863.2A
Authority: CN
Inventors: 宋尧哲; 童官军; 李宝清; 袁晓兵; 吴萌萌; 舒子婷
Original assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Current assignee: Shanghai Institute of Microsystem and Information Technology of CAS
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2024-02-27
Anticipated expiration: 2041-06-28
Also published as: CN113435334A

Abstract

The invention relates to a small target face recognition method based on deep learning, which comprises the following steps: constructing a high-to-low generation countermeasure network, and inputting the first pixel face image into the trained generation countermeasure network to obtain a second pixel face image close to a real scene; constructing a teacher-student network, training the teacher-student network by using the first pixel face image and the second pixel face image, and inputting the second pixel face image to be recognized into the trained teacher-student network to obtain a recognition result. The invention can improve the recognition capability of the small target face image.

Description

Small target face recognition method based on deep learning

Technical Field

The invention relates to the technical field of face recognition, in particular to a small-target face recognition method based on deep learning.

Background

Due to the development of deep learning, the face recognition field has been rapidly developed in recent years. At present, the optimal face recognition algorithm with front and high pixels has reached an accuracy of more than 99%. But most of these algorithms are only applicable in situations where identification of credentials is limited. One hotspot in face recognition research is small target face image recognition in a real environment. Mainly by several cases: caused by the relative movement of the defocused lens, the objective lens and the like; caused by a larger camera-to-face distance and a low spatial resolution camera sensor; due to low-scale compression settings, interlacing or other conditions; picture noise increases (e.g., illumination decreases upon acquisition).

In real environments, such as surveillance videos, algorithms that are currently excellent in front-side, high-pixel face images often suffer from significant performance loss. The reason is mainly that in the training process, the data of model learning is mostly face images with high pixels, but the problem of domain transfer can be generated by directly applying a model to the face images in the monitoring video.

At present, aiming at the face recognition of a small target, two main methods are provided: super-resolution (see fig. 1) and common feature subspace (see fig. 2). The super-resolution method is used for intuitively converting the small target face image into a high-pixel face image and then carrying out face recognition. The public feature subspace method is used for judging whether the high-pixel face image and the small-target face image are the same identity face image or not according to the distance by projecting the high-pixel face image and the small-target face image into the same space.

The super resolution method is more prone to enhancing visual characteristics in the super resolution process, and can introduce artificial characteristics to damage the identity recognition performance. The current common feature subspace method trains on high and low pixel face images at the same time, and can improve the recognition performance of the low pixel face images, but simultaneously damages the recognition performance of the high pixel face images.

On the other hand, most of the current small-target face recognition algorithms adopt a direct downsampling mode when making high-low pixel image pairs, so that a model is trained on the downsampled small-target face images, but the small-target face images are tested in a real environment, and the problem of domain transfer still exists. And makes downsampling the face image a difficult sample during training.

Disclosure of Invention

The invention aims to solve the technical problem of providing a small target face recognition method based on deep learning, which can improve the recognition capability of small target face images.

The technical scheme adopted for solving the technical problems is as follows: the small target face recognition method based on deep learning comprises the following steps:

(1) Constructing a high-to-low generation countermeasure network, and inputting a first pixel face image into the trained generation countermeasure network to obtain a second pixel face image close to a real scene, wherein the pixel resolution of the first pixel face image is larger than that of the second pixel face image;

(2) Constructing a teacher-student network, and training the teacher-student network through the first pixel face image and the second pixel face image to obtain a trained teacher-student network, wherein the teacher-student network comprises a teacher network part and a student network part, and the teacher network part trains the student network part in a knowledge distillation mode, so that the student network part obtains the accuracy rate close to the teacher network part;

(3) And inputting the second pixel face image to be recognized into the trained teacher-student network to obtain a recognition result.

The generating countermeasure network in the step (1) comprises a generator and a discriminator, wherein the generator is used for generating the second pixel face image according to the first pixel face image and a random vector; and the discriminator judges whether the image input into the discriminator is a small target face image collected in a natural monitoring environment or not through deep learning.

The penalty function generated in step (1) against the network consists of pixel level penalty and arbiter penalty.

The teacher network part and the student network part in the step (2) adopt the same model.

The teacher network part in the step (2) adopts the first pixel face image to pretrain and fix parameters; the student network part inputs a mixed face image, wherein the mixed face image is the corresponding first pixel face image and second pixel face image; the first pixel face image of the teacher network part and the mixed face image of the student network part are input into a trainable classification layer together for training, and the student network part obtains the accuracy rate close to the teacher network part by minimizing the output distance between the teacher network part and the student network part and sharing the parameters of the trainable classification layer.

The loss function of the teacher-student network is vector q ^pre Sum vector p ^pre Distance from each other and vector q ^pre Sum vector p ^t The sum of the distances between, wherein the vector q ^pre For the vector p obtained when the first pixel face image passes through the teacher network part and fixes parameters ^pre Sum vector p ^t The first pixel face image passing through the teacher network part and the mixed face image passing through the student network part are input into a trainable classification layer together to obtain a vector after training.

The teacher-student network in step (2) trains simultaneously with the cascade of the generation countermeasure network.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: according to the invention, a high-to-low generation countermeasure network is established, and the high-pixel face image is converted into the small target face image with low pixels through learning, so that the direct downsampling method commonly used at present is replaced, the problems of difficult sample and domain transfer in the training process are eliminated, the subsequent model is learned in the small target face image in the real environment and tested in the same domain, and the model recognition rate is enhanced. On the other hand, through a teacher-student network, the network learns the pixel public feature subspace, and the problem that the recognition rate of the current small target face image algorithm on a high-pixel face image is reduced by minimizing the cosine distance output by the teacher and student networks and sharing the Softmax layer parameter between the two networks is solved. The invention has the advantages of high recognition rate, strong environment adaptability, reliable performance and the like.

Drawings

Fig. 1 is a flowchart of a small target face recognition method based on a conventional super resolution method;

FIG. 2 is a flow chart of a small target face recognition method based on a traditional common feature subspace method;

FIG. 3 is a flow chart of an embodiment of the present invention;

figure 4 is a flow chart of a GAN network constructed in an embodiment of the invention;

fig. 5 is a flow chart of a teacher-student network constructed in an embodiment of the present invention.

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

The embodiment of the invention relates to a small target face recognition method based on deep learning, which is shown in fig. 3 and comprises the following steps: constructing a high-to-low generation countermeasure network (GAN network for short), and inputting a first pixel face image into the trained generation countermeasure network to obtain a second pixel face image close to a real scene, wherein the pixel resolution of the first pixel face image is larger than that of the second pixel face image; constructing a teacher-student network, and training the teacher-student network through the first pixel face image and the second pixel face image to obtain a trained teacher-student network, wherein the teacher-student network comprises a teacher network part and a student network part, and the teacher network part trains the student network part in a knowledge distillation mode, so that the student network part obtains the accuracy rate close to the teacher network part; and inputting the second pixel face image to be recognized into the trained teacher-student network to obtain a recognition result.

In the embodiment, a CasiaWebFace training set and a TinyFace training set are selected as training sets, and a TinyFace testing set is selected as a testing set. The CasiaWebFace training set was cut 64 x 64.

And establishing a GAN network with high to low pixels, and training the network to obtain a small target face image close to a real scene, wherein the network flow is shown in figure 4. The GAN network of the present embodiment includes a generator and a arbiter, where the generator is configured to generate the second pixel face image according to the first pixel face image and a random vector; and the discriminator judges whether the image input into the discriminator is a small target face image collected in a natural monitoring environment or not through deep learning.

Specifically, the training set after clipping and a random vector are input into the generator. The random vector is a gaussian random vector of 1 x 64, passes through a full connected layer, and changes size to 64 x 64, and then is stitched to the input picture. The generator outputs a corresponding small target face image with 16 pixels by 16 pixels, and then the small target face image and the Tinyface training set are input into the discriminator together, so that the discriminator learns and discriminates the small target face image in the real scene.

The loss function of the GAN network consists of pixel level loss and arbiter loss, with α set to 1 and β set to 0.05 in this embodiment.

l _GAN ＝αl _pixel +βl _g (1)

The loss of the discriminator is as follows:

wherein P is _r Refers to a small target face image in a natural environment, namely a Tinyface training set. P (P) _g Is a picture generated by a generator, D (x) is the characteristic obtained by a discriminator of a small target face image in a natural environment,is the feature of the generator picture obtained by the discriminator, E [. Cndot.]Indicating the desire.

The pixel level loss is:

wherein W, H is the length, width of the picture produced by the generator. F is let high pixel face image I ^hr Conversion to a function of the same size as the generator generates the picture, i.e. 64 x 64 g ₀ To let a high pixel face image I ^hr And the probability distribution of the small target face image is the same as that of the small target face image in the natural environment. In this embodiment, the average pooling is used.

Step three: a teacher-student network is established, and the network flow is shown in fig. 5. The system comprises a teacher network part and a student network part, wherein the teacher network part adopts a first pixel face image to perform pre-training and fix parameters; the student network part inputs a mixed face image, wherein the mixed face image is a first pixel face image and a second pixel face image which correspond to each other; the first pixel face image through the teacher network part and the mixed face image through the student network part are input into a trainable classification layer together for training, and the student network part obtains the accuracy rate close to the teacher network part by minimizing the output distance between the teacher network part and the student network part and sharing the parameters of the trainable classification layer.

In this embodiment, the teacher network portion and the student network portion are both ResNet34. The weights of the teacher network part and the fixed Softmax layer weights are pre-trained by the CasiaWebFace dataset training set, and the fixed weights are unchanged during training and only serve as a feature extractor.

During training, 64 x 64 high-pixel face images are input to a teacher network part, and corresponding 16 x 16 and 64 x 64 high-low-pixel mixed face images are input to a student network part, wherein the 16 x 16 low-pixel face images are generated by a high-low-pixel GAN network.

The high-pixel face image passes through the teacher network part and then fixes the Softmax layer to obtain a vector q ^pre Then, the high-pixel face image through the teacher network part and the high-low-pixel mixed face image through the student network part are input into a trainable Softmax layer together to respectively obtain vectors p ^pre And p ^t 。

The teacher-schoolThe loss function of the raw network is q ^pre And p ^pre Distance between and q ^pre And p ^t The distance between them. The present embodiment employs a cosine distance metric function:

the teacher-student network loss function is: l (L) _ts ＝sim(q ^pre ,p ^pre )+sim(q ^pre ,p ^l )。

During training, the teacher student network and the GAN network in the step two are cascaded, and training is performed simultaneously.

After training the GAN network and the teacher-student network, only a small target face image is input into the student network part, and an original pixel image to be compared is input into the teacher network part, so that a final recognition result can be obtained.

It is easy to find that the invention replaces the direct downsampling method commonly used at present by establishing a high-to-low generation countermeasure network and converting the high-pixel face image into the small-pixel target face image through learning, eliminates the difficult sample and domain transfer problems encountered in the training process, and enables the subsequent model to learn in the small-target face image in the real environment and test in the same domain, thereby enhancing the model recognition rate. On the other hand, through a teacher-student network, the network learns the pixel public feature subspace, and the problem that the recognition rate of the current small target face image algorithm on a high-pixel face image is reduced by minimizing the cosine distance output by the teacher and student networks and sharing the Softmax layer parameter between the two networks is solved. The invention has the advantages of high recognition rate, strong environment adaptability, reliable performance and the like.

Claims

1. The small target face recognition method based on deep learning is characterized by comprising the following steps of:

(2) Constructing a teacher-student network, and training the teacher-student network through the first pixel face image and the second pixel face image to obtain a trained teacher-student network, wherein the teacher-student network comprises a teacher network part and a student network part, and the teacher network part trains the student network part in a knowledge distillation mode, so that the student network part obtains the accuracy rate close to the teacher network part; the teacher network part adopts the first pixel face image to pretrain and fix parameters; the student network part inputs a mixed face image, wherein the mixed face image is the corresponding first pixel face image and second pixel face image; inputting a first pixel face image through the teacher network part and a mixed face image through the student network part together into a trainable classification layer for training, and enabling the student network part to obtain accuracy rate close to that of the teacher network part by minimizing the output distance between the teacher network part and the student network part and sharing parameters of the trainable classification layer; the loss function of the teacher-student network is vector q ^pre Sum vector p ^pre Distance from each other and vector q ^pre Sum vector p ^t The sum of the distances between, wherein the vector q ^pre For the vector p obtained when the first pixel face image passes through the teacher network part and fixes parameters ^pre Sum vector p ^t The first pixel face image passing through the teacher network part and the mixed face image passing through the student network part are input into a trainable classification layer together to obtain a vector after training;

2. The deep learning based small target face recognition method of claim 1 wherein the generating an countermeasure network in step (1) includes a generator and a arbiter, the generator for generating the second pixel face image from the first pixel face image and a random vector; and the discriminator judges whether the image input into the discriminator is a small target face image collected in a natural monitoring environment or not through deep learning.

3. The deep learning based small target face recognition method of claim 2 wherein the penalty function of generating the countermeasure network in step (1) consists of pixel level penalty and discriminant penalty.

4. The deep learning-based small target face recognition method of claim 1, wherein the teacher network part and the student network part in the step (2) use the same model.

5. The deep learning based small target face recognition method of claim 1, wherein the teacher-student network in step (2) trains simultaneously with the generating of the challenge network cascade.