CN113378675A

CN113378675A - Face recognition method for simultaneous detection and feature extraction

Info

Publication number: CN113378675A
Application number: CN202110603538.1A
Authority: CN
Inventors: 茅耀斌; 沈庆强; 项文波; 陈婷; 吴敏杰; 张伟
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-10

Abstract

The invention discloses a face recognition method for simultaneous detection and feature extraction, which comprises the steps of constructing a face detection data set and a face feature extraction data set to form a multitask face detection recognition data set; constructing a backbone network, a face detection branch and a face feature extraction branch, and training a face detection recognition model based on a deep neural network, wherein the backbone network is used for extracting deep features in an image and providing regression classification information for subsequent detection and the feature extraction branch, the face detection branch is used for estimating a heat map, target center offset and the size of a boundary box, and the face feature extraction branch is used for extracting features of each face to generate a feature vector; and inputting the identification image to be detected into the trained face detection identification model, completing face detection and feature extraction, and further determining the identity information of the personnel. The invention improves the speed of face detection and recognition and reduces the dependence of the feature extraction stage on the effect of the face detection stage.

Description

Face recognition method for simultaneous detection and feature extraction

Technical Field

The invention relates to the field of image processing and deep learning, in particular to a face recognition method and a face recognition system for simultaneous detection and feature extraction.

Background

With the rapid development of internet and information technology, the security requirements in many fields in life and production are gradually increased, and the identity authentication needs to be accurately and rapidly performed. Compared with other identity verification modes, the face recognition has naturalness, intuition and non-contact property, and is more in line with human cognitive rules. Just because human face features have many advantages, the human face recognition technology has been widely applied to many fields of production and life, such as access control systems, face-brushing payment and the like.

Most of the face recognition processes at the present stage are based on a paradigm of firstly detecting and then extracting features. For example, document [1] performs image division feature construction in a face space, performs recognition and detection, switches and selects local features, performs statistics on the local features, and performs targeted calculation and comparison on data. Document [2] performs face detection on a current frame image by using a trained face detection CNN, sets a face frame having the same size and position as a face frame of a previous frame image on the current frame image, and enlarges the face frame by a certain multiple to obtain a face region.

[1] Yuanpejiang, Song Bo, Shizhen, Li Jianmin a model switching algorithm [ P ] based on face recognition in Beijing city: CN 111860454A, 2020-10-30.

[2] Zhou army, Wangyang, Surveillance video image face detection and tracking method, device, medium and equipment [ P ]. Beijing City: c N112825116A,2021-05-21

Disclosure of Invention

The invention aims to provide a face recognition method and a face recognition system for simultaneous detection and feature extraction.

The technical solution for realizing the purpose of the invention is as follows: a face recognition method for simultaneous detection and feature extraction comprises the following steps:

step 1, data modeling and preparation

Constructing a face detection data set and a face feature extraction data set to form a multitask face detection identification data set;

step 2, deep neural network model training

Constructing a backbone network, a face detection branch and a face feature extraction branch, and training a face detection recognition model based on a deep neural network, wherein the backbone network is used for extracting deep features in an image and providing regression classification information for subsequent detection and the feature extraction branch, the face detection branch is used for estimating a heat map, target center offset and the size of a boundary box, and the face feature extraction branch is used for extracting features of each face to generate a feature vector;

step 3, model reasoning application

And inputting the identification image to be detected into the trained face detection identification model, completing face detection and feature extraction, and determining the identity information of the personnel.

Further, in step 1, data modeling and preparation are specifically performed by:

step 1.1, constructing a face detection data set

Marking a face area in the image in a rectangular frame form, and recording the position of the central point and the width and height of the rectangular frame;

step 1.2, constructing a human face feature extraction data set

Identity identification positions are arranged in the label, the same identity adopts the same identification, and the identifications of different identities are different;

step 1.3, constructing a multitask face detection recognition data set

And integrating the constructed face detection data set and the face feature extraction identity data set to obtain the label content comprising an identity identification position, coordinates of the center point of the rectangular frame and the width and height of the rectangular frame, and setting the name of the label file to be consistent with the name of the original image.

Further, in step 2, deep neural network model training is specifically performed by:

(1) backbone network

The backbone network can adopt ResNet + transposition convolution, DLA (deep LayerAttregeration), Hourglass, MobilenetV2 or High Resolution network (High Resolution);

(2) face detection branch

The detection branch is designed by adopting an anchor-free frame, and is represented as three parallel heads behind a main network, namely a heat map head, a central point offset head and a boundary frame head, wherein the heat map head comprises an input layer, a dynamic convolution layer, a first full-connection layer, a second full-connection layer and an output layer;

(a) thermal map head

Detecting branch heatmap heads estimate the location of object centers based on the heatmap representation, detecting bounding boxes for each object in the image

The center point can be calculated

Wherein:

the position of the characteristic diagram can be obtained by dividing the stride

Feature map response M at image (x, y)_xyCan be expressed as:

wherein N represents the number of face frames in the image, σ_cRepresenting standard deviation, heat map head loss function L_heatUsing the Focal-Loss function, as shown in equation (2):

in the formula

Representing a heat value graph estimated by the model, wherein alpha and beta are represented as Focal-local preset hyper-parameters;

(b) center point offset head

The center point offset head aims at more accurately positioning the face position, and the alignment accuracy of the feature extraction branch and the center of the object is crucial to the performance. Let the displacement of the output center point be O ∈ R^W×H×2For each bounding box

With an offset of its center point of

Center point offset head loss function L_centerBy means of₁Norm, as shown in equation (3):

in the formula

Representing the estimated central point offset of the model;

(c) boundary frame head

The boundary box head is used for estimating the height and width of the human face boundary box at each anchor point position, the height and width are not directly related to the feature extraction branch, but the positioning precision influences the evaluation of human face detection performance, and the size of the output boundary box is represented as the size of S belonging to R^W×H×2To, forAt each bounding box

It has a size of

Bounding box head loss function L_boxBy means of₁Norm is shown as formula (4);

in the formula

Representing the size of a bounding box estimated by the model;

(3) face feature extraction branch

The human face feature extraction branch learns the feature extraction task through the classification task, all objects with the same identification in the training set are considered to be the same class, and each boundary frame in the image

Obtain its center on heatmap

A distinguishable feature vector can be extracted at this position and expressed as

Loss function L of face feature extraction branch_idComprises the following steps:

combining a heat map head loss function, a center offset head loss function, a bounding box head loss function and a face feature extraction branch loss function in a face detection branch, balancing tasks of the detection branch and the feature extraction branch according to uncertainty loss, wherein the detection branch loss function and the fair network overall loss function are respectively expressed as (6) and (7):

L_det＝L_heat+L_center+L_box (6)

wherein ω is₁And ω₂Expressed as parameters balancing the detection and feature extraction tasks.

Further, in step 2, the deep neural network model training adopts a cutting mixing strategy, a multi-scale image strategy, a random left-right turning strategy and a random rotation strategy.

Further, step 3, model reasoning application, the specific method is as follows:

step 3.1, inputting the recognition image to be detected into a trained face recognition model for face detection and feature extraction;

step 3.2, post-processing the face detection result in the step 3.1, setting a confidence threshold of the face bounding box, screening invalid candidate boxes, and performing non-maximum suppression to filter overlapped bounding boxes;

step 3.3, post-processing the face feature extraction result in the step 3.1, comparing the face feature extraction result with feature data stored in a database (such as Euclidean distance and cosine distance), and obtaining identity information corresponding to the features;

and 3.4, corresponding the face detection result in the step 3.2 with the face recognition result in the step 3.3 to obtain the personnel identity information.

A face recognition system capable of simultaneously detecting and extracting features is based on the face recognition method and is used for face recognition of simultaneous detection and feature extraction.

A computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the computer program, the face recognition of detection and feature extraction is simultaneously carried out based on the face recognition method.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs face recognition for both detection and feature extraction based on the face recognition method.

Compared with the prior art, the invention has the following remarkable advantages:

(1) the invention abandons the two-stage paradigm of 'detection before feature extraction' of the structure of the existing face detection and identification method, but integrates the face detection task and the feature extraction task into a deep network structure, namely, the two tasks of face detection and face feature extraction can be completed through one-time reasoning, thereby improving the identification speed and the identification accuracy and reducing the network complexity;

(2) the method abandons the method that the prior face feature extraction stage needs fixed size, multi-scale training is added in the network training process, and the full-convolution model can accept input of any size, thereby reducing the influence of image size scaling in the face feature extraction stage;

(3) the invention adopts the deep learning network to detect the face and simultaneously extract the face characteristics, thereby avoiding the need of face alignment in the characteristic extraction stage in the prior method, having certain universality and improving the fault tolerance.

(4) Aiming at the problem that a data set with double tasks of face detection and feature extraction is quite few at the present stage, the face recognition training set CASIA-Webface is used for modification to manufacture the data set with the double tasks of face detection and feature extraction.

Drawings

Fig. 1 is a flow chart of a face intelligent recognition method for simultaneously performing detection and feature extraction.

FIG. 2 is a diagram of a neural network architecture.

Fig. 3 is a schematic diagram of a residual backbone network.

Fig. 4 is a schematic diagram of a labeling format of a multitask face detection recognition data set.

FIG. 5 is a schematic diagram of a model one-time detection implementation; wherein (a) represents the test result on LFW public data set, and (b) represents the test result on CASIA-faceV5 public data set.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

With reference to fig. 1, a face recognition method with simultaneous detection and feature extraction utilizes a single deep neural network to detect and position a plurality of faces in a single image and output feature values through one-time forward reasoning, thereby performing face recognition.

Step 1, data modeling and preparation

The method comprises the following steps of constructing a face detection data set and a face feature extraction data set to form a multitask face detection identification data set, and specifically comprises the following steps:

step 1.1, collecting a face image: the collected images include but are not limited to facial images of various visual angles and expressions;

step 1.2, labeling a face detection data set: marking a face area in the image in a rectangular frame form, and recording the position of the central point and the width and height of the rectangular frame;

step 1.3, labeling a face recognition data set: identity identification positions are arranged in the label, the same identity adopts the same identification, and the identifications of different identities are different;

step 1.4, labeling a face detection recognition data set; the labeled data sets in the steps 1.2 and 1.3 are integrated, the labeling format is shown in fig. 4, one image corresponds to one text file, and the labeling information of each face in the image is recorded in the text file line by line;

step 1.5, according to a preset proportion n₁:n₂:n₃Respectively carrying out random sampling on the two data sets to construct a training set, a verification set and a test set; wherein n is₁+n₂+n₃＝1；

Exemplarily, n₁:n₂:n₃＝8:1:1。

Step 2, deep neural network model training

Constructing a main network for extracting features, constructing a face detection branch, constructing a face feature extraction branch, and training a face detection recognition model, which specifically comprises the following steps:

step 2.1, constructing a backbone network, specifically: and constructing a feature extraction backbone network by using a residual error module in ResNet as a basic module for feature extraction and combining a dynamic convolution module and a transposed convolution module.

Specifically, the face detection and recognition backbone network described with reference to fig. 3 by taking the residual backbone network as an example includes: a residual network module with an output feature map size of 1/4, a residual network module with an output feature map size of 1/8, a residual network module with an output feature map size of 1/16, a residual network module with an output feature map size of 1/32, a dynamic convolution + transposed convolution module with an output feature map size of 1/16, a dynamic convolution + transposed convolution module with an output feature map size of 1/8, and a dynamic convolution + transposed convolution module with an output feature map size of 1/4.

And 2.2, constructing a detection branch, and regarding the face detection as a boundary box regression task based on the center on the high-resolution image. Three parallel regression heads (heads) are accessed behind the backbone network to estimate the heatmap, face center offset and bounding box size respectively. Each regression header (head) is achieved by applying a 3 × 3 convolution to the output profile of the backbone network, and then generating the final target by a 1 × 1 convolution layer.

(1) Thermal map head

Here, the position of the center of the object is estimated using a representation based on a heat map having a size of 1 × H × W. The response decays exponentially with the distance between the location and the center of the object in the heat map. Detecting bounding boxes for each object in an image

The center point can be calculated

Wherein

The feature map response at image (x, y) may be expressed as:

wherein N represents the number of face frames in the image, σ_cRepresents the standard deviation. The Loss function adopts a Focal-Loss function, and is shown as formula (2).

In the formula

Expressed as a model predicted heat value graph, and alpha and beta are expressed as the preset hyper-parameters of Focal-local.

(2) Center point offset head

The center point offset head is responsible for more accurately positioning the object. And supposing that the displacement of the center point of the output face is represented as O epsilon R^W ^×H×2For each bounding box

Its offset can be calculated as

Loss function adopts₁Norm is shown in formula (3).

(3) Boundary frame head

The bounding box head is responsible for estimating the height and width of the face bounding box at each anchor point position, and has no direct relation with the feature extraction branch, but the positioning precision influences the evaluation of the face detection performance. Assume that the size of the bounding box of the output is denoted as S ∈ R^W×H×2For each bounding box

Its size can be calculated as

Loss function adopts₁Norm is shown in formula (4).

And 2.3, constructing a feature extraction branch, wherein the feature extraction branch aims at generating features capable of distinguishing different human faces. Ideally, the distance between different faces should be greater than the distance between the same face. To achieve this goal, the present invention applies a convolutional layer with multiple kernels over the backbone features to extract the identity embedded features for each location.

The feature extraction branch employs a classification task to learn a feature extraction task. All objects in the training set having the same identity are considered to be of the same class, for each bounding box in the image

Center retrievable on heatmap

Extracting a distinguishable feature vector at this location

The penalty function for the feature extraction branch is:

step 2.4, training of face recognition model

Step 2.4.1, preprocessing the training data

Adopting a cutting mixing strategy:

the original image is cut randomly or two images are added and mixed. The cutting technology is to cut the image randomly, fill 0 in the cut area and supplement, and the data label is still unchanged. The mixing technology is that random two pictures in a training set are mixed according to a proportion, and data labels are labeled according to the mixing proportion. The cropping mixing technique is to crop the image but not fill 0 elements, and fill in the pixel values of the region of other related images of the training set. And taking the processed image as a network input image.

Strategy for training with multi-scale images: and the image is amplified and reduced to obtain a multi-scale image, so that the multi-scale invariance of the network model is ensured.

Finally, random left-right flipping and random rotation are used: increasing the diversity of the sample;

step 2.4.2, deep neural network model training is carried out by using the processed data set

The training process mainly comprises the steps of setting a loss function, jointly detecting a branch heat map head loss function, detecting a branch target boundary box, a center offset loss function and a feature extraction branch loss function, and balancing tasks of detecting branches and feature extraction branches according to uncertainty loss. The detection branch loss function and the fairness network overall loss function are represented as (6) and (7), respectively.

L_det＝L_heat+L_center+L_box (6)

Classifying the data set into a training set, a verification set and a test set, setting a weighted value of each task by taking an equation (6) as an objective function, selecting an optimization method such as Adam, SGD and the like, setting a training round, a network initial learning rate and an attenuation rate, finishing training when an error calculated by training reaches an expected value, and obtaining parameters of a convolutional neural network model.

Module 3, model inference application

Inputting a recognition image to be detected into a trained face detection recognition model, completing face detection and feature extraction, and further determining personnel identity information, wherein the method specifically comprises the following steps:

and 3.2, performing post-processing on the face detection result in the step 3.1, setting a confidence threshold of the face bounding box, screening invalid candidate boxes, performing non-maximum suppression to filter overlapped bounding boxes, and taking LFW and CASIA-faceV5 as examples, wherein the detection result is shown in FIG. 5.

Step 3.3, post-processing the face feature extraction result in the step 3.1, and comparing the face feature extraction result with feature data stored in a database to obtain identity information corresponding to the features;

The invention also provides a face intelligent recognition system for simultaneously carrying out detection and feature extraction.

A computer device comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, when the processor executes the computer program, the intelligent face recognition of detection and feature extraction is carried out simultaneously based on the face recognition method.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs face recognition based on the intelligent face recognition method, both detection and feature extraction.

In summary, unlike the previous paradigm of "detecting and then extracting features", the present invention directly uses a single deep neural network to detect and extract features of multiple faces in an image through one forward inference. The working principle is as follows: constructing a deep neural network by adopting a mode that a trunk network is subsequently connected with a detection branch and a characteristic identification branch; extracting the features of the image by utilizing the strong feature extraction capability of the backbone network; detecting the face in the image by using the subsequent detection branch of the backbone network; synchronously extracting features of the detected human face by utilizing subsequent feature extraction branches of the backbone network; and comparing the features by taking the result stored in the database as a reference to calculate the face attribute. The invention has the following characteristics: 1. the face detection and the feature extraction are integrated into a deep neural network, so that the speed of face detection and recognition is improved; 2. the feature extraction stage does not depend on the result output of the face detection stage any more, so that the dependence of the feature extraction stage on the effect of the face detection stage can be reduced; 3. in the feature extraction stage, the face image size does not need to be fixed, and then the face feature extraction is carried out, so that the influence on the accuracy of the face feature extraction caused by image scaling is avoided; 4. the feature extraction stage does not need face alignment any more, and the steps of face recognition are reduced.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A face recognition method for simultaneous detection and feature extraction is characterized by comprising the following steps:

step 1, data modeling and preparation

step 2, deep neural network model training

step 3, model reasoning application

And inputting the identification image to be detected into the trained face detection identification model, completing face detection and feature extraction, and further determining the identity information of the personnel.

2. The face recognition method for simultaneous detection and feature extraction according to claim 1, wherein in step 1, data modeling and preparation are specifically performed by:

step 1.1, constructing a face detection data set

step 1.2, constructing a human face feature extraction data set

step 1.3, constructing a multitask face detection recognition data set

3. The method for face recognition with simultaneous detection and feature extraction according to claim 1, wherein in step 2, deep neural network model training is specifically performed by:

(1) backbone network

The main network adopts ResNet + transposition convolution, DLA, Hourglass, MobilenetV2 or high-resolution network;

(2) face detection branch

(a) thermal map head

Calculating center point

Wherein:

dividing stride to obtain the position of the stride on the feature map

The feature at image (x, y) is ringingShould M_xyExpressed as:

in the formula

(b) center point offset head

The center point offset head aims at more accurately positioning the position of a human face, and the output center point displacement is set as O e R^W×H×2For each bounding box

With an offset of its center point of

in the formula

Representing the estimated central point offset of the model;

(c) boundary frame head

The boundary box head is used for estimating the height and width of the human face boundary box at each anchor point position, the height and width are not directly related to the feature extraction branch, but the positioning precision influences the evaluation of human face detection performance, and the size of the output boundary box is represented as the size of S belonging to R^W ^×H×2For each bounding box

It has a size of

in the formula

Representing the size of a bounding box estimated by the model;

(3) face feature extraction branch

Obtain its center on heatmap

Extracting a distinguishable feature vector at this position as

L_det＝L_heaf+L_center+L_box (6)

4. The method for face recognition with simultaneous detection and feature extraction as claimed in claim 1, wherein in step 2, the deep neural network model training employs a clipping hybrid strategy, a multi-scale image strategy, a random left-right flip and a random rotation strategy.

5. The face recognition method for simultaneous detection and feature extraction as claimed in claim 1, wherein in step 3, the model inference is applied, and the specific method is as follows:

step 3.2, post-processing the face detection result in the step 3.1, setting a confidence threshold of the face bounding box, screening invalid candidate boxes, performing non-maximum suppression, and filtering out coincident bounding boxes

6. A face recognition system for simultaneous detection and feature extraction is characterized in that the face intelligent recognition for simultaneous detection and feature extraction is carried out based on the face intelligent recognition method of any one of claims 1-5.

7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the computer program to perform face recognition based on the intelligent face recognition method according to any one of claims 1 to 5, and simultaneously perform detection and feature extraction.

8. A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, performs face recognition for both detection and feature extraction based on the face recognition method of any one of claims 1-5.