CN113239885A

CN113239885A - Face detection and recognition method and system

Info

Publication number: CN113239885A
Application number: CN202110626119.XA
Authority: CN
Inventors: 徐小丹; 刘小扬; 何学智
Original assignee: Newland Digital Technology Co ltd
Current assignee: Newland Digital Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-08-10

Abstract

The invention discloses a face detection and identification method, which comprises the following steps: s1: preprocessing the image marked with the face frame to generate a training sample; s2: constructing a face detection and recognition network, wherein the face detection and recognition network adopts a deep learning network and fuses network high-level features and low-level features; s3: inputting training samples into the constructed face detection and recognition network for training until the training loss value is smaller than a preset threshold value, and obtaining a deep learning network capable of outputting face detection and face recognition results; according to the invention, a face detection and recognition network is designed, the face detection is regarded as the face central point problem, the face central point detection and the face feature vector extraction are combined for learning, a face frame is obtained, a face feature vector corresponding to the face frame can be obtained, then the face feature vector comparison is carried out to obtain a face recognition result, and therefore the network outputs the face detection and face recognition results.

Description

Face detection and recognition method and system

Technical Field

The invention relates to the field of artificial intelligence, in particular to a face detection and recognition method and system.

Background

With the development and progress of science and technology, face recognition has very wide application in crime fighting, fraud prevention, public safety guarantee, wide improvement of customer experience of various industries and the like. Such as identifying criminal suspects, finding lost children, intelligent stores, face payments, etc. Face recognition is a process of recognizing or verifying the identity of a person using facial information, and generally comprises three steps, step 1: face detection, which is an indispensable step in that it can detect and locate faces in images or videos; step 2: aligning the detected human faces, and converting one human face into a string of vectors by utilizing a human face feature extraction technology; and step 3: and calculating face similarity of the obtained feature vectors to judge whether the two faces belong to the same person.

The existing face recognition method has the defect of consuming time because the three steps are executed in a serial connection mode, the two functions are realized by a face detection network and a face recognition network which are designed in a distributed mode, in the face recognition process, the time for extracting the feature vectors is in direct proportion to the number of detected face frames, and the more the number of faces is, the more the time for extracting the feature vectors is, and therefore, the more the face recognition method in the mode is.

Disclosure of Invention

In order to solve the defects in the prior art, the invention designs a face recognition method which can simultaneously carry out two tasks of face detection and face recognition, thereby improving the efficiency of face detection and recognition and saving computer resources.

The technical scheme of the invention is as follows:

a face detection and recognition method is characterized by comprising the following steps: the method comprises the following steps:

s1: preprocessing the image marked with the face frame to generate a training sample;

s2: constructing a face detection and recognition network, wherein the face detection and recognition network adopts a deep learning network and fuses network high-level features and low-level features;

s3: inputting training samples into the constructed face detection and recognition network for training until the training loss value is smaller than a preset threshold value, and obtaining a deep learning network capable of outputting face detection and face recognition results;

the deep learning network for face detection comprises the following steps:

s31: generating a face central point thermodynamic diagram, a face central point offset diagram and a face width and height diagram;

s32: executing a non-maximum value suppression algorithm on the face central point thermodynamic diagram, extracting peak value points, respectively calculating thermodynamic response values, selecting points with the thermodynamic response values larger than a threshold value as candidate face central points, extracting face central point offset values at corresponding positions of the face central point offset map, adding to obtain face central point positions, and finally extracting face width and height values at corresponding positions of the face width and height map to generate a face frame;

the deep learning network for face recognition comprises the following steps:

s33: when step S31 is executed, image feature vectors of the entire image are extracted at the same time;

s34: selecting a feature vector corresponding to the position of the face frame from image feature vectors as a face feature vector, and matching the face feature vector with each face feature vector stored in a database to obtain a face recognition result;

the training loss value is formed by superposing face central point thermodynamic diagram loss, face central point offset diagram loss, face width and height diagram loss and face recognition loss.

Preferably, let the ith individual face frame on the image be represented by two points at the top left and bottom right of the frame

The face center point of the face frame

Is shown as

Order to

Is shown as

At the center point of the faceAnd if the corresponding position on the thermodynamic diagram corresponds to the generated human face central point thermodynamic diagram, the response value of the corresponding generated human face central point thermodynamic diagram is represented as:

where N represents the number of face frames on the image, σ_cExpressing the standard deviation of the Gaussian function;

the loss of the face center point thermodynamic diagram is represented as:

wherein, alpha and beta are modulation coefficients;

and representing the heat value of the center point of the face obtained by network prediction.

Preferably, let the ith individual face frame on the image be represented by its two upper left and lower right points as:

let its width and height be expressed as:

the face width height loss is expressed as:

wherein,

representing the width and height positions of the human face obtained by network prediction;

let the face center point of the ith face frame on the image be represented as

Order to

The corresponding position on the face central point thermodynamic diagram is represented as

Let the offset of the center point of the face be expressed as

Then the face center point offset loss is expressed as:

wherein,

and n is the multiple of the down-sampling of the deep neural network.

Preferably, the target central point of the ith face frame on the face central point thermodynamic diagram on the image is set as

Extracting the corresponding characteristic vector on the image characteristic vector diagram

Maps it to a class distribution vector pⁱ(k) L for corresponding label class labelⁱ(k) And if so, the face recognition loss is expressed as:

wherein N is the number of face frames, and K is the number of categories; p is a radical ofⁱ(k) Is the probability that the ith face box belongs to the kth id, Lⁱ(k) And labeling the ith face frame.

Preferably, the face detection and recognition network uses resnet34 or Googlenet as a backbone network.

A face detection and recognition system comprising:

the image preprocessing module is used for preprocessing the image marked with the face frame to generate a training sample;

the human face feature extraction module is used for generating a human face central point thermodynamic diagram, a human face central point offset diagram and a human face width and height diagram and extracting an image feature vector of the whole image;

the training loss calculation module is used for calculating the thermodynamic diagram loss of the face center point, the offset diagram loss of the face center point, the width and height diagram loss of the face and the face recognition loss, performing superposition calculation, finishing training when the training loss value is smaller than a preset threshold value, and obtaining a deep learning network capable of outputting the face detection and face recognition results;

the human face detection module is used for executing a non-maximum value suppression algorithm on a human face central point thermodynamic diagram, calculating thermal response values of the human face central point thermodynamic diagram after extracting peak values, selecting points with the thermal response values larger than a threshold value as candidate human face central points, extracting human face central point offset values at corresponding positions of the human face central point offset quantity diagram, adding the human face central point offset values to obtain human face central point positions, and finally extracting human face width and height values at corresponding positions of the human face width and height diagram to generate a human face frame;

and the face recognition module is used for selecting the feature vector corresponding to the face frame position from the image feature vectors as a face feature vector, and matching the face feature vector with each face feature vector stored in the database to obtain a face recognition result.

By adopting the technical scheme, compared with the prior art, the invention has the following beneficial effects:

according to the invention, a face detection and recognition network is designed, the face detection is regarded as the face central point problem, the face central point detection and the face feature vector extraction are combined for learning, a face frame is obtained, a face feature vector corresponding to the face frame can be obtained, then the face feature vector comparison is carried out to obtain a face recognition result, and therefore the network outputs the face detection and face recognition results. The face detection and the face recognition share one network, so that the inference time is reduced, the forward time is irrelevant to the number of faces in the picture to be detected, the face recognition efficiency is improved, and moreover, the multi-task learning can supervise the learning mutually, and the network performance is favorably improved; on the other hand, the problem that the face detection is regarded as the face central point is solved, and the technical difficulty that ambiguity is easily caused when a plurality of faces are in charge of identity information of the same face in the prior art is overcome.

Drawings

FIG. 1 is a flow chart of a face detection and recognition method of the present invention;

FIG. 2 is a flowchart of the overall operation of the face detection and recognition method of the present invention;

fig. 3 is a diagram of a face detection and recognition network according to the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Referring to fig. 1, the face detection and recognition method of the present invention includes the following steps:

referring to fig. 3, in the embodiment, a resnet34 network is used as a backbone network in the face detection and recognition network, and the network fuses high-level and low-level features of the network through multiple hopping connections, so that the features are more robust;

referring to fig. 1 and 2, the performing of the face detection by the deep learning network includes the following steps:

in the embodiment, the step length is set to be 4, C × H × W images are input, C represents the number of channels, H and W respectively represent the height and width of the images, and after the images pass through a resnet34 network, a characteristic diagram with the shape of C × H/4 × W/4 is finally obtained;

referring to fig. 2 and fig. 3, the network includes a part for implementing the face detection function, the first branch is used for predicting the face central point thermodynamic diagram and is composed of a convolution of 256 × 3 × 3 and a convolution of 1 × 1 × 1, and finally, a thermodynamic diagram of 1 × H/4 × W/4 is obtained

The second branch is used for predicting the offset of the center point of the face and consists of a convolution of 256 multiplied by 3 and a convolution of 2 multiplied by 1, and finally the offset prediction result of 2 multiplied by H/4 multiplied by W/4 is obtained

The 3 rd branch is used for predicting the width and height of the face and consists of a convolution of 256 multiplied by 3 and a convolution of 2 multiplied by 1, and finally, a 2 multiplied by H/4 multiplied by W/4 face width and height prediction result is obtained

referring to fig. 2 and 3, the performing of the face recognition by the deep learning network includes the following steps:

in the embodiment, the network consists of a convolution of 256 × 3 × 3, a convolution of 128 × 1 × 1 (used for obtaining the face feature vector), and a convolution layer of K × 1 × 1, where K represents the face ID number, i.e., the number of classified categories, and the corresponding face feature vector is used as the identification of the face ID.

Referring to fig. 2, in an embodiment of the present invention, a face detection and recognition method includes the following steps:

making a feature vector database:

taking the face image of each id in the database as network input, and extracting corresponding characteristic vectors

As an identifier of each id, a database face feature vector set E ═ E is obtained^j|j＝1，…，K}。

Face detection and feature extraction:

using 3 × 960 × 720 images as input, a face center thermodynamic diagram of 1 × 0240 × 1180, a face center displacement diagram of 2 × 240 × 180, a width and height diagram of 2 × 240 × 180 faces, and an image feature vector of 128 × 240 × 180 are obtained. Executing a non-maximum value suppression algorithm on the 1 x 240 x 180 face central point thermodynamic diagram, extracting a peak face central point, and obtaining a thermal response value larger than T₁N candidate face center points

Then, the offset of the center point of the corresponding face is taken

And width and height of human face

Obtaining a face frame after calculation

Taking the image feature vector set corresponding to the face frame from the image feature vectors of 128 × 240 × 180

I.e. the face feature vector.

Face matching:

collecting face characteristic vector

And the face feature vector set E ═ { E in the database^jComparing if 1 | j ═ 1.. K |, if

And in databases

Highest similarity value, and the similarity value is greater than threshold value T₂Then it is considered as

And

corresponding to the same person.

In this example, T is taken₁＝0.8，T₂＝0.6。

In the embodiment of the invention, the training loss value is formed by superposing face central point thermodynamic diagram loss, face central point offset diagram loss, face width and height diagram loss and face identification loss.

Further, let the ith individual face frame on the image be represented by two points at the top left and bottom right of the frame

The face center point of the face frame

Is shown as

Order to

Is shown as

And at the corresponding position on the face central point thermodynamic diagram, the response value of the corresponding generated face central point thermodynamic diagram is represented as:

the loss of the face center point thermodynamic diagram is represented as:

wherein α and β are modulation coefficients, and in this embodiment, are set to 1 and 2, respectively;

Further, let the ith individual face frame on the image be represented by its two upper left and lower right points as:

let its width and height be expressed as:

the face width height loss is expressed as:

wherein,

in this embodiment, the face center point of the ith face frame on the image is expressed as

Order to

Let the offset of the center point of the face be expressed as

Then the face center point offset loss is expressed as:

wherein,

and n is the multiple of the down-sampling of the deep neural network.

In this embodiment, n is 4, and for the label box

Its width and height

Corresponding center point offset is

Further, the target central point of the ith face frame on the face central point thermodynamic diagram on the image is set as

In this embodiment, the K value is 10000, and when the ith face frame belongs to the 1 st id, L isⁱ(k) (1, 0, 0, 0,. 0, 0, 0), 9999 of which are 0.

In the embodiment provided by the invention, the face detection and recognition network adopts the resnet34 or Googlenet as a backbone network.

The invention also provides a face detection and recognition system, comprising:

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims

1. A face detection and recognition method is characterized by comprising the following steps: the method comprises the following steps:

the deep learning network for face detection comprises the following steps:

the deep learning network for face recognition comprises the following steps:

2. A face detection and recognition method as claimed in claim 1, wherein:

let the ith personal face frame on the image be represented by two points, upper left and lower right, as

The face center point of the face frame

Is shown as

Order to

Is shown as

the loss of the face center point thermodynamic diagram is represented as:

wherein, alpha and beta are modulation coefficients;

3. A face detection and recognition method as claimed in claim 2, wherein:

let the ith personal face frame on the image be represented by its two upper left and lower right points as:

let its width and height be expressed as:

the face width height loss is expressed as:

wherein,

let the face center point of the ith face frame on the image be represented as

Order to

Let the offset of the center point of the face be expressed as

Then the face center point offset loss is expressed as:

wherein,

and n is the multiple of the down-sampling of the deep neural network.

4. A face detection and recognition method as claimed in claim 3, wherein:

the target central point of the ith face frame on the face central point thermodynamic diagram on the image is set as

wherein N is the number of face frames, and K is the number of categories; p is a radical ofⁱ(k) Is the probability that the ith face box belongs to the kth id, Lⁱ(k) Is the label of the ith detection frame.

5. A face detection and recognition method as claimed in any one of claims 1 to 4, wherein: the face detection and recognition network adopts resnet34 or Googlenet as a backbone network.

6. A face detection and recognition system, comprising:

the image preprocessing module is used for preprocessing the image marked with the face frame to generate a training sample; the human face feature extraction module is used for generating a human face central point thermodynamic diagram, a human face central point offset diagram and a human face width and height diagram and extracting an image feature vector of the whole image;