CN113688793A

CN113688793A - Training method of face model and face recognition system

Info

Publication number: CN113688793A
Application number: CN202111112053.9A
Authority: CN
Inventors: 万章敏
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2021-11-23

Abstract

The invention discloses a training method of a face recognition model and a face recognition system, wherein the face recognition model comprises a recognition backbone network, a space pyramid pooling network, a recognition full-link layer, a classification full-link layer and a recognition module; the input of the identification backbone network is a partial image on a high-resolution face, the input of the space pyramid pooling network is the output of the backbone network, the input of the full-connection layer is the output of the space pyramid pooling network, the input of the identification module is the output of the full-connection layer, the output of the identification module is the identity information of the monitored object, the obtained face identification model has high accuracy on face identification, the face can be identified in a face image (a face image worn with a mask) only containing a partial face, and the effectiveness and the accuracy of the face identification are improved.

Description

Training method of face model and face recognition system

Technical Field

The invention relates to the technical field of computers, in particular to a training method of a face recognition model and a face recognition system.

Background

People need to wear the mask when going out at present for less spreading of infectious diseases. However, after the mask is worn, most of human faces are shielded by the mask, so that the accuracy of identifying the identity information of the user based on the human face image is reduced, and certain threats are brought to the safety prevention and control of public areas.

Disclosure of Invention

The invention aims to provide a training method of a face recognition model and a face recognition system, which are used for solving the problems in the prior art.

In a first aspect, an embodiment of the present invention provides a training method for a face recognition model, where the face recognition model includes a recognition backbone network, a spatial pyramid pooling network, a recognition full-link layer, a classification full-link layer, and a recognition module; the input of the identification backbone network is a partial image on a high-resolution face, the input of the spatial pyramid pooling network is the output of the backbone network, the input of the full-connection layer is the output of the spatial pyramid pooling network, the input of the identification module is the output of the full-connection layer, and the output of the identification module is the identity information of the monitored object; the method comprises the following steps:

obtaining a face recognition training set, wherein the face recognition training set comprises a plurality of recognition training images, and the plurality of recognition training images comprise a basic face image, a training positive sample image and a training negative sample image;

preprocessing and data strengthening are carried out on the recognition training image to obtain a recognition strengthened image;

inputting the recognition enhanced image into the recognition backbone network to obtain a basic facial image feature, a training positive sample feature and a training negative sample feature;

obtaining a basic facial image feature matrix, a training positive sample feature matrix and a training negative sample feature matrix with fixed sizes through a spatial pyramid pooling network based on the basic facial image features, the training positive sample features and the training negative sample features;

after the basic facial image feature matrix, the training positive sample feature matrix and the training negative sample feature matrix respectively pass through the identification full-connection layer, respectively obtaining a basic facial image feature vector, a training positive sample feature vector and a training negative sample feature vector;

respectively carrying out L2 standardization processing on the basic facial image feature vector, the training positive sample feature vector and the training negative sample feature vector to obtain a standardized basic facial image feature vector, a standardized training positive sample feature vector and a standardized training negative sample feature vector;

constructing a classifier based on the standardized basic facial image feature vector, the standardized training positive sample feature vector and the standardized training negative sample feature vector to obtain a recognition loss value;

and obtaining the maximum iteration times of the face model training, and stopping the training if the face recognition loss value is not greater than a threshold value or the iteration times of the face model training are less than or equal to the maximum iteration times to obtain a trained face recognition model.

Optionally, the constructing a classifier based on the normalized basic facial image feature vector, the normalized training positive sample feature vector, and the normalized training negative sample feature vector to obtain the recognition loss value includes:

classifying the standardized basic facial image feature vector, the standardized training positive sample feature vector and the standardized training negative sample feature vector respectively based on the category full-link layer to obtain category feature vectors;

the recognition loss value of the face recognition model is the sum of the loss value of the face recognition characteristic vector and the loss value of the weighted classification characteristic vector, the loss value of the classification characteristic vector is the loss value of the class characteristic vector and the labeled class characteristic vector obtained through a cross entropy loss function, and the loss value of the face recognition characteristic vector is a triple loss function value among the basic face image characteristic vector, the training positive sample characteristic vector and the training negative sample characteristic vector;

the triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

Wherein L is_tIs the value of the triplet loss function, d_a,pIs a first distance, the first distance being a distance value between the face recognition feature vector and the training positive sample, d_a,nIs a second distance, the second face recognition feature vector is a distance value from the training negative sample; alpha is an identification threshold parameter set according to actual needs;

the first distance is a calculation mode of the distance value between the face recognition feature vector and the training positive sample, and the calculation mode comprises the following steps:

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

wherein β is a weight factor;

wherein x is_iFor the face recognition feature vector, y1_iIs a feature vector of the positive sample; y2_iIs the feature vector of the negative example.

Optionally, the obtaining, by using the spatial pyramid pooling network, a fixed-size basic facial image feature matrix, a fixed-size training positive sample feature matrix, and a fixed-size training negative sample feature matrix based on the basic facial image feature, the training positive sample feature, and the training negative sample feature includes:

dividing the basic facial image features into a first number of basic facial image features, combining all the basic facial image features together, and then pooling the basic facial image features by using pooling cores with corresponding sizes to obtain a first number of features, so as to obtain a basic facial image feature matrix with a fixed size;

dividing the training positive sample features into second number training positive sample features, combining all the training positive sample features together, and then pooling the training positive sample features by using pooling cores with corresponding sizes to obtain second number features, so as to obtain a training positive sample feature matrix with a fixed size;

and dividing the training negative sample characteristics into a third number of training positive sample characteristics, combining all the training negative sample characteristics together, and pooling the training positive sample characteristics by using pooling cores with corresponding sizes to obtain a third number of characteristics, thereby obtaining a training negative sample characteristic matrix with a fixed size.

In a second aspect, an embodiment of the present invention provides a face recognition system, where the face recognition system includes:

the acquisition module is used for acquiring a face image of the pedestrian; the face image comprises a face, and the pedestrian wears a mask which covers the mouth of the pedestrian;

the recognition module is used for recognizing partial areas on the face and obtaining partial images on the face based on the partial areas on the face; identifying the identity information of a user based on partial images on the face through a pre-trained face identification model;

the face recognition model comprises a recognition backbone network, a space pyramid pooling network, a recognition full-link layer, a classification full-link layer and a recognition module; the input of the identification backbone network is a partial image on a high-resolution face, the input of the spatial pyramid pooling network is the output of the backbone network, the input of the full-connection layer is the output of the spatial pyramid pooling network, the input of the identification module is the output of the full-connection layer, and the output of the identification module is the identity information of the monitored object.

Optionally, the training method of the face recognition model includes:

the triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

wherein β is a weight factor;

Optionally, the identification module is further configured to: carrying out face reconstruction on a partial region on the face to obtain a partial image on the face with high resolution; and identifying the identity information of the user based on partial images on the high-resolution face through a pre-trained face identification model.

Compared with the prior art, the embodiment of the invention achieves the following beneficial effects:

the embodiment of the invention provides a training method of a face recognition model and a face recognition system, wherein the face recognition model comprises a recognition backbone network, a space pyramid pooling network, a recognition full-link layer, a classification full-link layer and a recognition module; the input of the identification backbone network is a partial image on a high-resolution face, the input of the spatial pyramid pooling network is the output of the backbone network, the input of the full-connection layer is the output of the spatial pyramid pooling network, the input of the identification module is the output of the full-connection layer, and the output of the identification module is the identity information of the monitored object; obtaining a face recognition training set, wherein the face recognition training set comprises a plurality of recognition training images, and the plurality of recognition training images comprise a basic face image, a training positive sample image and a training negative sample image; preprocessing and data strengthening are carried out on the recognition training image to obtain a recognition strengthened image; inputting the recognition enhanced image into the recognition backbone network to obtain a basic facial image feature, a training positive sample feature and a training negative sample feature; obtaining a basic facial image feature matrix, a training positive sample feature matrix and a training negative sample feature matrix with fixed sizes through a spatial pyramid pooling network based on the basic facial image features, the training positive sample features and the training negative sample features; after the basic facial image feature matrix, the training positive sample feature matrix and the training negative sample feature matrix respectively pass through the identification full-connection layer, respectively obtaining a basic facial image feature vector, a training positive sample feature vector and a training negative sample feature vector; respectively carrying out L2 standardization processing on the basic facial image feature vector, the training positive sample feature vector and the training negative sample feature vector to obtain a standardized basic facial image feature vector, a standardized training positive sample feature vector and a standardized training negative sample feature vector; constructing a classifier based on the standardized basic facial image feature vector, the standardized training positive sample feature vector and the standardized training negative sample feature vector to obtain a recognition loss value; and obtaining the maximum iteration times of the face model training, and stopping the training if the face recognition loss value is not greater than a threshold value or the iteration times of the face model training are less than or equal to the maximum iteration times to obtain a trained face recognition model. By adopting the scheme, the obtained face recognition model has high accuracy for face recognition, can recognize faces in face images (face images wearing a mask) only containing partial faces, and improves the effectiveness and accuracy of face recognition.

Drawings

Fig. 1 is a flowchart of a training method for a face recognition model according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a feature pyramid network structure according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a context module of a single-point headless face detector network according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a detection module of a single-point headless face detector network according to an embodiment of the present invention.

Fig. 5 is a schematic block structure diagram of an electronic device according to an embodiment of the present invention.

The labels in the figure are: a bus 500; a receiver 501; a processor 502; a transmitter 503; a memory 504; a bus interface 505.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings.

Because the existing epidemic situation causes that people come in and go out of public places, particularly hospitals, and wear masks are required, most of face detection models and recognition models are designed at present and are used for detecting and recognizing the whole face, and if the face area is covered by wearing the mask, the face detection models and the recognition models cannot be detected and recognized, so that the face recognition method for monitoring the hospital is designed.

Examples

The embodiment of the invention provides a training method of a face recognition model, wherein the face recognition model comprises a recognition backbone network, a space pyramid pooling network, a recognition full-link layer, a classification full-link layer and a recognition module; the input of the identification backbone network is a partial image on a high-resolution face, the input of the spatial pyramid pooling network is the output of the backbone network, the input of the full-connection layer is the output of the spatial pyramid pooling network, the input of the identification module is the output of the full-connection layer, and the output of the identification module is the identity information of the monitored object. As shown in fig. 1, the method includes:

s101: and obtaining a face recognition training set.

The face recognition training set comprises a plurality of recognition training images, wherein the plurality of recognition training images comprise basic face images, training positive sample images and training negative sample images;

s102: and preprocessing and data enhancement are carried out on the recognition training image to obtain a recognition enhanced image.

S103: and inputting the recognition enhanced image into the recognition backbone network to obtain basic human face image characteristics, training positive sample characteristics and training negative sample characteristics.

S104: and obtaining a basic facial image feature matrix, a training positive sample feature matrix and a training negative sample feature matrix with fixed sizes through a spatial pyramid pooling network based on the basic facial image features, the training positive sample features and the training negative sample features.

S105: and after the basic facial image feature matrix, the training positive sample feature matrix and the training negative sample feature matrix respectively pass through the identification full-connection layer, respectively obtaining a basic facial image feature vector, a training positive sample feature vector and a training negative sample feature vector.

The basic facial image feature matrix passes through the recognition full-connection layer to obtain a basic facial image feature vector, the training positive sample feature matrix passes through the recognition full-connection layer to obtain a training positive sample feature vector, and the training negative sample feature matrix passes through the recognition full-connection layer to obtain a training negative sample feature vector.

S106: and respectively carrying out L2 standardization processing on the basic facial image feature vector, the training positive sample feature vector and the training negative sample feature vector to obtain a standardized basic facial image feature vector, a standardized training positive sample feature vector and a standardized training negative sample feature vector.

S107: and constructing a classifier based on the standardized basic facial image feature vector, the standardized training positive sample feature vector and the standardized training negative sample feature vector to obtain an identification loss value.

S108: and obtaining the maximum iteration times of the face model training, and stopping training the face recognition model if the face recognition loss value is not greater than a threshold value or the face model training iteration times is less than or equal to the maximum iteration times to obtain the trained face recognition model. The maximum iteration number is preset, and the value of the maximum iteration number may be 500.

By adopting the scheme, the obtained face recognition model has high accuracy for face recognition, can recognize faces in face images (face images wearing a mask) only containing partial faces, and improves the effectiveness and accuracy of face recognition.

the triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

wherein β is a weight factor;

Optionally, the training method of the face recognition model includes:

the triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

wherein β is a weight factor;

Optionally, the face recognition system is further configured to execute a face recognition method, where the face recognition method includes:

gather the monitoring image among the real time monitoring, include the face image of monitoring object among the monitoring image, the gauze mask is worn to the monitoring object, just the gauze mask covers the lower part of monitoring object's face, the lower part of face includes the mouth at least.

And detecting a face region in the monitored image based on the detection model to obtain a face image frame and a mask frame.

And obtaining partial areas on the face based on the central point and the width and the height of the mask frame and the central point and the width and the height of the face image frame.

And carrying out face reconstruction on the partial region of the face to obtain a partial image on the high-resolution face.

And identifying the identity information of the monitored object based on partial images on the high-resolution face through a face identification model.

And if the identity information is matched with the face information in the special database, starting an alarm system and tracking a target.

According to the scheme, firstly, the face detection is carried out on the monitored image, and the face detection module capable of detecting the face area and obtaining the face image frame, the mask frame and the face characteristic points is designed. Due to the characteristic of the size of the face image in the monitored image, the designed face detection model can detect the large face image, the medium face image and the small face image in the monitored image.

Because the existing face recognition model capable of recognizing the face of a wearer directly recognizes the face of the wearer, and the characteristics of the wearer can influence the recognition effect, the face recognition model and the mask frame detected by the face detection model are used for obtaining partial areas on the face for recognition.

Note that the upper part image includes the user's eyes.

In order to solve the problem that the face recognition effect of the existing face recognition model is poor for a long-distance low-resolution face, the invention obtains a high-resolution face image by reconstructing the face to obtain a better recognition effect.

Meanwhile, a pyramid pooling network is added to the face recognition model, so that the face of any size can obtain feature points through a full connection layer, the loss function is modified, the convergence effect of the triple loss function is enhanced by using the cross entropy, and the accuracy of face recognition is improved.

In summary, the technical solution adopted by the hospital monitoring face recognition method and system provided by the embodiments of the present invention can recognize a face in a face image (a face image of a mask) including only a part of the face, and can also effectively recognize a face with a far distance and a low resolution, so that the effectiveness and accuracy of face recognition are improved.

And acquiring a monitoring image in real-time monitoring, wherein the face image in the monitoring image is an original face image of the mask. Putting the monitored image into a detection model, carrying out a series of convolution operations on the monitored image through a trunk network, putting the characteristics obtained by the last three layers of convolution into a characteristic pyramid network for characteristic fusion, putting the fused characteristics into a context module of a single-point headless face for characteristic reinforcement, putting the characteristics after the characteristic reinforcement into a single-point headless face detector, and detecting a face image frame, a mask frame and face characteristic points. And calculating to obtain the position of the upper frame of the face image and the position of the upper part of the mask frame based on the position and the width of the central point of the mask frame and the position and the width of the central point of the face image frame, so as to obtain the upper part area of the face. The method comprises the steps of reconstructing a face of a partial region on the face, carrying out size amplification on the partial region on the face by a bicubic interpolation method to obtain an amplified image, carrying out nonlinear mapping on a feature vector obtained by convolution on each image block of the amplified image, and aggregating high-dimensional feature vectors of all the image blocks to obtain a partial image on the face with high resolution. The method comprises the steps of putting a part of image on a high-resolution face into a face recognition model, convolving the part of image on the high-resolution face through a backbone network to obtain face features, fixing the dimension of the face features input into a full connection layer through space pyramid pooling network segmentation combination operation, converting the face features into one dimension by the full connection layer to obtain face feature vectors, calculating the distance between the face feature vectors and the face in a database to be recognized in the database, and outputting identity information and giving an alarm if the recognition is successful.

Optionally, the detection model includes a detection backbone network, a feature pyramid network, and a single-point headless face detector network; the input of the backbone network is a monitoring image, the input of the feature pyramid network is the output of the backbone network, the input of the single-point headless face detector network is the output of the feature pyramid network, and the output of the single-point headless face detector network is the mask area and the partial area on the face.

Optionally, the training method of the detection model includes:

and obtaining a face detection training set, wherein the face detection training set comprises a plurality of training images and training annotation values.

The training marking values obtained by the face detection training set comprise the position and the width and the height of the central point of a face frame, the position of the central point and the width and the height of a mask frame, and the position of a face characteristic point, wherein the face characteristic point comprises two points of eyes. And (3) reducing the ratio according to the finally obtained sizes of the picture features of the detection model (for example, a Resnet50 residual error network structure is used by a backbone network, the feature scale for detecting a large face is 28x28, the feature scale for detecting a middle face is 14x14, and the feature scale for detecting a small face is 7x7), establishing a coordinate axis by taking the lower right corner as a point 0, wherein the positions on the coordinate axis are the positions of the reduced pictures, and the positions of the original pictures can be obtained after the pictures are proportionally expanded.

And preprocessing and data enhancement are carried out on the training image to obtain an enhanced image. Including CutMix to mosaic data for the image, etc.

Inputting the enhanced image into the detection backbone network, wherein the detection backbone network comprises a plurality of convolutional layers, the invention adopts a Resnet50 residual error network, and the structure and output of the last three layers of the detection backbone network are shown in the following table 1:

TABLE 1

The first feature of the enhanced image is output from Con3_ x, the second feature of the enhanced image is output from Con4_ x and the third feature of the enhanced image is output from Con5_ x.

And performing feature fusion on the first feature, the second feature and the third feature based on a pyramid network structure to obtain a first fusion feature, a second fusion feature and a third fusion feature.

As shown in fig. 2, the feature pyramid network has a first feature output by the last dimension convolution (Con5_ x) as a first fusion feature, a second fusion feature is obtained by adding the feature sampled by the last dimension convolution (Con5_ x) and the second feature of the original penultimate layer to the first fusion feature, and a third fusion feature is obtained by adding the feature sampled by the penultimate layer (Con4_ x) and the feature of the original penultimate layer to the second fusion feature, so that feature information of the upper layer is richer, and a target can be detected more effectively.

Inputting the first fusion feature, the second fusion feature and the third fusion feature into a single-point headless face detector, respectively entering different 3 × 3 convolutions through a context module of the single-point headless face detector shown in fig. 3 and adding, and performing feature enhancement on the first fusion feature, the second fusion feature and the third fusion feature to obtain a first enhancement feature, a second enhancement feature and a third enhancement feature.

Through a detection module of the single-point headless face detector network shown in fig. 4, an output feature value is obtained based on the first enhanced feature, the second enhanced feature and the third enhanced feature.

And obtaining a labeled characteristic value based on the characteristic dimension of the output characteristic value (for example, the characteristic dimension is 7x7 when a Resnet50 residual network structure is used by a backbone network) and the training labeled value.

And obtaining the detection loss values of the output characteristic value and the labeled characteristic value.

The maximum iteration number of the detection model training is obtained, and the maximum iteration number is set to 800 in this embodiment.

And stopping training when the detection loss value is less than or equal to a threshold value or the training iteration number reaches the maximum iteration number, so as to obtain a trained detection model.

Optionally, the obtaining, by the detection module of the single-point headless face detector network, an output feature value based on the first enhanced feature, the second enhanced feature, and the third enhanced feature includes:

the network structure of the single-point headless face detector is as shown in fig. 4, and the output feature value corresponding to the labeled data is obtained by combining the feature after the 3x3 convolution and the feature after the enhancement by the context module and then adjusting the channel through the 1x 1 convolution.

The first enhanced feature channel is adjusted to be the dimension of the adjustment parameter vector of the detection frame in the first feature value (4 × number of anchors, 4 represents the center point and width and height (x, y, w, h)), so as to obtain the adjustment parameter vector of the detection frame in the first feature value. And adjusting the first enhanced characteristic channel to be the dimension (2x number of anchors, 2 represents the type of the face image or the mask) of the detection type probability vector in the first characteristic value to obtain the detection type probability vector in the first characteristic value. And adjusting the first enhanced characteristic channel to be the dimension of the face detection point vector in the first characteristic value (2x number of anchors, 2 represents two characteristic points of eyes), and obtaining the face detection point vector in the first characteristic value. The first output feature value characterizes a feature of a large face in the monitored image.

The second output characteristic value and the third output characteristic value are also the same as the above-mentioned calculation method of the first output characteristic value, and are not described herein again.

And the second output characteristic value represents the characteristics of the face in the monitored image.

And a third output characteristic value, wherein the third output characteristic value represents the characteristics of the small face in the monitored image.

The number of anchors is a feature map calibration detection frame for detecting one feature map, and in this embodiment, 3 different anchors are respectively obtained for each output feature scale through clustering.

Optionally, the obtaining the detection loss values of the output characteristic value and the labeled characteristic value includes:

the loss function of the detection model is: DLoss (p, q, x, t) ═ BLoss (p, q) + bsloss (x) + lsloss (t).

Wherein DLoss (p, q, x, t) is a detection loss value of the output characteristic value and the labeled characteristic value.

The bsloss (x) is obtained by a calculation method described by the following formula:

wherein, x is x1-x2, x1 is an adjustment parameter vector of a detection box in an output eigenvalue, x2 is an adjustment parameter vector of a labeled detection box of the labeled eigenvalue, and BSLoss is the loss value of x1 and x 2.

The BLOss is obtained by the calculation method of the following formula:

wherein BLOss is a class loss value of each detection frame, p is a detection class probability vector in an output characteristic value, and q is a labeled class probability vector labeling the characteristic value.

The LSLoss is obtained by the calculation method according to the following formula:

wherein, t is t1-t2, t1 is a face detection point vector in an output feature value, t2 is a labeled face detection point vector of the labeled feature value, and LSLoss is the loss value of t1 and t 2.

Optionally, the training method of the face recognition model includes:

and obtaining a face recognition training set, wherein the face recognition training set comprises a plurality of recognition training images, and the plurality of recognition training images comprise basic face images, training positive sample images and training negative sample images.

The basic face image is a standard image to be recognized, the positive sample image is an image of the same face as the basic image, and the negative sample image is an image of a face other than the basic image.

And preprocessing and data enhancement are carried out on the recognition training image to obtain a recognition enhanced image.

And inputting the recognition enhanced image into the recognition backbone network to obtain basic human face image characteristics, training positive sample characteristics and training negative sample characteristics.

And obtaining a basic facial image feature matrix, a training positive sample feature matrix and a training negative sample feature matrix with fixed sizes through a spatial pyramid pooling network based on the basic facial image features, the training positive sample features and the training negative sample features.

And after the basic facial image feature matrix, the training positive sample feature matrix and the training negative sample feature matrix respectively pass through the identification full-connection layer, respectively obtaining a basic facial image feature vector, a training positive sample feature vector and a training negative sample feature vector.

And respectively carrying out L2 standardization processing on the basic facial image feature vector, the training positive sample feature vector and the training negative sample feature vector to obtain a standardized basic facial image feature vector, a standardized training positive sample feature vector and a standardized training negative sample feature vector.

And constructing a classifier based on the standardized basic facial image feature vector, the standardized training positive sample feature vector and the standardized training negative sample feature vector to obtain an identification loss value.

and classifying the standardized basic facial image feature vector, the standardized training positive sample feature vector and the standardized training negative sample feature vector respectively based on the category full-link layer to obtain a category feature vector.

The recognition loss value of the face recognition model is the sum of the face recognition feature vector loss value and the weighted classification feature vector loss value, the classification feature vector loss value is the loss value obtained by the class feature vector and the labeled class feature vector through a cross entropy loss function, and the face recognition feature vector loss value is a triple loss function value among the basic face image feature vector, the training positive sample feature vector and the training negative sample feature vector.

Since the triplet loss function does not converge well, convergence is aided by increasing the class cross entropy.

The triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

Wherein L is_tIs the value of the triplet loss function, d_a,pIs a first distance, the first distance being a distance value between the face recognition feature vector and the training positive sample, d_a,nIs a second distance, the second face recognition feature vector is a distance value from the training negative sample. α is a recognition threshold parameter set according to actual needs.

Generally, α is the minimum distance between the first distance and the second distance, and this distance is used to determine that the distance reaches a certain degree to determine that two faces are the same face, thereby achieving face recognition. The smaller the alpha value, the more the triple loss function value tends to 0, but it is difficult to distinguish similar images of different faces, and the larger the alpha value, the more the triple loss function value tends to 0, resulting in network non-convergence, so the value is set according to the face to be recognized in the face recognition model and database. For example, the difference of the face to be recognized in the database is larger, the α value may be set smaller, the difference of the face to be recognized in the database is smaller, and the α value is set larger.

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

where β is a weighting factor.

Wherein x is_iFor the face recognition feature vector, y1_iIs the feature vector of the positive sample, y2_iIs the feature vector of the negative example.

and dividing the basic facial image features to obtain a first number of basic facial image features, combining all the basic facial image features together, and pooling the basic facial image features by using a pooling core with a corresponding size to obtain a first number of features, thereby obtaining a basic facial image feature matrix with a fixed size.

The positive sample feature and the negative sample feature are obtained by the method shown above, and a positive sample feature matrix with a fixed size and a negative sample feature matrix with a fixed size are obtained, which are not described in detail herein.

The 128 eigenvectors obtained in this embodiment can be divided into two 8 × 8 feature maps, and then pooled by using the pooling core with the corresponding size to obtain 2 × 64 eigenvectors.

Optionally, the performing face reconstruction on the partial region of the face to obtain a partial image of the high-resolution face includes:

and carrying out size amplification on the partial area on the face by a bicubic interpolation method to obtain a target image.

Extracting a plurality of first high-dimensional vector feature vectors from a target image by a set step length through a sliding frame based on a convolutional layer of a neural network; the set length is 2. The dimension of the first high-dimensional vector feature vector is a first set dimension.

Carrying out nonlinear mapping on the first high-dimensional vector feature vector to obtain a second high-dimensional vector feature vector; the dimension of the second high-dimensional vector feature vector is a second set dimension. And correspondingly obtaining a plurality of second high-dimensional vector feature vectors by the plurality of first high-dimensional vector feature vectors.

And aggregating a plurality of second high-dimensional vector feature vectors to obtain a partial image on the high-resolution face.

Based on the above hospital monitoring face recognition method, the embodiment of the invention provides a hospital monitoring face recognition system, and the method comprises the following steps:

and the acquisition module is used for acquiring the face image of the monitoring image. And the feature extraction module is used for extracting the face features based on the face image. And the face detection module is used for carrying out face detection on the image, detecting a face area in the monitored image and obtaining a face image frame and a mask frame. The face recognition module is used for recognizing the face area; and if the identification is successful, obtaining the identity information of the monitored object stored in the database. And the alarm module starts a ring and carries out an alarm system. The specific manner in which the respective modules perform operations has been described in detail in the embodiments related to the method, and will not be elaborated upon here.

An embodiment of the present invention further provides an electronic device, as shown in fig. 3, including a memory 504, a processor 502, and a computer program stored on the memory 504 and executable on the processor 502, where the processor 502 implements the steps of the method executed by any module of the face recognition system when executing the program.

Where in fig. 3 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 505 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.

The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in an apparatus according to an embodiment of the invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A training method of a face recognition model is characterized in that the face recognition model comprises a recognition backbone network, a space pyramid pooling network, a recognition full-link layer, a classification full-link layer and a recognition module; the input of the identification backbone network is a partial image on a high-resolution face, the input of the spatial pyramid pooling network is the output of the backbone network, the input of the full-connection layer is the output of the spatial pyramid pooling network, the input of the identification module is the output of the full-connection layer, and the output of the identification module is the identity information of the monitored object; the method comprises the following steps:

2. The method of claim 1, wherein constructing a classifier based on the normalized basic facial image feature vector, the normalized training positive sample feature vector and the normalized training negative sample feature vector to obtain a recognition loss value comprises:

the triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

wherein β is a weight factor;

3. The method of claim 1, wherein obtaining a fixed-size base facial image feature matrix, a training positive sample feature matrix, and a training negative sample feature matrix based on the base facial image features, the training positive sample features, and the training negative sample features through a spatial pyramid pooling network comprises:

4. A face recognition system, the system comprising:

5. The system of claim 4, wherein the training method of the face recognition model comprises:

6. The system of claim 4, wherein the constructing a classifier based on the normalized basic facial image feature vector, the normalized training positive sample feature vector and the normalized training negative sample feature vector to obtain the recognition loss value comprises:

the triplet loss function is: l is_t＝(d_a,p-d_a,n+α)

d_a,p＝dist(x,y1)+β*cos(θ1)

d_a,n＝dist(x,y2)+β*cos(θ2)

wherein β is a weight factor;

7. The system of claim 4, wherein the obtaining of the fixed-size base facial image feature matrix, the training positive sample feature matrix, and the training negative sample feature matrix based on the base facial image features, the training positive sample features, and the training negative sample features through the spatial pyramid pooling network comprises:

8. The system of claim 4, wherein the identification module is further configured to: carrying out face reconstruction on a partial region on the face to obtain a partial image on the face with high resolution; and identifying the identity information of the user based on partial images on the high-resolution face through a pre-trained face identification model.