CN111414817B

CN111414817B - Face recognition system and face recognition method

Info

Publication number: CN111414817B
Application number: CN202010146210.7A
Authority: CN
Inventors: 杨晶; 张一�
Original assignee: Glenfly Tech Co Ltd
Current assignee: Glenfly Tech Co Ltd
Priority date: 2020-03-05
Filing date: 2020-03-05
Publication date: 2024-03-29
Anticipated expiration: 2040-03-05
Also published as: CN111414817A

Abstract

A face recognition system and a face recognition method are disclosed, the face recognition system includes a processor having an artificial neural network configured to perform face recognition on a face image and in-vivo detection. The artificial neural network comprises a backbone network, a face recognition sub-network and a living body detection sub-network, wherein the backbone network is then coupled in parallel with the face recognition sub-network and the living body detection sub-network. The processor performs the face recognition by combining the main network with the face recognition sub-network. The processor performs the above-mentioned in vivo detection in combination with the in vivo detection sub-network through the backbone network.

Description

Face recognition system and face recognition method

Technical Field

The present disclosure relates to a face recognition system and a face recognition method, and more particularly to a face recognition system and a face recognition method based on an artificial neural network

Background

Face recognition has many applications, such as customs security, transaction payment, identification …, etc., which may be referred to as a popular technique in the computer arts. Artificial Intelligence (AI) technology, which has matured over the years, has also been used to implement facial recognition.

How to construct an artificial intelligent face recognition system with low calculation amount is an important subject in the technical field.

Disclosure of Invention

The technical problem to be solved by the scheme is to realize a face recognition system and a face recognition method based on an artificial neural network with low calculation amount.

A facial recognition system implemented according to one embodiment of the present invention includes a processor. The processor has an artificial neural network configured to perform face recognition on the face image, and in-vivo detection. The artificial neural network comprises a backbone network, a face recognition sub-network and a living body detection sub-network, wherein the backbone network is then coupled in parallel with the face recognition sub-network and the living body detection sub-network. The processor performs the face recognition by combining the main network with the face recognition sub-network. The processor performs the above-mentioned in vivo detection in combination with the in vivo detection sub-network through the backbone network. By using the same backbone network, the calculation amount is obviously reduced, and the power consumption is further saved.

In one embodiment, the processor extracts fine-grained features of the face image through the backbone network, where the fine-grained features of the face image refer to features of distinct regional blocks in the face image. The processor performs the face recognition according to the fine-grained feature through the face recognition sub-network, and the processor performs the in-vivo detection according to the fine-grained feature through the in-vivo detection sub-network.

In one embodiment, after the training of the first network composed of the backbone network and the face recognition sub-network is completed, the backbone network with determined weight is combined with the living body detection sub-network with undetermined weight in addition, so as to perform the training of the living body detection sub-network.

In one embodiment, the biopsy sub-network includes a biopsy convolution layer and a biopsy classification layer coupled in sequence. After the training of the first network consisting of the main network and the face recognition sub-network is completed, the main network with determined weight and the living body detection convolution layer with undetermined weight are combined into a second network, and the training of the living body detection convolution layer is carried out, so that the weight of the living body detection convolution layer reaches a primary state. The backbone network with determined weights, the living body detection convolution layer with weights reaching the primary state, and the living body detection classification layer with weights not determined are combined into a third network, and weight fine adjustment of the living body detection convolution layer and training of the living body detection classification layer are performed.

In one embodiment, the backbone network comprises an M-layer network, the face recognition sub-network comprises an N1-layer network, the living detection convolutional layer comprises an N2-layer network, and the living detection classification layer comprises an N3-layer network. The first network includes first through (m+n1) th layers, and training based on the first network includes modifying weights of the (m+n1) th layer to the first layer in a back propagation. The second network includes the first layer through the (m+n2) th layer, training based on the second network includes modifying weights of the (m+n2) th layer through the (m+1) th layer in a back propagation, and the weights of the M-th layer through the first layer remain unchanged in the back propagation. The third network includes the first layer through the (M+N2+N3) th layer, training based on the third network includes modifying weights of the (M+N2+N3) th layer through the (M+1) th layer in a back propagation, and the weights of the M th layer through the first layer remain unchanged in the back propagation.

In one embodiment, the living detection convolutional layer training based on the second network includes supervised training with depth information, and the living detection convolutional layer weight fine-tuning based on the third network, and the living detection classification layer training includes supervised training with category information.

In one embodiment, the depth information of the training data is 0 at all feature points, and the category information of the training data is 0. The depth information of the living body training data reflects the depth of each feature point, and the category information of the living body training data is 1.

In one embodiment, the processor receives an image, performs face detection and face alignment processing on the image to generate the face image, wherein the image is captured by a monocular camera.

In one embodiment, the face recognition system further comprises a memory storing a library of target face images. The processor performs the living body detection to estimate living body probability, and performs the face recognition to estimate feature vector. When the living body probability is larger than a threshold value, the processor judges that the face image is a living body, then queries the target face image library according to the feature vector, and identifies whether the target face image accords with the target face image.

The invention relates to a technology for realizing facial recognition by establishing an artificial neural network based on the concept. One embodiment also discloses a face recognition method. The face recognition method comprises the following steps: and performing face recognition and living body detection on the face image through an artificial neural network, wherein the artificial neural network comprises a main network, a face recognition sub-network and a living body detection sub-network, and the main network is then coupled with the face recognition sub-network and the living body detection sub-network in parallel. And carrying out the face recognition by combining the main network with the face recognition sub-network. The above-mentioned living body detection is performed in combination with the living body detection sub-network through the backbone network.

In the invention, the same backbone network is used in the face recognition and living body detection, and independent backbone networks are not needed, so that the calculated amount is obviously reduced, and the power consumption is further saved.

In addition, because only one training is needed to be performed on the backbone network during training, the training time is shortened, and the training efficiency is improved.

The present invention will be described in detail with reference to the accompanying drawings.

Drawings

FIG. 1 illustrates a facial recognition system 100 implemented in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates a training architecture of an artificial neural network of a processor 104, according to one embodiment of the present disclosure;

FIG. 3 is a flow chart illustrating a training process of the training architecture of FIG. 2; and is also provided with

FIG. 4 is a flow chart illustrating the weight update of the ith layer of the artificial neural network.

Detailed Description

The following description exemplifies various embodiments of the invention. The following description presents basic concepts of the invention and is not intended to limit the scope of the present invention. The actual scope of the invention is to be defined in the following claims.

Fig. 1 illustrates a facial recognition system 100, including a processor 104, implemented in accordance with an embodiment of the present disclosure. The processor 104 has an artificial neural network configured to face recognition, and in vivo detection, of the face image 124. The artificial neural network includes a backbone network 108, a face recognition sub-network 110, and a living body detection sub-network 112, wherein the backbone network 108 is followed by a parallel coupling of the face recognition sub-network 110 and the living body detection sub-network 112. The processor 104 performs face recognition through the backbone network 108 in combination with the face recognition sub-network 110, and the processor 104 performs in-vivo detection through the backbone network 108 in combination with the in-vivo detection sub-network 112.

In one embodiment, the face recognition system 100 further includes a memory storing a target face image library 120. The processor 104 performs the living body detection estimation 128 and performs the face recognition estimation feature vector 126. When the living probability 128 is greater than a threshold, the processor 104 determines that the face image 124 is a living body, and then queries the target face image library 120 based on the feature vector 126 to identify whether there is a conforming target face image.

Specifically, the hardware design of the processor 104 implements various functional modules, including: the face detection and face alignment module 106, the backbone network 108, the face recognition sub-network 110, the living (real person) detection sub-network 112, the living judgment module 114, the portrait library judgment module 116, and the prosthesis attack issuing module 118. The facial recognition system 100 also has a system-in memory or cloud memory for storing a target facial image library 120.

In one embodiment, the processor 104 receives the image 122, performs a face detection and face alignment process on the image 122 to generate the face image 124, wherein the image 122 is captured by a monocular camera. Specifically, the facial recognition system 100 further includes a camera 102, the camera 102 being a monocular camera. The camera 102 captures an image 122 (e.g., an RGB image) and then sends the image 122 to the face detection and face alignment module 106. The face detection and face alignment module 106 receives the image 122, performs a face detection process on the image 122 to generate a face frame (e.g., a rectangular frame containing a face represented by four coordinates) and face key coordinates (e.g., eye coordinates, nose coordinates, mouth corner coordinates), and performs a face alignment (face alignment) process on the face frame according to the face key coordinates to generate a face image 124 of the front face (e.g., a face image of 112 x 96 pixels). The face detection and alignment module 106 then sends the face image 124 to the backbone network 108.

The backbone network 108, face recognition sub-network 110, and living body detection sub-network 112 are artificial neural network architectures. The backbone network 108, in addition to generating feature vectors 126 of faces by constituting a face recognition function (face recognition) with the face recognition sub-network 110, also estimates a living probability 128 (e.g., 1% indicates a low living probability and 99% indicates a high living probability) by constituting a living detection function (liveness detection) with the living detection sub-network 112.

The processor 104 extracts fine-grained characteristics of the face image 124 through the backbone network 108, wherein the fine-grained characteristics refer to characteristics of distinct regional blocks in the face image 124. The processor 104 performs face recognition according to the fine-grained feature through the face recognition sub-network 110, and the processor 104 performs living detection according to the fine-grained feature through the living detection sub-network 112. Specifically, upon receiving the face image 124, the processor 104 extracts fine-grained characteristics of the face image 124 through the backbone network 108, and then sends the fine-grained characteristics of the face image 124 to both the face recognition sub-network 110 and the living sub-network 112. The processor 104 generates feature vectors 126 of the face from the fine-grained features of the face image 124 through the face recognition sub-network 110, and the processor 104 estimates the living probability 128 from the fine-grained features of the face image 124 through the living detection sub-network 112. The fine granularity feature of the face image 124 is a feature of the face image 124 extracted by the processor 104 through the backbone network 108 that can distinguish subtle differences between different people, i.e., distinguishing regional blocks (discriminative part) in the face image 124. The feature vector 126 of the face is a vector representation of the features of the face image 124 output by the face recognition sub-network 110.

The living body determination module 114 determines whether the camera 102 is capturing a real person or a prosthetic attack based on the living body probability 128. In one embodiment, when the living probability 128 is greater than a threshold (e.g., 0.5, 0.6, or others), the living judgment module 114 determines that the camera 102 is capturing a real person, and conversely, a prosthetic attack (e.g., a photo of a system captured by the camera 102, or a head portrait in a video).

If the living body judging module 114 determines that the face shot by the camera 102 is a real person, the face image library judging module 116 compares the feature vector 126 with the feature vector of the face image listed in the target face image library 120 to judge whether the face shot by the camera 102 is a face of the target living body. If the living body judging module 114 determines that the camera 102 shoots a prosthesis, the processor 104 issues the prosthesis attack information through the prosthesis attack issuing module 118.

From the foregoing, the backbone network 108 for extracting fine-grained features of the face image 124 is shared by the face recognition function and the living body detection function. Since separate backbone networks are not required for the face recognition function and the living body detection function, the amount of computation and power consumption of the processor 104 are significantly reduced.

In addition, when the artificial neural network of the processor 104 is trained, the weight of the backbone network 108 is only required to be trained and generated once, so that the time for training the artificial neural network is shortened, and the training efficiency is improved.

In one embodiment, after the training of the first network composed of the backbone network 108 and the face recognition sub-network 110 is completed, the backbone network 108 with determined weights and the living body detection sub-network 112 with not determined weights are combined into another network, and the training of the living body detection sub-network 112 is performed. Because the weights of the backbone network 108 are already determined at this time, the training only needs to determine the weights of the living body detection sub-network 112, thereby improving the training efficiency and shortening the training time.

In one embodiment, the living body detection subnetwork 112 is further subdivided into multiple stages for training, which also does not change the weight of the backbone network 108. As discussed by way of example below.

In response to artificial neural network training of the backbone network 108, face recognition sub-network 110, and living body detection sub-network 112, one embodiment prepares multiple sets of training sets, including: a face recognition training set, and a living body detection training set.

This paragraph discusses the way in which the face recognition training set is built. The developer may take a picture of the face through the camera 102 as a dataset, or may collect a published face dataset as a supplement. The prepared dataset may be converted to a plurality of face images (e.g., each 112 x 96 pixels in size) in positive face alignment by the face detection and alignment module 106. After the face is aligned, the face images are grouped according to a preset proportion (for example, 1:9, or 2:8, or other proportions), part of face recognition training sets required by weight correction are formed, and part of face recognition verification sets are formed for verification.

The manner in which the liveness detection training set is established is discussed below. The living body detection training set comprises a data set formed by a plurality of face images and supervision information corresponding to the face images.

The data set for living body detection may include a live image (live video, photo, living body), and various attack images (mobile phone screen flip video, computer/tablet screen flip video, television screen flip video …, etc., belonging to a prosthesis). The prepared dataset may be converted to a plurality of face images (e.g., each 112 x 96 pixels in size) in positive face alignment by the face detection and alignment module 106. After the face is aligned, the face images are grouped according to a preset proportion, a living body detection training set required by weight correction is partially formed, and a living body detection verification set is partially formed for verification.

The supervision information may include depth information (e.g., depth data of the face feature points). The depth information of the prosthesis training data is 0 at all feature points. The in-vivo training data is converted into depth information by three-dimensional modeling (3D morphable model). The camera 102 produced by the artificial neural network trained by the three-dimensional modeling assistance is simply a monocular camera, so that the cost is remarkably reduced. The backbone network 108 and the living body detection subnetwork 112 have the ability to distinguish between the prosthesis and the living body from images captured by the monocular camera.

The supervision information may also be category information. The category information of the prosthesis training data is 0. The category information of the living body training data is 1. In another embodiment, the category information of the prosthesis training data is 1 and the category information of the living body training data is 0, and the present invention is not limited thereto.

In one embodiment, the biopsy subnetwork 112 includes a biopsy convolution layer and a biopsy classification layer coupled in sequence. After the training of the first network composed of the backbone network 108 and the face recognition sub-network 110 is completed, the backbone network 108 with determined weight and the living body detection convolution layer with undetermined weight are combined into a second network, and the training of the living body detection convolution layer is performed, so that the weight of the living body detection convolution layer reaches the primary state. The backbone network 108 with determined weights, the living detection convolution layer with weights up to the primary state, and the living detection classification layer with weights not yet determined are combined into a third network, and weight fine-tuning of the living detection convolution layer and training of the living detection classification layer are performed. The living body detection convolutional layer training based on the second network includes supervised training with depth information. The weight fine-tuning of the living detection convolutional layer based on the third network, and the living detection classifying layer training comprise supervised training with category information. Details are provided below in connection with figures 2 and 3.

Fig. 2 illustrates a training architecture of an artificial neural network of the processor 104, according to one embodiment of the present disclosure. The face recognition sub-network 110 comprises a (face recognition) convolution layer 110_1 and a (face recognition) classification layer 110_2, and a Loss function layer Loss is correspondingly designed in training ^FR . The living body detection sub-network 112 includes a (living body detection) convolution layer 112_1 and a (living body detection) classification layer 112_2, and two Loss function layers Loss are designed correspondingly in training ^Depth Low and Loss ^Spoof . The loss function layer compares the feedforward calculation result of the training data with the label (which can consider the supervision information) of the training data to obtain an error z for modifying the weight of each network by back propagation.

Fig. 3 is a flow chart illustrating a training process of the training architecture of fig. 2. The method is divided into three stages.

Stage S302, using the face recognition training set to identify the sub-face and the backbone network 108The network 110 (including the convolution layer 110_1 and the classification layer 110_2) is trained until the network converges. Specifically, stage S302 trains the backbone network 108, as well as the face recognition sub-network 110 (including convolutional layer 110_1 and classification layer 110_2), multiple rounds using the face recognition training set. In each round of training, the Loss function layer Loss is used for realizing the following ^FR The calculated error z modifies the weights of the backbone network 108, and the weights of the face recognition sub-network 110 (including the convolutional layer 110_1 and the classification layer 110_2) by back propagation. With the increase of training round number, the Loss function layer Loss ^FR The calculated error z becomes smaller. When the number of training rounds reaches a certain number, the error z is not smaller by increasing the number of training rounds, and the network is considered to be converged at the moment. Upon network convergence, training is stopped to obtain the weights of the backbone network 108, and the weights of the face recognition sub-network 110 (including the convolutional layer 110_1 and the classification layer 110_2).

Stage S304 uses the above-described living detection training set, and in particular selects the convolutional layer 112_1 in which the depth information is supervised and trained, via the backbone network 108 and the living detection sub-network 112, the Loss function layer Loss ^Depth Operates to train the convolutional layer 112_1 of the living detection sub-network 112 until the network converges. Specifically, stage 304 uses the above-described living detection training set, and in particular selects to use the convolutional layer 112_1 in which the depth information performs multiple rounds of supervised training on the backbone network 108 and the living detection subnetwork 112. During the training process, the weights of the backbone network 108 are fixed (i.e., the weights of the backbone network 108 obtained at stage S302 are used). In each round of training, the Loss function layer Loss is used for realizing the following ^Depth The calculated error z modifies the weight of the convolution layer 112_1 of the living body detection sub-network 112 by back propagation. With the increase of training round number, the Loss function layer Loss ^Depth The calculated error z becomes smaller. When the number of training rounds reaches a certain number, the error z is not smaller by increasing the number of training rounds, and the network is considered to be converged at the moment. Upon network convergence, training is stopped, thereby obtaining the weight of the convolution layer 112_1 of the in-vivo detection sub-network 112.

Stage S306 uses the above-described living body detection training set, and particularly selects the category information for supervision training. Via the backbone network 108 and the living body detection sub-network 112 (including the convolution layer 112_1 and the classification layer 112_2), the Loss function layer Loss ^Spoof Operates to train the convolutional layer 112_1 and the classification layer 112_2 of the living sub-network 112 until the network converges. Specifically, stage S306 uses the above-described living detection training set, and in particular, selects the category information to perform multiple rounds of supervised training on the backbone network 108 and the living detection subnetwork 112 (including the convolutional layer 112_1 and the classification layer 112_2). During the training process, the weight of the backbone network 108 is fixed (i.e., the weight of the backbone network 108 obtained in stage S302 is used), and the weight of the convolution layer 112_1 of the living body detection sub-network 112 obtained in stage S304 is used as the initial weight of the convolution layer 112_1 of the living body detection sub-network 112. In each round of training, the Loss function layer Loss is used for realizing the following ^Spoof The calculated error z modifies the weights of the convolution layer 112_1 and the classification layer 112_2 of the living body detection sub-network 112 by back propagation. A lower learning rate (e.g., much less than the learning rate set at stage S304) may be employed during the training of stage S306 than stage S304 to fine tune the weight of the convolutional layer 112_1 of the living sub-network 112 obtained at stage S304. With the increase of training round number, the Loss function layer Loss ^Spoof The calculated error z becomes smaller. When the number of training rounds reaches a certain number, the error z is not smaller by increasing the number of training rounds, and the network is considered to be converged at the moment. Upon network convergence, training is stopped, and the weight of the convolution layer 112_1 of the living body detection sub-network 112 (the weight after trimming) and the weight of the classification layer 112_2 of the living body detection sub-network 112 are obtained.

The training of the living body detection sub-network 112 by the stage S304 and the stage S306 is based on the backbone network 108 (i.e., the weight of the fixed backbone network 108) trained by the stage S302. At this point, the backbone network 108 has been able to accurately extract fine-grained features. The training of the living sub-network 112 by stage S304 and stage S306 is easily and quickly converged because the weight of the backbone network 108 does not need to be regenerated again.

In order, the artificial neural network training of the processor 104 is in three stages. In the first stage, the backbone network 108 and the face recognition sub-network 110 (including the convolution layer 110_1 and the classification layer 110_2) form a first network for training. In the second stage, the backbone network 108 with determined weights and the (living body detection) convolution layer 112_1 with not determined weights are combined into a second network, and the convolution layer 112_1 is supervised and trained with depth information, so that the weights of the convolution layer 112_1 reach a primary state. In the third stage, the backbone network 108 with determined weights, the convolution layer 112_1 with weights up to the primary state, and the classification layer 112_2 with weights not yet determined (in vivo detection) are combined into a third network, and the convolution layer 112_1 and the classification layer 112_2 are supervised and trained with class information to fine tune the weights of the convolution layer 112_1 and obtain the weights of the classification layer 112_2. The training design can converge rapidly, and the trained artificial neural network can operate accurately.

In one embodiment, the backbone network 108 comprises an M-layer network, the face recognition sub-network 110 comprises an N1-layer network, the (in-vivo detection) convolution layer 112_1 comprises an N2-layer network, and the (in-vivo detection) classification layer 112_2 comprises an N3-layer network. The first networks (108 and 110) include first through (M+N1) th layers, and training based on the first networks (108 and 110) includes modifying weights of the (M+N1) th layers to the first layers in a back propagation. The second network (108 and 112_1) includes the first layer through the (M+N2) th layer, training based on the second network (108 and 112_1) includes modifying the weights of the (M+N2) th layer through the (M+1) th layer in a back propagation, and the weights of the M th layer through the first layer in the back propagation remain unchanged. The third network (108 and 112_1 and 112_2) includes the first layer through the (M+N 2+N 3) th layer, training based on the third network (108 and 112_1 and 112_2) includes modifying weights of the (M+N 2+N 3) th layer through the (M+1) th layer in a back propagation, and the weights of the M th layer through the first layer in the back propagation remain unchanged.

The feedforward operations and back propagation involved in training are discussed below. For one training data, a feed forward operation results in an input x that is fed forward to each layer of the network ⁱ (x ⁱ Representing the input of the i-th layer and also the input of the i-1 th layerThe input of layer 1 is data in the training set), and the feedforward calculation result is compared with the label y (label y, i.e. supervision information, such as the aforementioned depth information and category information) by the Loss function (Loss), to obtain the final error z. Next, the error z is counter-propagated, updating the weight w of each layer from back to front ⁱ (w ⁱ Representing the weight of the i-th layer).

FIG. 4 is a flow chart illustrating the weight update of the ith layer of the artificial neural network. Step S402 calculates the first derivative of the ith layer, which is the final error z versus the weight w ⁱ According to the second derivative of layer i+1 (final error z vs. input x ⁱ⁺¹ The derivative of (c) is calculated. Step S404 calculates a second derivative of the i-th layer as the final error z versus the input x ⁱ The second derivative is also used in the calculation of the second derivative to the (i+1) th layer (final error z vs. input x ⁱ⁺¹ Derivatives of (c). Step S406 updates the weight w of the ith layer according to the first derivative of the ith layer ⁱ 。

In one embodiment, the operations involved in FIG. 4 include:

first derivative:

second derivative:

and (5) weight updating:

wherein x is ⁱ Input for the i-th layer; x is x ⁱ⁺¹ An input for layer i+1; w (w) ⁱ Is the weight of the i layer; z is the final error; vec () represents the conversion of tensors in brackets into vectors; t represents a transpose operation; η represents the step size of each random gradient descent and generally decreases with the increase of the training wheel number; and c, representing assignment operation.

Taking the artificial neural network training performed on the face recognition in step S302 as an example, in the first network (108 and 110), each layer (i= (m+n1) … 1) needs to perform the weight update in fig. 4.

Taking the training of the artificial neural network based on the depth information supervision for the living body detection in step S304 as an example, only the back-end layer (i= (m+n2) … (m+1)) in the second network (108 and 112_1) needs to perform the weight update in fig. 4.

Taking the training of the artificial neural network based on the category information supervision for the living body detection in step S306 as an example, only the back-end layer (i= (m+n2+n3) … (m+1)) in the third network (108 and 112_1 and 112_2) needs to perform the weight update of fig. 4.

In one embodiment, the training set does not just a single training round, but rather multiple training rounds until the network parameters converge (e.g., the final error z is less than a threshold).

It should be noted that, according to the actual needs, those skilled in the art may set the specific number of convolution layers and/or classification layers in the backbone network 108, the face recognition sub-network 110, and the in-vivo detection sub-network 112, the number of convolution kernels of each convolution layer, the size of the convolution kernels, and the size of the convolution step. The invention is not limited in this regard.

Although the invention has been described with respect to the preferred embodiments, it will be understood by those skilled in the art that various changes, modifications and alterations may be made without departing from the spirit and scope of the invention, and it is intended that the invention be limited only by the terms of the appended claims.

[ symbolic description ]

100-face recognition system;

102 to a camera;

104 to a processor;

106-a face detection and face alignment module;

108 to a backbone network;

110-face recognition sub-network;

112 to living body detection subnetwork;

114 to a living body judging module;

116 to a portrait library judging module;

118-prosthesis attack release module;

120-target face image library;

122-images;

124-face images;

126-feature vectors;

128 to living body probability;

110_1 to (face recognition) convolutional layers;

110_2 to (face recognition) classification layer;

112_1 to (in vivo detection) convolutional layer;

112_2 to (in vivo detection) classification layer;

Loss ^FR 、Loss ^Depth low and Loss ^Spoof -a loss function layer;

s302, … S306 to the training process; and

and S402, …, updating the weight of the ith layer of the artificial neural network.

Claims

1. A facial recognition system, comprising:

a processor having an artificial neural network configured to perform face recognition and living body detection on the face image,

wherein:

the artificial neural network comprises a main network, a face recognition sub-network and a living body detection sub-network, wherein the main network is then coupled with the face recognition sub-network and the living body detection sub-network in parallel;

the processor performs the face recognition by combining the main network with the face recognition sub-network; and is also provided with

The processor performs the above-described in-vivo detection in combination with the in-vivo detection sub-network through the backbone network,

the method is characterized in that:

the living body detection sub-network comprises a living body detection convolution layer and a living body detection classification layer which are sequentially coupled;

the face recognition sub-network and the living body detection sub-network each further comprise a loss function layer;

after the weight training of the main network is determined through back propagation according to the error calculated by the loss function layer by the first network formed by the main network and the face recognition sub-network, the main network with the determined weight and the living body detection convolution layer with the undetermined weight are combined into a second network, and the training of the living body detection convolution layer is carried out through back propagation according to the error calculated by the loss function layer, so that the weight of the living body detection convolution layer reaches a primary state;

the backbone network with the determined weight, the living body detection convolution layer with the weight reaching the primary state and the living body detection classification layer with the weight not determined are combined into a third network, the weight fine adjustment of the living body detection convolution layer and the training of the living body detection classification layer are carried out through back propagation according to the error calculated by the loss function layer,

wherein the Loss function layer comprises a Loss function layer Loss for face recognition of the first network ^FR And a Loss function layer Loss for in vivo detection of said second network ^Depth And a Loss function layer Loss for in vivo detection of the third network ^Spoof 。

2. The facial recognition system of claim 1, wherein:

the processor extracts fine-grained features of the face image through the backbone network, wherein the fine-grained features refer to features of distinguishing regional blocks in the face image.

3. The facial recognition system of claim 2, wherein:

the processor performs the face recognition according to the fine granularity characteristic through the face recognition sub-network, and performs the living body detection according to the fine granularity characteristic through the living body detection sub-network.

4. The facial recognition system of claim 1, wherein:

the backbone network comprises an M-layer network;

the face recognition sub-network comprises an N1 layer network;

the living body detection convolution layer comprises an N2 layer network;

the living body detection classification layer comprises an N3 layer network;

the first network includes first through (m+n1) th layers, training based on the first network includes modifying weights of the (m+n1) th layer to the first layer in a back propagation;

the second network includes the first layer through the (m+n2) -th layer, training based on the second network includes modifying weights of the (m+n2) -th layer through the (m+1) -th layer in a back propagation, and the weights of the M-th layer through the first layer remain unchanged in the back propagation; and is also provided with

The third network includes the first layer through the (m+n2+n3) th layer, training based on the third network includes modifying weights of the (m+n2+n3) th layer through the (m+1) th layer in a back propagation, and the weights of the M-th layer through the first layer in the back propagation remain unchanged.

5. The facial recognition system of claim 1, wherein:

the living body detection convolutional layer training based on the second network comprises supervised training with depth information; and is also provided with

The weight fine-tuning of the living detection convolutional layer based on the third network and the living detection classifying layer training comprise supervised training with category information.

6. The facial recognition system of claim 5, wherein:

the depth information of the prosthesis training data is 0 at all feature points, and the category information of the prosthesis training data is 0; and is also provided with

The depth information of the living body training data reflects the depth of each feature point, and the category information of the living body training data is 1.

7. The facial recognition system of claim 1, wherein:

the processor receives an image, performs face detection and face alignment processing on the image to generate the face image, wherein the image is captured by a monocular camera.

8. The facial recognition system of claim 1, further comprising:

a memory for storing a target face image library,

wherein:

the processor performs the living body detection to estimate living body probability, and performs the face recognition to estimate a feature vector; and is also provided with

And when the living body probability is greater than a threshold value, the processor judges that the face image is a living body, then queries the target face image library according to the feature vector, and identifies whether a matched target face image exists.

9. A method of face recognition, comprising:

performing face recognition and living body detection on a face image through an artificial neural network, wherein the artificial neural network comprises a main network, a face recognition sub-network and a living body detection sub-network, and the main network is then coupled with the face recognition sub-network and the living body detection sub-network in parallel;

the face recognition is carried out by combining the main network with the face recognition sub-network; and is also provided with

The above-mentioned in-vivo detection is performed in combination with the in-vivo detection sub-network through the backbone network,

the method is characterized in that:

10. The face recognition method of claim 9, further comprising:

and extracting fine granularity characteristics of the face image through the backbone network, wherein the fine granularity characteristics of the face image refer to characteristics of distinguishing regional blocks in the face image.

11. The face recognition method of claim 10, further comprising:

performing the face recognition according to the fine granularity characteristics through the face recognition sub-network; and is also provided with

The living body detection is performed according to the fine-grained characteristics through the living body detection subnetwork.

12. The face recognition method of claim 9, wherein:

the backbone network comprises an M-layer network;

the face recognition sub-network comprises an N1 layer network;

the living body detection convolution layer comprises an N2 layer network;

the living body detection classification layer comprises an N3 layer network;

13. The face recognition method of claim 9, wherein:

14. The face recognition method of claim 13, wherein:

15. The face recognition method of claim 9, further comprising:

receiving an image; and is also provided with

Performing face detection and face alignment processing on the image to generate the face image;

wherein the image is captured by a monocular camera.

16. The face recognition method of claim 9, further comprising:

storing a target face image library by a memory;

performing the living body detection to estimate living body probability, and performing the face recognition to estimate a feature vector; and is also provided with

And when the living body probability is greater than a threshold value, judging that the face image is a living body, inquiring the target face image library according to the feature vector, and identifying whether a matched target face image exists.