CN116206355A

CN116206355A - Face recognition model training, image registration and face recognition method and device

Info

Publication number: CN116206355A
Application number: CN202310450980.4A
Authority: CN
Inventors: 刘子鑫; 蒋冬梅; 王耀威; 池虹雨; 田永鸿
Original assignee: Peng Cheng Laboratory
Current assignee: Peng Cheng Laboratory
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-06-02

Abstract

The invention discloses a face recognition model training, image registration and face recognition method and device, wherein a face recognition model comprises a first feature extraction network and a second feature extraction network, and model training is carried out through a first training sample to obtain the first feature extraction network; the number of sample face images contained in the first training sample is larger than a first preset threshold value; obtaining a first initial face feature of a first training sample and a second initial face feature of a second training sample by using the first feature extraction network; the number of sample face images contained in the second training sample is smaller than a first preset threshold value; performing feature fusion on the second initial face features based on the first initial face features to obtain fused face features of the second training sample, and constructing feature triplets according to the first initial face features and the fused face features; and performing model training through the feature triplets to obtain a second feature extraction network, and realizing high-precision face recognition when training data is lacked or the training data is unevenly distributed.

Description

Face recognition model training, image registration and face recognition method and device

Technical Field

The invention relates to the technical field of computers, in particular to a face recognition model training, image registration and face recognition method and device.

Background

Face recognition is one of the hot subjects in recent years for research in fields such as pattern recognition, image processing, machine vision, neural networks, and cognitive sciences, and is widely used in various fields such as public security, security verification systems, information card verification, medicine, archive management, video conferencing, and man-machine interaction systems.

With the development of artificial intelligence in recent years, face recognition technology based on deep learning has become a mainstream way of face recognition. However. The existing face recognition technology based on deep learning is too dependent on the state of training samples, and the face recognition accuracy is greatly reduced under the condition that the training samples are fewer or the class distribution of the training samples is unbalanced.

Based on this, how to realize high-precision face recognition under the condition of lack of training samples or unbalanced distribution of training sample categories is a technical problem to be solved.

Disclosure of Invention

The invention mainly aims to provide a face recognition model training, image registration and face recognition method and device, and aims to solve the problem of low face recognition accuracy under the condition of lack of training samples or unbalanced distribution of training sample categories.

In order to achieve the above object, the present invention provides a face recognition model training method, where the face recognition model includes a first feature extraction network and a second feature extraction network, and the face recognition model training method includes:

training a first preset neural network through a plurality of first training samples to obtain a first feature extraction network; the number of the images of the sample face images contained in each first training sample is larger than a first preset threshold value;

extracting features by using the first feature extraction network to obtain first initial face features of each first training sample and second initial face features of each second training sample; the number of the images of the sample face images contained in each second training sample is smaller than the first preset threshold value;

based on the first initial face features, carrying out feature fusion on the second initial face features to obtain fused face features of each second training sample;

constructing a plurality of feature triples according to the first initial face features of the first training samples and the fused face features of the second training samples;

Training a second preset neural network through the feature triplets to obtain the second feature extraction network.

Optionally, before training the first preset neural network by the number of first training samples, the method further comprises:

acquiring a training sample set comprising a plurality of training samples; each training sample comprises at least one sample face image, and different training samples correspond to different individuals;

taking the training samples with the image quantity larger than the first preset threshold value as the first training samples;

and taking the training samples with the image quantity smaller than the first preset threshold value as the second training samples.

Optionally, the acquiring a training sample set including a plurality of training samples specifically includes:

acquiring a plurality of face images containing faces;

horizontally overturning each face image to obtain a face image after horizontal overturning;

screening the face image before horizontal overturning based on the face image after horizontal overturning to determine a sample face image;

and determining individuals corresponding to the sample face images, and taking the sample face images with the same individuals as one training sample.

Optionally, based on the face image after horizontal overturn, the face image before horizontal overturn is screened, and a sample face image is determined, which specifically includes:

respectively carrying out key point detection on the face image before horizontal overturning and the face image after horizontal overturning to obtain corresponding key point coordinates;

calculating the distance between each key point coordinate in the face image before horizontal overturning and the corresponding key point coordinate in the face image after horizontal overturning;

taking the sum value of the distances corresponding to each face image as an image difference value;

deleting the face image with the image difference value larger than a second preset threshold value;

and taking the face image with the image difference value smaller than or equal to the second preset threshold value as a sample face image.

Optionally, the feature fusion is performed on the second initial face features based on the first initial face features to obtain fused face features of each second training sample, which specifically includes:

calculating the average face characteristic of each first training sample according to the first initial face characteristic;

constructing a corresponding feature matrix based on the average face features of each first training sample;

And carrying out feature fusion on the second initial face features according to the feature matrix and the first initial face features to obtain fused face features of each second training sample.

Optionally, the feature fusion is performed on the second initial face features according to the feature matrix and the first initial face features to obtain fused face features of each second training sample, which specifically includes:

selecting a first training sample as a training sample to be fused of the second training samples aiming at each second training sample;

selecting one first initial face feature from the training samples to be fused corresponding to the second training samples as the face feature to be fused of each second initial face feature in the second training samples;

determining the fusion face characteristics of each second initial face characteristic in the second training sample according to the average face characteristics of the second training sample, the average face characteristics of the training samples to be fused corresponding to the second training sample and the face characteristics to be fused;

The average face characteristic of the second training sample is an average value of the second initial face characteristic of the second training sample; the average face characteristics of the training samples to be fused are the average value of the face characteristics to be fused of the training samples to be fused.

Optionally, the constructing a corresponding feature matrix based on the average face feature of each first training sample specifically includes:

constructing a corresponding covariance matrix according to the average face characteristics of each first training sample;

and performing dimension reduction on the covariance matrix through a preset dimension reduction algorithm to obtain the feature matrix.

Optionally, the constructing a plurality of feature triples according to the first initial face feature of each first training sample and the fused face feature of each second training sample specifically includes:

taking all the first initial face features of the first training samples and all the fused face features of the second training samples as target training features;

selecting one target training feature as an anchor point at will in sequence, selecting one other target training feature which is the same individual as the target training feature as a positive example of the anchor point, and selecting one other target training feature which is different individual from the target training feature as a negative example of the anchor point;

The absolute value of the distance between the positive example and the negative example of the anchor point is smaller than a third preset threshold value;

and taking the anchor point, the positive instance of the anchor point and the negative instance of the anchor point as the characteristic triples.

Optionally, training a second preset neural network through the feature triplets to obtain the second feature extraction network, which specifically includes:

and taking the ternary loss function as a target loss function of the second preset neural network, and inputting the ternary loss function into the second preset neural network for training according to the characteristic triplets so as to obtain the trained second characteristic extraction network.

In order to achieve the above object, the present invention further provides a face image registration method, including:

acquiring a face image to be registered of a target user;

extracting features of the face image to be registered through a first feature extraction network of a face recognition model to obtain initial face features of the face image to be registered;

extracting features of the initial face features of the face image to be registered through a second feature extraction network of the face recognition model to obtain target face features of the target user;

Taking the target face characteristics of the target user as a face characteristic template of the target user;

the face recognition model is obtained through the face recognition model training method according to any one of the above.

In order to achieve the above object, the present invention further provides a face recognition method, including:

acquiring a face image to be recognized of a target user;

extracting features of the face image to be recognized through a first feature extraction network of a face recognition model to obtain initial face features of the face image to be recognized;

extracting features of the initial face features of the face image to be recognized through a second feature extraction network of the face recognition model to obtain target face features of the face image to be recognized;

comparing the target face features of the face image to be identified with a pre-stored face feature template to determine whether the matching is successful or not;

In order to achieve the above object, the present invention also provides a computer-readable storage medium storing one or more programs executable by one or more processors to implement steps in the face recognition model training method as set forth in any one of the above, or steps in the face image registration method as set forth above, or steps in the face recognition method as set forth above.

In order to achieve the above object, the present invention also provides a terminal, including: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, implements the steps in the face recognition model training method as described in any one of the above, or the steps in the face image registration method as described above, or the steps in the face recognition method as described above.

According to the invention, the first feature extraction network obtained through training of the first training sample can extract more abundant face feature information in the face image, and then the transfer learning of the second training sample is realized through the first training sample, so that semantic information contained in small sample features (namely, the face features of the second training sample) is enriched, the deviation of a face recognition model caused by unbalance of training data is reduced, and then the second feature extraction network is obtained through the second training sample after transfer learning, so that the obtained face recognition model containing the first feature extraction network and the second feature extraction network can realize high-precision and high-efficiency face recognition under the condition of lack of unbalance of training data and training data distribution.

Drawings

FIG. 1 is a schematic diagram of an implementation scenario provided in an embodiment of the present invention;

fig. 2 is a flowchart of a face recognition model training method according to an embodiment of the present invention;

fig. 3 is a flowchart of step S201 provided in an embodiment of the present invention;

fig. 4 is a schematic flow chart of a face recognition model training method according to an embodiment of the present invention;

fig. 5 is a flowchart of step S205 provided in an embodiment of the present invention;

fig. 6 is a flowchart of step S503 provided in the embodiment of the present invention;

fig. 7 is a flowchart of step S206 provided in an embodiment of the present invention;

fig. 8 is a flowchart of a face image registration method according to an embodiment of the present invention;

fig. 9 is a flowchart of a face recognition method according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

In order to ensure the accuracy of the face recognition technology based on deep learning, a large amount of training sample data is needed to be used as a support, and training sample data with balanced category distribution is needed.

In order to ensure the quantity and class distribution of the training sample data, a great deal of manpower and material resources are consumed. Most of the presently disclosed face datasets such as WiderFace, megaFace, LFW, MS MV2, etc.) have serious problems of unbalanced category distribution, i.e. serious long tail effect. For example, in the above data set, the data of the front face view angle is about 52%, the data of the half face view angle is about 16%, and the data of the oblique side face view angle is about 32% by dividing according to the angles of the faces collected in the image. This serious problem of unbalanced class distribution results in that the face recognition model obtained based on the current face data set is a biased estimation model, thereby resulting in low accuracy of face recognition.

Based on the above, the invention provides a face recognition model training, image registration and face recognition method and device, and the face recognition with high accuracy can be still realized under the condition of lack of training samples or unbalanced distribution of the training samples.

Fig. 1 is a schematic diagram of an implementation scenario provided by the present invention. This implementation scenario involves face recognition. As shown in fig. 1, when face recognition is performed, a face image to be recognized can be input into a face recognition model to obtain target face features, the target face features are matched with a preset face feature template, the probability that the two belong to the same individual is determined through similarity scores between the two, and then the user identity corresponding to the face image to be recognized is judged. It will be appreciated that prior to face recognition, a process of training a face recognition model, and a process of face image registration, are also required. The face image registration is to convert the face image with known user identity into compact discriminable feature vector, which is the face feature template, through a trained face recognition model.

Generally, the face recognition model training, the face image registration and the face recognition process all involve interaction between the terminal equipment and the server. Taking face recognition model training as an example, the terminal equipment collects face images through image collection equipment such as a camera and the like, then the collected face images are transmitted to a server, and the server carries out model training through the face images so as to obtain a face recognition model.

In the embodiment of the invention, the face recognition model is deployed in a server and comprises a first feature extraction network and a second feature extraction network. The embodiment of the invention respectively carries out detailed explanation from three different stages of face recognition model training, face image registration and face recognition, and can realize high-precision face recognition under the condition of lack of training samples or unbalanced distribution of the types of the training samples in each stage.

Fig. 2 is a flowchart of a face recognition model training method according to an embodiment of the present invention, and as shown in fig. 2, the face recognition model training method according to the present invention at least includes the following steps:

s201, a training sample set containing a plurality of training samples is obtained.

Wherein each training sample includes at least one sample face image, and different training samples correspond to different individuals. All sample face images in a training sample correspond to an individual. In the embodiment of the invention, one individual can be taken as one class, and then how many training samples exist in the training sample set and how many classes exist in the training sample set.

For example, the individual corresponding to training sample a is "Zhang san" and the individual corresponding to training sample B is "Lisi"; the training samples A are face images of individuals 'Zhang San' and the training samples B are face images of individuals 'Liqu'.

It will be appreciated that the number of images of the sample face images contained in different training samples may be different. For example, training sample a contains 10 sample face images and training sample B contains 50 sample face images.

Further, as shown in fig. 3, the obtaining of the training sample set including the plurality of training samples in step S201 may be at least implemented by:

s301, a plurality of face images containing faces are acquired.

Face images including faces may be obtained from presently disclosed face data sets. For example: widerFace, megaFace, LFW, MS1MV2 et al face dataset.

S302, horizontally overturning each face image to obtain a horizontally overturned face image.

And S303, respectively carrying out key point detection on the face image before horizontal overturning and the face image after horizontal overturning to obtain corresponding key point coordinates.

In some embodiments of the present invention, key point detection may be performed on a face image before horizontal overturn and a face image after horizontal overturn by presetting a face key point detection model, so as to obtain corresponding key point coordinates.

The key points of the face image at least comprise: left eye, right eye, left mouth corner, right mouth corner and nose.

S304, calculating the distance between each key point coordinate of the face image before horizontal overturning and the corresponding key point coordinate of the face image after horizontal overturning.

Taking a key point as a left eye as an example, calculating the distance between L1 and L2, wherein the left eye coordinate of the face image A1 before horizontal overturning is L1, and the left eye coordinate of the face image A2 after horizontal overturning is L2. The face image A2 is obtained by horizontally overturning the face image A1.

S305, calculating the sum value of the distances corresponding to each face image as an image difference value.

In the case that the key points of the face image include five key points of left eye, right eye, left mouth angle, right mouth angle and nose, the distances corresponding to the key points are added and summed, and the obtained sum value is used as the image difference value.

S306, deleting the face image with the image difference value larger than a second preset threshold.

The fact that the image difference value is larger than the second preset threshold value indicates that the difference between the face image before horizontal overturning and the face image after horizontal overturning is larger, the face image is not suitable for being used as a sample face image, and the face recognition accuracy can be affected. Therefore, the face images before horizontal overturning are screened through the image difference value and the second preset threshold value, so that the recognition accuracy of the face recognition model is ensured.

It can be appreciated that the second preset threshold may be adjusted according to practical situations, which is not specifically limited in the embodiments of the present invention.

S307, taking the face image with the image difference value smaller than or equal to a second preset threshold value as the sample face feature.

Through the steps S303-S307, the face images before horizontal overturning are screened based on the face images after horizontal overturning, and the sample face images are determined.

S308, determining individuals corresponding to the sample face images, and taking the sample face images of the same individuals as a training sample.

In the embodiment of the invention, the acquired face image is horizontally turned over, and the face image after horizontal turning over is screened according to the image difference value of the face image before horizontal turning over and the face image after horizontal turning over so as to remove the face image with larger difference, namely, the image with low acquisition quality of the removed part, thereby further improving the accuracy of the face recognition model.

S202, taking training samples with the number of images larger than a first preset threshold value as first training samples; and taking the training samples with the image quantity smaller than the first preset threshold value as second training samples.

In the embodiment of the invention, training samples with different image numbers can represent different feature richness of the individual. The training samples are described as more finely characterizing the features of the individual when the number of images of the training samples is large (i.e., greater than a first preset threshold), whereas the training samples are not more finely characterizing the features of the individual when the number of images of the training samples is small (i.e., less than the first preset threshold).

The first preset threshold may be adjusted according to an actual scene, and is not specifically limited in the embodiment of the present invention.

It can be understood that the training samples with the number of images equal to the first preset threshold may be used as the first training samples or the second training samples, which is not specifically limited in the embodiment of the present invention.

And S203, training the first preset neural network through a plurality of first training samples to obtain a first feature extraction network.

In one embodiment of the present invention, as shown in fig. 4, the first preset neural network includes two parts, an encoder and a decoder, and the dimension of the input vector of the encoder is the same as the dimension of the output vector of the decoder. In an embodiment of the present invention, the first feature extraction network is an encoder in a trained first preset neural network.

In one embodiment of the invention, the encoder may be composed of four bottleneck modules, one flattening layer and one fully connected layer, where each bottleneck module is composed of an unequal number of convolution operations, batch normalization and correction linear elements. The decoder is the inverse operation of the encoder and consists of a full connection layer, a reconstruction layer and four upsampling modules. Wherein each up-sampling module consists of a different number of transpose convolution operations, batch normalization and correction units, as shown in fig. 4.

Additionally, in one embodiment of the invention, the first predetermined neural network may be a self-encoder or a transducer model.

It should be noted that, in the embodiment of the present invention, the specific network structure of the first feature extraction network is not specifically limited, and the above embodiment is only for easy understanding.

Further, training the first preset neural network through a plurality of first training samples, and taking the L2 regression function as a target loss function of the first preset neural network to continuously adjust and correct network parameters of the first preset neural network through the target loss function until a trained first feature extraction network is obtained.

The target loss function of the first preset neural network is as follows:

；

wherein, the liquid crystal display device comprises a liquid crystal display device,

an i first training sample representing an input, < +.>

Representing the output of the first preset neural network,

is the number of samples of the first training sample.

Further, the objective loss function value of the first preset neural network may be minimized by a random gradient descent method until convergence, thereby obtaining a trained first feature extraction network. The convergence speed can be increased by a random gradient descent method, and the training speed of the first feature extraction network is increased.

It can be understood that, in the embodiment of the present invention, the target loss function of the first preset neural network is not specifically limited, and other algorithms besides the random gradient descent method may be used to accelerate the convergence speed, which is only for easy understanding.

In the embodiment of the invention, the first preset neural network is trained by using the first training sample, so that the feature with finer granularity of the face image can be extracted by the first feature extraction model obtained by training, the accuracy of the face recognition model is further improved, and the face recognition with high precision can be realized under the condition of lack of the sample or unbalanced sample distribution.

S204, performing feature extraction by using the first feature extraction network to obtain first initial face features of each first training sample and second initial face features of each second training sample.

Wherein, the first training sample and the second training sample both need to utilize the first feature extraction network to perform feature extraction. Taking the first feature extraction model for feature extraction of the first training sample as an example, the following description will be made:

the face features of the face images of the samples are extracted through the first feature extraction network, namely the first initial face features.

That is, the number of first initial face features of the first training sample corresponds to the number of sample face images that it contains. For example, the first training sample a includes 25 sample face images, and each sample face image can obtain one first initial face feature, that is, the first training sample a has 25 first initial face features.

It can be appreciated that the feature extraction of the second training sample by using the first feature extraction model is the same as the feature extraction of the first training sample, and will not be described herein.

In the embodiment of the invention, the feature extraction is performed through the first feature extraction network, so that the fine-granularity face features can be obtained, and the accuracy of face recognition is further improved.

For ease of illustration, all first training sample constituent sets may be considered large sample data sets and all second training sample constituent sets may be considered small sample data sets. As shown in fig. 4, the large sample data set and the small sample data set can be obtained by preprocessing in steps S301 to S308, and the large sample data set is input into the reconstruction network (i.e., the first preset neural network) for training, so as to obtain a trained first feature extraction network.

The first initial face feature set of the large sample data set can be obtained through the step S204

Second initial face feature set of small sample dataset +.>

：/>

；

first initial face feature representing a kth sample face image of an ith first training sample in the large sample dataset, +.>

Sample number of first training sample for large sample dataset, +.>

Representing the number of sample face images of the ith first training sample in the large sample dataset.

；

second initial face feature representing the first sample face image of the j-th first training sample in the small sample dataset, +. >

Sample number of the second training sample, which is a small sample dataset, +.>

Representing the number of sample face images of the jth second training sample in the small sample dataset.

As shown in FIG. 4, the large sample data set and the small sample data set are input into a trained first feature extraction network to obtain a corresponding first initial face feature set

Second initial face feature set +.>

。

It can be understood that the face features in the embodiments of the present invention may be represented in the form of feature vectors.

S205, based on the first initial face features, feature fusion is carried out on the second initial face features of the second training samples, and fused face features of the second training samples are obtained.

Specifically, as shown in fig. 5, step S205 may include at least the following steps:

s501, calculating the average face characteristic of each first training sample according to the first initial face characteristics.

The average face feature of the first training sample refers to an average value of all first initial face features of the first training sample.

S502, constructing a corresponding feature matrix based on the average face features of each first training sample.

Further, a corresponding covariance matrix may be constructed according to the average face feature of each first training sample

The method comprises the steps of carrying out a first treatment on the surface of the Then, dimension reduction is carried out on the covariance matrix through a preset dimension reduction algorithm to obtain the feature matrix ++>

。

Specifically, covariance matrix

The method comprises the following steps:

；

first initial face feature of the kth sample face image being the ith first training sample,/for example>

The average face feature of the ith first training sample.

In the embodiment of the present invention, the covariance matrix may be directly used as the feature matrix. But the covariance matrix is obtained by a preset dimension reduction method

The matrix after dimension reduction is used as a characteristic matrix +.>

The robustness of the face recognition model can be further enhanced, and the computing resources can be saved.

The preset dimension-reducing algorithm may be, for example, a dimension-reducing algorithm of a principal component analysis algorithm, which is not specifically limited in the embodiment of the present invention.

And S503, carrying out feature fusion on the second initial face features according to the feature matrix and the first initial face features to obtain fused face features of each second training sample.

Specifically, as shown in fig. 6, step S503 may include at least the following steps:

s601, selecting a first training sample as a training sample to be fused of each second training sample.

In the embodiment of the invention, a first training sample is allocated to each second training sample and is used as a training sample to be fused of the second training samples. It should be noted that the training samples to be fused are different from each other.

For example, the second training samples B1, B2, B3, the first training samples A1, A2, A3, A4, the second training sample B1 is assigned A1 as its training sample to be fused, the second training sample B2 is assigned A3 as its training sample to be fused, and the first training sample B3 is assigned A4 as its training sample to be fused.

S602, selecting a first initial face feature from the training samples to be fused corresponding to the second training samples as the face feature to be fused of the second initial face feature aiming at each second initial face feature in the second training samples.

Because the second training sample comprises a plurality of sample face images of the same individual, each sample face image corresponds to one second initial face feature, the second training sample has a plurality of second initial face features.

In the embodiment of the invention, a first initial face feature in a training sample to be fused is allocated to each second initial face feature in a second training sample and is used as the face feature to be fused of the second initial face feature. The face features to be fused of different second initial face features in the second training sample are different.

For example, the second training samples B1 include second initial face features K1, K2, K3, and the training samples to be fused of B1 are A1, and A1 includes first initial face features F1, F2, and F3, and assigns F1 to K1 as its face feature to be fused, assigns F2 to K2 as its face feature to be fused, and assigns F3 to K3 as its face feature to be fused.

It may be appreciated that, since one first initial face feature needs to be allocated to each second initial face feature in the second training samples, and the first initial face features that need to be allocated to each second initial face feature are different, in step S501, the number of first initial face features of each first training sample needs to be determined first, so that the number of first initial face features of the training samples to be fused, which are selected for the second training samples, is greater than or equal to the number of fused face features of the second training samples.

S603, generating fusion face features of each second initial face feature in the second training according to the average face features of the second training samples, the average face features of the training samples to be fused corresponding to the second training samples and the face features to be fused.

Specifically, the fused face features may be generated by the following formula

：

；

for the first fused face feature in the jth second training sample,/for example>

For the average face feature of the jth second training sample,/for the j-th second training sample>

For the kth first initial face feature of the training sample to be fused,/for>

The average face characteristics of the training samples to be fused are obtained.

The average face feature of the second training sample refers to an average value of all second initial face features of the second training sample. The average face characteristics of the training sample to be fused refer to the average value of all the first initial face characteristics of the training sample to be fused.

In the embodiment of the invention, the first initial face features are utilized to perform feature fusion on the second initial face features, so that transfer learning (shown in fig. 4) is realized, semantic information contained in the second training sample is enriched, the problem of face recognition model deviation caused by unbalance of training data and lack of training data is relieved to a great extent, and the face recognition precision is further improved.

For convenience of explanation, the fused face features of the second initial face feature may be combined into a set as a fused small sample face feature set

：/>

。

S206, constructing a plurality of feature triples according to the first initial face features of each first training sample and the fused face features of each second training sample.

Specifically, as shown in fig. 7, step S206 may be implemented at least by:

s701, taking each first initial face feature of the first training sample and each fused face feature of the second training sample as target training features.

In the embodiment of the invention, the set of all target training features can be used as the total feature set

：

。

S702, selecting one target training feature as an anchor point, selecting one other target training feature which is the same individual as the anchor point as a positive example of the anchor point, and selecting one other target training feature which is different individual from the anchor point as a negative example of the anchor point.

The difference value between the distance between the anchor point and the positive example thereof and the distance between the anchor point and the negative example thereof is smaller than a third preset threshold, which is specifically shown as follows:

，

is anchor (or->

For the positive example of the anchor point, +.>

For the negative example of the anchor point, +.>

For a third preset threshold value,

is a normalization function.

S703, using the anchor point, the positive instance of the anchor point and the negative instance of the anchor point as feature triples.

Wherein the feature triplet is

。

S207, training the second preset neural network through the feature triplets to obtain a second feature extraction network.

As shown in fig. 4, the second preset neural network may include: two full link layers, two batch normalization layers, and two correction linear units.

It can be understood that, in the embodiment of the present invention, the network structure of the second preset neural network model may be adaptively adjusted, and is not limited to the structure shown in fig. 4, and is not specifically limited in the embodiment of the present invention. Specifically, the ternary loss function is used as a target loss function of a second preset neural network, so that the second preset neural network is trained according to a plurality of feature triplets, and a trained second feature extraction network is obtained.

Wherein the target loss function of the second preset neural network

The method comprises the following steps:

；

for the second preset neural network, +.>

Is anchor (or->

For the positive example of the anchor point, +.>

For the negative example of the anchor point, +.>

For the number of feature triples>

A third preset threshold.

It can be appreciated that other loss functions may be used as the target loss function of the second preset neural network according to the embodiment of the present invention, which is not specifically limited in the present invention.

In the embodiment of the present invention, a random gradient descent algorithm may also be used to minimize the objective loss function value of the second preset neural network until convergence is completed to obtain a trained second feature extraction network.

It can be understood that, in the embodiment of the present invention, other algorithms besides the random gradient descent method may be used to accelerate the convergence speed, which is not specifically limited in the present invention.

According to the face recognition model training method provided by the invention, the first feature extraction network obtained through training of the first training sample can extract more abundant face feature information in the face image, and then the transfer learning of the second training sample is realized through the first training sample, so that semantic information contained in small sample features (namely the face features of the second training sample) is enriched, the deviation of the face recognition model caused by unbalance of training data is reduced, and then the second feature extraction network is obtained through the second training sample after transfer learning, so that the obtained face recognition model comprising the first feature extraction network and the second feature extraction network can realize high-precision and high-efficiency face recognition under the condition of lack of unbalance of training data and training data distribution.

Based on the above-mentioned face image recognition model training method, the invention also provides a face image registration method, as shown in fig. 8, which at least comprises the following steps:

s801, a face image to be registered of a target user is acquired.

In the embodiment of the invention, the face image to be registered of the target user can be acquired through the terminal equipment.

S802, extracting features of the face image to be registered through a first feature extraction network of the face recognition model to obtain initial face features of the face image to be registered.

The face recognition model is obtained through the face recognition model training method provided by the embodiment.

S803, extracting features of the initial face features of the face image to be registered through a second feature extraction network of the face recognition model to obtain target face features of the target user.

S804, taking the target face features of the target user as a face feature template of the target user.

In some embodiments of the present invention, the face feature template of the target user may be stored, so as to perform face recognition.

Based on the above-mentioned face recognition model training method, the invention also provides a face recognition method, as shown in fig. 9, which at least comprises the following steps:

S901, acquiring a face image to be recognized of a target user.

In the embodiment of the invention, the face image to be recognized of the target user, such as a community access control, can be acquired through the terminal equipment.

S902, extracting features of the face image to be identified through a first feature extraction network of the face identification model to obtain initial face features of the face image to be identified.

S903, extracting features of the initial face features of the face image to be identified through a second feature extraction network of the face identification model to obtain target face features of the face image to be identified.

S904, comparing the target face characteristics of the face image to be recognized with a pre-stored face characteristic template to determine whether the matching is successful.

In the embodiment of the invention, the identity of the target user can be confirmed by comparing the target face characteristics of the face image to be recognized, which is obtained based on the face recognition model, with the pre-stored face characteristic template.

Based on the face recognition model training method, the face image registration method and the face recognition method, the invention also provides a computer readable storage medium, wherein one or more programs are stored in the computer readable storage medium, and can be executed by one or more processors to realize the steps in the face recognition model training method, the steps in the face image registration method or the steps in the face recognition method.

Based on the above face recognition model training method, face image registration method and face recognition method, the invention also provides a terminal, as shown in fig. 10, which includes at least one processor (processor) 100; a display screen 110; and a memory 120, which may also include a communication interface (Communications Interface) 130 and a bus 140. Wherein the processor 100, the display 110, the memory 120, and the communication interface 130 may communicate with each other via the bus 140. The display screen 110 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 130 may transmit information. The processor 100 may invoke logic instructions in the memory 120 to perform steps in the face recognition model training method, or steps in the face image registration method, or steps in the face recognition method described in the above embodiments.

Further, the logic instructions in the memory 120 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product.

The memory 120, as a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 100 performs functional applications and data processing, i.e., implements the methods of the embodiments described above, by running software programs, instructions or modules stored in the memory 120.

Memory 120 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the terminal, etc. In addition, the memory 120 may include a high-speed random access memory, and may also include a nonvolatile memory. For example, a plurality of media capable of storing program codes such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or a transitory storage medium may be used.

All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for terminal and media embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the partial description of method embodiments being relevant.

The terminal, the medium and the method provided in the embodiment of the present application are in one-to-one correspondence, so that the terminal and the medium also have similar beneficial technical effects to the corresponding methods, and since the beneficial technical effects of the method have been described in detail above, the beneficial technical effects of the terminal and the medium are not described in detail here.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Of course, those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by a computer program for instructing relevant hardware (e.g., processor, controller, etc.), the program may be stored on a computer readable storage medium, and the program may include the above described methods when executed. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. The human face recognition model training method is characterized in that the human face recognition model comprises a first characteristic extraction network and a second characteristic extraction network, and comprises the following steps:

training a second preset neural network through the feature triplets to obtain the second feature extraction network so as to obtain the face recognition model for face recognition.

2. The face recognition model training method of claim 1, wherein prior to training the first preset neural network with the first training samples, the method further comprises:

3. The face recognition model training method according to claim 2, wherein the obtaining a training sample set including a plurality of training samples specifically includes:

acquiring a plurality of face images containing faces;

4. A face recognition model training method according to claim 3, wherein the step of screening the face image before the horizontal overturn based on the face image after the horizontal overturn to determine a sample face image specifically comprises:

5. The face recognition model training method according to claim 1, wherein the feature fusion is performed on the second initial face features based on the first initial face features to obtain fused face features of each second training sample, and the method specifically includes:

6. The face recognition model training method of claim 5, wherein the feature fusion is performed on the second initial face features according to the feature matrix and the first initial face features to obtain fused face features of each second training sample, and the method specifically comprises:

7. The face recognition model training method of claim 6, wherein the constructing a corresponding feature matrix based on the average face features of each of the first training samples specifically comprises:

8. The face recognition model training method of claim 1, wherein the constructing a plurality of feature triples according to the first initial face feature of each first training sample and the fused face feature of each second training sample specifically includes:

9. The face recognition model training method according to claim 1, wherein the training the second preset neural network through the feature triplets to obtain the second feature extraction network specifically includes:

and taking the ternary loss function as a target loss function of the second preset neural network, so as to train the second preset neural network according to the feature triplets, and obtain the trained second feature extraction network.

10. A face image registration method, characterized in that the face image registration method comprises:

acquiring a face image to be registered of a target user;

the face recognition model is obtained by the face recognition model training method according to any one of claims 1-9.

11. A face recognition method, characterized in that the face recognition method comprises:

acquiring a face image to be recognized of a target user;

12. A computer-readable storage medium storing one or more programs executable by one or more processors to implement steps in a face recognition model training method as claimed in any one of claims 1-9, or steps in a face image registration method as claimed in claim 10, or steps in a face recognition method as claimed in claim 11.

13. A terminal, comprising: a processor and a memory; the memory has stored thereon a computer readable program executable by the processor; the processor, when executing the computer readable program, implements the steps of the face recognition model training method according to any one of claims 1-9, or the steps of the face image registration method according to claim 10, or the steps of the face recognition method according to claim 11.