CN113657248A

CN113657248A - Training method and device for face recognition model and computer program product

Info

Publication number: CN113657248A
Application number: CN202110932551.1A
Authority: CN
Inventors: 彭楠; 李弼; 希滕; 张刚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-11-16

Abstract

The present disclosure provides a training method of a face recognition model, a related apparatus and a computer program product, in particular to a computer vision and deep learning technology, which can be used in a face recognition scene. The specific implementation scheme is as follows: acquiring a training sample set, wherein training samples in the training sample set comprise sample face images and identity labels and non-identity labels corresponding to face objects in the sample face images; by using a machine learning method, taking a sample face image as input, extracting the identity characteristics and non-identity characteristics of the input sample face image through an initial face recognition model, sequentially using the identity label and the non-identity label corresponding to the input sample face image as a first output result obtained according to the extracted identity characteristics and an expected output of a second output result obtained according to the extracted non-identity characteristics, updating the initial image recognition model, and training to obtain the face recognition model. The face recognition model and the face recognition method improve the recognition accuracy of the face recognition model.

Description

Training method and device for face recognition model and computer program product

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to computer vision and deep learning technologies, and more particularly, to a method and an apparatus for training a face recognition model, an electronic device, a storage medium, and a computer program product, which can be used in a face recognition scenario.

Background

In recent years, with the development of deep learning techniques, the accuracy of face recognition based on the deep learning techniques is greatly improved. The face recognition has important application in many scenes, such as witness comparison in transfer scenes of airports, railway stations and the like, face refreshing of access control in private area management scenes, real person verification in financial scenes and the like. In the rich landing scenes, complex conditions such as complex light, large-angle human faces, human face shielding, image blurring and the like exist, and the human face features extracted through the deep learning model couple identity information and non-identity information, so that the bottleneck of further improving the human face recognition accuracy is realized.

Disclosure of Invention

The present disclosure provides a training method, an apparatus, an electronic device, a storage medium, and a computer program product for a face recognition model.

According to a first aspect, there is provided a training method for a face recognition model, comprising: acquiring a training sample set, wherein training samples in the training sample set comprise sample face images, identity labels corresponding to identity information of face objects in the sample face images and non-identity labels corresponding to non-identity information, and the non-identity labels of the training samples are obtained based on shallow feature clustering of the sample face images in the training sample set; by using a machine learning method, taking a sample face image as input, extracting the identity characteristics and non-identity characteristics of the input sample face image through an initial face recognition model, sequentially using the identity label and the non-identity label corresponding to the input sample face image as a first output result obtained according to the extracted identity characteristics and an expected output of a second output result obtained according to the extracted non-identity characteristics, updating the initial image recognition model, and training to obtain the face recognition model.

According to a second aspect, there is provided a face recognition method comprising: acquiring an image to be identified; and identifying the image to be identified through the pre-trained face identification model to obtain a face identification result, wherein the face identification model is obtained through training in any one implementation mode of the first aspect.

According to a third aspect, there is provided a training apparatus for a face recognition model, comprising: the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is configured to acquire a training sample set, training samples in the training sample set comprise sample face images, identity labels corresponding to identity information of face objects in the sample face images and non-identity labels corresponding to non-identity information, and the non-identity labels of the training samples are obtained based on shallow feature clustering of the sample face images in the training sample set; and the training unit is configured to extract the identity characteristics and the non-identity characteristics of the input sample face image through the initial face recognition model by using a machine learning method and taking the sample face image as input, and updating the initial image recognition model by taking the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity characteristics and a second output result obtained according to the extracted non-identity characteristics in sequence, so as to train the face recognition model.

According to a fourth aspect, there is provided a face recognition apparatus comprising: a second acquisition unit configured to acquire an image to be recognized; and the recognition unit is configured to recognize the image to be recognized through a pre-trained face recognition model to obtain a face recognition result, wherein the face recognition model is obtained through training in any one of the implementation manners of the first aspect.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect and the second aspect.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first and second aspects.

According to a seventh aspect, there is provided a computer program product comprising: a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect and the second aspect.

According to the technology disclosed by the invention, the non-identity labels of the training samples in the training sample set are obtained based on the feature clustering in an unsupervised mode, so that the determination efficiency of the non-identity labels is improved, and the resource consumption of a user for obtaining the labels is saved; in the training process of the face recognition model, the identity information and the non-identity information of the face object in the sample face image are decoupled, and the recognition accuracy of the trained face recognition model is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment according to the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of training a face recognition model according to the present disclosure;

fig. 3 is a schematic diagram of an application scenario of the training method of the face recognition model according to the embodiment;

FIG. 4 is a flow diagram of yet another embodiment of a training method of a face recognition model according to the present disclosure;

FIG. 5 is a schematic structural diagram of one embodiment of a face recognition method according to the present disclosure;

FIG. 6 is a block diagram of one embodiment of a training apparatus for a face recognition model according to the present disclosure;

FIG. 7 is a block diagram of one embodiment of a face recognition device according to the present disclosure;

FIG. 8 is a schematic block diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 1 illustrates an exemplary architecture 100 to which the face recognition model training method and apparatus, the face recognition method and apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The communication connections between the

terminal devices

101, 102, 103 form a topological network, and the network 104 serves to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 may be hardware devices or software that support network connections for data interaction and data processing. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices supporting network connection, information acquisition, interaction, display, processing, and the like, including but not limited to a monitoring device, a smart phone, a tablet computer, an e-book reader, a laptop portable computer, a desktop computer, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server that receives training requests sent by users through the

terminal devices

101, 102, and 103 and trains a face recognition model by using a machine learning method. Before training, obtaining non-identity labels of training samples in a training sample set based on feature clustering in an unsupervised mode; in the training process, the identity information and the non-identity information of the face object in the sample face image are decoupled. After the pre-trained face recognition model is obtained, the server can also receive an image to be recognized sent by a user through the terminal equipment, and face recognition is carried out to obtain a face recognition result. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the training method and the face recognition method of the face recognition model provided by the embodiment of the present disclosure may be executed by a server, or may be executed by a terminal device, or may be executed by the server and the terminal device in cooperation with each other. Accordingly, the training device of the face recognition model and each part (for example, each unit) included in the face recognition device may be all disposed in the server, or all disposed in the terminal device, or disposed in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the training method of the face recognition model and the electronic device on which the face recognition method operates do not need to perform data transmission with other electronic devices, the system architecture may only include the training method of the face recognition model and the electronic device (e.g., a server or a terminal device) on which the face recognition method operates.

Referring to fig. 2, fig. 2 is a flowchart of a training method of a face recognition model according to an embodiment of the present disclosure, where the process 200 includes the following steps:

step 201, a training sample set is obtained.

In this embodiment, an execution subject (for example, a terminal device or a server in fig. 1) of the training method for the face recognition model may obtain the training sample set from a remote location or from a local location through a wired network connection manner or a wireless network connection manner.

The training samples in the training sample set comprise sample face images, identity labels corresponding to the identity information of the face objects in the sample face images and non-identity labels corresponding to the non-identity information. And the non-identity label of the training sample is obtained based on the shallow feature clustering of the sample face image in the training sample set.

The identity information may be attribute information indicating the identity of the person, such as age, sex, face information, and limb information, and the non-identity information may be any information other than the identity information. The identity label corresponding to the identity information of each sample face image is preset, and the non-identity label corresponding to the non-identity information is obtained in a characteristic clustering mode.

As an example, the execution subject may perform feature extraction on each sample face image in the training sample set to obtain a shallow feature of each sample face image; then, based on the obtained multiple shallow layer features, the similarity between every two shallow layer features is determined, and then clustering is carried out according to the similarity to obtain multiple categories. And for each sample face image in the training sample set, determining the class corresponding to the sample face image as a non-identity label. The shallow feature of the sample face image mainly comprises non-identity information of the face object.

In the field of deep learning, deep features corresponding to face images are often required to be utilized. The deep features can be obtained by performing continuous multiple times or multi-stage feature extraction on the face image. In the process of obtaining the deep features, the initially obtained features can be used as the shallow features of the face image. Taking the residual error network as an example, generally, the image is subjected to processing operations such as feature extraction in five stages, and the features obtained in the first stage and the second stage can be taken as shallow features.

In clustering, a clustering method based on partitioning, a clustering method based on hierarchy, a clustering method based on density, or the like may be employed.

It should be noted that the plurality of categories obtained by clustering do not necessarily have to be information with a definite indication meaning, and may be different clusters obtained by simply clustering shallow features. Different categories can be distinguished through texts, numbers, pictures and the like, and a non-identity label is set for each sample face image in the training sample set.

In some optional implementations of this embodiment, the execution subject may determine the non-identity label of the training sample by:

firstly, for sample face images in a training sample set, extracting shallow features through a pre-training feature extraction network.

It can be understood that the pre-trained feature extraction network may be a network capable of extracting the depth features of the sample face image, or may be a model capable of extracting only the shallow features. When the feature extraction network can extract the depth features of the sample face image, shallow features can be selected from the extraction process of the depth features.

Secondly, clustering the shallow features of the sample face images in the training sample set to obtain a preset number of categories.

The preset number may be specifically set according to an actual situation (for example, a parameter scale of the face recognition model), and is not limited herein.

Thirdly, for each sample face image in the training sample set, determining the category corresponding to the sample face image as the non-identity label corresponding to the sample face image.

In the implementation mode, a process of obtaining the non-identity labels of the preset number of categories based on a clustering mode is provided, and the flexibility of obtaining the non-identity labels is improved.

Identity labels in training samples in the training sample set may be obtained based on data acquisition. As an example, in a transfer scene such as an airport and a train station, a passenger image and a passenger certificate image collected on site are generally required to be compared to verify whether the passenger and a person represented by the certificate image are the same person. In this scenario, the execution subject may use a passenger image collected on site as a sample face image, and use identity information represented by a certificate image representing the same person as the passenger as an identity tag.

Step 202, using a machine learning method, taking a sample face image as input, extracting the identity characteristics and non-identity characteristics of the input sample face image through an initial face recognition model, and using the identity labels and non-identity labels corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity characteristics and a second output result obtained according to the extracted non-identity characteristics in sequence, updating the initial image recognition model, and training to obtain the face recognition model.

In this embodiment, the execution subject may use a machine learning method to input a sample face image, extract the identity features and non-identity features of the input sample face image through an initial face recognition model, sequentially use the identity tags and non-identity tags corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity features and a second output result obtained according to the extracted non-identity features, update the initial image recognition model, and train to obtain the face recognition model

The initial face recognition model can be any deep learning model with face recognition function, including but not limited to a cyclic neural network, a convolutional neural network, a residual error network, and other network models. The executing body can train the initial face recognition model by adopting a batch training mode, a random training mode or a single training mode. Specifically, firstly, untrained training samples are selected from a training sample set, sample face images in the selected training samples are input into an initial face recognition model, and feature extraction is performed on the input sample face images through a feature extraction network in the initial face recognition model, so that identity features and non-identity features of the sample face images are obtained. Then, regarding the identity characteristics, taking the corresponding identity label of the input sample face image as expected output to obtain a first output result related to identity information; and for the non-identity feature, taking the corresponding non-identity label of the input sample face image as an expected output, and obtaining a second output result related to the non-identity information. Then, a classification loss between the first output result regarding the identity information and the identity tag and a classification loss between the second output result regarding the non-identity information and the non-identity tag are calculated. And finally, calculating a gradient according to the classification loss, and updating parameters of the initial face recognition model based on a gradient descent method and a random gradient descent method.

And the executing main body circularly executes the training process, and determines the trained initial face recognition model as the face recognition model when the preset finishing condition is determined to be reached. The preset ending condition may be, for example, that the training time exceeds a preset time threshold, and the classification loss tends to converge.

In some optional implementations of this embodiment, the executing main body may execute the step 202 by:

by utilizing a machine learning method, a sample face image is taken as input, the identity characteristic and the non-identity characteristic of the input sample face image are extracted through an initial face recognition model, the identity label and the non-identity label corresponding to the input sample face image are sequentially taken as expected outputs of a first output result and a second output result, the initial image recognition model is updated by taking the correlation between the extracted identity characteristic and the non-identity characteristic as a target, and the face recognition model is obtained through training.

The identity information and the non-identity information of the face object in the sample face image are relatively independent two types of information, and the correlation between the two types of information should be low. Furthermore, the correlation between the identity characteristic characterizing the identity information and the non-identity characteristic characterizing the non-identity information should be low. On the basis that the identity features and the non-identity features extracted from the input sample face image are output by taking the identity tags and the non-identity tags corresponding to the input sample face image as expected values in sequence, the correlation between the identity features and the non-identity features is minimized, the identity information and the non-identity information can be thoroughly decoupled, and the recognition accuracy of the trained face recognition model is further improved.

As an example, a first objective function for minimizing a classification loss between an actual output regarding the identity information and the identity tag, a second objective function for minimizing a classification loss between an actual output regarding the non-identity information and the non-identity tag, and a third objective function for minimizing a correlation between the identity feature and the non-identity feature may be set to train the initial face recognition model through the first objective function, the second objective function, and the third objective function.

In some optional implementations of this embodiment, the executing entity may train the initial face recognition model in two training phases by:

firstly, fixing a backbone network for extracting features in an initial face recognition model, and training to obtain a first regularization network and a second regularization network in the face recognition model by utilizing a machine learning method and taking a sample face image as input and taking the correlation between the maximized regularized identity feature and the regularized non-identity feature as a target.

The first regularization network and the second regularization network are sequentially used for obtaining regularized identity characteristics and regularized non-identity characteristics according to identity characteristics and non-identity characteristics of the input sample face image. Specifically, the first regularization network is used for implementing regularization of identity features, and the second regularization network is used for implementing regularization of non-identity features.

In this stage, only the first regularization network and the second regularization network are updated, and the backbone network is not updated, with the goal of maximizing the correlation between the regularized identity features and the regularized non-identity features.

Secondly, fixing the first regularization network and the second regularization network, extracting the identity characteristics and the non-identity characteristics of the input sample face image through an initial face recognition model by utilizing a machine learning algorithm, sequentially taking the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of a first output result and a second output result, and updating the backbone network by taking the minimum correlation between the extracted identity characteristics and the non-identity characteristics as a target.

In this stage, for the identity features and non-identity features of the input sample face image, the identity tags and non-identity tags corresponding to the input sample face image are sequentially output as expected, and the correlation between the regular identity features corresponding to the identity features and the regular non-identity features corresponding to the non-identity features is minimized, only the backbone network is updated, and the first regular network and the second regular network are not updated. The updating process of the two stages can refer to the training process of calculating the gradient and updating the model parameters according to the gradient.

In the training process, the executing body can calculate the correlation between the regularized identity feature and the regularized non-identity feature by the following formula:

where ρ represents the correlation and v_a、v_bAnd characterizing regular identity features and regular non-identity features in sequence, Cov () characterizing covariance and Var () characterizing variance.

In the implementation manner, the first stage aims at the training process of the first regularization network and the second regularization network and aims at maximizing the correlation between the regularized identity features and the regularized non-identity features; in the second stage, in the training process aiming at the backbone network, the correlation between the regularized identity features and the regularized non-identity features is minimized as a target, and the recognition accuracy of the face recognition model obtained through training is further improved.

In the training process, the execution subject can extract the identity features through the feature extraction network corresponding to the identity information in the initial face recognition model, and extract the non-identity features through the feature extraction network corresponding to the non-identity information in the initial face recognition model. In some optional implementations of this embodiment, the executing entity may obtain the identity feature and the non-identity feature of the sample face image by:

firstly, carrying out feature extraction on an input sample face image through a backbone network in an initial face recognition model to obtain face features.

The human face features comprise identity features and non-identity features.

Secondly, segmenting the face features to obtain identity features and non-identity features.

Taking the facial feature as the feature vector of the preset dimension as an example, the execution main body may determine the vector characterized by the previous first preset dimension as the identity feature, and determine the vector characterized by the next second preset dimension as the non-identity feature.

In the implementation mode, a mode of obtaining the identity characteristic and the non-identity characteristic by segmenting the face characteristic is provided, and the network structure of the face recognition model is simplified.

With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the training method for a face recognition model according to the present embodiment. In the application scenario of fig. 3, the server first obtains a training sample set 301. The training samples in the training sample set comprise sample face images, identity labels corresponding to the identity information of face objects in the sample face images and non-identity labels corresponding to the non-identity information, and the non-identity labels of the training samples are obtained based on shallow feature clustering of the sample face images in the training sample set. After the training sample set 301 is obtained, the server trains a face recognition model by using a machine learning method and taking a sample face image 302 as input, and regarding identity features 303 and non-identity features 304 extracted from the input sample face image 302, taking an identity label 305 and a non-identity label 306 corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity features and a second output result obtained according to the extracted non-identity features in sequence.

In the embodiment, the non-identity labels of the training samples in the training sample set are obtained based on the feature clustering in an unsupervised mode, so that the determining efficiency of the non-identity labels is improved, and the resource consumption of a user for obtaining the labels is saved; in the training process of the face recognition model, the identity information and the non-identity information of the face object in the sample face image are decoupled, and the recognition accuracy of the face recognition model is improved.

With continued reference to FIG. 4, an exemplary flow 400 of one embodiment of a method for training a face recognition model in accordance with the methods of the present disclosure is shown. The process 400 includes the following steps:

step 401, a training sample set is obtained.

The training samples in the training sample set comprise sample face images and identity labels corresponding to the identity information of the face objects in the sample face images.

And step 402, extracting shallow features of the sample face images in the training sample set through a pre-training feature extraction network.

And 403, clustering shallow features of the sample face images in the training sample set to obtain a preset number of categories.

Step 404, for each sample face image in the training sample set, determining the category corresponding to the sample face image as the non-identity label corresponding to the sample face image.

Step 405, fixing a backbone network for extracting features in the initial face recognition model, and training to obtain a first regularization network and a second regularization network in the face recognition model by using a machine learning method, taking the sample face image as input, and taking the correlation between the maximized regularized identity feature and the regularized non-identity feature as a target.

The first regularization network and the second regularization network are sequentially used for obtaining regularized identity characteristics and regularized non-identity characteristics according to identity characteristics and non-identity characteristics of the input sample face image.

Step 406, fixing the first regularization network and the second regularization network, extracting the identity features and the non-identity features of the input sample face image through the initial face recognition model by using a machine learning algorithm, sequentially taking the identity tags and the non-identity tags corresponding to the input sample face image as expected outputs of a first output result and a second output result, and updating the backbone network by taking the minimum correlation between the extracted identity features and the non-identity features as a target.

As can be seen from this embodiment, compared with the embodiment corresponding to fig. 2, the process 400 of the training method for the face recognition model in this embodiment specifically illustrates a process of obtaining the non-identity label by clustering, and the training process for the face recognition model improves the determination efficiency of the non-identity label, saves resource consumption of obtaining the label by the user, and improves the recognition accuracy of the face recognition model.

Referring to fig. 5, fig. 5 is a flowchart of a face recognition method according to an embodiment of the present disclosure, where the process 500 includes the following steps:

step 501, acquiring an image to be identified.

In this embodiment, an execution subject (for example, a terminal device or a server in fig. 1) of the face recognition method may obtain the image to be recognized from a remote location or a local location through a wired network connection manner or a wireless network connection manner.

The image to be recognized may be any image. As an example, the image to be recognized is each frame image in a video captured by the monitoring device.

Step 502, identifying the image to be identified through the pre-trained face identification model to obtain a face identification result.

In this embodiment, the execution subject may recognize the image to be recognized through the pre-trained face recognition model, so as to obtain a face recognition result. And the user represents the identity information of the face object in the image to be recognized according to the face recognition result.

The face recognition model is obtained based on the training methods shown in the

embodiments

200 and 400.

In the embodiment, the face recognition result of the image to be recognized is obtained through the face recognition model, and the recognition precision of the face recognition result is improved.

With continuing reference to fig. 6, as an implementation of the method shown in the above-mentioned figures, the present disclosure provides an embodiment of a training apparatus for a face recognition model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 6, the training apparatus for a face recognition model includes: a first obtaining unit 601, configured to obtain a training sample set, where a training sample in the training sample set includes a sample face image, and an identity label corresponding to identity information of a face object in the sample face image and a non-identity label corresponding to non-identity information, and the non-identity label of the training sample is obtained based on a shallow feature cluster of the sample face image in the training sample set; a training unit 602 configured to extract, by using a machine learning method, identity features and non-identity features of the input sample face image through an initial face recognition model, and update the initial image recognition model by taking an identity label and a non-identity label corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity features and a second output result obtained according to the extracted non-identity features in sequence, so as to train the face recognition model.

In some optional implementations of this embodiment, the training unit 602 is further configured to: by utilizing a machine learning method, a sample face image is taken as input, the identity characteristic and the non-identity characteristic of the input sample face image are extracted through an initial face recognition model, the identity label and the non-identity label corresponding to the input sample face image are sequentially taken as expected outputs of a first output result and a second output result, the initial image recognition model is updated by taking the correlation between the extracted identity characteristic and the non-identity characteristic as a target, and the face recognition model is obtained through training.

In some optional implementations of this embodiment, the training unit 602 is further configured to: fixing a backbone network for extracting features in the initial face recognition model, training to obtain a first regularization network and a second regularization network in the face recognition model by utilizing a machine learning method and taking a sample face image as input and taking the maximum correlation between the regular identity features and the regularization non-identity features as a target, wherein the first regularization network and the second regularization network are sequentially used for obtaining the regular identity features and the regularization non-identity features according to the identity features and the non-identity features of the input sample face image; fixing a first regularization network and a second regularization network, extracting the identity characteristics and the non-identity characteristics of the input sample face image through an initial face recognition model by utilizing a machine learning algorithm and taking the sample face image as input, sequentially taking the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of a first output result and a second output result, and updating a backbone network by taking the minimum correlation between the extracted identity characteristics and the non-identity characteristics as a target.

In some optional implementations of this embodiment, the apparatus further includes: a determining unit (not shown in the figure) configured to extract shallow features from the pre-trained feature extraction network for the sample face images in the training sample set; clustering shallow features of sample face images in a training sample set to obtain a preset number of classes; and for each sample face image in the training sample set, determining the class corresponding to the sample face image as a non-identity label corresponding to the sample face image.

In some optional implementations of this embodiment, the training unit 602 is further configured to: carrying out feature extraction on the input sample face image through a backbone network in the initial face recognition model to obtain face features; and segmenting the face features to obtain identity features and non-identity features.

With continuing reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a face recognition apparatus, which corresponds to the embodiment of the method shown in fig. 5, and which may be applied to various electronic devices.

As shown in fig. 7, the training apparatus for a face recognition model includes: a second acquisition unit 701 configured to acquire an image to be recognized; the recognition unit 702 is configured to recognize the image to be recognized through a pre-trained face recognition model, and obtain a face recognition result. The face recognition model is obtained through training of the

embodiments

200 and 400.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can implement the training method of the face recognition model and the face recognition method described in any of the above embodiments when executed.

According to an embodiment of the present disclosure, the present disclosure further provides a readable storage medium, which stores computer instructions for enabling a computer to implement the training method of the face recognition model and the face recognition method described in any of the above embodiments when executed.

The embodiments of the present disclosure provide a computer program product, which when executed by a processor can implement the training method of the face recognition model and the face recognition method described in any of the above embodiments.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as the training method of the face recognition model, the face recognition method. For example, in some embodiments, the training method of the face recognition model, the face recognition method, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above described training method of the face recognition model, the face recognition method may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method of the face recognition model, the face recognition method, in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility existing in the traditional physical host and Virtual Private Server (VPS) service; it may also be a server of a distributed system, or a server incorporating a blockchain.

According to the technical scheme of the embodiment of the disclosure, the non-identity labels of the training samples in the training sample set are obtained based on feature clustering in an unsupervised mode, so that the determination efficiency of the non-identity labels is improved, and the resource consumption of a user for obtaining the labels is saved; in the training process of the face recognition model, the identity information and the non-identity information of the face object in the sample face image are decoupled, and the recognition accuracy of the face recognition model is improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A training method of a face recognition model comprises the following steps:

acquiring a training sample set, wherein training samples in the training sample set comprise sample face images, identity labels corresponding to identity information of face objects in the sample face images and non-identity labels corresponding to non-identity information, and the non-identity labels of the training samples are obtained based on shallow feature clustering of the sample face images in the training sample set;

by using a machine learning method, taking a sample face image as input, extracting the identity characteristics and non-identity characteristics of the input sample face image through an initial face recognition model, sequentially using the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity characteristics and a second output result obtained according to the extracted non-identity characteristics, updating the initial image recognition model, and training to obtain the face recognition model.

2. The method of claim 1, wherein the using of the machine learning method, taking a sample face image as an input, extracting identity features and non-identity features of the input sample face image through an initial face recognition model, and training the initial image recognition model by using an identity label and a non-identity label corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity features and a second output result obtained according to the extracted non-identity features in sequence to update the initial image recognition model, comprises:

and by utilizing a machine learning method, taking a sample face image as input, extracting the identity characteristic and the non-identity characteristic of the input sample face image through the initial face recognition model, sequentially taking the identity label and the non-identity label corresponding to the input sample face image as expected outputs of the first output result and the second output result, updating the initial image recognition model by taking the correlation between the extracted identity characteristic and the non-identity characteristic as a target, and training to obtain the face recognition model.

3. The method according to claim 2, wherein the using a machine learning method, taking a sample face image as an input, extracting an identity feature and a non-identity feature of the input sample face image through the initial face recognition model, taking an identity tag and a non-identity tag corresponding to the input sample face image as expected outputs of the first output result and the second output result in sequence, updating the initial image recognition model with a goal of minimizing a correlation between the extracted identity feature and the non-identity feature, and training to obtain the face recognition model, includes:

fixing a backbone network used for extracting features in the initial face recognition model, training by using a machine learning method and taking a sample face image as input and maximizing correlation between a regular avatar feature and a regularized non-identity feature as a target to obtain a first regularization network and a second regularization network in the face recognition model, wherein the first regularization network and the second regularization network are sequentially used for obtaining the regular avatar feature and the regularized non-identity feature according to the identity feature and the non-identity feature of the input sample face image;

fixing the first regularization network and the second regularization network, extracting the identity characteristics and the non-identity characteristics of the input sample face image through the initial face recognition model by using a machine learning algorithm and taking the sample face image as input, sequentially taking the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of the first output result and the second output result, and updating the backbone network by taking the minimum correlation between the extracted identity characteristics and the non-identity characteristics as a target.

4. The method according to any one of claims 1-3, further comprising:

extracting shallow features of the sample face images in the training sample set through a pre-training feature extraction network;

clustering shallow features of the sample face images in the training sample set to obtain a preset number of categories;

and for each sample face image in the training sample set, determining the class corresponding to the sample face image as a non-identity label corresponding to the sample face image.

5. The method according to any one of claims 1 to 3, wherein the identity features and the non-identity features of the sample face image are obtained by:

carrying out feature extraction on the input sample face image through a backbone network in the initial face recognition model to obtain face features;

and segmenting the face features to obtain identity features and non-identity features.

6. A face recognition method, comprising:

acquiring an image to be identified;

and identifying the image to be identified through a pre-trained face identification model to obtain a face identification result, wherein the face identification model is obtained through training according to any one of claims 1 to 5.

7. An apparatus for training a face recognition model, comprising:

the system comprises a first obtaining unit and a second obtaining unit, wherein the first obtaining unit is configured to obtain a training sample set, training samples in the training sample set comprise sample face images, identity labels corresponding to identity information of face objects in the sample face images and non-identity labels corresponding to non-identity information, and the non-identity labels of the training samples are obtained based on shallow feature clustering of the sample face images in the training sample set;

and the training unit is configured to extract the identity characteristics and the non-identity characteristics of the input sample face image through an initial face recognition model by using a machine learning method and taking the sample face image as input, and updating the initial image recognition model by taking the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of a first output result obtained according to the extracted identity characteristics and a second output result obtained according to the extracted non-identity characteristics in sequence, so as to train the face recognition model.

8. The apparatus of claim 7, wherein the training unit is further configured to:

9. The apparatus of claim 8, wherein the training unit is further configured to:

fixing a backbone network used for extracting features in the initial face recognition model, training by using a machine learning method and taking a sample face image as input and maximizing correlation between a regular avatar feature and a regularized non-identity feature as a target to obtain a first regularization network and a second regularization network in the face recognition model, wherein the first regularization network and the second regularization network are sequentially used for obtaining the regular avatar feature and the regularized non-identity feature according to the identity feature and the non-identity feature of the input sample face image; fixing the first regularization network and the second regularization network, extracting the identity characteristics and the non-identity characteristics of the input sample face image through the initial face recognition model by using a machine learning algorithm and taking the sample face image as input, sequentially taking the identity labels and the non-identity labels corresponding to the input sample face image as expected outputs of the first output result and the second output result, and updating the backbone network by taking the minimum correlation between the extracted identity characteristics and the non-identity characteristics as a target.

10. The apparatus of any of claims 7-9, further comprising:

the determining unit is configured to extract shallow features from the sample face images in the training sample set through a pre-training feature extraction network; clustering shallow features of the sample face images in the training sample set to obtain a preset number of categories; and for each sample face image in the training sample set, determining the class corresponding to the sample face image as a non-identity label corresponding to the sample face image.

11. The apparatus of any of claims 7-9, wherein the training unit is further configured to:

carrying out feature extraction on the input sample face image through a backbone network in the initial face recognition model to obtain face features; and segmenting the face features to obtain identity features and non-identity features.

12. A face recognition apparatus comprising:

a second acquisition unit configured to acquire an image to be recognized;

a recognition unit configured to recognize the image to be recognized through a pre-trained face recognition model to obtain a face recognition result, wherein the face recognition model is obtained through training according to any one of claims 7 to 11.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product, comprising: computer program which, when being executed by a processor, carries out the method according to any one of claims 1-6.