CN113033373A

CN113033373A - Method and related device for training face recognition model and recognizing face

Info

Publication number: CN113033373A
Application number: CN202110298548.9A
Authority: CN
Inventors: 杨馥魁
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Qiruru Earprint Technology Shenzhen Co ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2021-06-25
Anticipated expiration: 2041-03-19
Also published as: CN113033373B

Abstract

The embodiment of the application discloses a method, a device, electronic equipment, a computer-readable storage medium and a computer program product for training a face recognition model and recognizing a face, relates to computer vision and deep learning technologies, and can be used for face recognition scenes. One embodiment includes: acquiring an original face feature set after normalization processing; the training steps performed at the time of training include: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set; training a face recognition model based on the target face feature set determined according to the similarity; decreasing the value of K; and if the current K meets the preset condition, executing a training step, and otherwise, determining the face recognition model as a target face recognition model. The embodiment avoids the influence of the dirty data on the robustness and the precision of the model by aggregating the dirty data to a certain new face feature set as much as possible.

Description

Method and related device for training face recognition model and recognizing face

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to computer vision and deep learning techniques, which may be used in a face recognition scenario, and in particular, to a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for training a face recognition model, and a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for recognizing a face.

Background

The existing face recognition model needs to label a sample in detail in the training process, but because of huge data volume, when data is labeled, labeling errors are inevitable, so that the problem of noise in the training data, namely, dirty data, is caused, and the dirty data can greatly mislead the training of the model, so that the trained face recognition model has the problems of poor robustness, low face recognition precision and the like.

Disclosure of Invention

The embodiment of the application provides a method, a device, an electronic device, a computer-readable storage medium and a computer program product for training a face recognition model, and also provides a method, a device, an electronic device, a computer-readable storage medium and a computer program product for recognizing a face.

In a first aspect, an embodiment of the present application provides a method for training a face recognition model, including: acquiring an original face feature set after normalization processing; training a face recognition model based on an original face feature set, wherein the training step comprises: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K; responding to the fact that the current K meets a preset condition, and executing a training step; and determining the face recognition model as a target face recognition model in response to the fact that the current K does not meet the preset condition.

In a second aspect, an embodiment of the present application provides an apparatus for training a face recognition model, including: the normalization processing unit is configured to acquire an original face feature set after normalization processing; a training unit configured to train a face recognition model based on the original face feature set, the training step comprising: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K; an iterative loop unit configured to perform a training step in response to a current K satisfying a preset condition; and the output unit is configured to respond to the fact that the current K does not meet the preset condition, and determine the face recognition model as the target face recognition model.

In a third aspect, an embodiment of the present application provides a method for recognizing a face, including: receiving a face image to be recognized; calling a target face recognition model to recognize a face image to be recognized; wherein the target face recognition model is obtained according to the method for training a face recognition model as described in any one of the implementations of the first aspect.

In a fourth aspect, an embodiment of the present application provides an apparatus for recognizing a human face, including: a face image to be recognized receiving unit configured to receive a face image to be recognized; the face recognition unit is configured to call a target face recognition model to recognize a face image to be recognized; wherein the target face recognition model is obtained according to the method for training a face recognition model as described in any one of the implementations of the second aspect.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a face recognition model as described in any implementation of the first aspect and/or the method for recognizing a face as described in any implementation of the third aspect when executed.

In a sixth aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions for enabling a computer to implement the method for training a face recognition model as described in any implementation manner of the first aspect and/or the method for recognizing a face as described in any implementation manner of the third aspect when executed.

In a seventh aspect, the present application provides a computer program product comprising a computer program, which when executed by a processor is capable of implementing the method for training a face recognition model as described in any one of the implementations of the first aspect and/or the method for recognizing a face as described in any one of the implementations of the third aspect.

According to the method, the device, the electronic equipment, the computer readable storage medium and the computer program product for training the face recognition model, firstly, an original face feature set after normalization processing is obtained; then, training a face recognition model based on the original face feature set, wherein the training step comprises: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K; next, in response to the current K satisfying a preset condition, performing a training step; and finally, determining the face recognition model as a target face recognition model when the current K does not meet the preset condition.

On the basis of the model training scheme provided by the method, the device, the electronic device, the computer-readable storage medium and the computer program product for recognizing the human face provided by the embodiment of the application can call the target human face recognition model trained by the scheme to recognize when the human face image to be recognized is received.

In order to avoid adverse effects of dirty data on model training as much as possible, the method randomly transforms an original face feature set into a plurality of new face feature sets through a subclass center transformation matrix, gathers the dirty data in a new face feature set of a certain class as much as possible through a class random transformation mode provided by the subclass center transformation matrix, and the more the dirty data are gathered, the smaller the similarity between the new face feature set of the class and the original face feature set is, so that the face recognition model can be controlled in a targeted manner to learn parameters only from the new face feature set containing as little dirty data as possible, and the robustness and the face recognition accuracy of the finally trained target face recognition model are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture to which the present application may be applied;

fig. 2 is a flowchart of a method for training a face recognition model according to an embodiment of the present application;

FIG. 3 is a flow chart of another method for training a face recognition model according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for verifying usability of a target face recognition model according to an embodiment of the present disclosure;

fig. 5 is a block diagram illustrating a structure of an apparatus for training a face recognition model according to an embodiment of the present application;

fig. 6 is a block diagram illustrating a structure of an apparatus for recognizing a human face according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device suitable for executing a method for training a face recognition model and/or a method for recognizing a face according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information (such as the subsequent face images) of the related user all conform to the regulations of related laws and regulations, and do not violate the customs of the public order.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present methods, apparatuses, electronic devices and computer-readable storage media for training a face recognition model and recognizing a face may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications for realizing information communication between the

terminal devices

101, 102, 103 and the server 105, such as data encryption application, face recognition application, instant messaging application, etc., may be installed on the

terminal devices

101, 102, 103 and the server 105.

The

terminal apparatuses

101, 102, 103 and the server 105 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the

terminal devices

101, 102, and 103 are software, they may be installed in the electronic devices listed above, and they may be implemented as multiple software or software modules, or may be implemented as a single software or software module, and are not limited in this respect. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server; when the server is software, the server may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited herein.

The server 105 may provide various services through various built-in applications, for example, a face recognition application that may provide a service for recognizing a user corresponding to a face image, and the server 105 may implement the following effects when running the electronic album application: firstly, receiving incoming face images to be recognized from

terminal equipment

101, 102 and 103 through a network 104; then, calling a previously trained target face recognition model from a preset storage position, and inputting a face image to be recognized into the target face recognition model as input data; finally, the recognition result output by the target face recognition model is returned to the

terminal equipment

101, 102, 103.

The target face recognition model can be obtained by applying a model training class built in the server 105 to a training sample in advance according to the following steps: firstly, acquiring an original face feature set after normalization processing; then, training a face recognition model based on the original face feature set, wherein the training step comprises: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K; next, in response to the current K satisfying a preset condition, performing the training step; and finally, determining the face recognition model as a target face recognition model in response to the fact that the current K does not meet the preset condition.

Since the target face recognition model obtained by training needs to occupy more computation resources and stronger computation capability, the method for training the face recognition model provided in the following embodiments of the present application is generally executed by the server 105 having stronger computation capability and more computation resources, and accordingly, the apparatus for training the face recognition model is generally also disposed in the server 105. However, it should be noted that when the

terminal devices

101, 102, and 103 also have computing capabilities and computing resources meeting the requirements, the

terminal devices

101, 102, and 103 may also complete the above-mentioned operations that are originally delivered to the server 105 through the model training application installed thereon, and then output the same result as the server 105. Correspondingly, the means for training the face recognition model may also be provided in the

terminal device

101, 102, 103. In such a case, the exemplary system architecture 100 may also not include the server 105 and the network 104.

Specifically, the target face recognition model trained by the server 105 may also obtain a lightweight face recognition model suitable for being placed in the

terminal devices

101, 102, and 103 in a model distillation manner, that is, the lightweight face recognition model in the

terminal devices

101, 102, and 103 may be flexibly selected for use according to the recognition accuracy required in practice, or the target face recognition model in the server 105 may be selected for use.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring to fig. 2, fig. 2 is a flowchart of a method for training a face recognition model according to an embodiment of the present application, wherein the process 200 includes the following steps:

step 201: acquiring an original face feature set after normalization processing;

this step is intended to obtain the normalized raw face feature set by the executing subject (e.g., the server 105 shown in fig. 1) of the method for training the face recognition model.

The original face feature set is a set of face features obtained after normalization processing, the face features are extracted from each picture containing a face image, the extraction modes of the face features are many, such as face key point extraction, face contour extraction and the like, the normalization processing aims at uniformly describing the face features extracted from different pictures, and the process can be visually described as uniform measurement.

It should be noted that the face image training set used for extracting and normalizing the original face feature set may be obtained directly from a local storage device by the execution subject, or may be obtained from a non-local storage device (for example,

terminal devices

101, 102, 103 shown in fig. 1). The local storage device may be a data storage module arranged in the execution main body, for example, a server hard disk, in which case, the face image set can be quickly read locally; the non-local storage device may also be any other electronic device configured to store data, such as some user terminals, in which case the executing subject may obtain the required training set of facial pictures by sending a obtaining command to the electronic device. Of course, the original face feature set may also be directly stored in a local storage device or a non-local storage device as a finished product.

In particular, the original set of face features can be generally represented in a variety of forms, such as matrices, vectors, or other equivalently transformable forms.

Step 202: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers;

on the basis of step 201, this step is intended to convert the original face features into K new face feature sets by the execution subject through a subclass center conversion matrix with a subclass center number of K, where the value range of K should be an integer greater than 1 and the initial value should be greater than 1 based on the reason that K is the subclass center number.

It should be noted that the K new face feature sets converted by the subclass center conversion matrix may be included in a large matrix with K dimensions converted, or may be represented as K small matrices respectively. Of course, in order to meet the conversion requirement, the original face feature set needs to be converted into a matrix representation form when a subclass center conversion matrix is input.

The subclass center conversion matrix is used for converting an original class center of input data into a new class center, so that the class of the feature set is changed in the mode, and dirty data contained in the face picture training set is gathered and found. The reason why the dirty data can be gathered and found in this way is that the dirty data is hidden deeper in the center of the original category and is not easy to find, and the randomly generated center of the new category may be the category center of the dirty data, and once a new face feature set gathers more dirty data, the similarity between the new face feature set and the original face feature set is necessarily reduced, so that the dirty data can be gathered and distinguished.

Since the class center can only be randomly generated on the premise that the dirty data cannot be determined in advance, in order to generate the aggregated dirty data and the class center as much as possible, the initial value of K may be set to a larger value, and the size of K is continuously changed in a decreasing manner at each turn, so that the class center capable of aggregating the dirty data is continuously tried to be generated without omission.

Step 203: respectively calculating the similarity between each new face feature set and the original face feature set;

on the basis of step 202, this step is intended to calculate the similarity between each new face feature set and the original face feature set separately by the executing subject. It should be understood that if more dirty data is gathered in a new face feature set, the similarity between the new face feature set and the original face feature set is low; if a new face feature set does not contain much dirty data, the similarity between the new face feature set and the original face feature set is high.

Specifically, arccos may be calculated by multiplying each new face feature set by the matrix expression form of the original face feature set, so as to obtain the similarity expressed in the angle matrix manner.

Step 204: and determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set.

On the basis of step 203, this step is intended to determine a target face feature set based on the similarity by the executing subject, and train a face recognition model based on the target face feature set.

According to the description, in order to realize the purpose of gathering and distinguishing dirty data, the target face feature set is a part of face feature set with high similarity to the original face feature set, so that the determined target face feature set contains as little dirty data as possible, the face recognition model is trained by using as little dirty data as possible, and the recognition accuracy of the model is improved.

Specifically, the new face feature set with higher similarity is determined as the target face feature set in various ways, for example, the degree of similarity is distinguished by a preset higher similarity threshold, the new face feature set with higher similarity may be sorted according to the degree of similarity and selected as the top ranking, or as the top ranking percentage, and the like, and may be flexibly selected according to actual requirements, and the method is not specifically limited herein.

Specifically, the face recognition model may be trained by using an arcace loss function suitable for face recognition, and in the loss function, the angle of arccos may be input for supervised training.

Step 205: decreasing the value of K;

on the basis of step 204, this step is intended to acquire a new K value by decrementing the K value, so as to guide the generation of a new set of face features based on the new K value, and to try again to gather and differentiate dirty data.

The decreasing step length of K in each iteration training cycle may be the same or different, and the purpose of the decreasing K value is to generate a new face feature set as comprehensive as possible by changing the size, so as to attempt to better aggregate and distinguish dirty data.

Step 206: judging whether the current K meets a preset condition, if so, skipping to execute the step 202, otherwise, executing the step 207;

it is to be understood that the preset condition is actually a jump-out condition of iterative training, for example, whether the current K value is decreased to a certain smaller value (i.e., whether the K value is changed for a sufficient number of times by the smaller value), and if the requirement of repeatedly performing the training step is met, it should be determined whether the current K value is greater than the smaller value, for example, 1, 2, etc.; the skip-out condition may also be a decreasing amplitude, a decreasing percentage, etc. of the current K value compared to the initial value, and the corresponding preset condition is also adjusted accordingly.

Step 207: and determining the face recognition model as a target face recognition model.

This step is based on the determination result in step 206 that the current K does not satisfy the preset condition, and aims to determine, by the executing agent, the face recognition model trained to the current as the target face recognition model, that is, this step is in a state where the training degree of the current face recognition model is considered to satisfy the end condition, and may be output. It should be understood, however, that skipping or ending training does not necessarily mean that the target face recognition model has the desired recognition accuracy, and in the improper case of preset condition setting, the transformation of K may not always be effective in aggregating and distinguishing dirty data, and although training is also skipped and ended, the recognition accuracy of the target face recognition model may not be as expected.

Based on the embodiment shown in fig. 2, the present application provides a more specific implementation manner by taking a fixed decreasing step size-1 as an example through another method for training a face recognition model provided in fig. 3, so as to further understand the present solution, where the process 300 shown in fig. 3 includes the following steps:

step 301: extracting face features from each face picture contained in the face picture training set;

step 302: normalizing the extracted face features to obtain an original face feature set;

step 301-.

Step 303: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers;

step 304: respectively calculating the similarity between each new face feature set and the original face feature set;

step 305: determining a new face feature set with the similarity larger than a preset threshold as a target face feature, and training a face recognition model based on the target face feature set;

the above steps 303-305 are substantially the same as the

steps

202 and 204 shown in fig. 2, and for the same parts, reference is made to the corresponding parts of the previous embodiment, which is not repeated herein. The differences are as follows: in the step, a comparison mode based on a preset threshold is specifically selected to select a new face feature set with high similarity as a target face feature.

Similarly, this step can be replaced by: and determining the new face feature set with the similarity arranged in the front preset percentage according to the size as a target face feature set.

Step 306: each time the value of K is decremented by 1.

In this embodiment, the decreasing step size of the K value is fixed, that is, the decreasing amplitude is unit 1, that is, the previous K value is 9, the K value after decreasing is 8, the next round is 7, and so on.

Step 307: judging whether the current K is larger than a preset value;

wherein, the preset value is less than the initial value of K, thereby ensuring the normal operation of iterative training. Assuming that the initial value of K is 10, the preset value may be set to 1, and steps 303-305 will be repeated 9 times based on the decreasing K value according to the fixed decreasing step size of unit 1 provided in step 306.

Step 308: determining the face recognition model as a target face recognition model;

the embodiment shown in fig. 3 provides a scheme of a fixed decreasing step size, but in some scenarios, especially in the case where the initial value of K is large, it is possible that the target face feature set meeting the requirement is found by performing the decreasing for multiple times, and at this time, too many rounds and too slow efficiency may be caused if the decreasing step size is fixed. It is therefore also conceivable to change the decrement step size of K in this case. One implementation, including and not limited to, is:

and increasing the decreasing step length of the K in response to that the target face features are not determined in the process of continuously decreasing the K value by the preset times. The adjustment of the decreasing step length is carried out according to the fact that the K value after adjustment is larger than a preset value, and therefore the training is repeatedly executed after the decreasing step length is increased, and the training is not directly skipped out and finished.

In order to ensure the usability of the target face recognition model trained according to the above embodiment as much as possible, and also to test the usability of the target face recognition model by designing a contrast verification, an included and not limited contrast verification method and processing method may refer to the flowchart shown in fig. 4:

step 401: identifying the face pictures contained in the face picture test set by using a conventional face identification model to obtain a first identification accuracy rate;

the conventional face recognition model is obtained through the provided training steps when the initial value of K is set to be 1, that is, dirty data and valid data are not distinguished as in the prior art.

Step 402: identifying the face pictures contained in the face picture test set by using the target face identification model to obtain a second identification accuracy rate;

the face image test set is used for verifying the usability of the trained target face model in the embodiment, the number of the specific test sets is not specifically limited, generally speaking, the test set has the capability of verifying whether the usability exists, but in an actual situation, a conclusion may be wrong due to inappropriate and imprecise test data selection, in order to avoid the situation as much as possible, a plurality of test sets can be independently arranged to perform respective tests, and further misleading of a single wrong test set to the conclusion is eliminated.

Step 403: in response to the first recognition accuracy being greater than the second recognition accuracy, adjusting the initial value of the K value until the second recognition accuracy is greater than the first recognition accuracy.

This step is based on the first recognition accuracy being greater than the second recognition accuracy, and aims to adjust the initial value of the K value by the execution subject until the second recognition accuracy is greater than the first recognition accuracy.

The reason why the initial value of K needs to be adjusted is that according to the above-mentioned idea, the first recognition accuracy should be smaller than the second recognition accuracy, so that when the first recognition accuracy is greater than the second recognition accuracy, it is an abnormal situation, and the reason for this situation may be that the initial value of K is set inappropriately, so that a category center capable of gathering dirty data cannot be generated well, and therefore, it can be seen whether to eliminate the problem by adjusting the initial value of K.

Furthermore, if the conclusion that the first recognition accuracy rate is greater than the second recognition accuracy rate is obtained after the last operation is tried for a plurality of times, the model frame and the parameters of the target face recognition model can be tried to be adjusted.

On the basis of any of the above embodiments, in order to make the target face recognition model trained in the above training mode function sufficiently, the following steps may be further performed to make the target face recognition model function in the actual face recognition operation:

receiving a face image to be recognized;

and calling a target face recognition model to recognize the face image to be recognized.

The target face recognition model is a face recognition model obtained according to the method for training the face recognition model provided in fig. 2-4.

Specifically, the recognition result may include whether the user belongs to a registered user, a prompt message indicating that the recognition cannot be completed due to insufficient definition, and the like. The execution subject of the above method for recognizing the human face may be the server 105 shown in fig. 1, or may be changed to the

terminals

101, 102, 103 shown in fig. 1, especially in the case of calling the trained target human face recognition model. Of course, the lightweight target face recognition model can be placed locally on the

terminals

101, 102, and 103 by model distillation, so as to realize local calling and recognition.

In order to deepen understanding, the application also provides a specific implementation scheme by combining a specific application scene:

in order to meet the requirement of realizing the high-accuracy face recognition effect proposed by a customer, a service provider firstly trains and obtains a target face recognition model according to the following steps:

1) acquiring a face picture training set containing a face image;

2) extracting face features from a face image of each face picture in a face picture training set, and obtaining an original face feature matrix expressed in a matrix form through normalization processing;

3) generating a corresponding number of new face feature matrixes in each round of supervised training through a class center conversion matrix with an initial value of K of 100 and a decreasing step length of 1;

4) in each round of supervision training, a similarity angle matrix of each new face feature matrix and an original face feature matrix in the current round is obtained through calculation in a matrix multiplication mode, the similarity is determined based on an across angle extracted from the similarity angle matrix, and a face recognition model is trained only by using the new face feature matrix with the similarity exceeding 70%;

5) after 100 rounds of repeated operation, outputting a trained target face recognition model;

after training the target face recognition model, the service provider also tests the target face recognition model with 100 test sets (10 test pictures containing dirty data), and the target face recognition model is determined to be available due to the fact that the dirty data detection number (8) exceeds the preset number (6).

The service provider obtains a lightweight target face recognition model (called a full-scale recognition model for embodying distinction) through a model distillation mechanism, and then provides all data files of the lightweight recognition model and a calling interface of the full-scale recognition model for the client so as to be flexibly selected by the client according to actual requirements.

With further reference to fig. 5 and fig. 6, as implementations of the methods shown in the above-mentioned figures, the present application further provides embodiments of an apparatus for training a face recognition model and an apparatus for recognizing a face, respectively, where the apparatus embodiments correspond to the above-mentioned method embodiments, and the apparatus may be applied to various electronic devices in particular.

As shown in fig. 5, the apparatus 500 for training a face recognition model of the present embodiment may include: a normalization processing unit 501, a training unit 502, an iterative loop unit 503, and an output unit 504. The normalization processing unit 501 is configured to obtain an original face feature set after normalization processing; a training unit 502 configured to train a face recognition model based on the original face feature set, the training step including: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K; an iterative loop unit 503 configured to perform a training step in response to the current K satisfying a preset condition; an output unit 504 configured to determine the face recognition model as the target face recognition model.

In the present embodiment, in the apparatus 500 for training a face recognition model: the detailed processing of the normalization processing unit 501, the training unit 502, the iterative loop unit 503, and the output unit 504 and the technical effects thereof can be referred to the related descriptions of step 201 and step 207 in the corresponding embodiment of fig. 2, and are not described herein again.

In some optional implementations of this embodiment, the training unit 502 may comprise a target face feature set determining subunit configured to determine a target face feature set based on the similarity, the target face feature set determining subunit being further configured to:

and determining the new face feature set with the similarity larger than a preset threshold value as the target face feature.

and determining the new face feature set with the similarity arranged in the front preset percentage according to the size as a target face feature set.

In some optional implementations of this embodiment, the iterative loop unit 503 may be further configured to:

and executing the training step in response to the current K value being larger than a preset value, wherein the preset value is smaller than the initial value.

In some optional implementations of the present embodiment, the apparatus 500 for training a face recognition model may further include:

a first identification accuracy rate obtaining unit configured to identify a face picture contained in the face picture test set by using a conventional face identification model to obtain a first identification accuracy rate; wherein, the conventional face recognition model is a face recognition model trained by using a training step with an initial value of 1;

the second identification accuracy rate acquisition unit is configured to identify the face pictures contained in the face picture test set by using the target face identification model to obtain a second identification accuracy rate;

and the K initial value adjusting unit is configured to respond to the first identification accuracy rate being greater than the second identification accuracy rate, and adjust the initial value of the K value until the second identification accuracy rate is greater than the first identification accuracy rate.

In some optional implementations of this embodiment, the normalization processing unit 501 may be further configured to:

extracting face features from each face picture contained in the face picture training set;

and carrying out normalization processing on the extracted face features to obtain an original face feature set.

and the decreasing step length adjusting unit is configured to respond to the situation that the target human face characteristic is not determined in the process of continuously decreasing the K value by the preset times, and increase the decreasing step length of the K.

As shown in fig. 6, the apparatus 600 for recognizing a human face of the present embodiment may include: a face image receiving unit 601 to be recognized and a face recognition unit 602. The face image receiving unit 601 to be recognized is configured to receive a face image to be recognized; a face recognition unit 602 configured to call a target face recognition model to recognize a face image to be recognized; the target face recognition model is obtained according to the apparatus for training the face recognition model shown in fig. 5.

The above embodiment exists as an apparatus embodiment corresponding to the above method embodiment, and in order to avoid adverse effects of dirty data on model training as much as possible, the present embodiment randomly transforms an original face feature set into a plurality of new face feature sets through a subclass center transformation matrix, and aggregates the dirty data in a new face feature set of a certain class as much as possible in a class random transformation manner provided by the subclass center transformation matrix, and the more the dirty data are aggregated, the smaller the similarity between the new face feature set of the class and the original face feature set is, so that the face recognition model can be controlled in a targeted manner to learn parameters only from the new face feature set containing as little dirty data as possible, and further the robustness and the face recognition accuracy of the finally trained target face recognition model are improved.

There is also provided, in accordance with an embodiment of the present application, an electronic device, a readable storage medium, and a computer program product.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as the method for training a face recognition model. For example, in some embodiments, the method for training a face recognition model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM703 and executed by the computing unit 701, one or more steps of the method for training a face recognition model described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g. by means of firmware) to perform the method for training the face recognition model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in the conventional physical host and Virtual Private Server (VPS) service.

In order to avoid adverse effects of dirty data on model training as much as possible, in the embodiment, an original face feature set is randomly converted into a plurality of new face feature sets through a subclass center conversion matrix, dirty data are gathered in a new face feature set of a certain class as much as possible through a class random conversion mode provided by the subclass center conversion matrix, and the more dirty data are gathered, the smaller similarity between the new face feature set of the class and the original face feature set is caused, so that a face recognition model can be controlled in a targeted manner to learn parameters only from the new face feature set containing as little dirty data as possible, and the robustness and the face recognition accuracy of a finally trained target face recognition model are improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for training a face recognition model, comprising:

acquiring an original face feature set after normalization processing;

training a face recognition model based on the original face feature set, wherein the training step comprises: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K;

responding to the current K meeting a preset condition, and executing the training step;

and determining the face recognition model as a target face recognition model in response to the current K not meeting the preset condition.

2. The method of claim 1, wherein the determining a target set of face features based on the similarity comprises:

and determining the new face feature set with the similarity larger than a preset threshold as the target face feature.

3. The method of claim 1, wherein the determining a target set of face features based on the similarity comprises:

and determining the new face feature set with the similarity arranged in the front preset percentage according to the size as the target face feature set.

4. The method of claim 1, wherein the performing the training step in response to the current K satisfying a preset condition comprises:

5. The method of claim 1, further comprising:

identifying the face pictures contained in the face picture test set by using a conventional face identification model to obtain a first identification accuracy rate; the conventional face recognition model is a face recognition model obtained through the training step when the initial value of the K is set to be 1;

recognizing the face pictures contained in the face picture test set by using the target face recognition model to obtain a second recognition accuracy rate;

in response to the first recognition accuracy being greater than the second recognition accuracy, adjusting an initial value of a K value until the second recognition accuracy is greater than the first recognition accuracy.

6. The method of claim 1, wherein the obtaining of the normalized raw face feature set comprises:

and carrying out normalization processing on the extracted face features to obtain the original face feature set.

7. The method of any of claims 1-6, further comprising:

and increasing the decreasing step length of the K in response to that the target face features are not determined in the process of continuously decreasing the K value by preset times.

8. A method for recognizing a human face, comprising:

receiving a face image to be recognized;

calling a target face recognition model to recognize the face image to be recognized; wherein the target face recognition model is derived from the method for training a face recognition model according to any one of claims 1-7.

9. An apparatus for training a face recognition model, comprising:

the normalization processing unit is configured to acquire an original face feature set after normalization processing;

a training unit configured to train a face recognition model based on the set of primitive face features, the training step comprising: converting the original face feature set into K new face feature sets through a subclass center conversion matrix with K subclass center numbers; respectively calculating the similarity between each new face feature set and the original face feature set, wherein the initial value of K is greater than 1; determining a target face feature set based on the similarity, and training a face recognition model based on the target face feature set; decreasing the value of K;

an iterative loop unit configured to perform the training step in response to a current K satisfying a preset condition;

an output unit configured to determine the face recognition model as a target face recognition model in response to the current K not satisfying the preset condition.

10. The apparatus of claim 9, wherein the training unit comprises a target face feature set determination subunit configured to determine a target face feature set based on the similarity, the target face feature set determination subunit further configured to:

11. The apparatus of claim 9, wherein the training unit comprises a target face feature set determination subunit configured to determine a target face feature set based on the similarity, the target face feature set determination subunit further configured to:

12. The apparatus of claim 9, wherein the iterative loop unit is further configured to:

13. The apparatus of claim 9, further comprising:

a first identification accuracy rate obtaining unit configured to identify a face picture contained in the face picture test set by using a conventional face identification model to obtain a first identification accuracy rate; the conventional face recognition model is a face recognition model obtained through the training step when the initial value of the K is set to be 1;

a second identification accuracy rate obtaining unit configured to identify the face pictures included in the face picture test set by using the target face identification model to obtain a second identification accuracy rate;

a K initial value adjustment unit configured to adjust an initial value of a K value until the second recognition accuracy is greater than the first recognition accuracy in response to the first recognition accuracy being greater than the second recognition accuracy.

14. The apparatus of claim 9, wherein the normalization processing unit is further configured to:

15. The apparatus of any of claims 9-14, further comprising:

and the decreasing step length adjusting unit is configured to respond to the situation that the target face characteristic is not determined in the process of continuously decreasing the K value by the preset times, and increase the decreasing step length of the K.

16. An apparatus for recognizing a human face, comprising:

a face image to be recognized receiving unit configured to receive a face image to be recognized;

the face recognition unit is configured to call a target face recognition model to recognize the face image to be recognized; wherein the target face recognition model is derived by the apparatus for training a face recognition model according to any one of claims 9-15.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for training a face recognition model of any one of claims 1-7 and/or the method for recognizing a face of claim 8.

18. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for training a face recognition model of any one of claims 1-7 and/or the method for recognizing a face of claim 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method for training a face recognition model according to any one of claims 1-7 and/or the method for recognizing a face of claim 8.