CN112257689A

CN112257689A - Training and recognition method of face recognition model, storage medium and related equipment

Info

Publication number: CN112257689A
Application number: CN202011500234.4A
Authority: CN
Inventors: 石海林; 杜航; 王军; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-01-22

Abstract

The embodiment of the application discloses a training and recognition method of a face recognition model, a storage medium and related equipment, wherein the training method comprises the following steps: obtaining at least two groups of images under the Nth iteration; obtaining two groups of training data sets under the Nth iteration according to at least two groups of images; processing the first group of training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration; processing the second group of training data set by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration; calculating a loss function value under the Nth iteration according to the first characteristic group and the second characteristic group; and at least determining whether to update the main model under the Nth iteration correspondingly or not according to the loss function value under the Nth iteration.

Description

Training and recognition method of face recognition model, storage medium and related equipment

Technical Field

The present application relates to the field of recognition technologies, and in particular, to a training method for a face recognition model, a face recognition method, a related device, and a computer-readable storage medium.

Background

The current face recognition training method based on deep learning is to make a convolutional neural network model learn the feature expression of a face on a training data set, and identify the identity information of the face to be recognized by calculating the similarity score between the features of the face. If the convolutional neural network model is regarded as a face recognition model, the training method of the face recognition model in the related technology is to regard face recognition as a multi-classification task, and calculate the probability that the face to be recognized belongs to each identity class by using a full connection layer and a loss function in the convolutional neural network model. The above method can achieve better performance when the number of samples of each identity class is large (sufficient). The widespread use of face recognition technology tends to produce large amounts of shallow data. The shallow data is relative to the deep data, which may mean that the number of samples (face images) of each identity class is sufficient, i.e. the number of images for the same face is large. Shallow data refers to a limited or smaller number of images of the same face relative to deep data. For shallow data, if the aforementioned training method (conventional training method) is still used, the training of the face recognition model is inaccurate.

Disclosure of Invention

In order to solve the existing technical problems, embodiments of the present application provide a training method for a face recognition model, a face recognition method, a related device, and a computer-readable storage medium.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a training method of a face recognition model, wherein the face recognition model comprises a secondary model and a primary model, and the training method comprises the following steps:

obtaining at least two groups of images under the Nth iteration, wherein each group of images in the at least two groups of images are two images aiming at the same face; n is a positive integer greater than or equal to 1;

obtaining two groups of training data sets under the Nth iteration according to at least two groups of images;

processing the first group of training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration;

processing the second group of training data set by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration;

calculating a loss function value under the Nth iteration according to the first characteristic group and the second characteristic group;

at least determining whether to update the main model under the Nth iteration for the corresponding time according to the loss function value under the Nth iteration;

if the main model under the Nth iteration is not determined to be updated for the Nth time, the iteration process is ended; and if the main model under the nth iteration is determined to be updated for the nth time, continuing to execute the (N +1) th iteration process by N = N + 1.

In the foregoing solution, the determining, at least based on the loss function value at the nth iteration, whether to update the main model at the nth iteration correspondingly includes:

and determining whether to update the main model and the auxiliary model in the Nth iteration correspondingly according to the loss function value in the Nth iteration.

if the loss function value under the Nth iteration is smaller than or equal to a preset loss threshold, at least determining not to update the main model under the Nth iteration for the Nth time;

and if the loss function value under the Nth iteration is larger than a preset loss threshold, at least determining to update the main model under the Nth iteration for the Nth time.

In the foregoing scheme, in the (N +1) th iteration scheme,

determining a model after updating the sub-model under the nth iteration as a sub-model under the (N +1) th iteration;

processing a first group of training data sets under the (N +1) th iteration by using the sub-model under the (N +1) th iteration to obtain a first feature group of the face under the (N +1) th iteration;

in the foregoing solution, the processing, by the sub-model under the (N +1) th iteration, the first group of training data sets under the (N +1) th iteration to obtain the first feature group of the face under the (N +1) th iteration includes:

obtaining a first part of characteristics, wherein the first part of characteristics are obtained by extracting human face characteristics in a first group of training data sets of two groups of training data sets under the N iteration by using a secondary model under the N iteration;

extracting the face features in the first group of training data sets under the (N +1) th iteration by using the sub-model under the (N +1) th iteration to obtain a second part of features under the (N +1) th iteration;

and obtaining a first feature group of the face under the (N +1) th iteration according to the first partial features and the second partial features.

In the foregoing aspect,

in the (N +1) th iteration scheme,

determining a model after updating the main model under the nth iteration as the main model under the (N +1) th iteration;

and processing a second group of training data sets under the (N +1) th iteration by using the main model under the (N +1) th iteration to obtain a second feature group of the face under the (N +1) th iteration.

In the foregoing aspect,

and at least using the main model in the face recognition model after the iteration is finished to recognize the face image to be recognized.

The embodiment of the application provides a face recognition method, which comprises the following steps:

obtaining a face image to be recognized;

inputting the face image into at least a main model in a face recognition model trained according to the training method;

identifying the face image to be identified at least by using a main model in the face identification model;

wherein the primary model is trained based at least on a secondary model of the face recognition model.

The embodiment of the application provides a training device of a face recognition model, which comprises:

the first obtaining unit is used for obtaining at least two groups of images under the Nth iteration, wherein each group of images in the at least two groups of images is at least two images aiming at the same human face; n is a positive integer greater than or equal to 1;

a second obtaining unit, configured to obtain two sets of training data sets under the nth iteration according to the at least two sets of images;

the first processing unit is used for processing the first group of training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration;

the second processing unit is used for processing a second group of training data sets by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration;

the calculation unit is used for calculating a loss function value under the Nth iteration according to the first characteristic group and the second characteristic group;

a determining unit, configured to at least determine whether to update the main model in the nth iteration for the corresponding time according to the loss function value in the nth iteration;

if the determining unit determines not to update the main model under the Nth iteration for the Nth time, the iteration process is ended; and if the determining unit determines to update the main model under the nth iteration for the nth time, continuing to execute the (N +1) th iteration process by N = N + 1.

An embodiment of the present application further provides a face recognition device, including:

the first obtaining unit is used for obtaining a face image to be recognized;

an input unit, configured to input the face image into at least a main model of a face recognition model trained according to the aforementioned training method;

the recognition unit is used for recognizing the face image to be recognized at least by utilizing the main model of the face recognition model;

Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the aforementioned steps of the face recognition model training method and/or the face recognition method.

The embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the training method for the face recognition model and/or the face recognition method when executing the program.

The embodiment of the application provides a training method of a face recognition model, a face recognition method, related equipment and a computer readable storage medium, wherein the face recognition model comprises a secondary model and a primary model, and the training method comprises the following steps: obtaining at least two groups of images under the Nth iteration, wherein each group of images in the at least two groups of images are two images aiming at the same face; n is a positive integer greater than or equal to 1; obtaining two groups of training data sets under the Nth iteration according to at least two groups of images; processing the first group of training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration; processing the second group of training data set by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration; calculating a loss function value under the Nth iteration according to the first characteristic group and the second characteristic group; at least determining whether to update the main model under the Nth iteration for the corresponding time according to the loss function value under the Nth iteration; if the main model under the Nth iteration is not determined to be updated for the Nth time, the iteration process is ended; and if the main model under the nth iteration is determined to be updated for the nth time, continuing to execute the (N +1) th iteration process by N = N + 1.

In the embodiment of the application, the training of the face recognition model by utilizing the shallow data is realized. In addition, the training accuracy of the model can be ensured by combining the characteristics of the human face and two images of the same human face for training, so that the accurate recognition of the human face can be realized.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a first schematic flow chart of an implementation of a training method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a second implementation flow of the training method according to the embodiment of the present application;

fig. 3 is a schematic flow chart illustrating an implementation of a training method according to an embodiment of the present application;

fig. 4 is a schematic flow chart of an implementation of the training method according to the embodiment of the present application;

fig. 5 is a schematic view of an implementation flow of a face recognition method according to an embodiment of the present application;

fig. 6 is a schematic flow chart of an implementation of the training method according to the embodiment of the present application;

FIGS. 7(a) and 7(b) are schematic diagrams illustrating the pretreatment according to the embodiment of the present application;

FIG. 8 is a schematic diagram of an implementation of a semi-twin neural network according to an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating a comparison between a training method according to an embodiment of the present application and a conventional training method;

FIG. 10 is a schematic diagram of a training apparatus according to an embodiment of the present application;

fig. 11 is a schematic diagram of a composition structure of a face recognition device according to an embodiment of the present application;

fig. 12 is a hardware process diagram of a training device and/or a face recognition device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In the present application, the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict. The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.

According to the above description, the embodiment of the present application is intended to realize that, if the training data is shallow data, accurate training of the face recognition model can be realized. That is, how to implement accurate training of the face recognition model in the case where the number of images of the same face in the training data set is limited or small. It can be understood that the more accurate the face recognition model is trained, the more accurate the face recognition model is used for face recognition in the application layer, and the recognition accuracy can be greatly ensured.

Before describing the technical solutions of the embodiments of the present application, technical terms that may be used in the embodiments of the present application are described.

1) A training set, which may also be referred to as a training data set, is a set of training data. In the field of face recognition technology, training data generally refers to images acquired of faces of different persons, and naturally, a training set refers to a collection of images (face images) acquired of faces of those persons.

It should be noted that the shallow data mentioned in the embodiment of the present application refers to a case where the number of face images for the same face in the training data set is small or limited, and is not a case where the number of face images in the training data set is small or limited.

2) The number of iterations may be regarded as a number of calculations, and each iteration is regarded as a calculation process. In the embodiment of the application, the calculation of the face recognition model is performed for a plurality of times through a plurality of iterations to calculate or determine the expected (or final) face recognition model. In the embodiment of the application, the meaning of the Nth iteration is equal to the meaning of the Nth iteration.

The embodiment of the training method of the face recognition model is applied to the training equipment of the face recognition model. The training device may be any device capable of training the face recognition model, such as a server, a terminal, a base station, and the like. In the embodiment of the application, for accurately training the face recognition model aiming at the shallow data, the face recognition model is divided into two parts which are better trained by utilizing the shallow data: a primary model and a secondary model. Wherein the secondary model can be considered as part of the training aid for the primary model.

Fig. 1 is a first schematic flow chart illustrating an implementation process of a training method of a face recognition model in an embodiment of the present application. As shown in fig. 1, the method includes:

s101: obtaining at least two groups of images under the Nth iteration, wherein each group of images in the at least two groups of images are two images aiming at the same face; n is a positive integer greater than or equal to 1;

in this step, each of the at least two sets of images is a face image. And collecting the images of different faces which can be collected to obtain an initial data set. At each iteration (at each iteration time), a plurality of face images are read from the initial data set, and the number of the images corresponding to the same face in the read plurality of face images is at least two. It will be appreciated that the at least two groups of images are images of at least two different faces, and that the number of images of the same face is small or limited, e.g. two. It is understood that if the number of image groups is M, M is a positive integer equal to or greater than 2. The M groups of images may be an image group of M faces (each group of images corresponds to one face), or may be an image group of other positive integer numbers of faces smaller than M (two or more groups of images correspond to one face).

The above-mentioned content is that the face image in the initial data set is read for the nth time to obtain at least two groups of images under the nth iteration. In addition, images of different faces can be directly shot or collected under each iteration, at least two images are shot or collected for each face, and the two images representing the same face are regarded as a group. It will be appreciated that to implement an image group, the number of images taken for the same face is typically an even number.

In some alternative embodiments, to achieve accurate training of the face recognition model for shallow data, the number of face images in the initial data set may be many, such as ten thousand or thousands. But the face images that need to be directed to the same face are limited. The limited number of face images for the same face may be considered as a small number of face images for the same face compared to the number of face images in the initial data set. If the face images in the initial data set are ten thousand, for example, ten thousand, the face images of the same face in the initial data set may be 8, 20, or 100 digits, for example, one digit, ten digits, or one hundred digits.

S102: obtaining two groups of training data sets under the Nth iteration according to at least two groups of images;

here, for each iteration, two sets of training data sets are obtained based on at least two sets of images obtained at the number of iterations. In some alternative embodiments, the at least two sets of images may be images for at least two different faces, taking into account that each of the at least two sets of images is two images for the same face. And grouping the two images in each group of images to obtain two groups of training data sets. The two sets of training sets are divided into a first set of training sets and a second set of training sets. The first set of training sets may include one of the facial images in the same set of images. The second set of training sets may include another facial image in the same set of images. The first and second sets of training sets are used differently in the embodiments of the present application. See the description below.

S103: processing the first group of training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration;

s104: processing the second group of training data set by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration;

in S103 and S104, in each iteration, the first group of training data sets in the current iteration number is processed by the sub-model in the current iteration number, so as to obtain a feature group (first feature group) of the face in the current iteration. And processing the second group of training data set by the main model under the current iteration times to obtain a feature group (a second feature group) of the face under the Nth iteration.

S105: calculating a loss function value under the Nth iteration according to the first characteristic group and the second characteristic group;

s106: at least determining whether to update the main model under the Nth iteration for the corresponding time according to the loss function value under the Nth iteration; if the main model under the Nth iteration is not determined to be updated for the Nth time, the iteration process is ended; and if the main model under the nth iteration is determined to be updated for the nth time, continuing to execute the (N +1) th iteration process.

In the foregoing S101 to S106, in order to implement accurate training of the face recognition model for the shallow data, the face recognition model is divided into two parts, namely a main model and a sub model. If at least two groups of images or two groups of training data sets obtained according to at least two groups of images are regarded as shallow data, and each iteration requires a sub-model under the current iteration times to process the first group of training data sets to obtain a first feature group, the above scheme can be regarded as a scheme for training a main model in a face recognition model by using the sub-model and the shallow data, so that the aim of training the face recognition model can be fulfilled, and the training of the face recognition model by using the shallow data is realized. In addition, the training of the main model is carried out by combining the characteristics of the human face and two images of the same human face, so that the training accuracy of the main model can be ensured, and the human face can be accurately identified by the main model.

In an optional scheme, the step S106 of determining, according to the loss function value at the nth iteration, at least whether to update the main model at the nth iteration correspondingly may be further performed as step S106': and determining whether to update the main model and the auxiliary model in the Nth iteration correspondingly according to the loss function value in the Nth iteration. That is, in each iteration scheme, for example, in the nth iteration scheme, based on the loss function value calculated in the nth iteration, it may be determined whether to perform corresponding update on the main model in the nth iteration, and it may also be determined whether to perform corresponding update on the sub model in the nth iteration. The main model and the auxiliary model are updated in the same iteration times, which is equivalent to training the main model and the auxiliary model by combining the characteristics of the human face, and the training accuracy of the main model and the auxiliary model can be ensured, namely the training accuracy of the human face recognition model is ensured, so that the accurate recognition of the human face by the human face recognition model can be realized.

And executing S101-S106 by the main body as training equipment of the face recognition model. The schemes of S101 to S106 are executed in the nth iteration, and if it is determined in S106 that the main model in the nth iteration is updated for the nth time, N is updated to N = N +1, and the N +1 th iteration process is continuously executed. The (N +1) th iteration process may include S201 to S206, as shown in FIG. 2:

s201: obtaining at least two groups of images under the (N +1) th iteration, wherein each group of images in the at least two groups of images are two images aiming at the same face;

in this step, each group of images under two adjacent iterations may be the same group of images, or may be different groups of images, and preferably is different groups of images. Two images in each group of images under two adjacent iteration times can be different in one or both.

S202: obtaining two groups of training data sets under the (N +1) th iteration according to at least two groups of images;

s203: processing the first group of training data sets by the sub-model under the (N +1) th iteration to obtain a first feature group of the face under the (N +1) th iteration;

in this step, the sub-model under the (N +1) th iteration is the same model as the model after updating the sub-model under the nth iteration in the nth iteration scheme.

S204: processing the second group of training data set by the main model under the (N +1) th iteration to obtain a second feature group of the face under the (N +1) th iteration;

in this step, the main model in the (N +1) th iteration and the model updated in the nth iteration scheme from the nth iteration are the same model.

S205: calculating a loss function value under the (N +1) th iteration according to the first characteristic group and the second characteristic group;

s206: at least determining whether to update the main model under the (N +1) th iteration for the corresponding time according to the loss function value under the (N +1) th iteration; if the main model under the (N +1) th iteration is determined not to be updated for the (N +1) th time, ending the iteration process; and if the main model under the (N +1) th iteration is determined to be updated for the (N +1) th time, continuing to execute the (N + 2) th iteration process.

It can be seen from S203 and S204 that, in some two adjacent iteration schemes, the primary and secondary models used in the next iteration scheme may be the updated primary and secondary models in the last iteration scheme. In each iteration scheme, the face recognition model is trained by using the secondary model and the shallow data, so that the training of the face recognition model by using the shallow data is realized.

In some optional embodiments, in S203, the first feature group of the face in the (N +1) th iteration, which is obtained by processing the first set of training data sets in the (N +1) th iteration by the sub-model in the (N +1) th iteration, may be regarded as a set of face features. For convenience of description, the first feature set may be regarded as a set of two major facial features, such as a set of a first partial feature and a second partial feature. Wherein, the first partial feature and the second partial feature have the same origin as that shown in fig. 3, and S203 includes:

s2031: obtaining a first part of characteristics, wherein the first part of characteristics are obtained by extracting human face characteristics in a first group of training data sets of two groups of training data sets under the N iteration by using a secondary model under the N iteration;

s2032: extracting the face features in the first group of training data sets under the (N +1) th iteration by using the sub-model under the (N +1) th iteration to obtain a second part of features under the (N +1) th iteration;

s2033: and obtaining a first feature group of the face under the (N +1) th iteration according to the first partial features and the second partial features.

It can be seen from the schemes of S1031 to S1033 that the first partial features in the first feature set in the N +1 th iteration are obtained by extracting the features of the face images in the first training data set of the two training data sets in the nth iteration using the sub-model in the nth iteration, and the second partial features are obtained by extracting the face images in the first training data set in the N +1 th iteration using the sub-model in the N +1 th iteration. The set of two partial features may be considered as the first set of features at iteration N + 1.

In short, the second partial features in the first feature group at a certain iteration number are related to the facial features at the previous iteration number (at the nth iteration number) of the current iteration number (N +1 th iteration). Specifically, the stored face features obtained by extracting or extracting the face images in the first group of training data sets under the previous (nth) iteration are read. And the read face features are regarded as a second feature part under the current iteration. The first partial feature and the second partial feature are aggregated into a first feature group. The scheme for obtaining the (first) feature group not only considers the face features in the images under the current iteration times, but also considers the face features in the images under the previous iteration times, so that although the number of the images aiming at the same face in at least two groups of images is limited or less, the face features in the images under the current iteration times and the face features in the images under the previous iteration times are considered in each iteration, which is equivalent to increase the diversity of the features of the faces with limited image numbers, the diversity in a shallow data class can be considered to be increased to a certain extent, the shallow data with the diversity in the class is utilized to train the face recognition model, and the accuracy of the trained model can be greatly improved.

Unlike the first feature set, the second feature set at the current iteration number in the embodiment of the present application is obtained based on the second set of training data sets at the current iteration number (nth iteration). The second set of features are also facial features. If the second feature group and the first feature group under a certain iteration number are regarded as face features obtained from different sets of training data sets under the certain iteration number, the scheme of the embodiment of the application can combine feature groups obtained from different sets of training data sets, calculate a loss function value under the current iteration number, and determine whether to continue training on the main model or the main and auxiliary models according to the calculation result. The training of the face recognition model is carried out by combining the two large feature groups, so that the training accuracy can be greatly ensured. In addition, the combined features are equivalent to the diversity of the human face features, so that the training accuracy can be improved.

In the field of training a model, training a main model is equivalent to updating the main model. The training of the main model is completed, i.e. the main model is updated to the desired state. On the technical level, the training of the main model is realized through an iterative process. And updating the main model once every iteration until the main model is updated to the expected state, and finishing or ending the training process. In this case, it can be considered that the face recognition model reaches the desired state as long as the main model reaches the desired state, and the training is completed or finished without continuing the training.

In the embodiment of the application, the face recognition model can reach the expected state only by training the main model. In addition, the face recognition model includes not only the main model but also the sub model. In order for the face recognition model to reach the expected state, both the main model and the auxiliary model are required to reach the expected state. In this case, not only the primary model but also the secondary model need to be trained. Namely, the first feature set and the second feature set are used for training the secondary model and the primary model until the training is completed. In the embodiment of the present application, it is preferable that both the primary and secondary models are trained, and when both the primary and secondary models reach respective expected states, it is considered that the face recognition model reaches the expected state, and the training process is completed or ended.

In the embodiment of the present application, in order to know when the training of the main model or the main and sub models is completed, a concept of a loss function needs to be introduced. Literally, a loss function is a function representing loss or risk. It will be appreciated by those skilled in the art that a smaller loss or risk function value is better. In practical application, the minimum loss value may not be obtained due to the limitation of the composition structure of the trained model, and in this case, the value of the loss function is required to be as small as possible or to be always maintained within a certain error after multiple iterations. In the embodiment of the application, a plurality of face image groups can be obtained in advance, and each face image group is used for constructing the loss function for the face recognition model according to the plurality of face image groups aiming at two images of the same face. Under the condition that the face recognition model enters the Nth iteration, calculating a loss function value under the Nth iteration according to the first feature group and the second feature group under the Nth iteration; if the loss function value under the Nth iteration is smaller than or equal to a preset loss threshold, at least determining not to update the main model under the Nth iteration for the Nth time; and if the loss function value under the Nth iteration is larger than the loss threshold, at least determining to update the main model under the Nth iteration for the Nth time. The foregoing scheme is a scheme of determining only whether to update the main model. Furthermore, it may also be determined whether the secondary model is updated: if the loss function value under the Nth iteration number is smaller than or equal to a preset loss threshold, at least determining that the sub-model under the Nth iteration is not updated for the Nth time; and if the loss function value under the Nth iteration is larger than a preset loss threshold, at least determining to update the sub-model under the Nth iteration for the Nth time.

It will be appreciated that the loss threshold may be an empirically derived value or range of values, and may be predetermined. In a specific implementation, different loss thresholds may be set for the primary and secondary models, and when the loss function value is compared with the loss threshold, whether the primary model is updated is determined by using the loss threshold set for the primary model. Whether to update the sub-model is determined using a loss threshold set for the sub-model. The loss threshold value preset for the main model and the loss threshold value preset for the auxiliary model can be the same or different, and can be flexibly set according to specific use conditions. Whether the updating scheme is equivalent to determining to update the main model and the auxiliary model or not according to the size relationship between the loss function value calculated under the Nth iteration and the loss function set for each model. The updating accuracy of the main and auxiliary models is guaranteed, and the training accuracy of the face recognition model is guaranteed. And stopping training if the calculated loss function value under a certain iteration number is less than or equal to the loss function set for each model, so that the training process can be stopped or ended in time, and the problem of computing resource waste caused by excessive training (iteration) processes is avoided.

In some optional embodiments, as shown in fig. 4, to facilitate better or more accurate training of the model, after obtaining two sets of training data sets at each iteration number, illustratively, as after S102, the method further comprises:

s102 a: respectively preprocessing the two groups of training data sets under the Nth iteration to obtain two groups of target training data sets;

accordingly, S103 and S104 become:

s103 a: processing the first group of target training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration;

s104 a: processing a second group of target training data sets by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration;

s102 a-S104 a are equivalent to training a model to be trained, such as a main model, by using two sets of target training data. In the embodiment of the present application, although the two images in each group of images are both images of the same human face, the two images are different. For example, a photograph taken or captured of the face in a more formal situation, such as a captured or captured certificate. And another image taken or captured for the face in a less normal scene, such as shopping at a mall. For convenience of description, the face image obtained in an informal scene is called a live photo. The two sets of training data sets are actually obtained by separating the identification photo and the live photo for each face in the at least two sets of images to form a first set of training data sets and a second set of training data sets. If the first group of training data sets are the set of identification photos of all the faces; the second set of training data is a set of live shots of each face. And vice versa. In S102a, the two sets of training data are preprocessed, which is equivalent to preprocessing the identification photo of each face and the scene photo of each face. Pretreatment includes, but is not limited to, the following: the method comprises the steps of aligning faces in a certificate photo and a field photo which are characterized by the same face, and performing data enhancement on the aligned certificate photo and the field photo. The data enhancement comprises the following steps: and performing at least one of cutting, scaling, graying, rotating and the like on the identification photo and the field photo. The training precision can be improved by training the main model or the main and auxiliary models according to the preprocessed target training data set, so that the trained model is more accurate.

In some optional embodiments, after S106 or S106', the training method further comprises: acquiring a face image to be recognized, and at least inputting the face image to the trained face recognition model; and identifying the face image to be identified at least by using a main model in the face identification model. Because the trained main model or the main and auxiliary models are trained with high accuracy, the trained model with high accuracy is used for identifying the face in the face image to be identified, the face identification accuracy can be improved, and the face identification error is avoided.

It is understood that the foregoing scheme is a scheme of training a face recognition model. In practical applications, when the trained model is trained, the trained model is also used for face recognition. Therefore, the embodiment of the present application further provides a face recognition method, in which the face recognition model includes the aforementioned main model and sub-model. As shown in fig. 5, the method includes:

s501: obtaining a face image to be recognized;

in this step, the face image to be recognized is read from a database or a server that stores the face image to be recognized. Or, the image shooting or the image acquisition is directly carried out on the face to be recognized.

S502: inputting the face image into at least a main model in a trained face recognition model;

in this step, the training of the face recognition model is performed according to the training method described in any one of fig. 1 to 4 until the training is completed.

S503: identifying the face image to be identified at least by using a main model in the face identification model; wherein the primary model is trained based at least on a secondary model of the face recognition model.

In S502 and S503, the face image to be recognized is input to the main model of the trained face recognition model, the face image to be recognized is recognized by at least using the main model, and the recognition result of the main model is used as the final recognition result of the face recognition model. Or inputting the face image to be recognized into a main model and an auxiliary model of the trained face recognition model, and recognizing the face image to be recognized by using both the main model and the auxiliary model. And under the condition that the identification results of the main model and the auxiliary model are consistent, taking the consistent result as the final identification result of the face identification model. The main model is trained at least based on the auxiliary model of the face recognition model, namely the auxiliary model of the face recognition model assists the main model to complete the final training, and the accuracy of the main model can be guaranteed. Therefore, the face recognition result can be more accurate.

In an optional embodiment of the face recognition method, the training of the primary model based on at least the secondary model of the face recognition model comprises: obtaining at least two groups of images under the Nth iteration, wherein each group of images in the at least two groups of images are two images aiming at the same face; n is a positive integer greater than or equal to 1;

obtaining two groups of training data sets under the Nth iteration according to at least two groups of images; processing the first group of training data sets by the secondary model under the Nth iteration to obtain a first feature group of the face under the Nth iteration; processing the second group of training data set by the main model under the Nth iteration to obtain a second feature group of the face under the Nth iteration; calculating a loss function value under the Nth iteration according to the first characteristic group and the second characteristic group; at least determining whether to update the main model under the Nth iteration for the corresponding time according to the loss function value under the Nth iteration; if the main model under the Nth iteration is not determined to be updated for the Nth time, the iteration process is ended; and if the main model under the nth iteration is determined to be updated for the nth time, executing the (N +1) th iteration process by using N = N + 1.

For the same or similar contents related to the training method in the embodiment of the face recognition method, please refer to the related description, and the description is not repeated.

The training method and the face recognition method of the present application will be described in further detail with reference to fig. 6 to 9 and specific embodiments.

In this application scenario, a certificate photo and a scene photo with an image group as a face are taken as examples for explanation. It will be appreciated that a certificate photograph is a photograph taken in a more formal situation, for example, when applying for various types of certificates, such as identification cards, passports, which may be stored in a database. The live photo is a photo obtained in an informal occasion, such as a photo taken by a camera in a shopping mall or a supermarket when a user shops in the shopping mall or the supermarket, and the photo can be stored in the server. The collection of photos stored in the database and server may be considered the initial data set.

Person in the examples of the present applicationThe face recognition model can be a neural network model, and any reasonable neural network model can be used. If the face recognition model is a semi-twin convolutional neural network model. The semi-twin convolutional neural network model comprises a main model phi_pAnd a sub-model phi_g. Main model phi_pA main branch corresponding to a semi-twin convolutional neural network; sub-model phi_gCorresponding to the secondary branch network. Main model phi under Nth iteration calculation_pIs phi_p(N), the secondary model phi_gIs phi_g(N) is provided. By aligning the main model phi_pPerforming multiple iterative calculations until the loss function of the main model

And when the loss is less than or equal to the set loss threshold value, the training is finished.

Assuming that the current iteration number is N +1, N = N +2 continues to perform the next iteration when the (N +1) th iteration is completed until the training is completed. N may be iterated from an initial value of 1, as shown in figure 6,

s601: obtaining the identification photos and the spot photos of at least two faces from the initial data set, combining the identification photos and the spot photos of the same face to obtain at least two groups of images under the N +1 iteration times;

it will be appreciated that the identity photos of a plurality of faces are read from the initial data set. Given the limited number of credentials per user, the number of credentials for the same user in the initial data set is small, such as 1, 2, or other positive integer numbers less than 10. According to the read certificate photo, a scene photo which is represented by the same face (or user) as the read certificate photo is read from the initial data set. For example, if the read photos are for user 1, user 2, and user 3 …, then the photos are read for user 1, user 2, and user 3 …. For each face (per user), the number of live photos will typically be greater than the number of certified photos. The number of photographs expressed as the same face in the read live photograph needs to be kept the same as the number of photographs expressed as the same face in the live photograph. For example, the read certificate photo and the field photo aiming at the same user are both one. And combining the identification photo and the scene photo which are expressed as the same user or the same face to obtain an image group aiming at the same face. The set of image groups of a plurality of different human faces forms at least two groups of images. For example, the 1 st group of images of the at least two groups of images are the identification photograph and the scene photograph of the user 1, the 2 nd group of images are the identification photograph and the scene photograph of the user 2, the 3 rd group of images are the identification photograph and the scene photograph of the user 3, and so on, the identification photographs and the scene photographs of a plurality of users can be read.

On the technical level, because the number of faces in the initial data set is large, every iteration reads all identification photos and all spot photos of the faces or all identification photos and all spot photos, so that the iteration process is slow, and the iteration efficiency is low. To solve this problem, it may be preset to read only partial face images at each iteration, such as reading M =50 face images or M =100 face images, but of course, other numbers may be set. It can be understood that a training process includes many iteration processes, such as the above ten thousand iteration processes, and it is necessary to preset that the number of faces read in all the iteration processes is the same or consistent. The faces in the image group obtained under the two adjacent iterative processes can be partially the same or completely different. Illustratively, taking reading 50 faces at each iteration count as an example, and reading the face image of the user P-user (P + 50) at the N +1 iteration count, the face image of the user (P + 51) -user (P + 100) may be read at the N +2 iteration count, and the face image of the user (P + 10) -user (P + 60) may also be read, which may be flexibly set as the case may be. Wherein P is a positive integer of 1 or more.

It should be noted that, although the number of faces in the initial data set is large, for example, ten thousand is taken as a unit. However, due to the limited nature of the identification photo, only a few or dozens of images of the same face exist in ten thousand images. If each human face is regarded as data in one category, only a few or dozens of images with the same human face in ten thousand images mean that the data in the category is single, limited and not diverse. Such data is called shallow data, and the related art cannot train an accurate face recognition model by using the shallow data. The embodiment of the application provides a method for accurately training a face recognition model by utilizing shallow data. See also, in particular, the following:

s602: classifying at least two groups of images obtained under the (N +1) th iteration according to the types of the images in the initial data set to obtain two groups of training data sets under the (N +1) th iteration;

in this step, the types of images in the initial dataset include identification photographs and spot photographs. And classifying at least two groups of images obtained under the (N +1) th iteration number according to the identification photo and the field photo. E.g., all the credentials in at least two sets of images obtained at the (N +1) th iteration are collected as a first set of training data sets (credentials data sets). All live shots in at least two groups of images obtained under the N +1 th iteration number are collected into a second group of training data sets (live shot data sets).

After the execution of S602 is completed, S604 may be executed. Alternatively, after the execution of S602 is completed, S603 is executed, and after the execution of S603 is completed, S604 is executed. As a preferred step, S603 may obtain a training data set for better training the face recognition model.

S603: respectively preprocessing the certificate photo in the certificate photo data set and the field photo in the field photo data set obtained under the (N +1) th iteration to obtain a first group of target training data set and a second group of target training set under the (N +1) th iteration;

in the application scene, the identification photo and the scene photo are both face images.

As shown in fig. 7(a), the foregoing preprocessing process may be to detect the faces and key points in the face image, so as to achieve alignment of the faces. Specifically, face detection is performed on each face image obtained under the nth iteration, and the region where the face is located in each face image is obtained. In the face region in each face image, detection of key points of the face such as eyes, nose, mouth and the like is performed. And aligning the identification photo of the same face with the face in the field photo by using the face key point. And aiming at each face, obtaining a certificate photo and a field photo after the face is aligned. The face image after the face alignment can be regarded as a standardized face image, and the standardized face image is used for model training, so that the training is more accurate.

As shown in fig. 7(b), in the preprocessing process, in addition to detecting the faces and key points in the face image and realizing alignment of the faces, data enhancement on the face image can also be realized after the faces are aligned. Specifically, the certificate photo after face alignment and the field photo are subjected to at least one of the following operations: cutting, scaling, graying, rotating and the like to realize the enhancement of the human face data of the human face image. The face data in the face image is partially enhanced, and the enhanced face data is used as target training data to be input into the main model and the auxiliary model, so that the model can be trained more accurately. Each set of data-enhanced identification photo images may be considered a first target training data set, and each data-enhanced live photo image may be considered a second target training data set. S604 is performed using the first and second target training data sets.

If S602 jumps to S604

S604 is S6041: obtaining a first partial feature and a second partial feature under the (N +1) th iteration; and obtaining a first feature group under the (N +1) th iteration according to the first partial features and the second partial features. Inputting a second group of training data sets into the main branch network to obtain a second feature group under the N +1 th iteration;

in the scenario from S602 to S604, the first partial feature and the second partial feature are obtained by using the training data set. The first part of characteristics are obtained by utilizing a secondary model under the Nth iteration to extract the face characteristics of a first group of training data sets of two groups of training data sets under the Nth iteration; the second part of features are obtained by extracting the face features of the first group of training data sets under the (N +1) th iteration by using the sub-model under the (N +1) th iteration.

If jumping from S603 to S604, then

S604 is S6042: obtaining a first partial feature and a second partial feature under the (N +1) th iteration; and obtaining a first feature group under the (N +1) th iteration according to the first partial features and the second partial features. And inputting the second group of target training data sets into the main branch network to obtain a second feature group under the N +1 th iteration.

In the scenario from S603 to S604, the first partial feature and the second partial feature are obtained by using the target training data set. Specifically, the first partial feature is obtained by extracting the face feature of a first group of target training data sets of two groups of training data sets under the N iteration by using a secondary model under the N iteration; the second part of features are obtained by extracting the face features of the first group of target training data sets under the (N +1) th iteration by using the sub-model under the (N +1) th iteration.

It will be appreciated that since the first training data set and the first target training set are sets of identification photographs of different faces, such as sets of identification photographs of 50 users, i.e. users 1-50, and the second training data set and the second target training set are sets of live photographs of different faces, such as sets of live photographs of 50 users, i.e. users 1-50, the order of the 50 face images input to the two branch networks should be consistent at each iteration. Illustratively, if the face image (certificate photo) input into the secondary branch network is input in the order of user (face) 1, user 2, user 3 … and user 50 at each iteration, the live photo input into the primary branch network at that iteration should also be input in the order of user 1, user 2, user 3 … and user 50. If the identification photograph and the live photograph of the same face are regarded as a pair of images, the order in which the identification photograph of the pair of images is input to the secondary branch network should coincide with the order in which the live photograph is input to the primary branch network.

The specific implementation process of S6042 is described below, and the specific implementation of S6041 is similar thereto, and repeated descriptions are omitted here.

The primary branch network and the secondary branch network include at least two convolutional layers. In the neural network, the convolutional layer is used for extracting or extracting the face features of the face image to obtain the feature information of the face. Since each convolution layer can extract or extract at least one feature of the human face, such as contour, detail, texture, color, and the like. Each convolutional layer focuses on extracting what information of the face is, and the information depends on the type and function of the filter arranged in the convolutional layer, and is not described in the embodiment of the present application.

Firstly, the specific implementation scheme of the second feature group under the N +1 th iteration is obtained by inputting the second group of target training data sets into the main branch network in S6042: under the (N +1) th iteration number, inputting each scene photo (face image) subjected to face alignment and data enhancement into a main branch network, performing feature extraction or extraction on the scene photo by at least two convolution layers in the main branch network to obtain the face features of each scene photo, and considering the face features of each scene photo as a second feature group. Wherein, the output of the convolution layer is output in the form of vector, and the characteristic vector x output by the convolution layer_pCan be regarded as a collection of facial features for each scene. Eigenvector x output by convolution layer under N +1 th iteration number_p(N +1), which can be considered as the second feature set at the (N +1) th iteration.

See again the scheme for obtaining the first feature set in S6042 for the (N +1) th iteration: and under the (N +1) th iteration frequency, inputting each certificate photo (face image) subjected to face alignment and data enhancement into a secondary branch network, performing feature extraction or extraction on the certificate photo by at least two convolution layers in the secondary branch network to obtain the face feature of each certificate photo, and considering the face feature of each certificate photo as a second part of features in the first feature group under the (N +1) th iteration. It can be understood that, in order to facilitate the implementation of the subsequent iteration process, the feature information of the face image input into the secondary branch network under the current iteration is calculated in each iteration process, and the feature needs to be stored. And can be directly read out in the next iteration process. Illustratively, in the process of the (N +1) th iteration, the face features of each certificate photo obtained by inputting the certificate photo under the nth iteration into the convolution layer of the secondary branch network are read, and the first feature group is obtained by collecting the first partial features and the second partial features according to the fact that the read part of the face features are the first partial features in the first feature group under the (N +1) th iteration. It can be understood that the human face is subjected to feature extraction because the human face features are convolution layersOr extracted, the convolutional layer is usually output in the form of a vector, so that the first and second partial features are feature vectors at each iteration. The first partial feature read out as the eigenvector x at the (N +1) th iteration number_g(N +1), the feature corresponding to the feature vector is the same as the second part of features obtained by extracting through the sub-branch network under the nth iteration number.

In order to facilitate the iteration of the main and auxiliary models, in the application scenario, the first feature group in the N +1 th iteration is regarded as a queue x_j(N + 1). Queue x_jThe length of (N +1) is related to the number of identification photographs input to the secondary branch network at each iteration, that is, the number M of face images read from the initial data set at each iteration, and may be P times (P is a positive integer greater than or equal to 1) the number of identification photographs input to the secondary branch network at each iteration, for example, P = 2. With the number of credentials input to the secondary branch network per iteration M =50, the feature queue x_jLength of (N +1) = P × M =2 × 50= 100. Queue x_jThe (N +1) contains 100 elements of the P x M number, each element occupying one position of the queue. The first M =50 of the 100 elements are the face features of each certificate photo obtained by inputting M =50 certificate photos to the convolution layer of the secondary branch network at the nth iteration number. The last M elements of the 2M elements are the face features of each certificate photo obtained by inputting the M certificate photos under the (N +1) th iteration times into the convolution layer of the secondary branch network.

As can be appreciated, queue x_j(N +1) has a first-in first-out characteristic. Because the face features with the previous iteration number will be written into the queue x first_j(N +1), writing the face features with the iteration times into a queue x after meeting_j(N + 1). Queue x P times per iteration_j(N +1) becomes full. To ensure normal training of the model, at each iteration, a queue x_j(N +1) there will be M elements that enter first to be shifted out to ensure that the M elements obtained at the current iteration number are written normally. Illustratively, after the (N +1) th iteration process is performed, the queue x_jThe first M positions of (N +1) are recorded with M elements obtained in the Nth iteration, and the last M positions of the queue are recordedThe M elements obtained at the Nth iteration are recorded. The elements that are positioned earlier will be moved out of the queue first. Under the condition that M face features output by the convolutional layer of the auxiliary branch network under the (N +1) th iteration number are obtained, M elements obtained under the (N-1) th iteration in the queue are shifted out, and the M elements under the N iteration are shifted to the first 50 positions from the last 50 positions of the queue. And M face features output by the convolution layer of the auxiliary branch network under the (N +1) th iteration number are added to the left 50 positions in the queue. Queue x as the number of iterations increases_j(N +1) is a dynamically changing queue.

For convenience of description, see x_p(N +1) is the feature vector obtained by the main branch network at iteration N + 1. View x_gAnd (N +1) is a feature vector obtained by the secondary branch network under the nth iteration. x is the number of_jAnd (N +1) is a dynamic queue obtained under the (N +1) th iteration. It can be understood that x_p(N+1)、x_g(N+1)、x_jThe values of the features in (N +1) have large differences, so that the iteration difficulty caused by the large differences can be avoided, and the values can be normalized, so that the absolute value of the feature values is 0-1. It can be understood that, under the first iteration times of the whole iteration process, such as the 1 st iteration time and the 2 nd iteration time, the queue may not be full, and the calculation is performed according to the situation that the position of the queue is empty. As the number of iterations increases, the queue gradually fills up. And when the queue is full, performing dynamic updating.

Wherein x is_jThe (N +1) considers not only the face features in the image under the current iteration number (the (N +1) th iteration) but also the face features in the image under the previous iteration number (the nth iteration), so that although the number of the images aiming at the same face input to the main and auxiliary branch networks under each iteration is relatively limited, the diversity of the features of the face with the limited number of images is increased by considering the face features under the adjacent two iterations, and the diversity in the shallow data class is increased to a certain extent. The superficial data of the diversity in the class is utilized to train the face recognition model, so that the accuracy of the trained model can be greatly improved.

S605: overlapping the N +1 th timeX obtained by substitution_p(N+1)、x_g(N+1)、x_j(N +1) is substituted into the loss function L, and the value of the loss function L (loss function value) at the (N +1) th iteration is calculated.

It is to be understood that the loss function L is constructed based on the facial features in the face image, such as the live photograph and/or the identification photograph image, before performing S101. Taking the loss function L as a softmax (normalized exponential function) cross entropy loss function as an example, the constructed loss function L is:

where L (N +1) is the loss function at iteration N + 1. s is a scaling factor, which can be set empirically. e is the base of the exponential function, and is an infinite acyclic fraction with a value close to 2.71828. log () represents the logarithm of ().

Represents the power of e. And (3) calculating the value of the loss function L (N +1) under the (N +1) th iteration according to the formula (1).

S606: judging whether the loss function L (N +1) under the (N +1) th iteration is less than or equal to a preset loss threshold value or not;

if so, stopping iteration or continuing the iteration until the value of the loss function is stable;

if not, the main model phi is subjected to the (N +1) th iteration times according to the value of the loss function L (N +1) under the (N +1) th iteration times_p(N +1) and the secondary model φ_gAnd (N +1) updating.

Here, it is assumed that the loss thresholds preset for the primary and secondary models are the same.

Main model phi under the N +1 th iteration times_p(N +1) may be updated or iterated with a random gradient descent algorithm. In the application scenario, after the value of the loss function L (N +1) under the (N +1) th iteration is calculated, the value is substituted into the formula (2), and the main model phi under the (N +1) th iteration is obtained_p(N+1)：

Sub-model phi under the N +1 th iteration number_g(N +1) may be updated or iterated in a running average manner. In the application scenario, the value of the loss function L (N +1) under the N +1 th iteration and the main model phi under the N +1 th iteration are calculated_pAfter (N +1), substituting the (N +1) th iteration number into a formula (3) to obtain a sub-model phi under the (N +1) th iteration number_g(N+1)：

Wherein, m is a weighting factor, the value is between 0 and 1, the larger the value is, the smaller the update degree of the sub-model is, and the size can be set according to experience.

As can be seen from equations (2) and (3), the main model φ_pThe update of (N +1) is related to the loss function. Sub-model phi_gThe update of (N +1) is related to the update result of the sub-model at the previous iteration number and the main model at the current time. That is, in the embodiment of the present application, not only the primary model but also the secondary model are updated or iterated at each iteration.

Under the condition that a loss function L (N +1) under the N +1 th iteration is calculated, whether the loss function L is smaller than or equal to a preset loss threshold value or not is judged, if the loss function L is not smaller than or equal to the preset loss threshold value, N +1 is added with 1, namely N = N +2, the operation returns to S601, N +2 is used for replacing N +1, and the next iteration process is continuously executed until the loss function L is smaller than or equal to the loss threshold value. And if the L (N +1) is less than or equal to a preset loss threshold value, the updating of the main and auxiliary models can be stopped, when the loss function is less than or equal to the loss threshold value, the main and auxiliary models are expected, and the training or the iteration is finished. In addition, the main and auxiliary models can be continuously iterated when the loss function is smaller than or equal to the loss threshold value for the first time until the value of the loss function is stable. The loss threshold may be any reasonable value, such as 0.1, 0.15 or less, and the specific value of the loss function may be set flexibly according to experience, and is not limited to the above.

In general, the above scheme is a scheme of obtaining a trained face recognition model by using a semi-twin neural convolution network. The auxiliary branch network of the semi-twin neural convolution network can realize training of the auxiliary model by using the certificate photo under each iteration, and the main branch network of the semi-twin neural convolution network can realize training of the main model by using the field photo under each iteration. From the equations (2) and (3), the updating or iterative equations of the main model are different from those of the sub models, and the main model and the sub models have certain differences. The difference of the main model and the auxiliary model can enlarge the in-class diversity of the shallow data, so that the training of the models is more accurate. The main model is updated by a random gradient descent algorithm, the auxiliary model is updated by a moving average mode, and the stability and the continuity of model updating can be ensured, so that the training accuracy of the model is ensured. Furthermore, a semi-twin neural convolutional network, which is one of deep learning techniques, generally has a fully connected layer in addition to a convolutional layer. In the application scene, the full connection layer is deleted, the training of the main model and the auxiliary model can be realized only by using the convolution layer to obtain the characteristics of the face image, which is equivalent to directly using the characteristics of the face sample to optimize the model, thereby greatly accelerating the iteration speed and improving the iteration efficiency.

In a specific implementation, the main model is any reasonable random gradient descent algorithm such as LMS (least mean square algorithm), or RLS (recursive least squares algorithm). When the training of the main model and the auxiliary model is finished, both models are in an expected state, and the face recognition model is also in the expected state. And inputting the subsequent image (the image of the face to be recognized) needing face recognition into the main model, and recognizing the face by only utilizing the main model. The face recognition method can also be used for simultaneously inputting the face recognition data into the main model and the auxiliary model and simultaneously recognizing the face by utilizing the main model and the auxiliary model. And under the condition that the main model and the auxiliary model are simultaneously used for recognizing the human face, and under the condition that the result of the main model for recognizing the human face is consistent with the result of the auxiliary model for recognizing the human face, the consistent result is regarded as the final recognition result of the human face. If the two images are not consistent, the two images can be re-identified or the images of the same face can be obtained again for re-identification.

The scheme is a scheme for performing primary and secondary model training by utilizing shallow data. In the scheme of calculating the loss function, the main branch network is used for calculating the loss function of the face features extracted or extracted from the field photo and the face features extracted or extracted from the certificate photo, and the updating of the main model is related to the calculated loss function value. Because the characteristic part used in the loss function formula is derived from the secondary branch network, in a certain degree, the embodiment of the present application can be understood as training the primary model by using the secondary model and the shallow data. The method can achieve the aim of training the face recognition model by utilizing shallow data, and also can be used for training by combining the face characteristics and the field photograph and the certificate photograph (two images) of the same face, so that the training accuracy of the model is ensured, and the face can be accurately recognized.

The scheme is used for training the semi-twin convolutional neural network, and a main model and an auxiliary model of the trained semi-twin convolutional neural network are used as face recognition models. In each training process, namely each iteration, extracting the face features of each pair of images (each pair of images is a field photograph and a certificate photograph aiming at the same face) of different faces under the iteration by using at least two convolution layers in the main and auxiliary branch networks, constructing the features extracted by the auxiliary branch networks into dynamic queues, and calculating the loss value of the model by using the features extracted by the convolution layers of the main branch network and the dynamic queues. And under the condition that the calculated loss value is less than or equal to a preset loss threshold value, updating the main model by adopting a random gradient descent mode, and updating the auxiliary model by adopting a moving average mode. Compared with the traditional training mode, the human face recognition model can be trained accurately by utilizing the superficial data of the human face in the embodiment of the application, so that the human face superficial data can play a very effective role in the training process, and the recognition performance of the trained model is greatly improved.

In addition, in the embodiment of the present application, the loss function may be, in addition to the softmax cross entropy loss function, a-softmax, AM-softmax, a Center loss function (Center loss), a triple loss function (triple loss), or the like. Fig. 9 is a schematic diagram of the accuracy obtained by using the conventional training method and the training method provided in the embodiment of the present application in the case that the loss function is the above several types. The a-softmax and AM-softmax are functions obtained by deforming softmax to a certain extent, and for details, refer to the related description.

In fig. 9, the ordinate represents the training accuracy. On the abscissa, two adjacent histograms represent a conventional training (origin) method using the same loss function and the training method provided in the embodiments of the present application. If two adjacent histograms are considered as a set of histograms. There are several loss functions that can be used, there are several sets of histograms. As in fig. 9, there are 5 groups of histograms. The histogram represented by the grid in each set of histograms is the training accuracy obtained by the training method of the embodiment of the present application. The histogram in each set of histograms represented by diagonal lines is the training accuracy obtained using conventional training methods. As can be seen from fig. 9, no matter which function of the above functions is used as the loss function, the training accuracy obtained by using the training method of the embodiment of the present application is higher than that obtained by using the conventional training method. On one hand, the training method is high in accuracy, and a more accurate face recognition model can be trained; on the other hand, the training method of the embodiment of the application has strong robustness and stability, and cannot generate influence due to different loss functions.

An embodiment of the present application provides a training device for a face recognition model, where the face recognition model includes a sub-model and a main model, as shown in fig. 10, the training device includes: a first obtaining unit 1001, a second obtaining unit 1002, a first processing unit 1003, a second processing unit 1004, a calculating unit 1005, a determining unit 1006; wherein the content of the first and second substances,

a first obtaining unit 1001, configured to obtain at least two groups of images in an nth iteration, where each group of images in the at least two groups of images is at least two images of a same face; n is a positive integer greater than or equal to 1;

a second obtaining unit 1002, configured to obtain two sets of training data sets under the nth iteration according to at least two sets of images;

a first processing unit 1003, configured to process the first group of training data sets by using the sub-model under the nth iteration to obtain a first feature group of the face under the nth iteration;

a second processing unit 1004, configured to process the second group of training data sets by using the master model in the nth iteration to obtain a second feature group of the face in the nth iteration;

a calculating unit 1005, configured to calculate a loss function value in the nth iteration according to the first feature group and the second feature group;

a determining unit 1006, configured to at least determine whether to update the main model in the nth iteration at least according to the loss function value in the nth iteration;

if the determining unit 1006 determines not to update the main model in the nth iteration for the nth time, the iteration process is ended; if the determining unit 1006 determines that the main model in the nth iteration is updated for the nth time, N = N +1 continues to perform the (N +1) th iteration process.

In some optional embodiments, the determining unit 1006 is configured to:

In some optional embodiments, the determining unit 1006 is configured to: if the loss function value under the Nth iteration is smaller than or equal to a preset loss threshold, at least determining not to update the main model under the Nth iteration for the Nth time;

In some optional embodiments, in the (N +1) th iteration scheme, the first processing unit 1003 is configured to:

in some optional embodiments, the first processing unit 1003 is configured to:

In some optional embodiments, in the (N +1) th iteration scheme, the second processing unit 1004 is configured to

In some optional embodiments, the device further includes a recognition unit, configured to recognize, by using at least the main model after the iteration is ended, a face to be recognized.

In some optional embodiments, the apparatus further includes a preprocessing unit, configured to respectively preprocess the two sets of training data sets under the nth iteration to obtain two sets of target training data sets;

correspondingly, the first processing unit 1003 is configured to process the first group of target training data sets by using the sub-model under the nth iteration to obtain a first feature group of the face under the nth iteration;

a second processing unit 1004, configured to process a second group of target training data sets by using the master model in the nth iteration to obtain a second feature group of the face in the nth iteration;

in this embodiment of the application, the first obtaining Unit 1001, the second obtaining Unit 1002, the first Processing Unit 1003, the second Processing Unit 1004, the calculating Unit 1005 and the determining Unit 1006 may be implemented by a Central Processing Unit (CPU), a Digital Signal Processor (DSP), a Micro Control Unit (MCU) or a Programmable Gate Array (FPGA) in practical application.

An embodiment of the present application further provides a face recognition device, as shown in fig. 11, including: a first obtaining unit 1101, an input unit 1102, and a recognition unit 1103; wherein the content of the first and second substances,

a first obtaining unit 1101 for obtaining a face image to be recognized;

an input unit 1102, configured to input the face image to at least a main model in a trained face recognition model;

an identifying unit 1103, configured to identify the facial image to be identified by using at least a main model of the facial recognition model; wherein the primary model is trained based at least on a secondary model of the face recognition model.

In some optional embodiments, the device comprises a training unit for training the main model. Further, the training unit is configured to:

In this embodiment of the application, the first obtaining unit 1101, the input unit 1102, and the identifying unit 1103 can be implemented by a CPU, a DSP, an MCU, or an FPGA in practical application.

It should be noted that, in the training device of the face recognition model and the face recognition device of the embodiment of the present application, because the principles of solving the problems of the two devices are similar to the aforementioned training method of the face recognition model and the face recognition method, the implementation processes and the implementation principles of the two devices can be described by referring to the implementation processes and the implementation principles of the aforementioned related methods, and repeated details are omitted.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, perform at least the steps of the method shown in any one of fig. 1 to 8. The computer readable storage medium may be specifically a memory. The memory may be memory 62 as shown in fig. 12.

Fig. 12 is a schematic diagram of a hardware structure of a training device of a face recognition model and/or a face recognition device according to an embodiment of the present application, and as shown in fig. 12, the hardware structure includes: a communication component 63 for data transmission, at least one processor 61 and a memory 62 for storing computer programs capable of running on the processor 61. The various components in the terminal are coupled together by a bus system 64. It will be appreciated that the bus system 64 is used to enable communications among the components. The bus system 64 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 64 in fig. 12.

Wherein the processor 61 executes the computer program to perform at least the steps of the method of any of fig. 1 to 8.

It will be appreciated that the memory 62 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), Synchronous Dynamic Random Access Memory (SLDRAM), Direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 62 described in embodiments herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The method disclosed in the above embodiments of the present application may be applied to the processor 61, or implemented by the processor 61. The processor 61 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 61. The processor 61 described above may be a general purpose processor, a DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 61 may implement or perform the methods, steps and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 62, and the processor 61 reads the information in the memory 62 and performs the steps of the aforementioned method in conjunction with its hardware.

In an exemplary embodiment, the training Device of the face recognition model and/or the face recognition Device may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), FPGAs, general purpose processors, controllers, MCUs, microprocessors (microprocessors), or other electronic components, for performing the aforementioned training method of the face recognition model and/or the face recognition method.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A training method of a face recognition model, wherein the face recognition model comprises a sub-model and a main model, the training method comprising:

2. The method of claim 1, wherein determining at least whether to update the main model at the nth iteration a corresponding number of times based on the loss function value at the nth iteration comprises:

3. The method according to claim 1 or 2, wherein said determining at least whether to update the main model at the nth iteration according to the loss function value at the nth iteration comprises:

4. The method of claim 2,

in the (N +1) th iteration scheme,

and processing the first group of training data sets under the (N +1) th iteration by the sub-model under the (N +1) th iteration to obtain a first feature group of the face under the (N +1) th iteration.

5. The method of claim 4, wherein the processing the first set of training data sets in the (N +1) th iteration by the sub-model in the (N +1) th iteration to obtain the first feature set of the face in the (N +1) th iteration comprises:

6. The method according to claim 2 or 4,

in the (N +1) th iteration scheme,

7. The method of claim 1,

8. A face recognition method, comprising:

obtaining a face image to be recognized;

inputting the face image into at least a main model in a face recognition model trained according to any one of the methods of claims 1-7;

and identifying the face image to be identified at least by using a main model in the face identification model.

9. An apparatus for training a face recognition model, comprising:

10. A face recognition device, comprising:

the first obtaining unit is used for obtaining a face image to be recognized;

an input unit, configured to input the face image into at least a main model in a face recognition model trained according to any one of claims 1 to 7;

and the recognition unit is used for recognizing the face image to be recognized at least by utilizing the main model of the face recognition model.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.

12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any one of claims 1 to 8 are performed when the program is executed by the processor.