CN112287966A

CN112287966A - Face recognition method and device and electronic equipment

Info

Publication number: CN112287966A
Application number: CN202010996916.2A
Authority: CN
Inventors: 申啸尘; 周有喜; 乔国坤
Original assignee: Shenzhen Aishen Yingtong Information Technology Co Ltd
Current assignee: Shenzhen Aishen Yingtong Information Technology Co Ltd
Priority date: 2020-09-21
Filing date: 2020-09-21
Publication date: 2021-01-29

Abstract

The invention relates to the technical field of face recognition, and discloses a face recognition method, a face recognition device and electronic equipment, wherein the face recognition method is applied to the electronic equipment, and comprises the following steps: acquiring a training set containing images of at least two races, and dividing the training set into at least two data sets according to the races; determining a loss function and a corresponding loss weight for each data set; determining a final loss function according to the loss function of each data set and the corresponding loss weight; and training the initial model according to the final loss function to obtain a face recognition model, and performing face recognition based on the face recognition model. According to the loss function corresponding to the race training, the final loss function is obtained by combining the loss weight combination corresponding to the loss function so as to train the face recognition model.

Description

Face recognition method and device and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of face recognition, in particular to a face recognition method, a face recognition device and electronic equipment.

Background

The current face recognition training process is roughly as follows: preparing training data, wherein the training data needs to contain a large number of people (namely IDs), each ID contains more photos belonging to the ID people in different scenes or different time periods as far as possible, then designing a network structure and a loss function, inputting the training data into a network, generally adopting a batch gradient descending method, obtaining a gradient from the loss function and carrying out back propagation so as to optimize the weight in the network, and continuously repeating the process and adjusting the hyper-parameters until the loss does not descend. At present, better face recognition models are produced in the engineering or the engineering based on the engineering. The used training process is as above, the contribution of the engineering lies in providing a very effective loss function, and the performance of the face recognition model is obviously improved:

however, the current face recognition scheme is difficult to solve the problem of unbalanced data distribution. For example, when a training set contains multiple races, the distribution of different races is not uniform (e.g., asian is three times as many as euros), which results in better recognition performance of the model for races containing more samples, and significantly reduced recognition performance for races containing fewer samples.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: the uneven distribution of the race causes the performance of recognizing a certain race to be reduced.

In view of the foregoing, there is a need for improvement in the art.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present invention provide a face recognition method, a face recognition device, and an electronic device, so as to solve the technical problem of decreased recognition performance for a certain type of race caused by uneven distribution of races, and improve the comprehensive recognition performance of a face recognition model.

In order to solve the above technical problem, an embodiment of the present invention provides the following technical solutions:

in a first aspect, an embodiment of the present invention provides a face recognition method, which is applied to an electronic device, and the method includes:

acquiring a training set containing images of at least two races, and dividing the training set into at least two data sets according to the races;

determining a loss function and a corresponding loss weight for each data set;

determining a final loss function according to the loss function of each data set and the corresponding loss weight;

and training the initial model according to the final loss function to obtain a face recognition model, and performing face recognition based on the face recognition model.

In some embodiments, the determining the loss function and its corresponding loss weight for each data set comprises:

and determining a loss weight corresponding to the loss function of each data set according to the proportion of each data set in the training set, wherein the loss weight corresponding to the loss function of each data set is positively correlated with the proportion of the data set.

In some embodiments, the determining a final loss function according to the loss function of each data set and its corresponding loss weight includes:

if the initial model is a pre-trained model, the final loss function is a weighted sum of the loss function of each data set and the corresponding loss weight.

and if the initial model is an untrained model, determining an interval loss function and a loss weight corresponding to the interval loss function, wherein the final loss function is the loss function of each data set and a loss weight corresponding to the interval loss function, and the weighted sum of the interval loss function and the loss weight corresponding to the interval loss function.

In some embodiments, the determining the interval loss function and its corresponding loss weight comprises:

and determining an interval loss function and a corresponding loss weight according to the loss function and the corresponding loss weight of each data set.

In some embodiments, the training set is divided into a first data set and a second data set, and the interval loss function is:

wherein norm _ feature1_iNorm _ feature2 is a normalized value for the ith feature of batch in the first data set_jTo normalize the j-th feature of the batch in the second data set, n_iNumber of samples of batch, n_jIs the number of samples of batch, alpha is a preset threshold, and beta is a number satisfying norm _ feature1_i*norm_feature2_jThe number of feature pairs > α, batch is the sample required for one training.

In a second aspect, an embodiment of the present invention provides a face recognition apparatus, which is applied to an electronic device, and the apparatus includes:

the training set unit is used for acquiring a training set containing images of at least two races and dividing the training set into at least two data sets according to the races;

a loss weight unit for determining a loss function and a corresponding loss weight for each data set;

the loss function unit is used for determining a final loss function according to the loss function of each data set and the corresponding loss weight;

and the face recognition unit is used for training the initial model according to the final loss function to obtain a face recognition model and carrying out face recognition based on the face recognition model.

In some embodiments, the loss weight unit is specifically configured to:

In some embodiments, the loss function unit is specifically configured to:

In a third aspect, an embodiment of the present invention provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the face recognition method as described above.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the face recognition method as described above.

The beneficial effects of the embodiment of the invention are as follows: different from the situation in the prior art, an embodiment of the present invention provides a face recognition method applied to an electronic device, where the method includes: acquiring a training set containing images of at least two races, and dividing the training set into at least two data sets according to the races; determining a loss function and a corresponding loss weight for each data set; determining a final loss function according to the loss function of each data set and the corresponding loss weight; and training the initial model according to the final loss function to obtain a face recognition model, and performing face recognition based on the face recognition model. According to the loss function corresponding to the race training, the final loss function is obtained by combining the loss weight combination corresponding to the loss function so as to train the face recognition model.

Drawings

One or more embodiments are illustrated in drawings corresponding to, and not limiting to, the embodiments, in which elements having the same reference number designation may be represented as similar elements, unless specifically noted, the drawings in the figures are not to scale.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present invention;

fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of a data distribution of training data according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of data distribution of another training data provided by an embodiment of the present invention;

FIG. 4 is a detailed flowchart of step S30 in FIG. 2;

fig. 5 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and detailed description. It will be understood that when an element is referred to as being "secured to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may be present. The terms "vertical," "horizontal," "left," "right," and the like as used herein are for descriptive purposes only.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

Referring to fig. 1, fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention;

as shown in fig. 1, the application scenario 300 includes an electronic device 100 and a user 200, where the electronic device 100 includes a camera, and the camera is configured to acquire a face image of the user 200 and perform face recognition on the face image. In the embodiment of the present invention, the electronic device 100 includes, but is not limited to, a mobile terminal, a computer device, and other devices having a camera.

Referring to fig. 2, fig. 2 is a schematic flow chart of a face recognition method according to an embodiment of the present invention; the face recognition method is applied to an electronic device, and particularly to a processor of the electronic device, that is, an execution subject of the face recognition method is the processor of the electronic device.

As shown in fig. 2, the face recognition method includes:

step S10: acquiring a training set containing images of at least two races, and dividing the training set into at least two data sets according to the races;

specifically, the images in the training set contain different races, such as: when a training set containing images of at least two races is obtained, the training set is split into different multiple data sets, for example: and splitting a data set containing the yellow and white people into two data sets. It will be appreciated that when the training set comprises at least two data sets, for example: when a yellow-type human face recognition data set and a white-type human face recognition data set are used, if the scale difference of the two data sets is large, the two data sets are combined, the situation that the sample distribution is not uniform can be caused, and the deviation between the data distribution mode learned by the model and the expectation is caused.

Step S20: determining a loss function and a corresponding loss weight for each data set;

at present, the best training strategy in the face recognition engineering is to train a face recognition model by using an arcfacce function as a final loss function. However, the training process generally requires a two-stage training strategy, that is, the model is preheated and trained by using the softmax cross entropy loss function in the first stage, and then the face recognition model can be effectively converged by using the arcface loss function. Determining the loss function of each data set refers to determining the proportion of the loss function corresponding to each data set and setting parameters in the loss function suitable for each data set.

Specifically, the determining the loss function of each data set includes: determining parameters of a loss function corresponding to each data set according to characteristics of the data sets, wherein the characteristics of the data sets include data quality and the number of data categories, such as: the loss function is an arcfacce loss function or a softmax loss function, for example: the arcface loss function is:

wherein, N is the sample number of the training set, s is the scaling factor, and m is the angle interval.

It can be understood that the process of performing face recognition training by using the arcface loss function is the prior art, and is not described herein again.

The arcface loss function includes two parameters, namely a scaling coefficient s and an angle interval m, wherein the angle interval m is used for compressing the inter-class interval (i.e. expanding the inter-class interval), and the scaling coefficient s is used for optimizing the gradient descent effect. It will be appreciated that the scaling factor s and the angular interval m are both parameters of an experimental nature and will have different preferences for different data sets. Wherein, the two parameters may be different according to the characteristics of different data sets, for example, for a data set with better data quality and more pictures contained in each category, m may be set to be slightly larger, so that the same type of data is sufficiently compressed in a certain fixed spatial range. In the embodiment of the present invention, preferably, the angle interval m is set to 0.5, and the scaling factor s is set to 64.

According to the method, the single task training mode of the face recognition is divided into the training modes of multi-stream data parallel training according to the training purpose and the training characteristics, and the training sets with the same characteristics are subjected to independent parameter control.

Specifically, the determining the loss function and the corresponding loss weight of each data set includes: and determining a loss weight corresponding to the loss function of each data set according to the proportion of each data set in the training set, wherein the loss weight corresponding to the loss function of each data set is positively correlated with the proportion of the data set.

Equivalent to the determining a loss function and its corresponding loss weight for each data set, comprising:

adjusting parameters of a loss function corresponding to each data set according to the characteristics of the data set, wherein the characteristics of the data set comprise data characteristics, the number of samples of the data set, data quality and the number of data categories;

Wherein, the loss weight corresponding to the loss function of each data set is positively correlated with the proportion of the data set, that is, the loss weight corresponding to the loss function of each data set is determined according to the number of images of each data set in the training set, for example: the training set comprises a yellow person data set and a white person data set, the ratio of the number of images of the yellow person data set to the number of images of the white person data set is 2:1, the loss weight corresponding to the loss function of the yellow person data set is determined to be k + a, the loss weight corresponding to the loss function of the white person data set is determined to be k, and k is an adjusting coefficient. The adjustment coefficient k can be set artificially, and a is an adjustment value which is set according to actual needs.

It can be understood that when the data volume of a certain data set is large, the corresponding loss function can be appropriately increased to facilitate the optimization. For example: the ratio of the first data set to the second data set is 2:1, the amount of data obtained from the two data sets by each batch is equivalent to 1: 1, the loss function corresponding to the data set with the data set proportion of 2 is only slightly larger than 1, for example, 1.2. The purpose of the setting is to prevent overfitting of the data sets with less data while ensuring better optimization of the data sets with more data, and simultaneously give consideration to the loss balance of the two data sets so as to obtain a more balanced effect.

Step S30: determining a final loss function according to the loss function of each data set and the corresponding loss weight;

the method and the device for determining the loss function of the human face recognition model comprise the steps of determining a final loss function according to an initial model, wherein the initial model is a model before a human face recognition model is trained, and determining whether an interval loss function needs to be added or not by judging whether the initial model is a pre-training model or not.

Referring to fig. 3a and fig. 3b together, fig. 3a is a schematic diagram of data distribution of training data according to an embodiment of the present invention; FIG. 3b is a schematic diagram of data distribution of another training data provided by an embodiment of the present invention;

the similarity of face recognition is mainly based on the cosine distance after feature normalization. If an interval loss function, i.e., inter loss, is not introduced, information between two sets of data is not intercommunicated, and two sets of training data categories may generate data distribution as shown in fig. 3a in a final class space, wherein the set of data in the first set of training data has a larger interval, and the set of data in the second set of training data also has a larger interval, but there may be a close distance between the first set of training data and the second set of training data, resulting in a small cosine distance and a poor discrimination;

as shown in fig. 3b, when an interval loss function, i.e., inter loss, is introduced, the data distribution has a good degree of distinction for the data of the group, and the degree of distinction between two sets of training data is also good, because the cosine distance between two sets of training data is also included in the loss function, the final classification result of the model avoids generating the data distribution with a small interval, and the interval between two sets of data can be maintained at a certain size.

Specifically, referring back to fig. 4, fig. 4 is a detailed flowchart of step S30 in fig. 2;

as shown in fig. 4, the step S30: determining a final loss function according to the loss function of each data set and the corresponding loss weight thereof, wherein the final loss function comprises:

step S31: judging whether the initial model is a pre-training model or not;

the pre-trained model refers to a face recognition model that has been trained on a public data set, such as some networks published on the internet, or networks trained with other loss functions. And determining whether an interval loss function, namely inter loss, needs to be added or not by judging whether the initial model is a pre-training model or not.

It can be understood that the reason why the interval loss function inter loss is not needed is that after the face recognition model is trained on a better data set to a sufficient degree, the face recognition model has strong effective information extraction capability, so that the difference of human species difference is reflected on the task features and does not have high similarity, therefore, even if the interval loss is added, the value of the interval loss is 0 or a small value, and the training is not affected significantly or valuably, therefore, if the initial model is a pre-trained model, the interval loss function is not added.

Step S32: determining the final loss function as a weighted sum of the loss function of each data set and its corresponding loss weight;

specifically, if the initial model is a pre-trained model, the final loss function is determined to be a weighted sum of the loss function of each data set and the corresponding loss weight, for example: the training set comprises images of yellow people and white people, the training set is divided into two data sets, namely a first data set and a second data set, namely a yellow person face recognition data set and a white person face recognition data set, the final loss function is L _ total, the loss function corresponding to the first data set is L _ a, the loss function corresponding to the second data set is L _ b, alpha1 is the loss weight corresponding to the loss function of the first data set, and alpha2 is the loss weight corresponding to the second data set, so that L _ total is alpha 1L _ a + alpha 2L _ b.

Step S33: determining an interval loss function and a loss weight corresponding to the interval loss function;

specifically, if the initial model is an untrained model, the interval loss function and the corresponding loss weight are determined. It will be appreciated that the reason why the untrained model requires the use of the interval loss function, i.e. inter loss, is that when the training set is divided into at least two sets of data, the information between the two sets of data is independent, so that the two IDs may occupy the same direction in the class space during the initial training, but the respective loss does not contain this information. The function of the interval loss function is to supplement the information, and provide a hint for the vectors occupying the same kind of direction. The reason why this part of information is not needed in the pre-trained model is that after the model is trained on a better data set to a sufficient degree, the model already has a strong effective information extraction capability, so that the difference of the human race difference is reflected on the task features and does not have a high similarity.

Specifically, the determining the interval loss function and the corresponding loss weight thereof includes:

Specifically, if the training set is divided into two data sets, which are a first data set and a second data set, respectively, a difference between the first loss weight and the second loss weight is calculated according to a first loss weight corresponding to a loss function of the first data set and a second loss weight corresponding to a loss function of the second data set, and a loss weight of the interval loss function is determined according to an absolute value of the difference, where the loss weight of the interval loss function is positively correlated with the absolute value of the difference, for example: the interval loss function has a loss weight m, where m > 0, of the absolute value of the difference.

Alternatively, the loss weight of the interval loss function is positively correlated with an average of a first loss weight corresponding to the first loss function and a second loss weight corresponding to the second loss function, for example: and the loss weight of the interval loss function is n (the first loss weight + the second loss function)/2, wherein n is larger than 0, and the loss weight of the interval loss function is positively correlated with the average value of the first loss weight corresponding to the first loss function and the second loss weight corresponding to the second loss function, so that the magnitude similarity between the inter loss and the other two losses is favorably maintained.

In an embodiment of the present invention, the training set is divided into a first data set and a second data set, and the interval loss function is:

In the embodiment of the present invention, if the training set is divided into three or more data sets, the variance of the loss weights corresponding to the loss functions of the plurality of data sets is calculated, and the loss weight of the interval loss function is determined according to the variance of the loss weights corresponding to the loss functions of the plurality of data sets, for example: and the loss weight of the interval loss function is p, and the variance of the loss weight corresponding to the loss function of the plurality of data sets is larger than 0.

In the embodiment of the present invention, the loss weight of the interval loss function is generally set to be in the range of [0,5], and by associating the loss weight of the interval loss function with the loss weights of the loss functions of the at least two data sets, the interval loss function can be better determined, so as to improve the comprehensive recognition performance of the face recognition model.

Step S34: determining a final loss function as a weighted sum of the loss function of each data set and the corresponding loss weight thereof and the interval loss function and the corresponding loss weight thereof;

specifically, if the initial model is an untrained model, that is, there is no pre-trained model, and it is assumed that the training set includes images of yellow and white people, the training set is divided into two data sets, which are a first data set and a second data set, that is, a yellow human face recognition data set and a white human face recognition data set, assuming that the final loss function is L _ total, the loss function corresponding to the first data set is L _ a, the loss function corresponding to the second data set is L _ b, the interval loss function is L _ inter, alpha1 is the loss weight corresponding to the loss function of the first data set, alpha2 is the loss weight corresponding to the second data set, and alpha3 is the loss weight corresponding to the interval loss function L _ inter, then L _ total is alpha 1L _ a + 2L _ b + alpha L _ 3.

In this embodiment of the present invention, the interval loss function L _ inter is:

In the embodiment of the present invention, the preset threshold may be set manually, for example: the preset threshold value is set to be 0.4, the interval loss function is equivalent to taking out and normalizing two batch features from two groups of training data in the training data to obtain a normalized feature vector, multiplying the normalized feature vector one by one, and adding a value of which the selected point multiplication is larger than the preset threshold value and dividing the value by the number of feature pairs meeting the threshold value, wherein the two batch features respectively have 128 features and 128 features, namely the two batch features both have 128 image feature vectors, and the cosine similarity of 64 pairs of features exceeds the preset threshold value. Then the cosine similarity (normalized feature vector point product) of these 64 pairs of features is added and divided by 64 to weight this value as part of loss. The implication of this is that if each batch is characterized by data from two sets of training data that are too close together, their similarity is taken into account and taken as part of the loss, and the inter loss is reduced only if the data in the two data sets are kept at a greater distance from each other.

In the embodiment of the invention, whether the initial model is a pre-training model is judged, and if so, the final loss function is determined to be the weighted sum of the loss function of each data set and the corresponding loss weight; and if not, determining an interval loss function and a corresponding loss weight thereof, wherein the final loss function is the weighted sum of the loss function of each data set and the corresponding loss weight thereof and the interval loss function and the corresponding loss weight thereof. The method can better and faster determine the final loss function so as to better and faster train the initial model based on the final loss function, thereby obtaining the face recognition model and further improving the speed of face recognition.

Step S40: and training the initial model according to the final loss function to obtain a face recognition model, and performing face recognition based on the face recognition model.

Specifically, the initial model is trained through the final loss function to obtain a face recognition model, and finally the feature vectors of the same category approach to the same direction through solving the gradient of the loss function to update the weight in the model through back propagation.

After the face recognition model is obtained, obtaining a face image, inputting the face image into the face recognition model, obtaining a feature vector of the face image, calculating an included angle cosine value between the feature vector of the face image and a feature vector of a target face stored in a face database, and determining a face recognition result according to the included angle cosine value, for example: and if the cosine value of the included angle is smaller than or equal to a preset threshold value, determining to match the corresponding target face, and if the cosine value of the included angle is larger than the preset threshold value, determining not to match the target face in the face database.

In an embodiment of the present invention, a face recognition method is provided and applied to an electronic device, where the method includes: acquiring a training set containing images of at least two races, and dividing the training set into at least two data sets according to the races; determining a loss function and a corresponding loss weight for each data set; determining a final loss function according to the loss function of each data set and the corresponding loss weight; and training the initial model according to the final loss function to obtain a face recognition model, and performing face recognition based on the face recognition model. According to the loss function corresponding to the race training, the final loss function is obtained by combining the loss weight combination corresponding to the loss function so as to train the face recognition model.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a face recognition apparatus according to an embodiment of the present invention; the face recognition device is applied to electronic equipment, and particularly to a processor of the electronic equipment.

As shown in fig. 5, the face recognition apparatus 50 includes:

a training set unit 501, which obtains a training set containing images of at least two races and divides the training set into at least two data sets according to the races;

a loss weight unit 502 for determining a loss function and a corresponding loss weight for each data set;

a loss function unit 503, configured to determine a final loss function according to the loss function of each data set and the corresponding loss weight thereof;

and a face recognition unit 504, configured to train the initial model according to the final loss function to obtain a face recognition model, and perform face recognition based on the face recognition model.

In an embodiment of the invention, the images in the training set contain different races, such as: when a training set containing images of at least two races is obtained, the training set is split into different multiple data sets, for example: and splitting a data set containing the yellow and white people into two data sets. It will be appreciated that when the training set comprises at least two data sets, for example: when a yellow-type human face recognition data set and a white-type human face recognition data set are used, if the scale difference of the two data sets is large, the two data sets are combined, the situation that the sample distribution is uneven can be caused, and the data distribution mode learned by the model is deviated from the expected value.

In an embodiment of the present invention, the loss weight unit is specifically configured to:

determining parameters of a loss function corresponding to each data set according to characteristics of the data sets, wherein the characteristics of the data sets include data quality and the number of data categories, such as: the loss function is an arcfacce loss function or a softmax loss function, for example: the arcface loss function is:

The process of carrying out face recognition training by using the arcface loss function is the prior art and is not repeated herein.

Wherein, the loss weight corresponding to the loss function of each data set is positively correlated with the proportion of the data set, that is, the loss weight corresponding to the loss function of each data set is determined according to the number of images of each data set in the training set, for example: the training set comprises a yellow person data set and a white person data set, the ratio of the number of images of the yellow person data set to the number of images of the white person data set is 2:1, the loss weight corresponding to the loss function of the yellow person data set is determined to be 2k, the loss weight corresponding to the loss function of the white person data set is determined to be k, and k is an adjusting coefficient. The adjustment coefficient k may be set artificially.

In an embodiment of the present invention, the loss function unit is specifically configured to:

It can be understood that the reason why the interval loss function inter loss is not needed is that after the face recognition model is trained on a better data set to a sufficient degree, the face recognition model has strong effective information extraction capability, so that the difference with large human difference does not have high similarity on the task characteristics, and therefore, even if the inter loss is added, the value of the inter loss is 0 or a small value, and the training is not influenced significantly or valuably.

In the embodiment of the present invention, if the training set is divided into three or more data sets, the variance of the loss weights corresponding to the loss functions of the plurality of data sets is calculated, and the loss weight of the interval loss function is determined according to the variance of the loss weights corresponding to the loss functions of the plurality of data sets, for example: and the loss weight of the interval loss function is n, and the variance of the loss weights corresponding to the loss functions of the plurality of data sets is larger than 0.

In an embodiment of the present invention, the determining the interval loss function and the corresponding loss weight includes:

wherein norm _ feature1_iNorm _ feature2 is a normalized value for the ith feature of batch in the first data set_jTo normalize the j-th feature of the batch in the second data set, n_iIs batchNumber of samples of (1), n_jIs the number of samples of batch, alpha is a preset threshold, and beta is a number satisfying norm _ feature1_i*norm_feature2_jThe number of feature pairs > α, batch is the sample required for one training.

In the embodiment of the present invention, the preset threshold may be set manually, for example: setting the preset threshold value to 0.4, which is equivalent to taking out and normalizing two batch features from two groups of training data in the training data to obtain a normalized feature vector, multiplying the normalized feature vector one by one, selecting a value of which the point multiplication is larger than the preset threshold value, adding the value of which is divided by the number of feature pairs meeting the threshold value, wherein the two batches have 128 features and 128 features respectively, namely the two batches have 128 image feature vectors, and the cosine similarity of 64 pairs of features exceeds the preset threshold value. Then the cosine similarity (normalized feature vector point product) of these 64 pairs of features is added and divided by 64 to weight this value as part of loss. The implication of this is that if each batch is characterized by data from two sets of training data that are too close together, their similarity is taken into account and taken as part of the loss, and the inter loss is reduced only if the data in the two data sets are kept at a greater distance from each other.

Specifically, the face recognition unit is specifically configured to:

training the initial model through the final loss function to obtain a face recognition model, and finally enabling the feature vectors of the same category to approach the same direction through solving the gradient of the loss function to reversely propagate and update the weight in the model.

In an embodiment of the present invention, a face recognition apparatus is provided and applied to an electronic device, where the apparatus includes: the training set unit is used for acquiring a training set containing images of at least two races and dividing the training set into at least two data sets according to the races; a loss weight unit for determining a loss function and a corresponding loss weight for each data set; the loss function unit is used for determining a final loss function according to the loss function of each data set and the corresponding loss weight; and the face recognition unit is used for training the initial model according to the final loss function to obtain a face recognition model and carrying out face recognition based on the face recognition model. According to the loss function corresponding to the race training, the final loss function is obtained by combining the loss weight combination corresponding to the loss function so as to train the face recognition model.

Referring to fig. 6, fig. 6 is a schematic diagram of a hardware structure of an electronic device according to various embodiments of the present invention;

as shown in fig. 6, the electronic device 100 includes but is not limited to: the electronic device 100 further includes a camera, and the electronic device includes a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, a processor 110, a power supply 1011, and the like. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 6 does not constitute a limitation of the electronic device, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a television, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.

A processor 110, configured to obtain a training set including images of at least two races, and divide the training set into at least two data sets according to the races; determining a loss function and a corresponding loss weight for each data set; determining a final loss function according to the loss function of each data set and the corresponding loss weight; and training the initial model according to the final loss function to obtain a face recognition model, and performing face recognition based on the face recognition model.

In the embodiment of the invention, the final loss function is obtained by training the corresponding loss function according to the race and combining the loss weight combination corresponding to the loss function so as to train the face recognition model.

It should be understood that, in the embodiment of the present invention, the radio frequency unit 101 may be used for receiving and sending signals during a message transmission or call process, and specifically, after receiving downlink data from a base station, the downlink data is processed by the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through a wireless communication system.

The electronic device 100 provides wireless broadband internet access to the user via the network module 102, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the network module 102 or stored in the memory 109 into an audio signal and output as sound. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the electronic apparatus 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 includes a speaker, a buzzer, a receiver, and the like.

The input unit 104 is used to receive an audio or video signal. The input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, and the Graphics processor 1041 processes a target image of a still picture or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the network module 102. The microphone 1042 may receive sound and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode.

The electronic device 100 also includes at least one sensor 105, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 1061 and/or the backlight when the electronic device 100 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 105 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 107 includes a touch panel 1071 and other input devices 1072. Touch panel 1071, also referred to as a touch screen, may collect touch operations by a user on or near the touch panel 1071 (e.g., operations by a user on or near touch panel 1071 using a finger, stylus, or any suitable object or attachment). The touch panel 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and receives and executes commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Specifically, other input devices 1072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

Further, the touch panel 1071 may be overlaid on the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although in fig. 6, the touch panel 1071 and the display panel 1061 are two independent components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the electronic device, and is not limited herein.

The interface unit 108 is an interface for connecting an external device to the electronic apparatus 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 100 or may be used to transmit data between the electronic apparatus 100 and the external device.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, wherein the program storage area may store an application program 1091 (such as a sound playing function, an image playing function, etc.) and an operating system 1092, etc. required by at least one function; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, performs various functions of the electronic device and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the electronic device. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The electronic device 100 may further include a power source 1011 (such as a battery) for supplying power to various components, and preferably, the power source 1011 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption via the power management system.

In addition, the electronic device 100 includes some functional modules that are not shown, and are not described in detail herein.

Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor 110, a memory 109, and a computer program stored in the memory 109 and capable of running on the processor 110, where the computer program, when executed by the processor 110, implements each process of the above-mentioned embodiment of the face recognition method, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.

In the embodiment of the present invention, the electronic device includes but is not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such electronic devices include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such electronic devices include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play video content, and generally also have mobile internet access features. This type of device comprises: video players, handheld game consoles, and intelligent toys and portable car navigation devices.

(4) And other electronic equipment with a video playing function and an internet surfing function.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by one or more processors, the computer program implements each process of the above-mentioned face recognition method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-described embodiments of the apparatus or device are merely illustrative, wherein the unit modules described as separate parts may or may not be physically separate, and the parts displayed as module units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network module units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (which may be a mobile terminal, a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments of the present invention.

Finally, it should be noted that: the embodiments described above with reference to the drawings are only for illustrating the technical solutions of the present invention, and the present invention is not limited to the above-mentioned specific embodiments, which are only illustrative and not restrictive; within the idea of the invention, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A face recognition method is applied to electronic equipment, and is characterized by comprising the following steps:

determining a loss function and a corresponding loss weight for each data set;

2. The method of claim 1, wherein determining the loss function and its corresponding loss weight for each data set comprises:

3. The method of claim 1, wherein determining a final loss function from the loss function and its corresponding loss weight for each data set comprises:

4. The method of claim 1, wherein determining a final loss function from the loss function and its corresponding loss weight for each data set comprises:

5. The method of claim 4, wherein determining the interval loss function and its corresponding loss weight comprises:

6. The method of claim 4, wherein the training set is divided into a first data set and a second data set, and wherein the interval loss function is:

where(norm_feature1_i*norm_feature2_j≥α)

7. A face recognition device applied to electronic equipment is characterized by comprising:

8. The apparatus according to claim 7, wherein the loss weight unit is specifically configured to:

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of face recognition according to any one of claims 1 to 6.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the face recognition method according to any one of claims 1 to 6.