CN114764942A

CN114764942A - Difficult positive and negative sample online mining method and face recognition method

Info

Publication number: CN114764942A
Application number: CN202210555142.9A
Authority: CN
Inventors: 郑文先; 陶映帆; 杨文明; 廖庆敏
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2022-05-20
Filing date: 2022-05-20
Publication date: 2022-07-19
Anticipated expiration: 2042-05-20
Also published as: CN114764942B

Abstract

The invention discloses an online hard positive and negative sample mining method, which is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from the sample face pair by the first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second image processor, so that the second image processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first image processor.

Description

Difficult positive and negative sample online mining method and face recognition method

Technical Field

The invention relates to the crossing field of image processing and machine learning, in particular to a hard positive and negative sample mining method and a face recognition method for model training by applying the mining method.

Background

It is a common practice in the field of face recognition to improve the face recognition accuracy by performing model training using a large number of face pictures. Different training processes and methods are adopted, and the obtained model identification effect is different. The improvement of the training process and method to improve the accuracy of face recognition by the model is one of the research subjects of related scholars in the industry.

A face recognition model training framework based on metric learning (metric learning) principle and capable of utilizing millions of face data sets generally employs a metric learning algorithm combined with online mining. When a model is trained using a metric learning algorithm, a large number of identical face pairs (positive sample pairs) and different face pairs (negative sample pairs) need to be provided to the model. In the middle and later stages of training, the more difficult the sample pairs are, the more obvious the recognition performance of the model can be improved, and the training speed is higher. With the increase of the number of model training iterations, the difficulty level standard is evolving continuously, so that the difficult sample pairs must be mined online in the training process, and the difficulty level of the mined positive and negative sample pairs is positively correlated with the data volume participating in each training iteration.

In summary, in the training stage of the face recognition model, the hard positive and negative samples are trained by using the hard positive and negative samples online mining method, and the larger the data batch participating in the hard positive and negative samples online mining is, the more the accuracy of the face recognition algorithm is improved. However, in the existing training process, feature extraction, loss function calculation, gradient calculation and back propagation are mainly performed by a GPU (Graphics Processing Unit), while the GPU card has a limited memory capacity and a limited number of images that can be processed, and the data batch on a single GPU card is usually between dozens to hundreds of face images, which greatly limits the difficulty degree of mining hard positive and negative samples, thereby limiting the improvement of the face recognition model training efficiency.

Disclosure of Invention

The invention mainly aims to provide a method for mining hard positive and negative samples on line, which is used for solving the problems in the prior art and improving the efficiency of mining the hard positive and negative samples on line.

In order to achieve the above purpose, the invention provides the following technical solutions on one hand:

a hard positive and negative sample online mining method is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from the sample face pair by the first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.

On the other hand, the invention provides the following technical scheme:

a difficult positive and negative sample online excavating device is used for the training process of a face recognition model and comprises a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.

The invention further provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above hard positive and negative sample online mining method can be implemented.

The invention also provides a face recognition method, which comprises a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method.

The beneficial effects of the invention include: the method can be matched with a first GPU and a second GPU (graphics processing units) through a CPU, image feature calculation is carried out only through the first GPU, after a feature vector pair of a sample pair is obtained, loss function and gradient calculation are carried out through the CPU, and gradient back propagation is carried out through the second GPU sharing model parameters, so that the first GPU is independent of the size of a video memory, the feature vector pair of the sample pair can be continuously calculated in a pipeline mode, the online mining efficiency of difficult samples is improved, and the model training efficiency is further improved.

Drawings

FIG. 1 is a schematic diagram of an online hard positive and negative sample mining method according to an embodiment of the invention;

fig. 2 is a schematic diagram of a three-dimensional loss function matrix a × B × C according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and detailed description of embodiments.

The noun explains:

firstly, sampling: the same type of image is a positive sample pair, such as a same face pair

② negative sample: the different types of images being pairs of negative examples, e.g. different pairs of faces

③ sample difficult to correct: the method refers to that the same face pair is recognized as a face pair of different faces by a model in the recognition process

Fourthly, difficult-to-load samples: the method refers to that different face pairs are recognized as face pairs of the same face by a model in the recognition process

Fifth, metric learning: in the model training process, the distance between the positive sample pairs is made as small as possible, and the distance between the negative sample pairs is made as large as possible. The hard sample pairs are at a greater distance in metric learning and therefore produce a greater gradient dip. The hard negative sample pairs are smaller in distance in metric learning and therefore also produce a larger gradient dip. A simple positive-negative sample pair provides no or a small gradient dip.

Sixthly, excavating positive and negative samples on line: in the model training process based on metric learning, after each iteration, the input batch data in the next iteration comprises hard positive and negative samples so as to improve the recognition capability of the model and the gradient descending speed of the model.

The embodiment of the invention provides an online hard positive and negative sample mining method which can be applied to a Central Processing Unit (CPU) and is applied to a training stage of a face recognition model. In this embodiment, the method may adopt the system architecture as shown in fig. 1, which includes one CPU and at least one group of GPUs, where the group of GPUs includes two GPUs, i.e., a first GPU and a second GPU. The first GPU and the second GPU are connected with the CPU in a bus mode, the first GPU is mainly responsible for feature extraction, features of the image are extracted and then transmitted to the CPU, loss functions and gradients are calculated in the CPU, hard samples are mined according to the loss functions, and samples with large loss functions are selected as the hard samples. And meanwhile, the CPU calculates the gradient and then transmits the gradient to a second GPU, so that the second GPU performs back propagation through the gradient, adjusts the model parameters and shares the model parameters to the first GPU. The loss function of each sample pair is between 0 and 1, and the greater the loss function is, the greater the difficulty of correctly identifying the identified model of the sample pair is.

Specifically, the sample face pairs are input to the first GPU for feature extraction, and each face image will obtain a corresponding feature vector, so that one sample face pair corresponds to two feature vectors, which are referred to as a feature vector pair herein. The CPU receives the pairs of feature vectors and places them in a pool of feature vectors while computing the penalty function for each pair of feature vectors. In a specific embodiment, the computation of the loss function is as follows:

When the sample face pair is a binary group (i.e. contains two face images), the loss function is:

wherein y is a label of the sample face pair, and when y is 0, the sample face pair is represented as a negative sample pair, and when y is 1, the sample face pair is represented as a positive sample pair; d_a,bThe measurement distance of the face images a and b is a binary group, and the space distance between the characteristic vectors of the face images a and b is represented; beta is a preset threshold value and is used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or whether the measurement distance of the binary group under the negative sample pair is large enough, and the value range is between 0.35 and 0.4; (beta-d)_a,b)₊Represents a change function of the form max (0, beta-d)_a,b) When d is_a,bMax (0, beta-d) when beta is not less than beta_a,b) The value is 0; when d is_a,bLess than beta, max (0, beta-d)_a,b) Value of beta-d_a,b。

It can be seen that when the sample face pair is a positive sample pair, the loss function is

The larger the loss function at this time, the larger the metric distance representing the dyad, and the more difficult it is to identify the correct pair of positive samples, which can be considered as a pair of hard-to-correct samples. When the sample face pair is a negative sample pair, the loss function is

At the preset value of beta, the larger the loss function is, the metric distance d of the binary group is shown _a,bThe smaller the negative example pair is, the more difficult it is to be correctly identified as a negative example pair, which can be considered as a difficult negative example pair.

When the sample face pairs are triplets, the loss function can be expressed as:

L₂＝d_a,b+(d_a,b-d_a,c+β)₊

the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; d_a,cThe measured distance between the face images a and c; (d)_a,b-d_a,c+β)₊Representing change functionOf the form max (0, d)_a,b-d_a,c+ β) when (d)_a,b-d_a,cWhen + beta) is greater than 0, the value is (d)_a,b-d_a,c+ β), otherwise the value is 0.

It can be seen that when d_a,bThe smaller, d_a,cThe larger the loss function is, the larger the loss function is d_a,bAt this time: when the loss function is smaller, the measurement distance of the positive sample pair a and b is small, and the measurement distance of the negative sample pair a and c is large, the triple is a simple sample pair; when the loss function is larger, the measurement distance of the positive sample pair a and b becomes larger, and the positive sample pair a and b is a hard sample pair. On the contrary, when d_a,bThe larger, d_a,cThe smaller the loss function is d_a,b+(d_a,b-d_a,c+β)₊At this time: when the loss function is larger, the measurement distance of the positive sample pair a and b is larger, and the measurement distance of the negative sample pair a and c is smaller, the triple is a difficult sample pair.

After the loss function is obtained through calculation, the CPU performs hard positive and negative sample mining according to the loss function to obtain a target sample set containing hard positive and negative sample face pairs. Specifically, when the number of pairs of feature vectors in the feature vector pool reaches a preset number (for example, 128 × 128 — 16384 pairs of feature vectors), a sample face pair corresponding to the pair of feature vectors in the feature vector pool is sampled by a preset sampling strategy to extract a hard-to-positive sample face pair, so as to obtain a target sample set. And then, calculating the gradient of each hard positive and negative sample face pair in the target sample set, transmitting the calculated gradient to a second GPU so that the second GPU can perform back propagation through the gradient, adjusting model parameters, sharing the adjusted model parameters to a first GPU, and performing face feature extraction by the first GPU according to the adjusted model parameters.

In some embodiments, the feature vector pool stores feature vector pairs that meet a condition (for example, a loss function is greater than a predetermined value), and when the number of the feature vector pairs that meet the condition reaches a certain number (which may be predetermined according to actual situations), for example, reaches 128 × 128 — 16384 feature vector pairs, the sample face pairs corresponding to the feature vector pairs in the feature vector pool are sampled by using a predetermined sampling policy.

In other embodiments, the loss functions may also be stored in the feature vector pool, and when the number of the loss functions stored in the feature vector pool (actually equivalent to the number of feature vector pairs) reaches a preset number, the loss functions in the feature vector pool are sampled by a preset sampling strategy, and a sample face pair corresponding to the extracted loss function is used as a hard-positive sample face pair, so as to form the target sample set.

Before sampling begins, the loss functions may be sorted to obtain a loss function matrix, for example, a 128 × 128 loss function matrix is obtained, and then sampling is performed in the loss function matrix through a preset sampling kernel. When sampling is started, a target sample set is initialized, the number of sample face pairs in the initialized target sample set is 0, and hard positive and negative samples obtained by sampling are added into the target sample set every time sampling is completed. It should be understood that if the feature vector pairs are stored in the feature vector pool, the loss functions corresponding to the stored feature vector pairs are arranged into a loss function matrix; if the loss functions are stored in the feature vector pool, the stored loss functions are arranged into a loss function matrix.

In some embodiments, when the loss functions are sorted, the loss functions may be preprocessed, where the preprocessing may be to calculate an average value and a standard deviation of the loss functions, and control distribution of the loss functions according to the average value and the standard deviation, so that the distribution of the loss functions conforms to gaussian distribution and covers all the loss functions higher than a first threshold; the first threshold is a preset threshold, and in this embodiment, the value is 0.8, it should be understood that the user may take other values between 0 and 1 according to the actual requirement, for example, some values around 0.8, and the invention is not limited thereto. And the loss functions conforming to the Gaussian distribution are used for constructing a loss function matrix, so that most of lower loss functions are removed, the balance between the loss functions higher than 0.8 and the loss functions lower than 0.2 is kept, and the balance of the samples is further ensured.

And then, sampling the loss function matrix, and adding a sample face pair corresponding to the sampled loss function into a target sample set as a hard positive and negative sample face pair. In the early stage of model training, a difficult sample pair with great gradient contribution exists, and the loss function is large and is close to 1; along with the increase of the iteration times, the recognition capability of the model is also increased, and the contribution of a difficult sample to the brought gradient is reduced compared with that in the early stage of training, so that a sampling method I can be adopted for sampling in the early stage of training; and in the later training stage, sampling can be performed by adopting a second sampling method. When training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a sampling method to obtain a target sample set containing hard positive and negative sample face pairs in current iteration; and judging whether the training process can be switched to a second sampling method or not during each iteration in the training process, wherein the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to a second sampling method for sampling; wherein e% is a preset percentage threshold; e can be a value between 0 and 60, preferably 50, depending on experience or practice. Or when the value of the largest of the loss functions (or the value of the largest of the loss function matrixes) corresponding to the feature vector pair in the feature vector pool is smaller than f, switching to the sampling method two for sampling; wherein f is more than 0 and less than 0.5.

The first sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the sum of loss functions in the sampling area; and selecting sample face pairs corresponding to the Top N loss functions with the largest values from the loss functions in the sampling area according to the integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set.

For example, in the first sampling method, the sampling kernel may be 3 × 3, and the sampling step size is 3. Specifically, the 3 × 3 sampling kernel has a 3 × 3 sampling region in the loss function matrix, and the sum of the loss functions of the 3 × 3 sampling region is calculated by the 3 × 3 sampling kernel, and the sum of the loss functions is a value between 0 and 9. And determining the sampling number N in the 3 x 3 sampling region according to the integer number N of the loss function sum, and selecting the Top N loss functions corresponding to the sample face pairs with the maximum value in the sampling region to be added into the target sample set. For example, when the sum of the loss functions is 8.3, the sample face pair corresponding to the loss function with the largest Top 8 values in the sample region may be selected and added to the target sample set.

The second sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area; and selecting sample face pairs corresponding to the Top M loss functions with the largest values from the loss functions in the sampling area according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.

For example, in the first sampling method, the sampling kernel may be 3 × 3, and the sampling step size is 3. Specifically, the 3 × 3 sampling kernel has a 3 × 3 sampling region in the loss function matrix, and the loss function weighted sum of the 3 × 3 sampling region is calculated by the 3 × 3 sampling kernel. According to the integer number M of the weighted sum of the loss functions, the sample face pair corresponding to the loss function with the largest Top M value can be selected from the sampling area and added into the target sample set. Wherein the weighted sum of the loss functions is related to the current number of iterations. The weighted sum w of the loss functions within the sample region can be calculated by the following equation:

wherein J is the current iteration number, T_jIs the number of pairs of feature vectors in the feature vector pool in the current iteration, L_tFor the loss function corresponding to the T-th eigenvector pair in the current iteration, T_iIs the total number of sample face pairs, L, in the target sample set in the last iteration_kIs the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.

In different training stages, different sampling methods are adopted, corresponding sample face pairs are sampled from the loss function matrix to a target sample set, and difficult sample pairs in different stages can be more accurately mined. Further, in the second sampling method, a dynamic weighting sampling method is adopted, so that difficult samples in the later training period can be mined more accurately. Compared with the scheme that the sample face pairs corresponding to Top N loss functions are directly adopted as the difficult samples in the prior art, and the gradient descending direction cannot be ensured, the dynamic selection method for the difficult samples by using the first sampling method and the second sampling method enables the number of the difficult samples to be more balanced, ensures the gradient descending direction, and improves the training efficiency. The gradient direction can affect the training effect of the model, when the sample is unbalanced, for example, only a difficult sample exists, the model can be trained only by using the difficult positive and negative samples, and the obtained model can learn to recognize the recognized face, so that the trained model is sensitive to the face difficult to recognize (the recognition accuracy is high), and is not sensitive to some simple face pairs (the recognition accuracy is poor).

The sampling method I and the sampling method II select that the sample face pairs corresponding to the Top N loss functions and the Top M loss functions are added into the target sample set, after the target sample set is obtained, the target sample set comprises the difficult sample face pairs with the large loss functions, in the training process, gradients corresponding to all the sample face pairs in the target sample set are issued to the second GPU for back propagation, and model parameters in the second GPU are adjusted. The first GPU and the second GPU are only hardware devices equipped with recognition models, and may also be referred to as training engines. It should be noted that, in the case that the CPU has enough computing resources, multiple metric learning models may be trained simultaneously, and multiple sets of GPUs are provided, where each metric learning model needs to be provided with one set of GPUs (including a first GPU and a second GPU). Fig. 1 is a schematic flow chart of simultaneous training of multiple metric learning models.

In other examples, the loss function matrix may be further sliced to obtain a slice matrix with a preset size; and then, the slice matrixes are superposed to obtain a three-dimensional loss function matrix A, B and C shown in figure 2, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrixes contain A, B loss functions, and C represents the number of the slice matrixes. For example, slicing the 128 × 128 loss function matrix and then stacking the sliced loss function matrix into a 64 × 4 three-dimensional loss function matrix, that is, slicing the 128 × 128 loss function matrix to obtain 4 64 × 64 slice matrices and then stacking the sliced loss function matrices, and certainly, it is also possible to slice the 128 × 128 loss function matrix into other sizes and corresponding numbers; and performing sampling by using a first sampling method and a second sampling method on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling core formed by overlapping C preset sampling cores to obtain the target sample set. For example, the sampling method in which the first sampling method and the second sampling method are combined is implemented on the three-dimensional loss function matrix a × B × C through the three-dimensional acquisition core 3 × C, and since the size of the three-dimensional loss function matrix is reduced, the area of the three-dimensional acquisition core to be sampled is reduced, the number of sliding times of the three-dimensional acquisition core can be reduced, and the sampling speed can be increased.

In addition, in the second sampling method, random masking may be performed on the sampling core, specifically, first, the largest loss function in the current iteration sampling region is obtained as a mask value, and then a position is randomly selected from the sampling core and masked with the mask value, so that the loss function of the masked position is output as the mask value in the sampling process. By random masking, a smaller loss function can be sampled at a chance, and the distribution of the loss function sampled by the sampling core is more balanced.

In some embodiments, when the second GPU shares the model parameter with the first GPU, in order to reduce the data transfer amount of the shared model parameter, the following operations may be performed: only the DIFF (difference value) between the model parameter adjusted by the second GPU and the current model parameter of the first GPU in the current iteration may be calculated, and then only the difference value is transmitted to the first GPU to be added to the current model parameter of the first GPU, so that the model parameter identical to that of the second GPU is obtained in the first GPU, that is, the model parameter adjusted by the second GPU is shared to the first GPU. For example, the current model parameter of the first GPU is (1, 2, 3, 4, 5, 6, 7, 8, 9), the model parameter of the second GPU after back propagation is (1, 3, 3, 4, 5, 6, 7, 8, 9), the difference value may be (1-1, 3-2, 3-3, 4-4, 5-5, 6-6, 7-7, 8-8, 9-9) ═ 0, 1, 0, 0, 0, 0, and the difference value (0, 1, 0, 0, 0, 0) is only transmitted to the first GPU and added to the current model parameter of the first GPU to obtain the same model parameter as the second GPU, so as to implement sharing, and this way of only transmitting the difference value reduces the data transmission amount of parameter sharing.

In another embodiment of the present invention, an online hard positive and negative sample mining device is provided, which is used for a training process of a face recognition model, and includes a Central Processing Unit (CPU), and a first graphics processing unit (first GPU) and a second graphics processing unit (second GPU) connected to the CPU; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.

It should be understood that, in the above apparatus, the first GPU, the second GPU and the CPU may be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. For example, the CPU is configured to perform hard positive and negative sample mining, loss function calculation, and gradient calculation, and the specific mining steps and calculation steps can be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. The detailed description is omitted, and it should be understood by those skilled in the art that the apparatus is an apparatus corresponding to the hard positive and negative sample online mining method of the foregoing embodiment.

Furthermore, an embodiment of the present invention may further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the hard positive and negative sample online mining method of the foregoing embodiment can be implemented. A computer readable storage medium may include, among other things, a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied in a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Other embodiments of the present invention further provide a face recognition method, which includes a training process of a face recognition model, where the training process includes the steps of the above hard positive and negative sample online mining method.

The foregoing is a further detailed description of the invention in connection with specific preferred embodiments and it is not intended to limit the invention to the specific embodiments described. It will be apparent to those skilled in the art that various equivalent substitutions and obvious modifications can be made without departing from the spirit of the invention, and all changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims

1. A hard positive and negative sample online mining method is used for a training process of a face recognition model, and is characterized by comprising the following steps:

s1, acquiring a feature vector pair extracted from the sample face pair by the first graphic processor;

s2, calculating a loss function of the sample face pair according to the feature vector pair;

s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs;

and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.

2. The online hard positive and negative sample mining method according to claim 1, wherein in step S2, when the sample face pair is a binary group, the loss function is:

wherein y is a label of the sample face pair, and when y is 0, it indicates that the sample face pair is a negative sample pair, and when y is 1, it indicates that the sample face pair is a positive sample pair; d is a radical of_a,bThe measurement distance of a face image a and b is a binary group, and the space distance between the characteristic vectors of the face image a and b is represented; beta is a preset threshold value used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or measuring whether the measurement distance of the binary group under the negative sample pair is large enough; (beta-d)_a,b)₊Represents a change function of the form max (0, beta-d)_a,b) When d is_a,bMax (0, beta-d) when beta is not less than beta_a,b) The value is 0; when d is_a,bLess than beta, max (0, beta-d)_a,b) Value of beta-d_a,b；

When the sample face pairs are triplets, the loss function is:

L₂＝d_a,b+(d_a,b-d_a,c+β)₊

the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; d_a,cIs the measurement distance between the face images a and c; (d)_a,b-d_a,c+β)₊Represents a change function of the form max (0, d)_a,b-d_a,c+ β, when (d)_a,b-d_a,cWhen + beta) is greater than 0, the value is (d) _a,b-d_a,c+β)，Otherwise, the value is 0.

3. The online hard positive-negative sample mining method according to claim 1, wherein the step S3 specifically comprises: adding the feature vector pairs of the sample face pairs into a feature vector pool; and when the number of the feature vector pairs in the feature vector pool reaches a preset number, sampling the sample face pairs corresponding to the feature vector pairs in the feature vector pool by using a preset sampling strategy to obtain the target sample set.

4. The online hard positive-negative sample mining method according to claim 3, wherein in the step S3, when the sampling is performed, the loss functions corresponding to the eigenvector pairs in the eigenvector pool are sorted into a loss function matrix, and then a preset sampling kernel is used to perform sampling in the loss function matrix, so as to obtain the target sample set.

5. The method for mining hard positive and negative samples on line according to claim 4, wherein the step of arranging the loss functions corresponding to the feature vector pairs in the feature vector pool into a loss function matrix comprises the following steps:

calculating the average value and the standard deviation of the loss function corresponding to the feature vector pairs in the feature vector pool;

Controlling the distribution of the loss functions corresponding to the eigenvectors in the eigenvector pool according to the average value and the standard deviation to enable the loss functions to be in accordance with Gaussian distribution and cover all loss functions higher than a first threshold;

a loss function conforming to a gaussian distribution is used to construct a matrix of loss functions.

6. The online hard positive-negative sample mining method of claim 4, wherein in step S3, the step of sampling the sample face pair corresponding to the feature vector pair in the feature vector pool by a preset sampling strategy comprises:

when training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a preset first sampling method to obtain a target sample set containing hard positive and negative sample face pairs in current iteration;

and judging whether the sampling method needs to be switched or not during each iteration in the training process, wherein the judging method comprises the following steps:

when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to a second sampling method for sampling; wherein e% is a preset percentage threshold;

Or when the value of the largest of the loss functions corresponding to the feature vector pair in the feature vector pool or the value of the largest of the loss function matrixes is smaller than f, switching to the second sampling method for sampling; wherein f is more than 0 and less than 0.5.

7. The online hard positive and negative sample mining method according to claim 6, wherein the first sampling method comprises the following steps of A1-A2:

a1, selecting a sampling area in the loss function matrix by using the preset sampling kernel, and calculating the sum of the loss functions in the sampling area;

a2, selecting sample face pairs corresponding to Top N loss functions with the largest value from the loss functions of the sampling area according to the integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set;

the second sampling method comprises the following steps of B1-B2:

b1, selecting a sampling area in the loss function matrix by using the preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area;

and B2, selecting sample face pairs corresponding to the Top M loss functions with the largest values from the loss functions of the sampling area according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.

8. The method for mining hard positive-negative samples on line as claimed in claim 7, wherein the weighted sum w of the loss functions in the sample area is determined in B1 by the following formula:

9. The hard positive and negative sample online mining method according to claim 7, further comprising:

slicing the loss function matrix to obtain a slice matrix with a preset size;

superposing the slice matrixes to obtain a three-dimensional loss function matrix A, B and C, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrix comprises A, B loss functions, and C represents the number of the slice matrixes;

and performing sampling on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling kernel formed by overlapping C preset sampling kernels by using the first sampling method and the second sampling method to obtain the target sample set.

10. The online hard positive and negative sample mining method according to claim 7, wherein in the second sampling method, the preset sampling core is randomly masked, and the random masking comprises:

firstly, the largest loss function in the current iteration sampling area is obtained as a mask value, and then a position is randomly selected from a sampling core to be masked by using the mask value, so that the loss function of the masked position is output as the mask value in the sampling process.

11. The hard positive-negative sample online mining method according to claim 1, wherein the step S4 of sharing the adjusted model parameters to the first graphics processor comprises:

calculating the difference value between the model parameter adjusted by the second graphic processor in the current iteration and the model parameter of the first graphic processor;

and transmitting the difference value to the first graphics processor to be added with the model parameter of the first graphics processor, namely sharing the model parameter adjusted by the second graphics processor to the first graphics processor.

12. A difficult positive and negative sample online excavating device is used for a training process of a face recognition model and is characterized by comprising a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.

13. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, is capable of implementing the steps of the hard positive and negative sample online mining method according to any one of claims 1 to 11.

14. A face recognition method comprising a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method of any one of claims 1 to 11.