CN114764942A - Difficult positive and negative sample online mining method and face recognition method - Google Patents

Difficult positive and negative sample online mining method and face recognition method Download PDF

Info

Publication number
CN114764942A
CN114764942A CN202210555142.9A CN202210555142A CN114764942A CN 114764942 A CN114764942 A CN 114764942A CN 202210555142 A CN202210555142 A CN 202210555142A CN 114764942 A CN114764942 A CN 114764942A
Authority
CN
China
Prior art keywords
sample
sampling
pair
feature vector
pairs
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210555142.9A
Other languages
Chinese (zh)
Other versions
CN114764942B (en
Inventor
郑文先
陶映帆
杨文明
廖庆敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202210555142.9A priority Critical patent/CN114764942B/en
Publication of CN114764942A publication Critical patent/CN114764942A/en
Application granted granted Critical
Publication of CN114764942B publication Critical patent/CN114764942B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an online hard positive and negative sample mining method, which is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from the sample face pair by the first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second image processor, so that the second image processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first image processor.

Description

Difficult positive and negative sample online mining method and face recognition method
Technical Field
The invention relates to the crossing field of image processing and machine learning, in particular to a hard positive and negative sample mining method and a face recognition method for model training by applying the mining method.
Background
It is a common practice in the field of face recognition to improve the face recognition accuracy by performing model training using a large number of face pictures. Different training processes and methods are adopted, and the obtained model identification effect is different. The improvement of the training process and method to improve the accuracy of face recognition by the model is one of the research subjects of related scholars in the industry.
A face recognition model training framework based on metric learning (metric learning) principle and capable of utilizing millions of face data sets generally employs a metric learning algorithm combined with online mining. When a model is trained using a metric learning algorithm, a large number of identical face pairs (positive sample pairs) and different face pairs (negative sample pairs) need to be provided to the model. In the middle and later stages of training, the more difficult the sample pairs are, the more obvious the recognition performance of the model can be improved, and the training speed is higher. With the increase of the number of model training iterations, the difficulty level standard is evolving continuously, so that the difficult sample pairs must be mined online in the training process, and the difficulty level of the mined positive and negative sample pairs is positively correlated with the data volume participating in each training iteration.
In summary, in the training stage of the face recognition model, the hard positive and negative samples are trained by using the hard positive and negative samples online mining method, and the larger the data batch participating in the hard positive and negative samples online mining is, the more the accuracy of the face recognition algorithm is improved. However, in the existing training process, feature extraction, loss function calculation, gradient calculation and back propagation are mainly performed by a GPU (Graphics Processing Unit), while the GPU card has a limited memory capacity and a limited number of images that can be processed, and the data batch on a single GPU card is usually between dozens to hundreds of face images, which greatly limits the difficulty degree of mining hard positive and negative samples, thereby limiting the improvement of the face recognition model training efficiency.
Disclosure of Invention
The invention mainly aims to provide a method for mining hard positive and negative samples on line, which is used for solving the problems in the prior art and improving the efficiency of mining the hard positive and negative samples on line.
In order to achieve the above purpose, the invention provides the following technical solutions on one hand:
a hard positive and negative sample online mining method is used for a training process of a face recognition model and comprises the following steps: s1, acquiring a feature vector pair extracted from the sample face pair by the first graphic processor; s2, calculating a loss function of the sample face pair according to the feature vector pair; s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs; and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.
On the other hand, the invention provides the following technical scheme:
a difficult positive and negative sample online excavating device is used for the training process of a face recognition model and comprises a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.
The invention further provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the above hard positive and negative sample online mining method can be implemented.
The invention also provides a face recognition method, which comprises a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method.
The beneficial effects of the invention include: the method can be matched with a first GPU and a second GPU (graphics processing units) through a CPU, image feature calculation is carried out only through the first GPU, after a feature vector pair of a sample pair is obtained, loss function and gradient calculation are carried out through the CPU, and gradient back propagation is carried out through the second GPU sharing model parameters, so that the first GPU is independent of the size of a video memory, the feature vector pair of the sample pair can be continuously calculated in a pipeline mode, the online mining efficiency of difficult samples is improved, and the model training efficiency is further improved.
Drawings
FIG. 1 is a schematic diagram of an online hard positive and negative sample mining method according to an embodiment of the invention;
fig. 2 is a schematic diagram of a three-dimensional loss function matrix a × B × C according to an embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and detailed description of embodiments.
The noun explains:
firstly, sampling: the same type of image is a positive sample pair, such as a same face pair
② negative sample: the different types of images being pairs of negative examples, e.g. different pairs of faces
③ sample difficult to correct: the method refers to that the same face pair is recognized as a face pair of different faces by a model in the recognition process
Fourthly, difficult-to-load samples: the method refers to that different face pairs are recognized as face pairs of the same face by a model in the recognition process
Fifth, metric learning: in the model training process, the distance between the positive sample pairs is made as small as possible, and the distance between the negative sample pairs is made as large as possible. The hard sample pairs are at a greater distance in metric learning and therefore produce a greater gradient dip. The hard negative sample pairs are smaller in distance in metric learning and therefore also produce a larger gradient dip. A simple positive-negative sample pair provides no or a small gradient dip.
Sixthly, excavating positive and negative samples on line: in the model training process based on metric learning, after each iteration, the input batch data in the next iteration comprises hard positive and negative samples so as to improve the recognition capability of the model and the gradient descending speed of the model.
The embodiment of the invention provides an online hard positive and negative sample mining method which can be applied to a Central Processing Unit (CPU) and is applied to a training stage of a face recognition model. In this embodiment, the method may adopt the system architecture as shown in fig. 1, which includes one CPU and at least one group of GPUs, where the group of GPUs includes two GPUs, i.e., a first GPU and a second GPU. The first GPU and the second GPU are connected with the CPU in a bus mode, the first GPU is mainly responsible for feature extraction, features of the image are extracted and then transmitted to the CPU, loss functions and gradients are calculated in the CPU, hard samples are mined according to the loss functions, and samples with large loss functions are selected as the hard samples. And meanwhile, the CPU calculates the gradient and then transmits the gradient to a second GPU, so that the second GPU performs back propagation through the gradient, adjusts the model parameters and shares the model parameters to the first GPU. The loss function of each sample pair is between 0 and 1, and the greater the loss function is, the greater the difficulty of correctly identifying the identified model of the sample pair is.
Specifically, the sample face pairs are input to the first GPU for feature extraction, and each face image will obtain a corresponding feature vector, so that one sample face pair corresponds to two feature vectors, which are referred to as a feature vector pair herein. The CPU receives the pairs of feature vectors and places them in a pool of feature vectors while computing the penalty function for each pair of feature vectors. In a specific embodiment, the computation of the loss function is as follows:
When the sample face pair is a binary group (i.e. contains two face images), the loss function is:
Figure BDA0003654640020000041
wherein y is a label of the sample face pair, and when y is 0, the sample face pair is represented as a negative sample pair, and when y is 1, the sample face pair is represented as a positive sample pair; da,bThe measurement distance of the face images a and b is a binary group, and the space distance between the characteristic vectors of the face images a and b is represented; beta is a preset threshold value and is used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or whether the measurement distance of the binary group under the negative sample pair is large enough, and the value range is between 0.35 and 0.4; (beta-d)a,b)+Represents a change function of the form max (0, beta-d)a,b) When d isa,bMax (0, beta-d) when beta is not less than betaa,b) The value is 0; when d isa,bLess than beta, max (0, beta-d)a,b) Value of beta-da,b
It can be seen that when the sample face pair is a positive sample pair, the loss function is
Figure BDA0003654640020000042
The larger the loss function at this time, the larger the metric distance representing the dyad, and the more difficult it is to identify the correct pair of positive samples, which can be considered as a pair of hard-to-correct samples. When the sample face pair is a negative sample pair, the loss function is
Figure BDA0003654640020000043
At the preset value of beta, the larger the loss function is, the metric distance d of the binary group is shown a,bThe smaller the negative example pair is, the more difficult it is to be correctly identified as a negative example pair, which can be considered as a difficult negative example pair.
When the sample face pairs are triplets, the loss function can be expressed as:
L2=da,b+(da,b-da,c+β)+
the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; da,cThe measured distance between the face images a and c; (d)a,b-da,c+β)+Representing change functionOf the form max (0, d)a,b-da,c+ β) when (d)a,b-da,cWhen + beta) is greater than 0, the value is (d)a,b-da,c+ β), otherwise the value is 0.
It can be seen that when da,bThe smaller, da,cThe larger the loss function is, the larger the loss function is da,bAt this time: when the loss function is smaller, the measurement distance of the positive sample pair a and b is small, and the measurement distance of the negative sample pair a and c is large, the triple is a simple sample pair; when the loss function is larger, the measurement distance of the positive sample pair a and b becomes larger, and the positive sample pair a and b is a hard sample pair. On the contrary, when da,bThe larger, da,cThe smaller the loss function is da,b+(da,b-da,c+β)+At this time: when the loss function is larger, the measurement distance of the positive sample pair a and b is larger, and the measurement distance of the negative sample pair a and c is smaller, the triple is a difficult sample pair.
After the loss function is obtained through calculation, the CPU performs hard positive and negative sample mining according to the loss function to obtain a target sample set containing hard positive and negative sample face pairs. Specifically, when the number of pairs of feature vectors in the feature vector pool reaches a preset number (for example, 128 × 128 — 16384 pairs of feature vectors), a sample face pair corresponding to the pair of feature vectors in the feature vector pool is sampled by a preset sampling strategy to extract a hard-to-positive sample face pair, so as to obtain a target sample set. And then, calculating the gradient of each hard positive and negative sample face pair in the target sample set, transmitting the calculated gradient to a second GPU so that the second GPU can perform back propagation through the gradient, adjusting model parameters, sharing the adjusted model parameters to a first GPU, and performing face feature extraction by the first GPU according to the adjusted model parameters.
In some embodiments, the feature vector pool stores feature vector pairs that meet a condition (for example, a loss function is greater than a predetermined value), and when the number of the feature vector pairs that meet the condition reaches a certain number (which may be predetermined according to actual situations), for example, reaches 128 × 128 — 16384 feature vector pairs, the sample face pairs corresponding to the feature vector pairs in the feature vector pool are sampled by using a predetermined sampling policy.
In other embodiments, the loss functions may also be stored in the feature vector pool, and when the number of the loss functions stored in the feature vector pool (actually equivalent to the number of feature vector pairs) reaches a preset number, the loss functions in the feature vector pool are sampled by a preset sampling strategy, and a sample face pair corresponding to the extracted loss function is used as a hard-positive sample face pair, so as to form the target sample set.
Before sampling begins, the loss functions may be sorted to obtain a loss function matrix, for example, a 128 × 128 loss function matrix is obtained, and then sampling is performed in the loss function matrix through a preset sampling kernel. When sampling is started, a target sample set is initialized, the number of sample face pairs in the initialized target sample set is 0, and hard positive and negative samples obtained by sampling are added into the target sample set every time sampling is completed. It should be understood that if the feature vector pairs are stored in the feature vector pool, the loss functions corresponding to the stored feature vector pairs are arranged into a loss function matrix; if the loss functions are stored in the feature vector pool, the stored loss functions are arranged into a loss function matrix.
In some embodiments, when the loss functions are sorted, the loss functions may be preprocessed, where the preprocessing may be to calculate an average value and a standard deviation of the loss functions, and control distribution of the loss functions according to the average value and the standard deviation, so that the distribution of the loss functions conforms to gaussian distribution and covers all the loss functions higher than a first threshold; the first threshold is a preset threshold, and in this embodiment, the value is 0.8, it should be understood that the user may take other values between 0 and 1 according to the actual requirement, for example, some values around 0.8, and the invention is not limited thereto. And the loss functions conforming to the Gaussian distribution are used for constructing a loss function matrix, so that most of lower loss functions are removed, the balance between the loss functions higher than 0.8 and the loss functions lower than 0.2 is kept, and the balance of the samples is further ensured.
And then, sampling the loss function matrix, and adding a sample face pair corresponding to the sampled loss function into a target sample set as a hard positive and negative sample face pair. In the early stage of model training, a difficult sample pair with great gradient contribution exists, and the loss function is large and is close to 1; along with the increase of the iteration times, the recognition capability of the model is also increased, and the contribution of a difficult sample to the brought gradient is reduced compared with that in the early stage of training, so that a sampling method I can be adopted for sampling in the early stage of training; and in the later training stage, sampling can be performed by adopting a second sampling method. When training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a sampling method to obtain a target sample set containing hard positive and negative sample face pairs in current iteration; and judging whether the training process can be switched to a second sampling method or not during each iteration in the training process, wherein the judging method comprises the following steps: when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to a second sampling method for sampling; wherein e% is a preset percentage threshold; e can be a value between 0 and 60, preferably 50, depending on experience or practice. Or when the value of the largest of the loss functions (or the value of the largest of the loss function matrixes) corresponding to the feature vector pair in the feature vector pool is smaller than f, switching to the sampling method two for sampling; wherein f is more than 0 and less than 0.5.
The first sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the sum of loss functions in the sampling area; and selecting sample face pairs corresponding to the Top N loss functions with the largest values from the loss functions in the sampling area according to the integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set.
For example, in the first sampling method, the sampling kernel may be 3 × 3, and the sampling step size is 3. Specifically, the 3 × 3 sampling kernel has a 3 × 3 sampling region in the loss function matrix, and the sum of the loss functions of the 3 × 3 sampling region is calculated by the 3 × 3 sampling kernel, and the sum of the loss functions is a value between 0 and 9. And determining the sampling number N in the 3 x 3 sampling region according to the integer number N of the loss function sum, and selecting the Top N loss functions corresponding to the sample face pairs with the maximum value in the sampling region to be added into the target sample set. For example, when the sum of the loss functions is 8.3, the sample face pair corresponding to the loss function with the largest Top 8 values in the sample region may be selected and added to the target sample set.
The second sampling method can be summarized as follows: selecting a sampling area in the loss function matrix by using a preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area; and selecting sample face pairs corresponding to the Top M loss functions with the largest values from the loss functions in the sampling area according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.
For example, in the first sampling method, the sampling kernel may be 3 × 3, and the sampling step size is 3. Specifically, the 3 × 3 sampling kernel has a 3 × 3 sampling region in the loss function matrix, and the loss function weighted sum of the 3 × 3 sampling region is calculated by the 3 × 3 sampling kernel. According to the integer number M of the weighted sum of the loss functions, the sample face pair corresponding to the loss function with the largest Top M value can be selected from the sampling area and added into the target sample set. Wherein the weighted sum of the loss functions is related to the current number of iterations. The weighted sum w of the loss functions within the sample region can be calculated by the following equation:
Figure BDA0003654640020000071
wherein J is the current iteration number, TjIs the number of pairs of feature vectors in the feature vector pool in the current iteration, LtFor the loss function corresponding to the T-th eigenvector pair in the current iteration, TiIs the total number of sample face pairs, L, in the target sample set in the last iterationkIs the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.
In different training stages, different sampling methods are adopted, corresponding sample face pairs are sampled from the loss function matrix to a target sample set, and difficult sample pairs in different stages can be more accurately mined. Further, in the second sampling method, a dynamic weighting sampling method is adopted, so that difficult samples in the later training period can be mined more accurately. Compared with the scheme that the sample face pairs corresponding to Top N loss functions are directly adopted as the difficult samples in the prior art, and the gradient descending direction cannot be ensured, the dynamic selection method for the difficult samples by using the first sampling method and the second sampling method enables the number of the difficult samples to be more balanced, ensures the gradient descending direction, and improves the training efficiency. The gradient direction can affect the training effect of the model, when the sample is unbalanced, for example, only a difficult sample exists, the model can be trained only by using the difficult positive and negative samples, and the obtained model can learn to recognize the recognized face, so that the trained model is sensitive to the face difficult to recognize (the recognition accuracy is high), and is not sensitive to some simple face pairs (the recognition accuracy is poor).
The sampling method I and the sampling method II select that the sample face pairs corresponding to the Top N loss functions and the Top M loss functions are added into the target sample set, after the target sample set is obtained, the target sample set comprises the difficult sample face pairs with the large loss functions, in the training process, gradients corresponding to all the sample face pairs in the target sample set are issued to the second GPU for back propagation, and model parameters in the second GPU are adjusted. The first GPU and the second GPU are only hardware devices equipped with recognition models, and may also be referred to as training engines. It should be noted that, in the case that the CPU has enough computing resources, multiple metric learning models may be trained simultaneously, and multiple sets of GPUs are provided, where each metric learning model needs to be provided with one set of GPUs (including a first GPU and a second GPU). Fig. 1 is a schematic flow chart of simultaneous training of multiple metric learning models.
In other examples, the loss function matrix may be further sliced to obtain a slice matrix with a preset size; and then, the slice matrixes are superposed to obtain a three-dimensional loss function matrix A, B and C shown in figure 2, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrixes contain A, B loss functions, and C represents the number of the slice matrixes. For example, slicing the 128 × 128 loss function matrix and then stacking the sliced loss function matrix into a 64 × 4 three-dimensional loss function matrix, that is, slicing the 128 × 128 loss function matrix to obtain 4 64 × 64 slice matrices and then stacking the sliced loss function matrices, and certainly, it is also possible to slice the 128 × 128 loss function matrix into other sizes and corresponding numbers; and performing sampling by using a first sampling method and a second sampling method on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling core formed by overlapping C preset sampling cores to obtain the target sample set. For example, the sampling method in which the first sampling method and the second sampling method are combined is implemented on the three-dimensional loss function matrix a × B × C through the three-dimensional acquisition core 3 × C, and since the size of the three-dimensional loss function matrix is reduced, the area of the three-dimensional acquisition core to be sampled is reduced, the number of sliding times of the three-dimensional acquisition core can be reduced, and the sampling speed can be increased.
In addition, in the second sampling method, random masking may be performed on the sampling core, specifically, first, the largest loss function in the current iteration sampling region is obtained as a mask value, and then a position is randomly selected from the sampling core and masked with the mask value, so that the loss function of the masked position is output as the mask value in the sampling process. By random masking, a smaller loss function can be sampled at a chance, and the distribution of the loss function sampled by the sampling core is more balanced.
In some embodiments, when the second GPU shares the model parameter with the first GPU, in order to reduce the data transfer amount of the shared model parameter, the following operations may be performed: only the DIFF (difference value) between the model parameter adjusted by the second GPU and the current model parameter of the first GPU in the current iteration may be calculated, and then only the difference value is transmitted to the first GPU to be added to the current model parameter of the first GPU, so that the model parameter identical to that of the second GPU is obtained in the first GPU, that is, the model parameter adjusted by the second GPU is shared to the first GPU. For example, the current model parameter of the first GPU is (1, 2, 3, 4, 5, 6, 7, 8, 9), the model parameter of the second GPU after back propagation is (1, 3, 3, 4, 5, 6, 7, 8, 9), the difference value may be (1-1, 3-2, 3-3, 4-4, 5-5, 6-6, 7-7, 8-8, 9-9) ═ 0, 1, 0, 0, 0, 0, and the difference value (0, 1, 0, 0, 0, 0) is only transmitted to the first GPU and added to the current model parameter of the first GPU to obtain the same model parameter as the second GPU, so as to implement sharing, and this way of only transmitting the difference value reduces the data transmission amount of parameter sharing.
In another embodiment of the present invention, an online hard positive and negative sample mining device is provided, which is used for a training process of a face recognition model, and includes a Central Processing Unit (CPU), and a first graphics processing unit (first GPU) and a second graphics processing unit (second GPU) connected to the CPU; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.
It should be understood that, in the above apparatus, the first GPU, the second GPU and the CPU may be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. For example, the CPU is configured to perform hard positive and negative sample mining, loss function calculation, and gradient calculation, and the specific mining steps and calculation steps can be configured according to the corresponding steps in the hard positive and negative sample online mining method of the foregoing embodiment. The detailed description is omitted, and it should be understood by those skilled in the art that the apparatus is an apparatus corresponding to the hard positive and negative sample online mining method of the foregoing embodiment.
Furthermore, an embodiment of the present invention may further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the hard positive and negative sample online mining method of the foregoing embodiment can be implemented. A computer readable storage medium may include, among other things, a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied in a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Other embodiments of the present invention further provide a face recognition method, which includes a training process of a face recognition model, where the training process includes the steps of the above hard positive and negative sample online mining method.
The foregoing is a further detailed description of the invention in connection with specific preferred embodiments and it is not intended to limit the invention to the specific embodiments described. It will be apparent to those skilled in the art that various equivalent substitutions and obvious modifications can be made without departing from the spirit of the invention, and all changes that come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims (14)

1. A hard positive and negative sample online mining method is used for a training process of a face recognition model, and is characterized by comprising the following steps:
s1, acquiring a feature vector pair extracted from the sample face pair by the first graphic processor;
s2, calculating a loss function of the sample face pair according to the feature vector pair;
s3, mining hard positive and negative samples of the sample face pairs according to the loss function to obtain a target sample set containing the hard positive and negative sample face pairs;
and S4, calculating the gradient of each hard positive and negative sample face pair in the target sample set, and transmitting the gradient to a second graphics processor, so that the second graphics processor performs back propagation through the gradient, adjusts model parameters, and shares the adjusted model parameters to the first graphics processor.
2. The online hard positive and negative sample mining method according to claim 1, wherein in step S2, when the sample face pair is a binary group, the loss function is:
Figure FDA0003654640010000011
wherein y is a label of the sample face pair, and when y is 0, it indicates that the sample face pair is a negative sample pair, and when y is 1, it indicates that the sample face pair is a positive sample pair; d is a radical ofa,bThe measurement distance of a face image a and b is a binary group, and the space distance between the characteristic vectors of the face image a and b is represented; beta is a preset threshold value used for measuring whether the measurement distance of the binary group under the positive sample pair is small enough or measuring whether the measurement distance of the binary group under the negative sample pair is large enough; (beta-d)a,b)+Represents a change function of the form max (0, beta-d)a,b) When d isa,bMax (0, beta-d) when beta is not less than betaa,b) The value is 0; when d isa,bLess than beta, max (0, beta-d)a,b) Value of beta-da,b
When the sample face pairs are triplets, the loss function is:
L2=da,b+(da,b-da,c+β)+
the triple comprises face images a, b and c, wherein a and b form a positive sample pair, and a and c form a negative sample pair; da,cIs the measurement distance between the face images a and c; (d)a,b-da,c+β)+Represents a change function of the form max (0, d)a,b-da,c+ β, when (d)a,b-da,cWhen + beta) is greater than 0, the value is (d) a,b-da,c+β),Otherwise, the value is 0.
3. The online hard positive-negative sample mining method according to claim 1, wherein the step S3 specifically comprises: adding the feature vector pairs of the sample face pairs into a feature vector pool; and when the number of the feature vector pairs in the feature vector pool reaches a preset number, sampling the sample face pairs corresponding to the feature vector pairs in the feature vector pool by using a preset sampling strategy to obtain the target sample set.
4. The online hard positive-negative sample mining method according to claim 3, wherein in the step S3, when the sampling is performed, the loss functions corresponding to the eigenvector pairs in the eigenvector pool are sorted into a loss function matrix, and then a preset sampling kernel is used to perform sampling in the loss function matrix, so as to obtain the target sample set.
5. The method for mining hard positive and negative samples on line according to claim 4, wherein the step of arranging the loss functions corresponding to the feature vector pairs in the feature vector pool into a loss function matrix comprises the following steps:
calculating the average value and the standard deviation of the loss function corresponding to the feature vector pairs in the feature vector pool;
Controlling the distribution of the loss functions corresponding to the eigenvectors in the eigenvector pool according to the average value and the standard deviation to enable the loss functions to be in accordance with Gaussian distribution and cover all loss functions higher than a first threshold;
a loss function conforming to a gaussian distribution is used to construct a matrix of loss functions.
6. The online hard positive-negative sample mining method of claim 4, wherein in step S3, the step of sampling the sample face pair corresponding to the feature vector pair in the feature vector pool by a preset sampling strategy comprises:
when training starts, sampling sample face pairs corresponding to the feature vector pairs in the feature vector pool by a preset first sampling method to obtain a target sample set containing hard positive and negative sample face pairs in current iteration;
and judging whether the sampling method needs to be switched or not during each iteration in the training process, wherein the judging method comprises the following steps:
when the number of the face pairs of the samples in the target sample set obtained by the first sampling method is less than e% of the number of the feature vector pairs in the feature vector pool, switching to a second sampling method for sampling; wherein e% is a preset percentage threshold;
Or when the value of the largest of the loss functions corresponding to the feature vector pair in the feature vector pool or the value of the largest of the loss function matrixes is smaller than f, switching to the second sampling method for sampling; wherein f is more than 0 and less than 0.5.
7. The online hard positive and negative sample mining method according to claim 6, wherein the first sampling method comprises the following steps of A1-A2:
a1, selecting a sampling area in the loss function matrix by using the preset sampling kernel, and calculating the sum of the loss functions in the sampling area;
a2, selecting sample face pairs corresponding to Top N loss functions with the largest value from the loss functions of the sampling area according to the integer number N of the sum of the loss functions, and adding the sample face pairs into the target sample set;
the second sampling method comprises the following steps of B1-B2:
b1, selecting a sampling area in the loss function matrix by using the preset sampling kernel, and calculating the weighted sum of the loss functions in the sampling area;
and B2, selecting sample face pairs corresponding to the Top M loss functions with the largest values from the loss functions of the sampling area according to the integer number M of the weighted sum, and adding the sample face pairs into the target sample set.
8. The method for mining hard positive-negative samples on line as claimed in claim 7, wherein the weighted sum w of the loss functions in the sample area is determined in B1 by the following formula:
Figure FDA0003654640010000031
wherein J is the current iteration number, TjIs the number of pairs of feature vectors in the feature vector pool in the current iteration, LtFor the loss function corresponding to the T-th eigenvector pair in the current iteration, TiIs the total number of sample face pairs, L, in the target sample set in the last iterationkIs the loss function corresponding to the kth feature vector pair in the sample face pair in the target sample set in the last iteration.
9. The hard positive and negative sample online mining method according to claim 7, further comprising:
slicing the loss function matrix to obtain a slice matrix with a preset size;
superposing the slice matrixes to obtain a three-dimensional loss function matrix A, B and C, wherein A, B represents the size of the slice matrixes, one slice matrix of A, B represents that the slice matrix comprises A, B loss functions, and C represents the number of the slice matrixes;
and performing sampling on the three-dimensional loss function matrix A, B and C by using a three-dimensional sampling kernel formed by overlapping C preset sampling kernels by using the first sampling method and the second sampling method to obtain the target sample set.
10. The online hard positive and negative sample mining method according to claim 7, wherein in the second sampling method, the preset sampling core is randomly masked, and the random masking comprises:
firstly, the largest loss function in the current iteration sampling area is obtained as a mask value, and then a position is randomly selected from a sampling core to be masked by using the mask value, so that the loss function of the masked position is output as the mask value in the sampling process.
11. The hard positive-negative sample online mining method according to claim 1, wherein the step S4 of sharing the adjusted model parameters to the first graphics processor comprises:
calculating the difference value between the model parameter adjusted by the second graphic processor in the current iteration and the model parameter of the first graphic processor;
and transmitting the difference value to the first graphics processor to be added with the model parameter of the first graphics processor, namely sharing the model parameter adjusted by the second graphics processor to the first graphics processor.
12. A difficult positive and negative sample online excavating device is used for a training process of a face recognition model and is characterized by comprising a central processing unit, a first graphic processor and a second graphic processor, wherein the first graphic processor and the second graphic processor are connected with the central processing unit; the first graphics processor is configured to: extracting feature vector pairs from the sample face pairs; the central processor unit is configured to: calculating a loss function of the sample face pair according to the feature vector pair, mining hard positive and negative samples of the sample face pair according to the loss function to obtain a target sample set containing the hard positive and negative sample face pair, and calculating the gradient of each hard positive and negative sample face pair in the target sample set; the second graphics processor is configured to: receiving the gradient from the central processing unit, performing back propagation through the gradient, adjusting model parameters, and sharing the adjusted model parameters to the first graphics processor.
13. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, is capable of implementing the steps of the hard positive and negative sample online mining method according to any one of claims 1 to 11.
14. A face recognition method comprising a training process of a face recognition model, wherein the training process comprises the steps of the hard positive and negative sample online mining method of any one of claims 1 to 11.
CN202210555142.9A 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method Active CN114764942B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210555142.9A CN114764942B (en) 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210555142.9A CN114764942B (en) 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method

Publications (2)

Publication Number Publication Date
CN114764942A true CN114764942A (en) 2022-07-19
CN114764942B CN114764942B (en) 2022-12-09

Family

ID=82364980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210555142.9A Active CN114764942B (en) 2022-05-20 2022-05-20 Difficult positive and negative sample online mining method and face recognition method

Country Status (1)

Country Link
CN (1) CN114764942B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558057A (en) * 2024-01-12 2024-02-13 清华大学深圳国际研究生院 Face recognition method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321777A1 (en) * 2014-06-20 2016-11-03 Tencent Technology (Shenzhen) Company Limited Data parallel processing method and apparatus based on multiple graphic processing units
US20170024849A1 (en) * 2015-07-23 2017-01-26 Sony Corporation Learning convolution neural networks on heterogeneous cpu-gpu platform
CN107330355A (en) * 2017-05-11 2017-11-07 中山大学 A kind of depth pedestrian based on positive sample Constraints of Equilibrium identification method again
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
US20190188560A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation Multi-gpu deep learning using cpus
CN110163265A (en) * 2019-04-30 2019-08-23 腾讯科技(深圳)有限公司 Data processing method, device and computer equipment
CN111667050A (en) * 2020-04-21 2020-09-15 佳都新太科技股份有限公司 Metric learning method, device, equipment and storage medium
US20210318878A1 (en) * 2019-10-12 2021-10-14 Baidu Usa Llc Method and system for accelerating ai training with advanced interconnect technologies
CN113569657A (en) * 2021-07-05 2021-10-29 浙江大华技术股份有限公司 Pedestrian re-identification method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321777A1 (en) * 2014-06-20 2016-11-03 Tencent Technology (Shenzhen) Company Limited Data parallel processing method and apparatus based on multiple graphic processing units
US20170024849A1 (en) * 2015-07-23 2017-01-26 Sony Corporation Learning convolution neural networks on heterogeneous cpu-gpu platform
CN107330355A (en) * 2017-05-11 2017-11-07 中山大学 A kind of depth pedestrian based on positive sample Constraints of Equilibrium identification method again
US20190188560A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation Multi-gpu deep learning using cpus
CN108647577A (en) * 2018-04-10 2018-10-12 华中科技大学 A kind of pedestrian's weight identification model that adaptive difficult example is excavated, method and system
CN110163265A (en) * 2019-04-30 2019-08-23 腾讯科技(深圳)有限公司 Data processing method, device and computer equipment
US20210318878A1 (en) * 2019-10-12 2021-10-14 Baidu Usa Llc Method and system for accelerating ai training with advanced interconnect technologies
CN111667050A (en) * 2020-04-21 2020-09-15 佳都新太科技股份有限公司 Metric learning method, device, equipment and storage medium
CN113569657A (en) * 2021-07-05 2021-10-29 浙江大华技术股份有限公司 Pedestrian re-identification method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558057A (en) * 2024-01-12 2024-02-13 清华大学深圳国际研究生院 Face recognition method
CN117558057B (en) * 2024-01-12 2024-04-16 清华大学深圳国际研究生院 Face recognition method

Also Published As

Publication number Publication date
CN114764942B (en) 2022-12-09

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN109902546A (en) Face identification method, device and computer-readable medium
CN110969250A (en) Neural network training method and device
CN107292352B (en) Image classification method and device based on convolutional neural network
CN109299716A (en) Training method, image partition method, device, equipment and the medium of neural network
CN108764466A (en) Convolutional neural networks hardware based on field programmable gate array and its accelerated method
CN111401516A (en) Neural network channel parameter searching method and related equipment
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
CN113111979B (en) Model training method, image detection method and detection device
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN109784372A (en) A kind of objective classification method based on convolutional neural networks
CN108764176A (en) A kind of action sequence recognition methods, system and equipment and storage medium
CN116229056A (en) Semantic segmentation method, device and equipment based on double-branch feature fusion
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
CN111931867B (en) New coronary pneumonia X-ray image classification method and system based on lightweight model
CN114764942B (en) Difficult positive and negative sample online mining method and face recognition method
CN111914908A (en) Image recognition model training method, image recognition method and related equipment
CN110175506A (en) Pedestrian based on parallel dimensionality reduction convolutional neural networks recognition methods and device again
CN109886317B (en) General image aesthetic evaluation method, system and equipment based on attention mechanism
CN111429414B (en) Artificial intelligence-based focus image sample determination method and related device
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based
EP3771999A1 (en) Method and apparatus for extracting image data in parallel from multiple convolution windows, device, and computer-readable storage medium
CN110796716A (en) Image coloring method based on multiple residual error networks and regularized transfer learning
CN116205883A (en) PCB surface defect detection method, system, electronic equipment and medium
CN111860054A (en) Convolutional network training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant