CN111860364A - Training method and device of face recognition model, electronic equipment and storage medium - Google Patents

Training method and device of face recognition model, electronic equipment and storage medium Download PDF

Info

Publication number
CN111860364A
CN111860364A CN202010722196.0A CN202010722196A CN111860364A CN 111860364 A CN111860364 A CN 111860364A CN 202010722196 A CN202010722196 A CN 202010722196A CN 111860364 A CN111860364 A CN 111860364A
Authority
CN
China
Prior art keywords
face recognition
vector
recognition model
scaling
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010722196.0A
Other languages
Chinese (zh)
Inventor
沈涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ctrip Computer Technology Shanghai Co Ltd
Original Assignee
Ctrip Computer Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ctrip Computer Technology Shanghai Co Ltd filed Critical Ctrip Computer Technology Shanghai Co Ltd
Priority to CN202010722196.0A priority Critical patent/CN111860364A/en
Publication of CN111860364A publication Critical patent/CN111860364A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of face recognition, and provides a training method and device of a face recognition model, electronic equipment and a storage medium. The training method of the face recognition model comprises the following steps: obtaining batch processing data volume of the face recognition model and the category number of a training set; constructing 0-1 distribution based on random numbers, and generating a parameter vector with the column number as the category number; adjusting a fixed scaling of a face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtaining a scaling matrix with the row number as the batch processing data volume and the column number as the category number according to the adjustable scaling vector; rescaling the output of the face recognition model by adopting the scaling matrix; and supervised training the face recognition model based on the rescaled output. According to the invention, the fixed scaling of the face recognition loss function is adjusted to generate the adjustable scaling vector capable of increasing the distance between the class center vectors, so that the recognition accuracy of the face recognition model is improved.

Description

Training method and device of face recognition model, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of face recognition, in particular to a training method and device of a face recognition model, electronic equipment and a storage medium.
Background
The ArcFace (Additive Angular space Loss for Deep Face Recognition) is the latest technology in the field of Face Recognition, the Loss function of the ArcFace is the Additive Angular space Loss function, and the Recognition accuracy of a Face Recognition model mainly depends on the design of the Loss function.
ArcFace loss improves inter-class separability and strengthens intra-class compactness by adding angle margin (angularmargin) on the basis of the traditional face recognition technology. However, the radius of the feature scaling is not set reasonably, only a fixed scaling ratio is adopted, so that all feature vectors and class center vectors are changed into the same vector length, even if all the feature vectors and class center vectors are scaled to a hyper-sphere with the radius of the fixed scaling ratio.
If the length before the eigenvector and the class center vector is maintained constant after the added angular margin is maintained, the scaling should be equal to the product of the lengths of the eigenvector and the class center vector, the eigenvector being determined based on the input data and the parameters of the convolutional layer, the length of which is constantly changing; the class center vector is a learnable parameter, the length of which also varies. It follows that it is not a good constraint to require that the product of the lengths of the feature vectors and the class center vectors be constant, equal to a fixed scale.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the invention and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present invention provides a training method and apparatus for a face recognition model, an electronic device, and a storage medium, which can generate an adjustable scaling vector capable of increasing a distance between class center vectors by adjusting a fixed scaling of a face recognition loss function, thereby improving a recognition accuracy of the face recognition model.
One aspect of the present invention provides a training method for a face recognition model, comprising the steps of: obtaining batch processing data volume of the face recognition model and the category number of a training set; constructing 0-1 distribution based on random numbers, and generating a parameter vector with the column number as the category number; adjusting a fixed scaling of a face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtaining a scaling matrix with the row number as the batch processing data volume and the column number as the category number according to the adjustable scaling vector; rescaling the output of the face recognition model by adopting the scaling matrix; and supervised training the face recognition model based on the rescaled output.
In some embodiments, the face recognition model is constructed based on a deep convolutional neural network, and the face recognition loss function is an additive angular interval loss function.
In some embodiments, the step of adjusting the fixed scaling of the face recognition loss function according to the parameter vector comprises: filtered _ S + selected _ vector S2 (1-cos θ)j) Wherein filtered _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of categories, S is the fixed scaling, selected _ vector is the parameter vector, θ isjIs the included angle of the class center vectors of two adjacent classes, and the class center vectors are obtained according to the output of the face recognition model.
In some embodiments, the step of adjusting the fixed scaling of the face recognition loss function according to the parameter vector comprises:
altered_s=S+selected_vector*S*2*(1-cos(θyimean (dim-0)), where scaled _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of categories, S is the fixed scaling, selected _ vector is the parameter vector, cos (θ) (q) is the parameter vectoryi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.
In some embodiments, the fixed scaling ratioExample values are: and S is 64, and the included angle between the class center vectors of the two adjacent classes is: thetaj71.61 ÷ 360 × 2 pi radians.
In some embodiments, constructing a 0-1 distribution based on random numbers, the step of generating a parameter vector having the number of columns as the number of categories comprises: taking a random seed to carry out 0 and 1 average distribution selection to generate the parameter vector; and registering the parameter vector as a fixed vector.
In some embodiments, rescaling the output of the face recognition model with the scaling matrix comprises: filtered-s' cos (θ)yi+ m), wherein scaled _ s' is the scaling matrix formed by repeating the batch data amount row by the adjustable scaling vector, cos (θ)yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.
In some embodiments, the step of supervised training of the face recognition model based on the rescaled output comprises: obtaining the prediction probability of the face recognition model through logistic regression Softmax according to the rescaled output; and obtaining a difference value between the prediction probability and a target probability based on a cross entropy loss function, and performing supervision training on the face recognition model until the face recognition model converges on the training set.
Another aspect of the present invention provides a training apparatus for a face recognition model, including: the initial data acquisition module is configured to acquire the batch processing data volume of the face recognition model and the category number of the training set; the parameter vector generation module is configured to construct 0-1 distribution based on random numbers and generate parameter vectors with the number of columns being the number of categories; a scaling adjustment module configured to adjust a fixed scaling of the face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtain a scaling matrix having a row number as the batch processing data amount and a column number as the category number according to the adjustable scaling vector; a feature rescaling module configured to rescale the output of the face recognition model using the scaling matrix; and a supervised training module configured to supervise train the face recognition model based on the rescaled output.
Yet another aspect of the present invention provides an electronic device including: a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the training method of a face recognition model according to any of the above embodiments by executing the executable instructions.
Yet another aspect of the present invention provides a computer-readable storage medium storing a program, wherein the program is configured to implement the steps of the training method of a face recognition model according to any of the above embodiments when executed.
Compared with the prior art, the invention has the beneficial effects that:
by adjusting the fixed scaling of the face recognition loss function, the adjustable scaling vector capable of increasing the distance between the class center vectors is generated, and the Euclidean distance between the characteristic vectors is increased, namely the distance of the characteristic vectors is farther, so that the recognition accuracy of the face recognition model is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a schematic diagram illustrating steps of a training method for a face recognition model according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a process of training a face recognition model based on modified ArcFace loss supervision according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating the principle of obtaining an adjustable scaling vector in an embodiment of the present invention;
FIG. 4 shows a schematic comparison of feature distributions of a face recognition model according to an embodiment of the present invention and an existing ArcFace;
FIG. 5 is a block diagram of an apparatus for training a face recognition model according to an embodiment of the present invention;
FIG. 6 is a schematic diagram showing a structure of an electronic apparatus according to an embodiment of the present invention; and
fig. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted.
Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The step numbers in the following method embodiments are only used for representing different execution contents, and do not limit the logical relationship and execution sequence between the steps.
The invention aims to solve the problems that the unreasonable constraint problem caused by fixed scaling is improved by changing the radius lengths of all class center vectors in an unreasonable fixed hypersphere, the resolution capability of a face recognition model on a face is enhanced by optimizing the design of the radius lengths, the separability of features is improved, and the recognition accuracy of the face recognition model is improved. The technical idea of the invention is that the fixed scaling in the ArcFace loss is optimized into the adjustable scaling vector, so that the hypersphere with fixed radius distance is changed into the hypersphere with different radius distance in view of the hypersphere space of the class center vector of different classes, the radius distance of the scaling of different classes is different, and the separability of the features is improved.
Fig. 1 shows the main steps of the training method of the face recognition model in the embodiment, and referring to fig. 1, the training method of the face recognition model in the embodiment mainly includes: in step S110, obtaining batch processing data size of the face recognition model and the number of classes of the training set; in step S120, a 0-1 distribution is constructed based on the random number, and a parameter vector having the number of columns as the number of categories is generated; in step S130, a fixed scaling ratio of the face recognition loss function is adjusted according to the parameter vector to obtain an adjustable scaling vector, and a scaling matrix with rows as a batch data amount and columns as a category number is obtained according to the adjustable scaling vector; in step S140, rescaling the output of the face recognition model by using a scaling matrix; and in step S150, supervising training of the face recognition model based on the rescaled output.
In the above embodiment, the face recognition model is constructed based on a Deep Convolutional Neural Network (DCNN), and the face recognition loss function is an additive angle interval loss function (arcfacells). FIG. 2 shows a process flow of supervised training of a face recognition model based on an improved ArcFace loss, and referring to FIG. 2, in the process of supervised training of a face recognition model by using the improved ArcFace loss of the present invention, a feature vector x is first processed by a process P210iAnd a class center vector wjNormalization, such that subsequent predictions depend only on the angle between the feature and the weight. Then, an additive angle interval penalty is performed through a process P220, and an angle interval value m is added to an included angle θ between the feature and the weight. Specifically, the feature vector xiAnd a class center vector wjIs thetajCalculating cos θjFinding the inverse cosine arccos (cos θ)yi) Obtain a feature vector xiWith the target class center vector WyiAngle theta therebetweenyiAdding the angular interval value m, thereby the eigenvector xiWith the target class center vector WyiIncreases the angle between them to thetayi+ m. The cosine cos (. theta.) is then obtained through the procedure P230yj+ m) and dot multiplied with the scaling matrix scaled _ s ', thus rescaling all logs based on the scaling matrix scaled _ s'. Finally, the log is input into a logistic regression Softmax function through a process P240 to obtain the probability of each category, the cross entropy loss is calculated through a cross entropy loss function, and the face recognition model is supervised and trained based on the cross entropy loss function.
The process P210, the process P220, and the process P240 all adopt the existing techniques, and therefore, the description thereof will not be repeated. The process P230, namely obtaining the adjustable scaling vector and rescaling the output of the face recognition model based on the scaling matrix, is described below.
First, the batch data size batch _ size of the face recognition model and the class number class _ num of the training set are obtained, and the batch data size batch _ size and the class number class _ num can be obtained according to the output of the face recognition model. In this embodiment, the face recognition model is constructed based on the residual neural network ResNet50, and the ith batch processing (batch) feature vector xiThe size of (is _ size, embedding _ size); w is equivalent to the weight of the fully-connected layer, which is learnable, the size of W is (embedding _ size, class _ num), and the class center vector WjIs (embedding, 1).
In one specific example, the training set uses a casca-webface, the optimizer makes SGD, lr 1e-1, weight _ decay 5e-4, momentum 0.9, training uses P100, batch _ size 128, and epoch 100.
Then, a 0-1 distribution is constructed based on the random numbers, a parameter vector selected _ vector is generated, and the parameter vector selected _ vector is registered as a fixed vector. Specifically, a random seed is taken, and an average distribution selection of 0 and 1 is made to generate a parameter vector selected _ vector of class _ num, which can be implemented by using a Pytorch (an open-source Python machine learning library), for example, and the parameter vector selected _ vector is registered as a buffer, because the parameter vector selected _ vector is a parameter that does not need to be learned and only needs to be fixed in the whole training process.
Because whether two class center vectors are adjacent or not cannot be determined, a random method is adopted, the number of classes is more, generally more than 1 ten thousand, and the random method can also achieve a better effect.
The fixed scaling of ArcFace loss is then adjusted based on the parameter vector. In one implementation, the value may be expressed by the formula "filtered _ S ═ S + selected _ vector × (1-cos θ ×) 2 × (S + 2 ×)j) "obtain the adjustable scaling vector scaled _ s. The number of columns of the scalable vector scaled _ s is class number class _ num, and the size of the class number class _ num is (1). The fixed scale S is referred to the experience of ArcFace and takes a value of 64. ThetajIs the angle between the class-center vectors of two adjacent classes, e.g. class-center vector wiAnd a class center vector wjThe included angle of (a). Reference to the experience in ArcFace, θjThe value may be 71.61 ÷ 360 × 2 pi radians. In another implementation, the value may be expressed by the formula "filtered _ S + selected _ vector S2 (1-cos (θ)yi+ m, mean (dim-0)) "adjust the fixed scaling of the face recognition loss function, where cos (θ)yi+ m) is the result after the angular margin has been added, with a size of (batch _ size, class _ num). I.e., cos (θ)yi+ m) is a cosine value of the sum of an angle between the current eigenvector and the center vector of the target class and an angle interval value, the cosine value is a matrix with the number of rows as batch data and the number of columns as class number, the current eigenvector is obtained according to the output of the face recognition model, that is, the current output eigenvector xi. mean (dim ═ 0) means that the 0 th dimension is averaged.
FIG. 3 illustrates the principle of obtaining an adjustable scaling vector in an embodiment, which is illustrated with reference to FIG. 3, with two adjacent class-center vectors wiAnd wjFor example. Class center vector wiEquivalent to ob, class center vector wjCorresponding to oc, the result of training the face recognition model is to let the feature vector xiClose to a certain class and far away from anotherA class, e.g. near class centre vector wiDistance class center vector wjAfter training feature vector xiIs relatively close to the class center vector wi, so the distance between the last different classes is the class center vector wiAnd a class center vector wjThe distance between these two different class-center vectors, i.e. the distance between ob and oc, is assumed to be the other class-center vector wkThis corresponds to oa.
The distance between the class center vectors in the original ArcFace is ab or bc, and if ab is bc, only the distance between the adjacent class center vectors is considered because a and c are not adjacent, and the distance between the class center vectors is ab or bc. Now delaying the ob vector by oe, the distance between the center vectors of neighboring classes is ae or ce, note that a and c also become adjacent, so the distance between the center vectors of neighboring classes is ae, ce or ac. If all three edges are equal, and ab bc has been assumed before, be ab or be bc is needed, so that the distance between all neighboring class center vectors is increased, i.e. the class center vector w is increasediAnd wjThe distance between them.
Meanwhile, it is noted that, during model training, the extracted face features are expressed as feature vectors xiRather than the class center vector wi. But average feature vector xi(i.e., embedding feature centers) and class center vector wiThe cross entropy loss function can also lead the feature vector x to be smaller in the training processiAnd a class center vector wiAre close in length, that is to say the eigenvector xiWill also be different, so that the feature vector xiThe distance between the two adjacent human faces is larger, namely the distance for extracting the embedded features is larger, so that the discrimination of the model is enhanced, and the face recognition effect is improved.
For the purpose of setting different lengths of class center vectors of different classes, the present embodiment changes the fixed scaling S into an adjustable scaling vector consisting of two different values, each element in the adjustable scaling vector represents a class, the smaller value of which can be set by training experience, such as 64, and the larger value of which can be set by adding up the larger value“2*(1-cosθj) "or" 2 (1-cos (theta))yiMean (dim-0)) "is implemented, thereby adding an additional benefit, namely the distance between class center vectors, and finally the feature vector x, on the basis of ArcFaceiThe distance between them.
It is further noted that due to scaling Sj=|xi|*|wjAlthough the objective is to set the class center vector w of different lengthsjBut in practice it is | xi|*|wjThis difference does not affect the scheme effect. Specifically, because ArcFace loss makes the feature vector xiAnd a class center vector wjTend to be uniform, i.e. the eigenvector xiAnd a class center vector wjThe angle between the two points is relatively small, and the cross entropy loss function can automatically make the value of the output point of the neural network Softmax labeled as the true corresponding position larger, so that the cross entropy loss function is smaller. While making the Softmax output larger adds the eigenvector xiAnd a class center vector wjIn addition to the angle of (2), another training direction is to make the feature vector xiAnd a class center vector wjIs the same, i.e. the scaling S is setjCan achieve setting | x simultaneouslyiI and | wjThe effect of | x is also realized in the processiThe size of | is set.
On the other hand, assuming that the lengths of different column vectors are set specifically when the weight W is initialized, there is no influence on Arcface, but for the present invention, since setting the lengths of different column vectors is equivalent to multiplication, which is interchangeable, feature rescaling is equivalent to restoring the lengths of different column vectors of the weight W, i.e. the vector rescaling can be logically treated as being only for the column vectors in the weight W and not for the feature x.
After obtaining the adjustable scaling vector scaled _ s, the adjustable scaling vector scaled _ s is repeated to the batch data size batch _ size, that is, the batch data size batch _ size is repeated by rows to obtain a scaling matrix scaled with a size of (batch _ size, class _ num)_s’。
Then, the scaling matrix filtered _ s' and the cosine value cos ((S))θyi+ m) dot product, completing rescaling. Specifically, by the formula "filtered-s'. cos (θ)yi+ m) "rescales the output of the face recognition model, where cos (θ)yi+ m) as mentioned above, the cosine value is the sum of the angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows as the batch data amount and the number of columns as the class number, and the current feature vector is obtained according to the output of the face recognition model.
And finally, importing the rescaled output into logistic regression Softmax and a cross entropy loss function to finish forward propagation. Specifically, according to the output after rescaling, the prediction probability of the face recognition model is obtained through logistic regression Softmax, and then the cross entropy loss is calculated by using the group Truth (real data, namely target probability) and the One Hot Vector (One-Hot code), namely the difference value between the prediction probability and the target probability is obtained, so that the face recognition model is supervised and trained until the face recognition model converges to a training set.
Fig. 4 shows a spatial comparison between the face recognition model trained by the above embodiment and the feature distribution of the conventional ArcFace, and referring to fig. 4, the features of the conventional ArcFace are distributed on the hypersphere 410 with the same radius, and the features of the face recognition model of the present invention are distributed on the hypersphere 420 with different radii, so that the euclidean distance between the feature vector and the class center vector is larger.
In summary, the present invention generates an adjustable scaling vector capable of increasing the distance between class center vectors by adjusting the fixed scaling of the face recognition loss function, thereby overcoming the drawback of fixed rescaling of features, and instead, the present invention increases the class center vector w by rescaling two different lengths of features of different classification categoriesiAnd wjAnd let the characteristic vector xiAnd xjThe Euclidean distance between the face recognition models is increased, namely the distance of the feature vectors is farther, so that the recognition accuracy of the face recognition models is improved.
The invention also provides a training device based on the training method described in the above embodiment, and fig. 5 shows the main modules of the training device of the face recognition model in the embodiment. Referring to fig. 5, the training apparatus 500 for a face recognition model in the present embodiment mainly includes: an initial data acquisition module 510 configured to acquire a batch processing data size of the face recognition model and a category number of the training set; a parameter vector generation module 520 configured to construct 0-1 distribution based on random numbers, and generate parameter vectors with the number of columns as the number of categories; a scaling adjustment module 530 configured to adjust a fixed scaling of the face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtain a scaling matrix with rows as batch data amount and columns as category number according to the adjustable scaling vector; a feature rescaling module 540 configured to rescale the output of the face recognition model using a scaling matrix; and a supervised training module 550 configured to supervise train the face recognition model based on the rescaled output.
The training device of the embodiment can generate the adjustable scaling vector capable of increasing the distance between the class center vectors by adjusting the fixed scaling of the face recognition loss function, and the Euclidean distance between the feature vectors is increased, namely, the feature vectors are farther away, so that the recognition accuracy of the face recognition model is improved.
The embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores executable instructions, and the processor is configured to execute the steps of the training method for a face recognition model in the foregoing embodiment by executing the executable instructions.
As described above, the electronic device of the present invention can generate an adjustable scaling vector that can increase the distance between the class center vectors by adjusting the fixed scaling of the face recognition loss function, and increase the euclidean distance between the feature vectors, that is, the feature vectors are farther apart, thereby improving the recognition accuracy of the face recognition model.
Fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention, and it should be understood that fig. 6 only schematically illustrates various modules, and these modules may be virtual software modules or actual hardware modules, and the combination, the splitting, and the addition of the remaining modules of these modules are within the scope of the present invention.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
The electronic device 600 of the present invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.
Wherein the storage unit stores a program code, which can be executed by the processing unit 610, so that the processing unit 610 performs the steps of the training method of the face recognition model described in the above embodiments. For example, processing unit 610 may perform the steps shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include programs/utilities 6204 including one or more program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700, and the external devices 700 may be one or more of a keyboard, a pointing device, a bluetooth device, and the like. The external devices 700 enable a user to interactively communicate with the electronic device 600. The electronic device 600 may also be capable of communicating with one or more other computing devices, including routers, modems. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.
The embodiment of the present invention further provides a computer-readable storage medium for storing a program, and when the program is executed, the steps of the training method for a face recognition model described in the above embodiment are implemented. In some possible embodiments, the various aspects of the present invention may also be implemented in the form of a program product, which includes program code for causing a terminal device to perform the steps of the training method for a face recognition model described in the above embodiments, when the program product is run on the terminal device.
As described above, the computer-readable storage medium of the present invention can generate an adjustable scaling vector that can increase the distance between the class center vectors by adjusting the fixed scaling of the face recognition loss function, and increase the euclidean distance between the feature vectors, that is, the feature vectors are farther apart, thereby improving the recognition accuracy of the face recognition model.
Fig. 7 is a schematic structural diagram of a computer-readable storage medium of the present invention. Referring to fig. 7, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of readable storage media include, but are not limited to: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device, such as through the internet using an internet service provider.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (11)

1. A training method of a face recognition model is characterized by comprising the following steps:
obtaining batch processing data volume of the face recognition model and the category number of a training set;
constructing 0-1 distribution based on random numbers, and generating a parameter vector with the column number as the category number;
adjusting a fixed scaling of a face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtaining a scaling matrix with the row number as the batch processing data volume and the column number as the category number according to the adjustable scaling vector;
rescaling the output of the face recognition model by adopting the scaling matrix; and
and based on the output after rescaling, supervising and training the face recognition model.
2. The training method of claim 1, wherein the face recognition model is constructed based on a deep convolutional neural network, and the face recognition loss function is an additive angular interval loss function.
3. The training method of claim 1, wherein the step of adjusting the fixed scaling of the face recognition loss function based on the parameter vector comprises:
altered_s=S+selected_vector*S*2*(1-cosθj),
wherein filtered _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of classes, S is the fixed scaling, selected _ vector is the parameter vector, θ isjThe included angle of the class center vectors of two adjacent classes is obtained according to the output of the face recognition model.
4. The training method of claim 1, wherein the step of adjusting the fixed scaling of the face recognition loss function based on the parameter vector comprises:
altered_s=S+selected_vector*S*2*(1-cos(θyi+m).mean(dim=0)),
wherein alternate _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of categories, S is the fixed scaling, selected _ vector is the parameter vector, cos (θ)yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.
5. A training method as claimed in claim 3 or 4, wherein the fixed scaling takes the values: and S is 64, and the included angle between the class center vectors of the two adjacent classes is: thetaj71.61 ÷ 360 × 2 pi radians.
6. The training method of claim 1, wherein the step of constructing a 0-1 distribution based on random numbers, and generating the parameter vector having the number of columns as the number of classes comprises:
taking a random seed to carry out 0 and 1 average distribution selection to generate the parameter vector; and
registering the parameter vector as a fixed vector.
7. The training method of claim 1, wherein rescaling the output of the face recognition model with the scaling matrix comprises:
altered_s’*cos(θyi+m),
wherein the scaled _ s' is the scaling matrix formed by repeating the batch data size row by the adjustable scaling vector, cos (θ)yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.
8. The training method of claim 1, wherein the step of supervised training of the face recognition model based on the rescaled output comprises:
obtaining the prediction probability of the face recognition model through logistic regression Softmax according to the rescaled output; and
and calculating a difference value between the prediction probability and the target probability based on a cross entropy loss function, and performing supervision training on the face recognition model until the face recognition model converges on the training set.
9. A training device for a face recognition model is characterized by comprising:
the initial data acquisition module is configured to acquire the batch processing data volume of the face recognition model and the category number of the training set;
the parameter vector generation module is configured to construct 0-1 distribution based on random numbers and generate parameter vectors with the number of columns being the number of categories;
a scaling adjustment module configured to adjust a fixed scaling of the face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtain a scaling matrix having a row number as the batch processing data amount and a column number as the category number according to the adjustable scaling vector;
a feature rescaling module configured to rescale the output of the face recognition model using the scaling matrix; and
and the supervision training module is configured to supervise and train the face recognition model based on the rescaled output.
10. An electronic device, comprising:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the training method of a face recognition model according to any one of claims 1 to 8 via execution of the executable instructions.
11. A computer-readable storage medium storing a program, characterized in that the program, when executed, implements the steps of a training method of a face recognition model according to any one of claims 1 to 8.
CN202010722196.0A 2020-07-24 2020-07-24 Training method and device of face recognition model, electronic equipment and storage medium Pending CN111860364A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010722196.0A CN111860364A (en) 2020-07-24 2020-07-24 Training method and device of face recognition model, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010722196.0A CN111860364A (en) 2020-07-24 2020-07-24 Training method and device of face recognition model, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111860364A true CN111860364A (en) 2020-10-30

Family

ID=72950004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010722196.0A Pending CN111860364A (en) 2020-07-24 2020-07-24 Training method and device of face recognition model, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111860364A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329619A (en) * 2020-11-04 2021-02-05 济南博观智能科技有限公司 Face recognition method and device, electronic equipment and readable storage medium
CN113076929A (en) * 2021-04-27 2021-07-06 东南大学 Angle allowance self-adaptive face recognition model training method
CN113077048A (en) * 2021-04-09 2021-07-06 上海西井信息科技有限公司 Seal matching method, system, equipment and storage medium based on neural network
CN114495243A (en) * 2022-04-06 2022-05-13 第六镜科技(成都)有限公司 Image recognition model training method, image recognition model training device, image recognition method, image recognition device and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329619A (en) * 2020-11-04 2021-02-05 济南博观智能科技有限公司 Face recognition method and device, electronic equipment and readable storage medium
CN113077048A (en) * 2021-04-09 2021-07-06 上海西井信息科技有限公司 Seal matching method, system, equipment and storage medium based on neural network
CN113077048B (en) * 2021-04-09 2023-04-18 上海西井信息科技有限公司 Seal matching method, system, equipment and storage medium based on neural network
CN113076929A (en) * 2021-04-27 2021-07-06 东南大学 Angle allowance self-adaptive face recognition model training method
CN114495243A (en) * 2022-04-06 2022-05-13 第六镜科技(成都)有限公司 Image recognition model training method, image recognition model training device, image recognition method, image recognition device and electronic equipment

Similar Documents

Publication Publication Date Title
CN111444340B (en) Text classification method, device, equipment and storage medium
CN111860364A (en) Training method and device of face recognition model, electronic equipment and storage medium
CN109952580B (en) Encoder-decoder model based on quasi-cyclic neural network
US10872273B2 (en) System and method for batch-normalized recurrent highway networks
CN112052948B (en) Network model compression method and device, storage medium and electronic equipment
CN112740200B (en) Systems and methods for end-to-end deep reinforcement learning based on coreference resolution
CN111414749A (en) Social text dependency syntactic analysis system based on deep neural network
CN111708871A (en) Dialog state tracking method and device and dialog state tracking model training method
US20230196067A1 (en) Optimal knowledge distillation scheme
CN111161238A (en) Image quality evaluation method and device, electronic device, and storage medium
Naeem et al. Complexity of deep convolutional neural networks in mobile computing
CN114997287A (en) Model training and data processing method, device, equipment and storage medium
CN117121016A (en) Granular neural network architecture search on low-level primitives
CN108475346A (en) Neural random access machine
CN113935396A (en) Manifold theory-based method and related device for resisting sample attack
CN113869005A (en) Pre-training model method and system based on sentence similarity
Yuan et al. Deep learning from a statistical perspective
CN112840358B (en) Cursor-based adaptive quantization for deep neural networks
CN112364198A (en) Cross-modal Hash retrieval method, terminal device and storage medium
CN113762459A (en) Model training method, text generation method, device, medium and equipment
Gurung et al. Decentralized quantum federated learning for metaverse: Analysis, design and implementation
CN116975347A (en) Image generation model training method and related device
US20230022151A1 (en) Full Attention with Sparse Computation Cost
CN114861671A (en) Model training method and device, computer equipment and storage medium
Chung et al. Simplifying deep neural networks for FPGA-like neuromorphic systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination