CN111860364A

CN111860364A - Training method and device of face recognition model, electronic equipment and storage medium

Info

Publication number: CN111860364A
Application number: CN202010722196.0A
Authority: CN
Inventors: 沈涛
Original assignee: Ctrip Computer Technology Shanghai Co Ltd
Current assignee: Ctrip Computer Technology Shanghai Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-10-30

Abstract

The invention relates to the technical field of face recognition, and provides a training method and device of a face recognition model, electronic equipment and a storage medium. The training method of the face recognition model comprises the following steps: obtaining batch processing data volume of the face recognition model and the category number of a training set; constructing 0-1 distribution based on random numbers, and generating a parameter vector with the column number as the category number; adjusting a fixed scaling of a face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtaining a scaling matrix with the row number as the batch processing data volume and the column number as the category number according to the adjustable scaling vector; rescaling the output of the face recognition model by adopting the scaling matrix; and supervised training the face recognition model based on the rescaled output. According to the invention, the fixed scaling of the face recognition loss function is adjusted to generate the adjustable scaling vector capable of increasing the distance between the class center vectors, so that the recognition accuracy of the face recognition model is improved.

Description

Training method and device of face recognition model, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of face recognition, in particular to a training method and device of a face recognition model, electronic equipment and a storage medium.

Background

The ArcFace (Additive Angular space Loss for Deep Face Recognition) is the latest technology in the field of Face Recognition, the Loss function of the ArcFace is the Additive Angular space Loss function, and the Recognition accuracy of a Face Recognition model mainly depends on the design of the Loss function.

ArcFace loss improves inter-class separability and strengthens intra-class compactness by adding angle margin (angularmargin) on the basis of the traditional face recognition technology. However, the radius of the feature scaling is not set reasonably, only a fixed scaling ratio is adopted, so that all feature vectors and class center vectors are changed into the same vector length, even if all the feature vectors and class center vectors are scaled to a hyper-sphere with the radius of the fixed scaling ratio.

If the length before the eigenvector and the class center vector is maintained constant after the added angular margin is maintained, the scaling should be equal to the product of the lengths of the eigenvector and the class center vector, the eigenvector being determined based on the input data and the parameters of the convolutional layer, the length of which is constantly changing; the class center vector is a learnable parameter, the length of which also varies. It follows that it is not a good constraint to require that the product of the lengths of the feature vectors and the class center vectors be constant, equal to a fixed scale.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the invention and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of this, the present invention provides a training method and apparatus for a face recognition model, an electronic device, and a storage medium, which can generate an adjustable scaling vector capable of increasing a distance between class center vectors by adjusting a fixed scaling of a face recognition loss function, thereby improving a recognition accuracy of the face recognition model.

One aspect of the present invention provides a training method for a face recognition model, comprising the steps of: obtaining batch processing data volume of the face recognition model and the category number of a training set; constructing 0-1 distribution based on random numbers, and generating a parameter vector with the column number as the category number; adjusting a fixed scaling of a face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtaining a scaling matrix with the row number as the batch processing data volume and the column number as the category number according to the adjustable scaling vector; rescaling the output of the face recognition model by adopting the scaling matrix; and supervised training the face recognition model based on the rescaled output.

In some embodiments, the face recognition model is constructed based on a deep convolutional neural network, and the face recognition loss function is an additive angular interval loss function.

In some embodiments, the step of adjusting the fixed scaling of the face recognition loss function according to the parameter vector comprises: filtered _ S + selected _ vector S2 (1-cos θ)_j) Wherein filtered _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of categories, S is the fixed scaling, selected _ vector is the parameter vector, θ is_jIs the included angle of the class center vectors of two adjacent classes, and the class center vectors are obtained according to the output of the face recognition model.

In some embodiments, the step of adjusting the fixed scaling of the face recognition loss function according to the parameter vector comprises:

altered_s＝S+selected_vector*S*2*(1-cos(θ_yimean (dim-0)), where scaled _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of categories, S is the fixed scaling, selected _ vector is the parameter vector, cos (θ) (q) is the parameter vector_yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.

In some embodiments, the fixed scaling ratioExample values are: and S is 64, and the included angle between the class center vectors of the two adjacent classes is: theta_j71.61 ÷ 360 × 2 pi radians.

In some embodiments, constructing a 0-1 distribution based on random numbers, the step of generating a parameter vector having the number of columns as the number of categories comprises: taking a random seed to carry out 0 and 1 average distribution selection to generate the parameter vector; and registering the parameter vector as a fixed vector.

In some embodiments, rescaling the output of the face recognition model with the scaling matrix comprises: filtered-s' cos (θ)_yi+ m), wherein scaled _ s' is the scaling matrix formed by repeating the batch data amount row by the adjustable scaling vector, cos (θ)_yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.

In some embodiments, the step of supervised training of the face recognition model based on the rescaled output comprises: obtaining the prediction probability of the face recognition model through logistic regression Softmax according to the rescaled output; and obtaining a difference value between the prediction probability and a target probability based on a cross entropy loss function, and performing supervision training on the face recognition model until the face recognition model converges on the training set.

Another aspect of the present invention provides a training apparatus for a face recognition model, including: the initial data acquisition module is configured to acquire the batch processing data volume of the face recognition model and the category number of the training set; the parameter vector generation module is configured to construct 0-1 distribution based on random numbers and generate parameter vectors with the number of columns being the number of categories; a scaling adjustment module configured to adjust a fixed scaling of the face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtain a scaling matrix having a row number as the batch processing data amount and a column number as the category number according to the adjustable scaling vector; a feature rescaling module configured to rescale the output of the face recognition model using the scaling matrix; and a supervised training module configured to supervise train the face recognition model based on the rescaled output.

Yet another aspect of the present invention provides an electronic device including: a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the training method of a face recognition model according to any of the above embodiments by executing the executable instructions.

Yet another aspect of the present invention provides a computer-readable storage medium storing a program, wherein the program is configured to implement the steps of the training method of a face recognition model according to any of the above embodiments when executed.

Compared with the prior art, the invention has the beneficial effects that:

by adjusting the fixed scaling of the face recognition loss function, the adjustable scaling vector capable of increasing the distance between the class center vectors is generated, and the Euclidean distance between the characteristic vectors is increased, namely the distance of the characteristic vectors is farther, so that the recognition accuracy of the face recognition model is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic diagram illustrating steps of a training method for a face recognition model according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a process of training a face recognition model based on modified ArcFace loss supervision according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating the principle of obtaining an adjustable scaling vector in an embodiment of the present invention;

FIG. 4 shows a schematic comparison of feature distributions of a face recognition model according to an embodiment of the present invention and an existing ArcFace;

FIG. 5 is a block diagram of an apparatus for training a face recognition model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram showing a structure of an electronic apparatus according to an embodiment of the present invention; and

fig. 7 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The step numbers in the following method embodiments are only used for representing different execution contents, and do not limit the logical relationship and execution sequence between the steps.

The invention aims to solve the problems that the unreasonable constraint problem caused by fixed scaling is improved by changing the radius lengths of all class center vectors in an unreasonable fixed hypersphere, the resolution capability of a face recognition model on a face is enhanced by optimizing the design of the radius lengths, the separability of features is improved, and the recognition accuracy of the face recognition model is improved. The technical idea of the invention is that the fixed scaling in the ArcFace loss is optimized into the adjustable scaling vector, so that the hypersphere with fixed radius distance is changed into the hypersphere with different radius distance in view of the hypersphere space of the class center vector of different classes, the radius distance of the scaling of different classes is different, and the separability of the features is improved.

Fig. 1 shows the main steps of the training method of the face recognition model in the embodiment, and referring to fig. 1, the training method of the face recognition model in the embodiment mainly includes: in step S110, obtaining batch processing data size of the face recognition model and the number of classes of the training set; in step S120, a 0-1 distribution is constructed based on the random number, and a parameter vector having the number of columns as the number of categories is generated; in step S130, a fixed scaling ratio of the face recognition loss function is adjusted according to the parameter vector to obtain an adjustable scaling vector, and a scaling matrix with rows as a batch data amount and columns as a category number is obtained according to the adjustable scaling vector; in step S140, rescaling the output of the face recognition model by using a scaling matrix; and in step S150, supervising training of the face recognition model based on the rescaled output.

In the above embodiment, the face recognition model is constructed based on a Deep Convolutional Neural Network (DCNN), and the face recognition loss function is an additive angle interval loss function (arcfacells). FIG. 2 shows a process flow of supervised training of a face recognition model based on an improved ArcFace loss, and referring to FIG. 2, in the process of supervised training of a face recognition model by using the improved ArcFace loss of the present invention, a feature vector x is first processed by a process P210_iAnd a class center vector w_jNormalization, such that subsequent predictions depend only on the angle between the feature and the weight. Then, an additive angle interval penalty is performed through a process P220, and an angle interval value m is added to an included angle θ between the feature and the weight. Specifically, the feature vector x_iAnd a class center vector w_jIs theta_jCalculating cos θ_jFinding the inverse cosine arccos (cos θ)_yi) Obtain a feature vector x_iWith the target class center vector W_yiAngle theta therebetween_yiAdding the angular interval value m, thereby the eigenvector x_iWith the target class center vector W_yiIncreases the angle between them to theta_yi+ m. The cosine cos (. theta.) is then obtained through the procedure P230_yj+ m) and dot multiplied with the scaling matrix scaled _ s ', thus rescaling all logs based on the scaling matrix scaled _ s'. Finally, the log is input into a logistic regression Softmax function through a process P240 to obtain the probability of each category, the cross entropy loss is calculated through a cross entropy loss function, and the face recognition model is supervised and trained based on the cross entropy loss function.

The process P210, the process P220, and the process P240 all adopt the existing techniques, and therefore, the description thereof will not be repeated. The process P230, namely obtaining the adjustable scaling vector and rescaling the output of the face recognition model based on the scaling matrix, is described below.

First, the batch data size batch _ size of the face recognition model and the class number class _ num of the training set are obtained, and the batch data size batch _ size and the class number class _ num can be obtained according to the output of the face recognition model. In this embodiment, the face recognition model is constructed based on the residual neural network ResNet50, and the ith batch processing (batch) feature vector x_iThe size of (is _ size, embedding _ size); w is equivalent to the weight of the fully-connected layer, which is learnable, the size of W is (embedding _ size, class _ num), and the class center vector W_jIs (embedding, 1).

In one specific example, the training set uses a casca-webface, the optimizer makes SGD, lr 1e-1, weight _ decay 5e-4, momentum 0.9, training uses P100, batch _ size 128, and epoch 100.

Then, a 0-1 distribution is constructed based on the random numbers, a parameter vector selected _ vector is generated, and the parameter vector selected _ vector is registered as a fixed vector. Specifically, a random seed is taken, and an average distribution selection of 0 and 1 is made to generate a parameter vector selected _ vector of class _ num, which can be implemented by using a Pytorch (an open-source Python machine learning library), for example, and the parameter vector selected _ vector is registered as a buffer, because the parameter vector selected _ vector is a parameter that does not need to be learned and only needs to be fixed in the whole training process.

Because whether two class center vectors are adjacent or not cannot be determined, a random method is adopted, the number of classes is more, generally more than 1 ten thousand, and the random method can also achieve a better effect.

The fixed scaling of ArcFace loss is then adjusted based on the parameter vector. In one implementation, the value may be expressed by the formula "filtered _ S ═ S + selected _ vector × (1-cos θ ×) 2 × (S + 2 ×)_j) "obtain the adjustable scaling vector scaled _ s. The number of columns of the scalable vector scaled _ s is class number class _ num, and the size of the class number class _ num is (1). The fixed scale S is referred to the experience of ArcFace and takes a value of 64. Theta_jIs the angle between the class-center vectors of two adjacent classes, e.g. class-center vector w_iAnd a class center vector w_jThe included angle of (a). Reference to the experience in ArcFace, θ_jThe value may be 71.61 ÷ 360 × 2 pi radians. In another implementation, the value may be expressed by the formula "filtered _ S + selected _ vector S2 (1-cos (θ)_yi+ m, mean (dim-0)) "adjust the fixed scaling of the face recognition loss function, where cos (θ)_yi+ m) is the result after the angular margin has been added, with a size of (batch _ size, class _ num). I.e., cos (θ)_yi+ m) is a cosine value of the sum of an angle between the current eigenvector and the center vector of the target class and an angle interval value, the cosine value is a matrix with the number of rows as batch data and the number of columns as class number, the current eigenvector is obtained according to the output of the face recognition model, that is, the current output eigenvector x_i. mean (dim ═ 0) means that the 0 th dimension is averaged.

FIG. 3 illustrates the principle of obtaining an adjustable scaling vector in an embodiment, which is illustrated with reference to FIG. 3, with two adjacent class-center vectors w_iAnd w_jFor example. Class center vector w_iEquivalent to ob, class center vector w_jCorresponding to oc, the result of training the face recognition model is to let the feature vector x_iClose to a certain class and far away from anotherA class, e.g. near class centre vector w_iDistance class center vector w_jAfter training feature vector x_iIs relatively close to the class center vector wi, so the distance between the last different classes is the class center vector w_iAnd a class center vector w_jThe distance between these two different class-center vectors, i.e. the distance between ob and oc, is assumed to be the other class-center vector w_kThis corresponds to oa.

The distance between the class center vectors in the original ArcFace is ab or bc, and if ab is bc, only the distance between the adjacent class center vectors is considered because a and c are not adjacent, and the distance between the class center vectors is ab or bc. Now delaying the ob vector by oe, the distance between the center vectors of neighboring classes is ae or ce, note that a and c also become adjacent, so the distance between the center vectors of neighboring classes is ae, ce or ac. If all three edges are equal, and ab bc has been assumed before, be ab or be bc is needed, so that the distance between all neighboring class center vectors is increased, i.e. the class center vector w is increased_iAnd w_jThe distance between them.

Meanwhile, it is noted that, during model training, the extracted face features are expressed as feature vectors x_iRather than the class center vector w_i. But average feature vector x_i(i.e., embedding feature centers) and class center vector w_iThe cross entropy loss function can also lead the feature vector x to be smaller in the training process_iAnd a class center vector w_iAre close in length, that is to say the eigenvector x_iWill also be different, so that the feature vector x_iThe distance between the two adjacent human faces is larger, namely the distance for extracting the embedded features is larger, so that the discrimination of the model is enhanced, and the face recognition effect is improved.

For the purpose of setting different lengths of class center vectors of different classes, the present embodiment changes the fixed scaling S into an adjustable scaling vector consisting of two different values, each element in the adjustable scaling vector represents a class, the smaller value of which can be set by training experience, such as 64, and the larger value of which can be set by adding up the larger value“2*(1-cosθ_j) "or" 2 (1-cos (theta))_yiMean (dim-0)) "is implemented, thereby adding an additional benefit, namely the distance between class center vectors, and finally the feature vector x, on the basis of ArcFace_iThe distance between them.

It is further noted that due to scaling S_j＝|x_i|*|w_jAlthough the objective is to set the class center vector w of different lengths_jBut in practice it is | x_i|*|w_jThis difference does not affect the scheme effect. Specifically, because ArcFace loss makes the feature vector x_iAnd a class center vector w_jTend to be uniform, i.e. the eigenvector x_iAnd a class center vector w_jThe angle between the two points is relatively small, and the cross entropy loss function can automatically make the value of the output point of the neural network Softmax labeled as the true corresponding position larger, so that the cross entropy loss function is smaller. While making the Softmax output larger adds the eigenvector x_iAnd a class center vector w_jIn addition to the angle of (2), another training direction is to make the feature vector x_iAnd a class center vector w_jIs the same, i.e. the scaling S is set_jCan achieve setting | x simultaneously_iI and | w_jThe effect of | x is also realized in the process_iThe size of | is set.

On the other hand, assuming that the lengths of different column vectors are set specifically when the weight W is initialized, there is no influence on Arcface, but for the present invention, since setting the lengths of different column vectors is equivalent to multiplication, which is interchangeable, feature rescaling is equivalent to restoring the lengths of different column vectors of the weight W, i.e. the vector rescaling can be logically treated as being only for the column vectors in the weight W and not for the feature x.

After obtaining the adjustable scaling vector scaled _ s, the adjustable scaling vector scaled _ s is repeated to the batch data size batch _ size, that is, the batch data size batch _ size is repeated by rows to obtain a scaling matrix scaled with a size of (batch _ size, class _ num)_{_}s’。

Then, the scaling matrix filtered _ s' and the cosine value cos ((S))θ_yi+ m) dot product, completing rescaling. Specifically, by the formula "filtered-s'. cos (θ)_yi+ m) "rescales the output of the face recognition model, where cos (θ)_yi+ m) as mentioned above, the cosine value is the sum of the angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows as the batch data amount and the number of columns as the class number, and the current feature vector is obtained according to the output of the face recognition model.

And finally, importing the rescaled output into logistic regression Softmax and a cross entropy loss function to finish forward propagation. Specifically, according to the output after rescaling, the prediction probability of the face recognition model is obtained through logistic regression Softmax, and then the cross entropy loss is calculated by using the group Truth (real data, namely target probability) and the One Hot Vector (One-Hot code), namely the difference value between the prediction probability and the target probability is obtained, so that the face recognition model is supervised and trained until the face recognition model converges to a training set.

Fig. 4 shows a spatial comparison between the face recognition model trained by the above embodiment and the feature distribution of the conventional ArcFace, and referring to fig. 4, the features of the conventional ArcFace are distributed on the hypersphere 410 with the same radius, and the features of the face recognition model of the present invention are distributed on the hypersphere 420 with different radii, so that the euclidean distance between the feature vector and the class center vector is larger.

In summary, the present invention generates an adjustable scaling vector capable of increasing the distance between class center vectors by adjusting the fixed scaling of the face recognition loss function, thereby overcoming the drawback of fixed rescaling of features, and instead, the present invention increases the class center vector w by rescaling two different lengths of features of different classification categories_iAnd w_jAnd let the characteristic vector x_iAnd x_jThe Euclidean distance between the face recognition models is increased, namely the distance of the feature vectors is farther, so that the recognition accuracy of the face recognition models is improved.

The invention also provides a training device based on the training method described in the above embodiment, and fig. 5 shows the main modules of the training device of the face recognition model in the embodiment. Referring to fig. 5, the training apparatus 500 for a face recognition model in the present embodiment mainly includes: an initial data acquisition module 510 configured to acquire a batch processing data size of the face recognition model and a category number of the training set; a parameter vector generation module 520 configured to construct 0-1 distribution based on random numbers, and generate parameter vectors with the number of columns as the number of categories; a scaling adjustment module 530 configured to adjust a fixed scaling of the face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtain a scaling matrix with rows as batch data amount and columns as category number according to the adjustable scaling vector; a feature rescaling module 540 configured to rescale the output of the face recognition model using a scaling matrix; and a supervised training module 550 configured to supervise train the face recognition model based on the rescaled output.

The training device of the embodiment can generate the adjustable scaling vector capable of increasing the distance between the class center vectors by adjusting the fixed scaling of the face recognition loss function, and the Euclidean distance between the feature vectors is increased, namely, the feature vectors are farther away, so that the recognition accuracy of the face recognition model is improved.

The embodiment of the present invention further provides an electronic device, which includes a processor and a memory, where the memory stores executable instructions, and the processor is configured to execute the steps of the training method for a face recognition model in the foregoing embodiment by executing the executable instructions.

As described above, the electronic device of the present invention can generate an adjustable scaling vector that can increase the distance between the class center vectors by adjusting the fixed scaling of the face recognition loss function, and increase the euclidean distance between the feature vectors, that is, the feature vectors are farther apart, thereby improving the recognition accuracy of the face recognition model.

Fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present invention, and it should be understood that fig. 6 only schematically illustrates various modules, and these modules may be virtual software modules or actual hardware modules, and the combination, the splitting, and the addition of the remaining modules of these modules are within the scope of the present invention.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.

The electronic device 600 of the present invention is described below with reference to fig. 6. The electronic device 600 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.

Wherein the storage unit stores a program code, which can be executed by the processing unit 610, so that the processing unit 610 performs the steps of the training method of the face recognition model described in the above embodiments. For example, processing unit 610 may perform the steps shown in fig. 1.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include programs/utilities 6204 including one or more program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700, and the external devices 700 may be one or more of a keyboard, a pointing device, a bluetooth device, and the like. The external devices 700 enable a user to interactively communicate with the electronic device 600. The electronic device 600 may also be capable of communicating with one or more other computing devices, including routers, modems. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

The embodiment of the present invention further provides a computer-readable storage medium for storing a program, and when the program is executed, the steps of the training method for a face recognition model described in the above embodiment are implemented. In some possible embodiments, the various aspects of the present invention may also be implemented in the form of a program product, which includes program code for causing a terminal device to perform the steps of the training method for a face recognition model described in the above embodiments, when the program product is run on the terminal device.

As described above, the computer-readable storage medium of the present invention can generate an adjustable scaling vector that can increase the distance between the class center vectors by adjusting the fixed scaling of the face recognition loss function, and increase the euclidean distance between the feature vectors, that is, the feature vectors are farther apart, thereby improving the recognition accuracy of the face recognition model.

Fig. 7 is a schematic structural diagram of a computer-readable storage medium of the present invention. Referring to fig. 7, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of readable storage media include, but are not limited to: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device, such as through the internet using an internet service provider.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A training method of a face recognition model is characterized by comprising the following steps:

obtaining batch processing data volume of the face recognition model and the category number of a training set;

constructing 0-1 distribution based on random numbers, and generating a parameter vector with the column number as the category number;

adjusting a fixed scaling of a face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtaining a scaling matrix with the row number as the batch processing data volume and the column number as the category number according to the adjustable scaling vector;

rescaling the output of the face recognition model by adopting the scaling matrix; and

and based on the output after rescaling, supervising and training the face recognition model.

2. The training method of claim 1, wherein the face recognition model is constructed based on a deep convolutional neural network, and the face recognition loss function is an additive angular interval loss function.

3. The training method of claim 1, wherein the step of adjusting the fixed scaling of the face recognition loss function based on the parameter vector comprises:

altered_s＝S+selected_vector*S*2*(1-cosθ_j)，

wherein filtered _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of classes, S is the fixed scaling, selected _ vector is the parameter vector, θ is_jThe included angle of the class center vectors of two adjacent classes is obtained according to the output of the face recognition model.

4. The training method of claim 1, wherein the step of adjusting the fixed scaling of the face recognition loss function based on the parameter vector comprises:

altered_s＝S+selected_vector*S*2*(1-cos(θ_yi+m).mean(dim＝0))，

wherein alternate _ S is the adjustable scaling vector, the number of columns of the adjustable scaling vector is the number of categories, S is the fixed scaling, selected _ vector is the parameter vector, cos (θ)_yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.

5. A training method as claimed in claim 3 or 4, wherein the fixed scaling takes the values: and S is 64, and the included angle between the class center vectors of the two adjacent classes is: theta_j71.61 ÷ 360 × 2 pi radians.

6. The training method of claim 1, wherein the step of constructing a 0-1 distribution based on random numbers, and generating the parameter vector having the number of columns as the number of classes comprises:

taking a random seed to carry out 0 and 1 average distribution selection to generate the parameter vector; and

registering the parameter vector as a fixed vector.

7. The training method of claim 1, wherein rescaling the output of the face recognition model with the scaling matrix comprises:

altered_s’*cos(θ_yi+m)，

wherein the scaled _ s' is the scaling matrix formed by repeating the batch data size row by the adjustable scaling vector, cos (θ)_yi+ m) is a cosine value of the sum of an included angle between the current feature vector and the target class center vector and an angle interval value, the cosine value is a matrix with the number of rows being the batch processing data amount and the number of columns being the class number, and the current feature vector is obtained according to the output of the face recognition model.

8. The training method of claim 1, wherein the step of supervised training of the face recognition model based on the rescaled output comprises:

obtaining the prediction probability of the face recognition model through logistic regression Softmax according to the rescaled output; and

and calculating a difference value between the prediction probability and the target probability based on a cross entropy loss function, and performing supervision training on the face recognition model until the face recognition model converges on the training set.

9. A training device for a face recognition model is characterized by comprising:

the initial data acquisition module is configured to acquire the batch processing data volume of the face recognition model and the category number of the training set;

the parameter vector generation module is configured to construct 0-1 distribution based on random numbers and generate parameter vectors with the number of columns being the number of categories;

a scaling adjustment module configured to adjust a fixed scaling of the face recognition loss function according to the parameter vector to obtain an adjustable scaling vector, and obtain a scaling matrix having a row number as the batch processing data amount and a column number as the category number according to the adjustable scaling vector;

a feature rescaling module configured to rescale the output of the face recognition model using the scaling matrix; and

and the supervision training module is configured to supervise and train the face recognition model based on the rescaled output.

10. An electronic device, comprising:

a processor;

a memory having stored therein executable instructions of the processor;

wherein the processor is configured to perform the steps of the training method of a face recognition model according to any one of claims 1 to 8 via execution of the executable instructions.

11. A computer-readable storage medium storing a program, characterized in that the program, when executed, implements the steps of a training method of a face recognition model according to any one of claims 1 to 8.