CN117557888B

CN117557888B - Face model training method and face recognition method based on metric learning

Info

Publication number: CN117557888B
Application number: CN202410046717.3A
Authority: CN
Inventors: 郑文先; 杨文明; 廖庆敏
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2024-01-12
Filing date: 2024-01-12
Publication date: 2024-04-12
Anticipated expiration: 2044-01-12
Also published as: CN117557888A

Abstract

A training method and face recognition method of face model based on metric learning, its training process includes the updating process of the iterative weight parameter, in the first training stage, weight parameter and face characteristic pair of the full-precision floating point number based on full-precision floating point number train; in the second training stage, training is carried out based on fixed-point weight parameters and face feature pairs of full-precision floating point numbers; the method comprises the steps of carrying out fixed-point treatment on weight parameters of full-precision floating point numbers, keeping the output of face feature pairs to be the full-precision floating point numbers, processing sample face pairs through the fixed-point weight parameters to obtain the face feature pairs, calculating gradients required by a back propagation algorithm according to loss values of the face feature pairs in a second training stage, and carrying out back propagation by utilizing the gradients to update the weight parameters of a face model to be trained; in a third training stage, training is performed based on the fixed-point weight parameters and the face feature pairs. The invention can avoid the precision loss caused by model conversion.

Description

Face model training method and face recognition method based on metric learning

Technical Field

The invention relates to a deep learning and face recognition technology, in particular to a face model training method and a face recognition method based on metric learning.

Background

Currently, a new technological revolution and industrial revolution are sprouting, the formation of big data, the innovation of theoretical algorithm, the improvement of computing power and the evolution of network facilities drive the development of artificial intelligence into a new stage, and the intelligentization becomes an important direction of the development of technology and industry. Face recognition technology is a very demanding technology in many application fields, such as public security field, finance field, city management field, business retail field, etc., and has very wide application scenes. In the development process of decades, the method sequentially goes through a geometric feature recognition algorithm stage, a Principal Component Analysis (PCA) algorithm stage, a probability analysis PCA (Probabilistic PCA) or Kernel principal component analysis (Kernel PCA) stage and the like, and finally goes into a face recognition technology stage based on deep learning.

After the face recognition technology enters the deep learning stage, key indexes such as the face recognition technology, the accuracy rate and the recall rate are greatly improved, so that the face recognition technology can be widely applied to more scenes, but the face recognition technology still has a plurality of problems in application, and particularly when the face recognition technology is applied to urban-scale application scenes, the problem of algorithm accuracy rate and the problem of algorithm calculation cost exist. Firstly, the accuracy of the algorithm is the premise that the face algorithm can be applied on a large scale, and in public safety scenes, civilian application scenes such as community parks and the like, financial payment and other scenes, the accuracy of the face algorithm is required to be higher and higher, and the requirements of partial scenes to be more than 99.9% are met. And the algorithm calculation cost directly determines how much economic and social benefits can be generated by the face algorithm in the industrialized application process under a certain fixed investment scale, and the continuously optimized algorithm calculation cost is an important driving force for realizing wide construction of the face algorithm in various industries and areas and counties and levels of most areas of the country in the application breadth. The optimization of the traditional algorithm calculation cost is to train a face model on a server, fix the trained face model, and deploy the fixed face model in the embedded equipment so as to adapt to low calculation force and low storage capacity of the embedded equipment. However, the trained face model is subjected to localization, so that the accuracy of the face model is partially lost, the accuracy of the face model is reduced, and the accuracy of face recognition is further reduced.

It should be noted that the information disclosed in the background section is only for understanding the background of the present application and thus may include information that does not form the prior art that is already known to one of ordinary skill in the art.

Disclosure of Invention

The invention mainly aims to overcome the defects in the background technology and provides a face model training method and a face recognition method based on metric learning.

In order to achieve the purpose, the invention adopts the following technical scheme:

a face model training method based on metric learning comprises the following steps:

s1, constructing a data set: the data set comprises a sample face pair, wherein the sample face pair comprises a first sample face image and a second sample face image;

s2, constructing a face model to be trained: the constructed face model to be trained is a deep convolutional neural network model; the initial face model to be trained is a full-precision floating point number model;

s3, training process: extracting a face feature pair of the sample face pair through weight parameters in a feature extraction layer, wherein the face feature pair comprises a first face feature corresponding to a first sample face image and a second face feature corresponding to a second sample face image; calculating a measurement distance between the first face feature and the second face feature, and calculating a loss value of the face feature pair through a preset loss function; back propagation processing is carried out on the loss value of the face feature pair to update the weight parameter of the face model to be trained;

The training process comprises an updating process of iteration weight parameters, and the updating process of the iteration weight parameters comprises the following steps:

s41, training based on weight parameters of the full-precision floating point number and face feature pairs of the full-precision floating point number in a first training stage;

s42, training based on the fixed-point weight parameters and the face feature pairs of the full-precision floating point number in a second training stage; the method comprises the steps of carrying out fixed localization or gradual localization on weight parameters of the full-precision floating point number, wherein the fixed localization is the localization weight parameters for converting the weight parameters of the full-precision floating point number into preset decimal point numbers, and the gradual localization is the localization weight parameters for sequentially reducing the weight parameters of the full-precision floating point number from the maximum decimal point number until the minimum decimal point number is reached, so as to obtain final localization weight parameters; the method comprises the steps of carrying out fixed-point treatment on weight parameters of full-precision floating point numbers, keeping the output of face feature pairs to be the full-precision floating point numbers, processing sample face pairs through the fixed-point weight parameters to obtain face feature pairs in a second training stage, calculating gradients required by a back propagation algorithm according to loss values of the face feature pairs in the second training stage, and carrying out back propagation by utilizing the gradients to update the weight parameters of a face model to be trained;

S43, training based on the fixed-point weight parameters and the fixed-point face feature pairs in a third training stage.

Further:

the step S41 specifically includes the following steps:

performing feature extraction on the first sample face image and the second sample face image through a feature extraction layer of the full-precision floating point number to obtain a first face feature corresponding to the first sample face image and a second face feature corresponding to the second sample face image under the full-precision floating point number;

calculating a measurement distance between the first face feature and the second face feature;

according to the measurement distance between the first face feature and the second face feature, calculating a loss value of the face feature pair through a preset loss function;

and calculating the gradient of the face model to be trained according to the loss value of the face feature pair, and updating the weight parameter of the face model to be trained based on the gradient of the face model to be trained.

The step S42 specifically includes the following steps:

s421, acquiring gradients of the current iteration times, and determining a reference gradient value in each gradient of the current iteration times; the method comprises the steps of calculating to obtain gradients of all layers in the current iteration times according to loss values of face feature pairs in the current iteration times, searching a maximum value or calculating a global average value in the gradients of all layers, and determining the maximum value or the global average value as a reference gradient;

S422, comparing the reference gradient value with a first gradient threshold value, and determining whether to enter a first training stage or a second training stage according to a comparison result; in the second training stage, the weight parameters of the full-precision floating point number are subjected to fixed-point treatment, and the sample face pairs are processed through the fixed-point weight parameters so as to obtain face feature pairs; calculating a loss value of the face feature pair in the second training stage, and calculating a gradient according to the loss value; carrying out fixed-point treatment on the gradient, and then carrying out back propagation by utilizing the fixed-point gradient to update the weight parameters of the face model to be trained;

s423, comparing the reference gradient value with a second gradient threshold value, if the reference gradient value is larger than the second gradient threshold value, continuing the second training phase, and if the reference gradient value is smaller than or equal to the second gradient threshold value, entering a third training phase; wherein the second gradient threshold is less than the first gradient threshold.

Step S422 specifically includes the following steps:

s4221, carrying out fixed-point treatment on the weight parameters of the full-precision floating point number, keeping the output of the face feature pairs to be the full-precision floating point number, and processing the sample face pairs through the fixed-point weight parameters to obtain the face feature pairs in the second training stage;

S4222, calculating a loss value of the face feature pair in the second training stage through the loss function;

s4223, calculating the precision loss between the face feature pair of the second training stage in the current iteration number and the face feature pair in the previous iteration number;

s4224, calculating a loss value of a face feature pair in a second training stage and a total loss value between the accuracy loss of the face feature pair in the second training stage in the current iteration number and the accuracy loss of the face feature pair in the previous iteration number, and adjusting weight parameters of the face recognition model to be trained by using a back propagation algorithm with the minimum total loss value as an optimization target;

s4225, iterating the weight parameter adjustment process in the second training phase until entering a third training phase; under the condition that the face feature pair is kept to be the full-precision floating point number, the fixed weight parameters are adjusted, so that the face feature pair of the full-precision floating point number can be output under the condition that the weight parameters are fixed by the face recognition model.

In step S4223, the face feature pair in the previous iteration number is the face feature pair in the first training stage or the second training stage; determining the maximum precision value or average precision value of the face feature pair in the second training stage in the current iteration number, and determining the maximum precision value or average precision value of the face feature pair in the previous iteration number, wherein the maximum precision value is the maximum number of mantissas in the full-precision floating point number, and the average precision value is the average number of mantissas in the full-precision floating point number;

Loss of precisionThe calculation is performed by the following formula:

；

wherein,for the maximum precision value or the average precision value of the face feature pairs in the second training stage in the current iteration times,/L>For the maximum or average accuracy of the face feature pairs in the last iteration number, +.>The maximum accuracy value of the face feature pairs in the first training stage.

The step S4223 specifically includes the following steps:

s42231, the weight parameter localization mode is gradual localization, after the precision loss between the face feature pair in the second training stage in the current iteration number and the face feature pair in the previous iteration number is calculated, if the precision loss is smaller than or equal to a preset precision loss threshold value, the number of bits after decimal points of the weight parameter subjected to localization is reduced by a preset number of bits; if the precision loss is larger than a preset precision loss threshold, the number of bits after decimal points of the fixed-point weight parameters is maintained;

s42232, stopping the stepwise localization of the localized weight parameter when the number of bits after the decimal point of the localized weight parameter decreases to the minimum number of bits.

The precision loss threshold is dynamically set, and the precision loss threshold Pth is calculated according to the following formula:

；

wherein, For the minimum number of bits after the decimal point in the fixed-point weight parameter,/for the weight parameter>For the maximum number of bits after the decimal point in the fixed-point weight parameter, < >>For the number of current decimal places in the weighted parameters of the localization,/, -, for example>For the number of iterations at the current decimal place, +.>Is the number of iterations at the ith decimal point in the second training phase.

In step S4224, the total loss is calculated according to the following equation:

；

wherein,for total loss->Loss value for face feature pair, +.>For loss of precision, said->For the loss of weight parameters, said +.>The value range of (5) is [0.5,0.9 ]]And->Positively correlated with the number of iterations in the current decimal place in the second training phase, the greater the number of iterations, the +.>The larger the number of iterations at the current decimal point in the second training phase increases.

The step S43 specifically includes the following steps:

s431, fixing the fixed weight parameters obtained in the second training stage, and adding Lp norm operation in the active layer to obtain an LP active layer;

s432, processing the face feature pair of the upper layer through the LP activation layer to obtain a face feature pair after localization; adding LpActination operation to the output of each layer of activation function in the whole face recognition model to be trained, recording the maximum absolute value of the face feature pair output of the last layer, taking the maximum absolute value as a fixed-point face feature pair, and realizing the fixed-point of the face feature pair, or gradually fixing the face feature pair through an LP activation layer;

S433, calculating a loss value of the face feature pair after the localization, taking the minimum loss value of the face feature pair after the localization as an optimization target, and updating LP activation layer parameters in the face model to be trained through a back propagation algorithm; iterating the updating process of the LP activation layer parameters until the iteration times reach the preset times, and stopping training to obtain a trained face recognition model; in the process of updating the iterative LP activation layer parameters, if the face feature pairs are stepwise spotted, the face feature pairs and/or gradients are also continuously spotted until training is stopped.

A face recognition method, comprising the steps of:

acquiring a face image to be identified;

inputting the face image to be recognized into the face model trained by the face model training method based on measurement learning to perform feature extraction processing;

comparing the face features of the extracted face image to be identified with the face features in the base;

and obtaining the recognition result of the face image to be recognized according to the face comparison result.

The invention has the following beneficial effects:

the invention provides a face model training method based on measurement learning, which adopts a measurement learning mode to constantly fix the points of the face model in the training process, can gradually reduce the parameter data quantity of the face model under the condition of ensuring the accuracy of the face model, and can obtain the fixed-point face model after the training is finished, thereby being convenient for deploying the face model in embedded equipment. Therefore, the invention avoids the problems that the precision part of the face model is lost, the precision of the face model is reduced, and the face recognition accuracy is further reduced when the trained face model is subjected to the localization by the traditional localization scheme.

According to the invention, the fixed-point neural network suitable for the operation of the embedded equipment is directly trained, so that the problem that the fixed-point number expression capability is limited in the whole training process of the model can be considered, and the precision loss caused by the model conversion is avoided.

Other advantages of embodiments of the present invention are further described below.

Detailed Description

The following describes embodiments of the present invention in detail. It should be emphasized that the following description is merely exemplary in nature and is in no way intended to limit the scope of the invention or its applications.

Metric learning: in the model training process, the distance between the positive sample pairs is made as small as possible, and the distance between the negative sample pairs is made as large as possible.

Positive samples: the same type of image is a positive sample pair, such as the same face pair.

Negative sample: different types of images are negative example pairs, such as different face pairs.

And (3) localization: the floating point number is converted into a fixed point number, and a fixed number of numerical values after decimal points are reserved, for example, three digits after decimal points are reserved, so that the data size of model parameters is reduced, and the accuracy of a part of the model is lost.

Floating point number: (+/-)d.dd...d × β ^e Where d.dd..d. is mantissa, β is radix and e is exponent. For example, 123.45 may be expressed as 12.345 ×10 ¹ ，0.12345 × 10 ³ Or 1.2345 ×10 ² . At 12.345X 10 ¹ 12.345 in (2) is mantissa, 10 is radix, and 1 is exponent. At 0.12345X 10 ³ 0.12345 in (2) is mantissa, 10 is radix, and 3 is exponent. In floating point numbers, the number of mantissa bits may represent the precision of the floating point number. In the full-precision floating point number, the right side of the decimal point is not 0, the left side of the decimal point is 1 or 0, the left side of the decimal point is 1 and represents positive numbers, and the left side of the decimal point is 0 and represents negative numbers.

The embodiment of the invention provides a face model training method based on metric learning, which comprises the steps of constructing a data set, constructing a face model to be trained and a training process;

s42, training based on the fixed-point weight parameters and the face feature pairs of the full-precision floating point number in a second training stage; the method comprises the steps of carrying out fixed localization or gradual localization on weight parameters of the full-precision floating point number, wherein the fixed localization is the localization weight parameters for converting the weight parameters of the full-precision floating point number into preset decimal point numbers, and the gradual localization is the localization weight parameters for sequentially reducing the weight parameters of the full-precision floating point number from the maximum decimal point number until the minimum decimal point number is reached, so as to obtain final localization weight parameters; the method comprises the steps of carrying out fixed-point treatment on weight parameters of full-precision floating point numbers, keeping the output of face feature pairs to be the full-precision floating point numbers, processing sample face pairs through the fixed-point weight parameters to obtain the face feature pairs, calculating gradients required by a back propagation algorithm according to loss values of the face feature pairs, and carrying out back propagation by utilizing the gradients to update the weight parameters of a face model to be trained;

The embodiment of the invention also provides a face recognition method, which comprises the following steps:

acquiring a face image to be identified;

Specific embodiments of the present invention are described further below.

A face model training method based on metric learning adopts metric learning to train a face model to be trained, and the method specifically comprises the following steps: constructing a data set, constructing a face model to be trained and a training process.

S1, constructing a data set: the data set includes a sample face pair including a first sample face image and a second sample face image, the sample face pair being divisible into a positive sample face pair including the first sample face image and the second sample face image labeled as the same person and a negative sample face pair including the first sample face image and the second sample face image labeled as different persons.

S2, constructing a face model to be trained: the face model to be trained is a deep convolutional neural network model and comprises a feature extraction layer, an activation layer and an output layer, wherein each feature extraction layer is connected with the activation layer. The initial face model to be trained is a full-precision floating point number model.

S3, training process: in the training process, a face feature pair of a sample face pair is extracted through weight parameters in a feature extraction layer, the face feature pair comprises a first face feature corresponding to a first sample face image and a second face feature corresponding to a second sample face image, a measurement distance between the first face feature and the second face feature is calculated, a loss value of the face feature pair is calculated through a preset loss function, and back propagation processing is carried out by utilizing the loss value of the face feature pair so as to update the weight parameters of a face model to be trained.

The preliminary training process specifically includes the following steps S31 to S34:

s31, performing feature extraction on the first sample face image and the second sample face image through a feature extraction layer of the full-precision floating point number to obtain a first face feature corresponding to the first sample face image and a second face feature corresponding to the second sample face image under the full-precision floating point number.

S32, calculating a measurement distance between the first face feature and the second face feature, wherein the measurement distance can be Euclidean distance or cosine distance, and the larger the measurement distance is, the smaller the probability that the first face feature and the second face feature are the same person is, and the smaller the measurement distance is, the larger the probability that the first face feature and the second face feature are the same person is. The metric distance may be d _a,b Representing a first face feature, b a second face feature, d _a,b Is a measured distance of the first face feature a from the second face feature b.

S33, calculating a loss value of the face feature pair through a preset loss function according to the measurement distance between the first face feature and the second face feature. For the sample face pair, the larger the measurement distance is, the more dissimilar the first sample face image and the second sample face image in the sample face pair are, and the smaller the measurement distance is, the more similar the first sample face image and the second sample face image in the sample face pair are.

Specifically, the loss function may be represented by the following equation:

（1）

wherein the method comprises the steps ofLoss value for face feature pair, +.>For the metric distance of the first face feature a and the second face feature b, +. >Label for sample face pair, when +.>When 1, the sample face pair is the positive sample face pair, when +.>When 0, the sample face pair is a negative sample face pair, and the sample face pair is a negative sample face pair>For a preset measurement threshold, for measuring whether the measurement distance of the two-element under the positive sample pair is small enough or whether the measurement distance of the two-element under the negative sample pair is large enough,/>The setting may be performed empirically, and may be set to 0.35 to 0.4. If the sample face pair is a positive sample face pair, < >>1, the loss value of the face feature pair obtained by the formula (1) is +.>If the sample face pair is a negative sample face pair, < ->Is 0, and the loss value of the face feature pair obtained by the formula (1) is +.>。/>For a range function, particularlyWhen->Is greater than->When (I)>The value is 0, when->Less than->In the time-course of which the first and second contact surfaces,the value is +.>。

S34, calculating the gradient of the face model to be trained according to the loss value of the face feature pair, and updating the weight parameter of the face model to be trained based on the gradient of the face model to be trained. The weight parameter is the weight parameter of the feature extraction layer. The gradient refers to a gradient of a weight parameter of a certain layer in a face model to be trained, the gradient is in a Matrix form, each numerical value in the Matrix represents a derivative of the weight of the loss on a certain connection of the layer, and the Matrix of the gradient can also be called as Jacobian Matrix. The weight parameter is W, and the matrix form of the gradient is as follows:

Wherein w is a weight parameter on a certain connection, b is a bias parameter, and L is a loss value.

In the back propagation process, each time a node is encountered, the calculated upstream gradient (the gradient calculated from the back) is multiplied by the local gradient of the node according to the chain derivative rule, and the gradient of the loss function value on the node weight parameter can be obtained. After the gradient of each layer is obtained, the weight parameters of the model to be trained are updated through the gradient of each layer.

The training process comprises the following updating process of iteration weight parameters:

in the training process, through the updating process of the iteration weight parameters, the training is stopped until the iteration times reach the preset times, and a trained face recognition model is obtained. And in the updating process of the iteration weight parameters, the weight parameters and the characteristic values are constantly subjected to fixed-point treatment until training is stopped. The updating process of the weight parameters comprises forward calculation of the face feature pairs and reverse calculation of the weight parameters, wherein the forward calculation is used for obtaining the face feature pairs, and the measurement calculation and the loss calculation are carried out to obtain the loss values of the face feature pairs, and the reverse calculation is used for carrying out parameter updating on the face recognition model to be trained by taking the minimized loss values as optimization targets so that the calculated loss values are minimum when the face recognition model is subjected to the forward calculation.

The updating process of the iteration weight parameter includes steps S41, S42 and S43, specifically includes:

s41, in a first training stage, training can be performed based on the weight parameters of the full-precision floating point number and the face feature pairs of the full-precision floating point number. With specific reference to steps S31-S34.

S42, in a second training stage, training can be performed based on the fixed-point weight parameters and the face feature pairs of the full-precision floating point number. In this step, the weighting parameters of the full-precision floating point number need to be fixed-point, or stepwise fixed-point. Fixed localization can be understood as a localization weight parameter that converts the weight parameter of a full precision floating point number into a preset decimal point number of digits. Stepwise localization can be understood as sequentially reducing the weight parameters of the full-precision floating point number from the maximum decimal point number until the minimum decimal point number is reached, thereby obtaining the final localization weight parameters.

Step S42 specifically includes steps S421, S422, and S423:

s421, acquiring gradients of the current iteration times, and determining a reference gradient value in each gradient of the current iteration times. Specifically, the gradient of each layer in the current iteration number can be calculated according to the loss value of the face feature pair in the current iteration number, the maximum value is found out from the gradients of each layer or the global average value is calculated, and the maximum value or the global average value is determined as the reference gradient.

S422, comparing the reference gradient value with a preset first gradient threshold value, and if the reference gradient value is larger than the preset first gradient threshold value, continuing the first training stage. If the reference gradient value is smaller than or equal to a preset first gradient threshold value, a second training stage is entered. It should be noted that, when the reference gradient value is greater than the preset first gradient threshold value, it is stated that the training of the face model to be trained has not reached the expectation of the first training stage, generally, in the initial stage of training, since the learning of the face model to be trained on the sample face pair is less, the gradient of the face model to be trained is larger, in the later stage of training, since the learning of the face model to be trained on the sample face pair is already much, the gradient of the face model to be trained will be much smaller, and finally the gradient will be close to 0, i.e. the training is close to completion, the gradient is smaller. The first gradient threshold may be set to a value in the range of 0.3-0.5, and may be set empirically.

In the second training stage, the weight parameters of the full-precision floating point number are subjected to fixed-point treatment, the output of the face feature pairs is kept to be the full-precision floating point number, the sample face pairs are processed through the fixed-point weight parameters, the face feature pairs in the second training stage are obtained, the loss value of the face feature pairs in the second training stage is calculated through the loss function, the gradient required by a back propagation algorithm is calculated according to the loss value of the face feature pairs in the second training stage, the gradient is subjected to fixed-point treatment, the gradient subjected to fixed-point treatment is obtained, and the fixed-point gradient is used for back propagation, so that the weight parameters in the face model to be trained are updated. The localization of the gradient may be the same as the localization of the weight parameters, and the current decimal point count in the localized weight parameters may be determined at each back propagation, and the gradient may be localized with the current decimal point count in the localized weight parameters as the localization target. Since the gradient is only used in the back propagation during training, it is also possible to not localize the gradient. The gradient is localized, so that the generalization capability of the face recognition model can be improved.

Step S422 specifically includes steps S4221, S4222, S4223, S4224, and S4225.

S4221, in the second training stage, the weight parameters of the full-precision floating point number are subjected to fixed-point treatment, the output of the face feature pairs is kept to be the full-precision floating point number, and the sample face pairs are processed through the fixed-point weight parameters, so that the face feature pairs in the second training stage are obtained.

S4222, calculating the loss value of the face feature pair in the second training stage through the loss function.

S4223, calculating the precision loss between the face feature pair of the second training stage in the current iteration number and the face feature pair in the last iteration number. Specifically, the face feature pair in the previous iteration number may be the face feature pair in the first training stage or the face feature pair in the second training stage. And determining the maximum precision value or the average precision value of the face feature pair in the second training stage in the current iteration number, wherein the maximum precision value can be the maximum number of digits of the mantissa in the full-precision floating point number, and the average precision value can be the average number of digits of the mantissa in the full-precision floating point number. And determining the maximum precision value or the average precision value of the face feature pair in the last iteration number, wherein the maximum precision value can be the maximum number of digits of the mantissa in the full-precision floating point number, and the average precision value can be the average number of digits of the mantissa in the full-precision floating point number. It should be noted that, because the weight parameters are fixed-point, the number of bits of the weight parameters after the decimal point is a fixed value, and part of the precision from the weight parameters is lost in the calculation process, but because the calculation process includes multiplication and addition, the precision of the face feature pair can be ensured by adjusting the fixed-point weight parameters.

The loss of precision can be calculated by the following equation:

wherein the saidFor the maximum or average accuracy of the face feature pairs of the second training stage in the current iteration number, the +.>For the maximum or average accuracy of the face feature pairs in the last iteration number, +.>The maximum accuracy value of the face feature pairs in the first training stage.

Step S4223 specifically includes steps S42231 and S42232.

S42231, the weight parameter localization mode is gradual localization, after the precision loss between the face feature pair in the second training stage in the current iteration number and the face feature pair in the previous iteration number is calculated, if the precision loss is smaller than or equal to a preset precision loss threshold value, the number of bits after decimal points of the weight parameter subjected to localization is reduced by a preset number of bits. If the precision loss is larger than a preset precision loss threshold value, the number of bits behind the decimal point of the fixed-point weight parameter is maintained. For example, if the number of bits after the decimal point of the weight parameter of the set point is 16, the accuracy is lostWhen the precision loss is larger than a preset precision loss threshold Pth, the bit number after the decimal point of the fixed-point weight parameter is kept to be 16, and when the precision loss is +.>And when the number of bits is smaller than or equal to the preset precision loss threshold Pth, reducing the number of bits after the decimal point of the weight parameter of the localization weight to 7. The value range of the precision loss threshold Pth may be 0.2-0.3, and in a possible embodiment, the precision loss threshold Pth may be dynamically set, and specifically, the precision loss threshold Pth may be calculated according to the following formula:

Wherein the saidFor the minimum number of bits after the decimal point in the fixed-point weight parameter, the +.>Weights for localizationThe maximum number of digits after the decimal point of the heavy parameter, said +.>For the current number of decimal places in the weighted parameters of the localization, said +.>For the number of iterations at the current decimal place, +.>For the iteration times under the ith decimal point in the second training stage, it can be seen that as the iteration times of the second training stage are more, the precision loss threshold Pth is smaller, as the iteration times under the current decimal point are more, the precision loss threshold Pth is smaller, so that the fixed-point weight parameters can learn the capability of reducing the precision loss in the training process, and the precision loss of the face feature pairs can be reduced after the weight parameters are fixed.

S42232, when the number of bits after the decimal point of the weight parameter is reduced to the minimum number of bits, stopping the stepwise localization of the weight parameter. By gradually positioning, the weight parameters of the full-precision floating point number are positioned step by step in the training process, and each positioning can improve the output precision of the full-precision floating point number face feature pair, so that the great reduction of the precision of the face feature pair due to the weight parameter positioning is avoided.

For example, in the second training stage, the weight parameter of the full-precision floating point is first fixed to be 15 after the decimal point, the fixed-point weight parameter of the decimal point is obtained, the face feature pair of the face pair is extracted by the fixed-point weight parameter of the decimal point after the decimal point is 15, if the decimal point number of the face feature pair is reduced to 15, it is stated that the precision of the face feature pair is reduced due to the fact that the weight parameter of the full-precision floating point is fixed, there is precision loss, if the precision loss is greater than the precision loss threshold Pth, the fixed-point weight parameter of the decimal point after the decimal point is 15 is iterated, if the precision loss is less than the precision loss threshold Pth, the fixed-point weight parameter of the decimal point after the decimal point is 15 is 14, the fixed-point after the decimal point is 3, the fixed-point weight parameter of the decimal point after the decimal point is 14 is obtained, and the fixed-point after the decimal point is 3 is the fixed, the fixed-point weight parameter of the fixed-point after the decimal point is 3, and the fixed-point after the point is positioned by the decimal point after the decimal point is positioned by the weight parameter of the decimal point is 3.

S4224, calculating a loss value of the face feature pair in the second training stage and a total loss value between the accuracy loss of the face feature pair in the second training stage in the current iteration number and the face feature pair in the previous iteration number, and adjusting weight parameters of the face recognition model to be trained by using the minimum total loss value as an optimization target through a back propagation algorithm.

The total loss is calculated according to the following equation:

wherein,for total loss->Loss value for face feature pair, +.>For loss of precision, said->For the loss of weight parameters, said +.>The value range of (5) is [0.5,0.9 ]]And->Positively correlated with the number of iterations in the current decimal place in the second training phase, the greater the number of iterations, the +.>The larger the weight parameter is, the more the iteration times under the current decimal point position in the second training stage are increased, the localization of the weight parameter under the current decimal point position is carried out, the less the influence on precision loss is caused, the adjustment of the weight parameter can pay more attention to the loss value among face feature areas, and the lack of fitting or slow fitting speed of the face recognition model to be trained is avoided.

S4225, iterating the weight parameter adjustment process in the second training stage until entering a third training stage. In the second training stage, under the condition that the face feature pairs are kept to be full-precision floating points, the fixed-point weight parameters are adjusted, so that the face feature pairs of the full-precision floating points can be output under the condition that the weight parameters are fixed, and the output precision of the face recognition model is ensured.

S423, comparing the reference gradient value with a preset second gradient threshold value, and if the reference gradient value is larger than the preset second gradient threshold value, continuing the second training stage. If the reference gradient value is smaller than or equal to the preset second gradient threshold value, a third training stage is entered. The second gradient threshold is less than the first gradient threshold. It should be noted that, when the reference gradient value is greater than the preset second gradient threshold value, it is indicated that the training of the face recognition model to be trained has not yet reached the expectation of the second training stage. The second gradient threshold value may be set to a value in the range of 0.1-0.3, and may be set empirically. The face recognition model to be trained is trained through the second training stage, so that the face feature pair with higher precision can be output by the face recognition model to be trained under the condition of weight parameter fixed-point.

S43, in a third training stage, training can be performed based on the fixed-point weight parameters and the fixed-point face feature pairs. In the third training stage, the fixed-point weight parameters in the face recognition model to be trained obtained in the second training stage can be fixed, and the output characteristic face pairs are fixed in the training process.

Step S43 specifically includes steps S431, S432, S433:

s431, fixing the fixed weight parameters obtained in the second training stage, and adding Lp norm operation in the active layer to obtain the LP active layer.

S432, processing the face feature pair of the upper layer through the LP activation layer to obtain the face feature pair after localization. The operation of adding LpActification to the output of each layer of Activation function (Activation) in the whole face recognition model to be trained aims at recording the maximum absolute value of the output of the face feature pair of the last layer, and takes the maximum absolute value of the output of the face feature pair of the last layer as a fixed-point face feature pair, so that the fixed-point of the face feature pair is realized. Specifically, the LP activation layer is used for processing the first face feature output by the upper layer to obtain the maximum absolute value of the first face feature, the maximum absolute value of the first face feature is used as the first face feature after the localization, the LP activation layer is used for processing the second face feature output by the upper layer to obtain the maximum absolute value of the second face feature, the maximum absolute value of the second face feature is used as the second face feature after the localization, and the pair of the face features after the localization is obtained through the first face feature after the localization and the second face feature after the localization.

The LP activation layer can be used for gradually positioning the face feature pairs, and because the weight parameters of the positioning can output the face feature pairs of the full-precision floating point number, the LP activation layer can be used for gradually positioning the face feature pairs when the face feature pairs of the full-precision floating point number are positioned, so that the accuracy of feature extraction of the model to be trained is ensured.

S433, calculating the loss value of the face feature pair after the localization, taking the minimum loss value of the face feature pair after the localization as an optimization target, and updating the LP activation layer parameters in the face model to be trained through a back propagation algorithm. And iterating the updating process of the LP activation layer parameters until the iteration times reach the preset times, and stopping training to obtain a trained face recognition model. In the process of updating the iterative LP activation layer parameters, if the facial feature pairs are stepwise spotted, the facial feature pairs and/or gradients can be further and continuously spotted until training is stopped.

In the process of gradually pointing the face feature pairs through the LP activation layer, parameter updating is only carried out on the LP activation layer parameters, so that the face model to be trained can output the pointed face feature pairs through the updated LP activation layer parameters on the premise of keeping the accuracy of the full-precision floating point number face feature pairs. In the process of gradually fixing the face feature pairs, the basis for adjusting the fixing can be the loss value of the face feature pairs in the current iteration times L _d,t And the loss value of the face feature pair in the last iteration numberL _d,t-1 Absolute value of difference between |L _d,t - L _d,t-1 I, if youL _d,t - L _d,t-1 If i is greater than or equal to the absolute value of the preset difference (the absolute value of the preset difference can be set to a value between 0.05 and 0.1), then training needs to continue with the accuracy of the current localization, if iL _d,t - L _d,t-1 And if the I is smaller than or equal to the absolute value of the preset difference value, the current localization precision can be reduced by one bit, namely the number of bits behind the decimal point of the face feature pair is reduced, for example, 16 bits behind the decimal point are reduced to 15 bits behind the decimal point, and then the LP activation layer parameters are gradually updated, so that the LP activation layer parameters output the localization feature face pair with the minimum decimal point rear number, for example, the LP activation layer parameters output the localization feature face pair with the decimal point rear 3 bits.

Face recognition part

1. And acquiring a face image to be identified.

2. And inputting the face image to be recognized into a face recognition model for feature extraction processing to obtain the face features of the face image to be recognized. The face recognition model is obtained according to the training part.

3. And comparing the face features of the face image to be recognized with the face features in the base to obtain a face comparison result, wherein the face features in the base are also obtained by performing feature extraction processing through a face recognition model.

4. And obtaining the recognition result of the face image to be recognized according to the face comparison result.

The face recognition method can be applied to the field of identity verification, and can cover the fields of coal mines, buildings, banks, social benefit guarantee, electronic commerce, security and defense and the like.

Compared with the traditional method, the face model training method based on metric learning has the following advantages:

the traditional fixed-point scheme carries out fixed-point on the trained face model, so that the accuracy of the face model is partially lost, the accuracy of the face model is reduced, and the accuracy of face recognition is further reduced. The invention adopts the measurement learning mode to constantly fix the points of the face model in the training process, can gradually reduce the parameter data quantity of the face model under the condition of ensuring the accuracy of the face model, can obtain the fixed-point face model after the training is finished, and is convenient for deploying the face model in the embedded equipment. According to the invention, the fixed-point neural network suitable for the operation of the embedded equipment is directly trained, so that the problem that the fixed-point number expression capability is limited in the whole training process of the model can be considered, and the precision loss caused by the model conversion is avoided.

The embodiments of the present invention also provide a storage medium storing a computer program which, when executed, performs at least the method as described above.

The embodiment of the invention also provides a control device, which comprises a processor and a storage medium for storing a computer program; wherein the processor is adapted to perform at least the method as described above when executing said computer program.

The embodiments of the present invention also provide a processor executing a computer program, at least performing the method as described above.

The storage medium may be implemented by any type of non-volatile storage device, or combination thereof. The nonvolatile Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (PROM, programmable Read-Only Memory), an erasable programmable Read Only Memory (EPROM, erasableProgrammable Read-Only Memory), an electrically erasable programmable Read Only Memory (EEPROM, electricallyErasable Programmable Read-Only Memory), a magnetic random Access Memory (FRAM, ferromagneticRandom Access Memory), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a compact disk Read Only (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems and methods may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described as separate units may or may not be physically separate, and units displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments may be implemented by hardware associated with program instructions, and the foregoing program may be stored in a computer readable storage medium, which when executed, performs steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the integrated units of the invention may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as stand-alone products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

The methods disclosed in the method embodiments provided by the invention can be arbitrarily combined under the condition of no conflict to obtain a new method embodiment.

The features disclosed in the several product embodiments provided by the invention can be combined arbitrarily under the condition of no conflict to obtain new product embodiments.

The features disclosed in the embodiments of the method or the apparatus provided by the invention can be arbitrarily combined without conflict to obtain new embodiments of the method or the apparatus.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several equivalent substitutions and obvious modifications can be made without departing from the spirit of the invention, and the same should be considered to be within the scope of the invention.

Claims

1. The face model training method based on metric learning is characterized by comprising the following steps:

2. The face model training method based on metric learning as claimed in claim 1, wherein the step S41 specifically comprises the steps of:

3. The face model training method based on metric learning as claimed in claim 1 or 2, wherein the step S42 specifically comprises the steps of:

4. A face model training method based on metric learning as claimed in claim 3, wherein step S422 specifically comprises the steps of:

5. The face model training method based on metric learning of claim 4, wherein in step S4223, the face feature pair in the previous iteration number is the face feature pair of the first training stage or the second training stage; determining the maximum precision value or average precision value of the face feature pair in the second training stage in the current iteration number, and determining the maximum precision value or average precision value of the face feature pair in the previous iteration number, wherein the maximum precision value is the maximum number of mantissas in the full-precision floating point number, and the average precision value is the average number of mantissas in the full-precision floating point number;

Loss of precisionThe calculation is performed by the following formula:

；

wherein,for the maximum or average accuracy value of the face feature pairs of the second training stage in the current iteration number,for the maximum or average accuracy of the face feature pairs in the last iteration number, +.>The maximum accuracy value of the face feature pairs in the first training stage.

6. The face model training method based on metric learning as claimed in claim 5, wherein the step S4223 specifically comprises the steps of:

7. The face model training method based on metric learning of claim 6, wherein the accuracy loss threshold is dynamically set, and the accuracy loss threshold Pth is calculated according to the following equation:

；

Wherein,for the minimum number of bits after the decimal point in the fixed-point weight parameter,/for the weight parameter>For the maximum number of bits after the decimal point in the fixed-point weight parameter, < >>To fixThe number of current decimal places in the dotted weight parameter,/->For the number of iterations at the current decimal place, +.>Is the number of iterations at the ith decimal point in the second training phase.

8. The face model training method based on metric learning as claimed in claim 4, wherein in step S4224, the total loss is calculated according to the following equation:

；

9. The face model training method based on metric learning as claimed in claim 1 or 2, wherein the step S43 specifically comprises the steps of:

s431, fixing the fixed-point weight parameters obtained in the second training stage, and adding Lp norm operation in the active layer to obtain an LP active layer;

10. The face recognition method is characterized by comprising the following steps of:

acquiring a face image to be identified;

inputting the face image to be recognized into a face model trained by the face model training method based on metric learning according to any one of claims 1 to 9 for feature extraction processing;