CN115171201B

CN115171201B - Face information identification method, device and equipment based on binary neural network

Info

Publication number: CN115171201B
Application number: CN202211092937.7A
Authority: CN
Inventors: 陈鹏; 陈宇; 胡启昶; 李腾; 李发成; 张如高; 虞正华
Original assignee: Suzhou Moshi Intelligent Technology Co ltd
Current assignee: Suzhou Moshi Intelligent Technology Co ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2023-04-07
Anticipated expiration: 2042-09-08
Also published as: CN115171201A

Abstract

The application relates to a face information identification method, device and equipment based on a binary neural network, in particular to the technical field of neural networks. The method comprises the following steps: acquiring a target image, extracting an image characteristic value of the target image, and inputting the image characteristic value into a binary neural network; carrying out binary quantization processing on the image characteristic value to obtain a quantization value of the image characteristic value; determining the numerical value of a gradient adjusting coefficient according to the quantization value of the image characteristic value, wherein the gradient adjusting coefficient is used for controlling the probability of binary inversion of the weight in the binary neural network in the training process, and the binary inversion refers to that the weight corresponds to different quantization values before and after single iteration; carrying out gradient updating on the weights by using the gradient adjusting coefficient to obtain the weights after the training is finished, wherein the weights are used for forming a binary neural network after the training is finished; and inputting the image to be recognized into the trained binary neural network, and recognizing the face information in the image to be recognized.

Description

Face information identification method, device and equipment based on binary neural network

Technical Field

The invention relates to the technical field of neural networks, in particular to a face information identification method, a face information identification device and face information identification equipment based on a binary neural network.

Background

Face information recognition is widely applied to systems such as information encryption, system security, identity authentication and the like. Face recognition systems based on deep neural networks face a need for high computational complexity.

A binary neural network is a neural network that employs binary quantization. The binary quantization refers to discretizing the weight and the feature map of the neural network into two states including-1 and 1 (or 0 and 1). Therefore, in order to reduce the computational complexity of the deep neural network, face information recognition may be performed based on a binary neural network.

When the weights and the characteristic values in the neural network are subjected to binary quantization, for many neural networks, the precision loss is large. How to reduce the precision loss of the binary neural network during training, thereby improving the performance of the binary neural network, is a prerequisite for deploying the binary neural network in the practical application of face information recognition.

Disclosure of Invention

The application provides a face information recognition method, a face information recognition device and face information recognition equipment based on a binary neural network, and the training precision of the binary neural network is improved, so that the recognition precision of the trained binary neural network on face information in an image is improved. The technical scheme is as follows.

In one aspect, a face information recognition method based on a binary neural network is provided, and the method includes:

acquiring a target image, extracting an image characteristic value of the target image, and inputting the image characteristic value into a binary neural network;

performing binary quantization processing on the image characteristic value to obtain a quantization value of the image characteristic value;

determining the numerical value of a gradient adjusting coefficient according to the quantization value of the image characteristic value, wherein the gradient adjusting coefficient is used for controlling the probability of binary inversion of the weight in the binary neural network in the training process, and the binary inversion refers to that the weight corresponds to different quantization values before and after single iteration;

performing gradient updating on the weights by using the gradient adjusting coefficients to obtain the weights after training is completed, wherein the weights are used for forming the binary neural network after training is completed;

and inputting the image to be recognized into the trained binary neural network, and recognizing the face information in the image to be recognized.

In still another aspect, there is provided a face information recognition apparatus based on a binary neural network, the apparatus including:

the characteristic value input module is used for acquiring a target image, extracting an image characteristic value of the target image and inputting the image characteristic value into a binary neural network;

the binary quantization module is used for performing binary quantization processing on the image characteristic value to obtain a quantization value of the image characteristic value;

a numerical value determining module, configured to determine a numerical value of a gradient adjustment coefficient according to a quantization value of the image feature value, where the gradient adjustment coefficient is used to control a probability that a weight in the binary neural network undergoes binary inversion in a training process, and the binary inversion refers to that the weight corresponds to different quantization values before and after a single iteration;

the first gradient updating module is used for performing gradient updating on the weights by using the gradient adjusting coefficients to obtain the weights after training is completed, and the weights are used for forming the binary neural network after training is completed;

and the face information recognition module is used for inputting the image to be recognized into the trained binary neural network and recognizing the face information in the image to be recognized.

In a possible implementation manner, the numerical value determining module is further configured to:

determining the numerical value of the gradient adjustment coefficient as a first gradient adjustment coefficient value under the condition that the quantized value of the image characteristic value is a first quantized value;

determining the numerical value of the gradient adjustment coefficient as a second gradient adjustment coefficient value under the condition that the quantized value of the image characteristic value is a second quantized value;

wherein the first gradient adjustment coefficient value is different from the second gradient adjustment coefficient value.

In one possible implementation manner, the first gradient updating module is further configured to:

gradient updating weights in the binary neural network using the following formula:

；

wherein,

for the weight, is>

For a quantized value of said weight>

For the image characteristic value, < > or>

For a quantized value of the image characteristic value>

Adjust the coefficient for the gradient, < >>

For a number of layers of said binary neural network, in>

For training the number of iterations, ->

For a loss function of the binary neural network>

Is the learning rate.

In one possible implementation, the apparatus further includes: a second gradient update module;

and the second gradient updating module is used for performing gradient updating on the gradient adjusting coefficient so as to obtain the trained gradient adjusting coefficient.

In a possible implementation manner, the second gradient updating module is further configured to:

obtaining a gradient magnitude synchronization coefficient, wherein the gradient magnitude synchronization coefficient is used for ensuring that the magnitude of the gradient adjustment coefficient is synchronous with the magnitude of the gradient of the weight;

and performing gradient updating on the gradient adjusting coefficient by using the gradient magnitude synchronous coefficient to obtain the trained gradient adjusting coefficient.

determining the gradient magnitude synchronization coefficient using the following equation:

；

wherein,

for the gradient magnitude synchronization coefficient, based on the gradient magnitude of the image data>

And the number of the weights in the current layer of the binary neural network is used as the weight of the current layer of the binary neural network.

In still another aspect, a computer device is provided, where the computer device includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, at least one program, a code set, or a set of instructions is loaded and executed by the processor to implement the above-mentioned face information recognition method based on a binary neural network.

In still another aspect, a computer-readable storage medium is provided, and at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement the above-mentioned face information recognition method based on a binary neural network.

In yet another aspect, a computer program product is provided, as well as a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the face information identification method based on the binary neural network.

The technical scheme provided by the application can comprise the following beneficial effects:

in a binary neural network, the difference between binary quantized data and original data is large, so that a large error exists in gradient calculation, and the error causes that the weight frequently undergoes binary inversion in training, namely the weight corresponds to different quantized values before and after single iteration, so that the optimization direction of the neural network is difficult to converge to a correct direction, and the precision of neural network training is reduced. Based on the technical scheme, in the training process of the binary neural network, a gradient adjustment coefficient is introduced, the numerical value of the gradient adjustment coefficient is related to the quantization value of the image characteristic value used in the training process, and when the weight training is carried out in a gradient updating mode, the gradient adjustment coefficient can control the probability of binary inversion of the weight in the training process, so that the training precision of the neural network can be improved, and the recognition precision of the trained binary neural network on the face information in the image is improved.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram illustrating training of a binary neural network, according to an example embodiment.

Fig. 2 is a flowchart illustrating a method of a face information recognition method based on a binary neural network according to an exemplary embodiment.

FIG. 3 is a diagram illustrating gradient adjustment coefficients learned by different layers according to an exemplary embodiment.

FIG. 4 is a diagram illustrating weight flip probabilities in a binary neural network training process, according to an exemplary embodiment.

FIG. 5 is a diagram illustrating a comparison of training accuracy for different training modes according to an example embodiment.

FIG. 6 is a diagram illustrating a comparison of training accuracy for different training modes according to an example embodiment.

Fig. 7 is a block diagram illustrating a configuration of a face information recognition apparatus based on a binary neural network according to an exemplary embodiment.

FIG. 8 is a schematic diagram of a computer device provided in accordance with an exemplary embodiment of the present application.

Detailed Description

The technical solutions of the present application will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be understood that "indication" mentioned in the embodiments of the present application may be a direct indication, an indirect indication, or an indication of an association relationship. For example, a indicates B, which may mean that a directly indicates B, e.g., B may be obtained by a; it may also mean that a indicates B indirectly, for example, a indicates C, and B may be obtained by C; it can also mean that there is an association between a and B.

In the description of the embodiments of the present application, the term "correspond" may indicate that there is a direct correspondence or an indirect correspondence between the two, may also indicate that there is an association between the two, and may also indicate and is indicated, configure and is configured, and the like.

In the embodiment of the present application, "predefining" may be implemented by saving a corresponding code, table, or other manners that may be used to indicate related information in advance in a device (for example, including a terminal device and a network device), and the present application is not limited to a specific implementation manner thereof.

Before describing the various embodiments shown herein, several concepts related to the present application will be described.

1) Neural network

The Neural Network is a mathematical model for simulating the structure and function of biological Neural Networks, such as Deep Neural Networks (DNN), convolutional Neural Networks (CNN), recurrent Neural Networks (RNN), and so on.

2) Binary quantization

In recent years, the performance of the neural network is remarkably improved in the fields of computer vision, natural language processing and the like, and the neural network is widely applied to practical applications such as automatic driving, smart phones and the like. However, the high computational load of the neural network poses a great challenge to its deployment on the mobile side. Binary quantization is an effective and extreme model compression means, and can reduce the complexity of a neural network, thereby causing wide attention.

The binary quantization is to discretize the weight (weight) or the characteristic value (feature map) of the neural network into two states including-1 and 1 (or 0 and 1), so as to achieve the purposes of improving the network calculation efficiency and reducing the energy consumption.

In particular, after the weights and the eigenvalues in the neural network are quantized bi-directionally, the calculation of the neural network (convolution operation or full concatenation operation) can use bit operation calculation (if only the weights are quantized, the network calculation uses addition and subtraction operation instead of multiplication). The calculation method using bit operation is very suitable for a computer system compared with the original numerical calculation method, and can bring various benefits such as calculation speed and energy consumption.

The neural network for performing binary quantization is called a binary neural network, and the binary neural network faces a huge challenge in the aspect of precision loss.

The amount of information that a binary neural network can carry is very limited. When the weights and eigenvalues in the neural network are binary quantized, for many neural networks, the accuracy is lost to an unacceptable step. The performance of the binary neural network is improved, the precision loss of the task is reduced, and the method is a prerequisite condition that the binary neural network can be deployed in practical application.

In order to solve the problems, some related works of the known binary neural network aim at special scenes, and a quantization algorithm of binary quantization is separately designed, so that the quantization algorithm is more suitable for the specific scenes; others are constrained using additional penalty functions.

In the technical solutions provided by the above related works, the reason for the accuracy degradation of the binary neural network is not analyzed from the perspective of the binary neural network optimization.

Based on the above situation, in the embodiment of the present application, a training strategy of a binary neural network based on state awareness is proposed, and the training strategy theoretically analyzes the reason that the precision of the binary neural network training is reduced: one of the reasons for instability of binary quantization training is frequent inversion of the binary state of data during training.

Illustratively, with reference to FIG. 1 in conjunction, FIG. 1 indicates the problem faced by a binary neural network in training. Where w and L represent the weight and network loss function, respectively, and t represents the number of iterations of the training.

In a binary neural network, w is quantized binary, depending on the sign (positive or negative) of the data, to determine whether it is quantized to 1 or-1. Because the binary neural network only has two states of-1 and 1, the information loss is large, and further the gradient calculation is inaccurate (when the gradient is calculated, a quantized value is used, and the actual optimization direction is greatly different).

For example, in the t-th iteration shown in fig. 1, the original gradient direction (dotted line head 101) of the weight w is desirably updated in the negative direction (the value decreases), but after quantization, the quantization value is-1, and the gradient direction (dotted line head 102) here is desirably updated in the positive direction in the next iteration of the weight w. Thus, in the t +1 th iteration, the weight w is updated to a positive number (solid arrow 103). Similarly, in the iteration t +1 shown in fig. 1, the actual gradient of the weight w may not match the quantized gradient direction, and the original gradient direction of the weight w (dotted head 104) is preferably updated (the value is increased) in the positive direction next time, whereas the quantized value is 1 after quantization, and the gradient direction (dotted head 105) in this case is preferably updated in the negative direction next time, so that the probability is updated again to a negative value (solid arrow 106). The problem that the actual optimization direction is inconsistent with the quantized optimization direction causes instability of the binary neural network training. A large amount of computing resources are wasted in the wrong update direction, resulting in a reduction in the final training accuracy.

Therefore, it can be seen from the problems faced by the above binary neural network in training that, in the binary neural network, because the expression capability is limited, the difference between the binary quantized data and the original data is large, and a large error is caused during gradient calculation. Such errors make it difficult for the network's optimization direction to converge to the correct direction, and in practice, the manifestation is the frequent repeated inversion of binary data. Some of the repeated inversions are caused by calculation errors after quantization, and belong to invalid inversions, so that the efficiency of network training is reduced.

Correspondingly, the embodiment of the application provides a technical scheme of using different gradient adjustment coefficients for different binary states, so that invalid turnover is reduced, the purposes of stable training and task precision improvement are achieved, and the trained binary neural network can be applied to face information recognition of images.

The technical solutions provided in the present application are further described below with reference to the following examples.

Fig. 2 is a flowchart illustrating a method of a face information recognition method based on a binary neural network according to an exemplary embodiment. The method is performed by a computer device. As shown in fig. 2, the face information recognition method based on the binary neural network may include the following steps:

step 201, acquiring a target image, extracting an image characteristic value of the target image, and inputting the image characteristic value into a binary neural network.

The image feature value refers to a feature value extracted from image data of a target image. Wherein the target image is an image containing face information.

In the embodiment of the application, the image characteristic values are input into the binary neural network, and the binary neural network is exemplarily described to be applied to face information recognition in the field of computer vision.

Step 202, performing binary quantization processing on the image characteristic value to obtain a quantization value of the image characteristic value.

The binary quantization processing is a processing method of quantizing an image feature value into one of a first quantized value and a second quantized value. Such as: the sign of the image feature value is taken, and if the sign is positive, the quantized value of the image feature value is a first quantized value "1", and if the sign is negative, the quantized value of the image feature value is a second quantized value "-1".

It is understood that the embodiment of the present application is only exemplified by a binary neural network, and the technical solution thereof can be similarly applied to other multivalued quantization networks.

Step 203, determining a value of a gradient adjustment coefficient according to the quantization value of the image characteristic value, wherein the gradient adjustment coefficient is used for controlling the probability of binary inversion of the weight in the binary neural network in the training process, and the binary inversion refers to that the weight corresponds to different quantization values before and after a single iteration.

In the computer device, the corresponding relation between the quantized value of the image characteristic value and the numerical value of the gradient adjusting coefficient is prestored, and after the quantized value of the image characteristic value is obtained, the numerical value of the gradient adjusting coefficient corresponding to the quantized value of the image characteristic value is determined according to the quantized value of the image characteristic value obtained currently based on the corresponding relation.

And 204, performing gradient updating on the weights by using the gradient adjusting coefficients to obtain the trained weights, wherein the weights are used for forming the trained binary neural network.

From a mathematical point of view, the direction of the gradient is the direction in which the function increases most rapidly, and the opposite direction of the gradient is the direction in which the function decreases most rapidly. Therefore, in order to minimize the loss function of the neural network (including the binary neural network) and obtain the optimal weights of the neural network, the weights need to be updated in a gradient manner, and the weights need to be continuously optimized in the opposite direction of the gradient.

When the weights are updated in a gradient manner, the weights before and after a single iteration may correspond to different quantization values, so that binary inversion may occur on the weights, and the binary inversion is highly likely to be caused by errors in the quantized gradient calculation and belongs to invalid inversion.

In the embodiment of the application, a new parameter, namely a gradient adjustment coefficient is introduced, and the new parameter is applied to the process of updating the gradient of the weight, and the gradient adjustment coefficient is used for adjusting the quantized gradient calculation so as to control the probability of binary inversion of the weight based on the adjusted gradient.

And step 205, inputting the image to be recognized into the trained binary neural network, and recognizing the face information in the image to be recognized.

In the embodiment of the application, the trained binary neural network can be used for recognizing the face information in the image.

To sum up, in the binary neural network, the difference between the binary quantized data and the original data is large, which causes a large error in gradient calculation, and this error causes that the weights will frequently undergo binary inversion during training, i.e. before and after a single iteration, the weights correspond to different quantized values, so that the optimization direction of the neural network is difficult to converge to the correct direction, resulting in a decrease in the precision of neural network training. Based on the technical scheme, in the training process of the binary neural network, a gradient adjustment coefficient is introduced, the numerical value of the gradient adjustment coefficient is related to the quantization value of the image characteristic value used in the training process, and when weight training is performed in a gradient updating mode, the gradient adjustment coefficient can control the probability of binary inversion of the weight in the training process, so that the training precision of the neural network can be improved, and the recognition precision of the trained binary neural network on the face information in the image is improved.

Since the numerical value of the gradient adjustment coefficient is related to the quantized value of the image feature value, and different quantized values can be understood as different states, the technical scheme provided by the application in the binary neural network training can be called as binary neural network training based on state perception.

Next, such state-aware binary neural network training will be described.

Giving an image characteristic value to be quantized, and when calculating the gradient of the image characteristic value, determining the numerical value of a gradient adjustment coefficient as a first gradient adjustment coefficient value under the condition that the quantization value of the image characteristic value is a first quantization value; under the condition that the quantized value of the image characteristic value is a second quantized value, determining the numerical value of the gradient adjustment coefficient as a second gradient adjustment coefficient value; wherein the first gradient adjustment coefficient value is different from the second gradient adjustment coefficient value.

That is, if the image feature value is recorded as

Setting different gradient adjustment coefficients tau according to different states (1 or-1) ₋₁ And τ ₁ . As shown in equation 1:

(formula 1)

Wherein,

is a characteristic value of the image, is based on>

For a quantized value of an image characteristic value>

For the gradient adjustment factor, is selected>

Is a loss function of the binary neural network. It should be understood that when τ is ₋₁ Is equal to tau ₁ Then equation 1 degenerates to the traditional binary neural network optimization algorithm.

Accordingly, the weights in the binary neural network are updated in a gradient using equation 2:

(formula 2)

Wherein,

is weighted, in>

Is the quantized value of the weight>

Number of layers for a binary neural network>

For training the number of iterations, ->

Is the learning rate.

In the following, compared with the traditional implementation, that is, the binary neural network training based on state consistency, the binary neural network training based on state perception can suppress frequent invalid rollover caused by gradient calculation errors in the binary neural network for proving.

If equation 3 is set:

(formula)3）

Equation 2 can be simplified to

。

As can be seen from simplified equation 2, the weight binary inversion occurs only when equation 4 holds:

(formula 4)

Assuming that the initial state of the image feature value is-1, the probability of the weight being subjected to binary inversion is analyzed as follows.

a. When | τ _−1| = |τ ₁ When |, the probability of binary inversion of the weight from the t-th iteration to the t +1 iterations is as shown in formula 5:

(equation 5)>

Wherein N is the number of all weights, and

the number of weights to satisfy equation 6.

(formula 6)

Equations 4 and 6 mean that if the absolute value of the gradient learning rate product is larger than the absolute value of the current weight, the probability that the quantization value is inverted is larger. Since in equation 4 and equation 6, the weight w of the current time ^t Fixed in the update, whether the quantized value of the next weight is inverted depends on the gradient update amount b of the current time ^t (i.e., the product of the gradient magnitude and the learning rate). In the technical scheme provided by the application, the quantity b is artificially updated in the gradient ^t Besides, a factor based on state perception is additionally added

。

Can affect the probability of flipping of the quantized values in the binary quantization.

The difficulty degree of the quantized value of the weight can be adjusted by adjusting the value of the gradient adjusting coefficient tau used by the quantized values (-1 and 1) in different states, so that the probability of continuous inversion of the quantized value of the weight in the binary neural network is suppressed. It can be understood that, in the technical solution of the present application, binary inversion is allowed to occur to the quantized values of the weights, because this is a normal training process, but it is necessary to suppress meaningless continuous inversion.

It can be inferred that the probability of the successive and repeated flipping from the t-th iteration to t +2 iterations is shown in equation 7:

(equation 7)

b. When | τ ₋₁ | ≠ |τ ₁ In |, the continuous flipping probability of the state-aware-based binary neural network training is smaller than that of the state-aware-based binary neural network training, see the following formula 8 and formula 9:

(formula 8)

(formula 9)

Wherein A refers to the current iteration (A) ^t Representing the t-th iteration) of the set of quantization value flips. For example, in the above formula, the union of the t-th iteration and the t + 1-th iteration flip set is taken to represent the case of consecutive flip in two consecutive iterations.

The two formulas have the meanings that for the data with the quantized values of 1 and-1 which are very clear in the network optimization process, the updated values of the data cannot be influenced by the extra gradient adjustment factors, and the parameters are always optimized to the correct positions under the influence of the gradient reduction of the network; and for the weight of random swing of the quantization value between 1 and-1 due to the unknown gradient direction, the overturning difficulty of different quantization values is changed by setting a gradient adjusting coefficient tau, and the weight of the part is actively guided to a deviation value (1 or-1). Therefore, in the binary neural network optimization process, frequent invalid turnover of the quantization value of the weight can be inhibited, the training process is stabilized, and the training efficiency is improved.

In the following, the validity of the binary neural network training based on state perception in the present application is proved through experiments.

First, τ is fixed ₁ =1, find optimal τ through network training ₋₁ FIG. 3 shows the learned τ of the different layers of resnet-18 ₋₁ Distribution diagram of (c). In fig. 3, the ordinate represents the different channels (channels) and the abscissa is τ ₋₁ The size distribution of (a); sub-graphs a, b, c and d in fig. 3 are statistics of layers 14, 15, 16 and 17 of resnet-18, respectively. From fig. 3, it can be seen that the optimum | τ is ₋₁ L is not equal to | τ ₁ And l, namely the binary neural network training based on state perception can be closer to actual data.

Secondly, the weight turnover probability in the actual binary neural network training process is tracked, and fig. 4 is statistics of the turnover rate of the quantized value of the Resnet-18 layer 10 weight. In fig. 4, the ordinate represents the rollover probability and the abscissa represents the number of training epochs (epochs). m refers to the number of consecutive flips, e.g., m =2 represents 2 consecutive flips (only two consecutive flips were analyzed in the above theoretical demonstration); m =3 represents 3 consecutive flips. The probability of continuous flipping expresses to some extent the probability of invalid training occurring. From fig. 4, it can be seen that the binary neural network training based on state awareness (SA in the legend) has a lower probability of flipping compared to the binary neural network training based on state consistency.

Then, the training accuracy of the binary neural network training based on state perception is compared with that of the binary neural network training based on state consistency, see fig. 5. In FIG. 5, base + SA represents binary neural network training based on state perception, base represents binary neural network training based on state consistency, top-1 represents that an accurate value is the first name of a predicted value, and Top-5 represents that the accurate value belongs to the first five names of the predicted values. From fig. 5, it can be seen that the training precision of the binary neural network based on state perception is improved compared with the training precision of the binary neural network based on state consistency.

Finally, the state-aware based binary neural network training is compared with the accuracy of other related works to improve the accuracy of the binary neural network, see fig. 6. In fig. 6, the column SA-BNN represents the effect of state-aware binary neural network training, the column FP represents the performance of a full-precision network, and the other columns show the precision achieved by other related tasks. From fig. 6, it can be seen that the binary neural network training based on state perception has higher precision compared with other related work for improving the precision of the binary neural network.

In an exemplary embodiment, the gradient adjustment coefficients are learnable parameters obtained by training and learning of the network.

That is, before the binary neural network is actually applied to face information recognition, the training method of the binary neural network may further include the steps of:

and carrying out gradient updating on the gradient adjusting coefficient to obtain the trained gradient adjusting coefficient.

In a possible implementation manner, a gradient magnitude synchronization coefficient is obtained, and the gradient magnitude synchronization coefficient is used for ensuring that the magnitude of the gradient adjustment coefficient is synchronous with the magnitude of the gradient of the weight; and performing gradient updating on the gradient adjusting coefficient by using the gradient magnitude synchronous coefficient to obtain the trained gradient adjusting coefficient.

Extensive practice and theoretical analysis in the field of deep learning indicates that neural networks can converge to better local optima when the magnitude of each layer of gradient in the deep learning network is similar. According to this theory, the gradient introduced in equation 1 is adjusted for the coefficient (τ) ₋₁ And τ ₁ ) When optimization learning is carried out, the gradient updating magnitude is controlled, and the whole binary neural network can be converged to a better state.

Formally, the magnitude relation between the gradient of the original weight w and the gradient of the newly introduced gradient adjustment coefficient τ in the binary neural network is represented by equation 10:

(equation 10)

According to the foregoing theoretical analysis, it is required that | | | R | =1 to make the magnitude of the gradient adjustment coefficient τ similar to the magnitude of the gradient of other parameters in the binary neural network.

Optionally, using the following equation 11, the gradient magnitude synchronization coefficient is determined:

(formula 11)

Wherein,

is a gradient magnitude synchronization coefficient>

The number of weights in the current layer of the binary neural network.

In the implementation of equation 1, the gradient adjustment coefficient τ is associated with each quantized value of the layer, and its gradient is a sum of all quantized values of the layer, so its gradient quantization is higher than the quantized value of each individual weight. For correction, i | | R | =1, when the gradient adjustment coefficient τ is updated, a fixed gradient magnitude synchronization coefficient g is additionally multiplied.

In summary, in the neural network optimization process, the gradient descent of each learnable parameter is optimized, and when the magnitude of each learnable parameter is approximate, the learnable parameter can converge to a better local optimal point.

It should be noted that the method embodiments described above may be implemented alone or in combination, and the present application is not limited thereto.

Fig. 7 is a block diagram illustrating a configuration of a face information recognition apparatus based on a binary neural network according to an exemplary embodiment. The device comprises:

a feature value input module 701, configured to acquire a target image, extract an image feature value of the target image, and input the image feature value into a binary neural network;

a binary quantization module 702, configured to perform binary quantization processing on the image feature value to obtain a quantization value of the image feature value;

a numerical value determining module 703, configured to determine, according to the quantized value of the image feature value, a numerical value of a gradient adjustment coefficient, where the gradient adjustment coefficient is used to control a probability that a weight in the binary neural network undergoes binary inversion in a training process, where the binary inversion refers to that the weight corresponds to different quantized values before and after a single iteration;

a first gradient updating module 704, configured to perform gradient updating on the weights by using the gradient adjustment coefficients to obtain the trained weights, where the weights are used to form the trained binary neural network;

a face information recognition module 705, configured to input the image to be recognized into the trained binary neural network, and recognize face information in the image to be recognized.

In a possible implementation manner, the value determining module 703 is further configured to:

determining the numerical value of the gradient adjustment coefficient to be a second gradient adjustment coefficient value under the condition that the quantized value of the image characteristic value is a second quantized value;

In a possible implementation manner, the first gradient updating module 704 is further configured to:

；

wherein,

for said weight, based on>

For a quantized value of said weight>

For the image characteristic value, < > or>

For a quantized value of the image characteristic value>

Adjust the coefficient for the gradient, < >>

For a number of layers of said binary neural network, in>

For the number of training iterations>

For a loss function of the binary neural network>

Is the learning rate. />

；

wherein,

It should be noted that: the face information recognition device based on the binary neural network provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.

Please refer to fig. 8, which is a schematic diagram of a computer device according to an exemplary embodiment of the present application, the computer device includes a memory and a processor, the memory is used for storing a computer program, and the computer program is executed by the processor, so as to implement the face information recognition method based on the binary neural network.

The processor may be a Central Processing Unit (CPU). The Processor may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or a combination thereof.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the methods of the embodiments of the present invention. The processor executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions and modules stored in the memory, that is, the method in the above method embodiment is realized.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be coupled to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

In an exemplary embodiment, a computer-readable storage medium is also provided for storing at least one computer program, which is loaded and executed by a processor to implement all or part of the steps of the above method. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A face information identification method based on a binary neural network is characterized by comprising the following steps:

carrying out binary quantization processing on the image characteristic value to obtain a quantization value of the image characteristic value;

determining a numerical value of a gradient adjustment coefficient as a first gradient adjustment coefficient value under the condition that the quantization value of the image characteristic value is a first quantization value, determining the numerical value of the gradient adjustment coefficient as a second gradient adjustment coefficient value under the condition that the quantization value of the image characteristic value is a second quantization value, wherein the gradient adjustment coefficient is used for adjusting the quantized gradient calculation so as to control the probability of binary inversion of the weight in the binary neural network in the training process, the binary inversion refers to the fact that the weight corresponds to different quantization values before and after a single iteration, and the first gradient adjustment coefficient value is different from the second gradient adjustment coefficient value;

2. The method of claim 1, wherein the gradient updating the weights in the binary neural network using the gradient adjustment coefficients to complete the training of the weights comprises:

；

wherein,

for said weight, based on>

For the quantized value of the weight, <' >>

For the image characteristic value，

For a quantized value of the image characteristic value>

Adjust the coefficient for the gradient, < >>

For the number of layers of the binary neural network, is->

For training the number of iterations, ->

For a loss function of the binary neural network, <' >>

Is the learning rate.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

4. The method of claim 3, wherein the gradient updating the gradient adjustment coefficients comprises:

acquiring a gradient magnitude synchronization coefficient, wherein the gradient magnitude synchronization coefficient is used for ensuring that the magnitude of the gradient adjustment coefficient is synchronous with the magnitude of the gradient of the weight;

5. The method of claim 4, wherein obtaining gradient magnitude synchronization coefficients comprises:

；

wherein,

for the gradient magnitude synchronization factor->

6. A face information recognition device based on a binary neural network is characterized in that the device comprises:

the binary quantization module is used for carrying out binary quantization processing on the image characteristic value in the binary neural network to obtain a quantization value of the image characteristic value;

a numerical value determining module, configured to determine, when the quantization value of the image feature value is a first quantization value, a numerical value of a gradient adjustment coefficient as a first gradient adjustment coefficient value, and determine, when the quantization value of the image feature value is a second quantization value, the numerical value of the gradient adjustment coefficient as a second gradient adjustment coefficient value, where the gradient adjustment coefficient is used to adjust quantized gradient calculation so as to control a probability of binary inversion occurring in a weight in the binary neural network during training, where the binary inversion refers to that the weight corresponds to different quantization values before and after a single iteration, and the first gradient adjustment coefficient value is different from the second gradient adjustment coefficient value;

7. A computer device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the method for recognizing face information based on binary neural network as claimed in any one of claims 1 to 5.

8. A computer-readable storage medium, having at least one instruction, at least one program, a set of codes, or a set of instructions stored therein, which is loaded and executed by a processor to implement the method for recognizing face information based on binary neural network as claimed in any one of claims 1 to 5.