CN113762061A

CN113762061A - Quantitative perception training method and device for neural network and electronic equipment

Info

Publication number: CN113762061A
Application number: CN202110580264.9A
Authority: CN
Inventors: 康洋
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-12-07

Abstract

The application provides a quantitative perception training method and device for a neural network and electronic equipment, wherein the method comprises the following steps: inputting training data which is marked with a first result in advance into a neural network to obtain a second result output by the neural network; calculating a loss function of the neural network based on an error between the first result and the second result; in the process of obtaining the quantization operation of a pseudo quantization operator in the neural network on the floating point input corresponding to the training data, the quantization parameter adopted by the pseudo quantization operator directly operates the direct output obtained by the floating point input; calculating a gradient of the quantization parameter based on the direct output; and adjusting the quantization parameters in the direction of reducing the loss function based on the gradient of the quantization parameters until the loss function meets a preset condition to obtain the neural network with the completion of the quantization perception training. The embodiment of the application can improve the stability of the neural network.

Description

Quantitative perception training method and device for neural network and electronic equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a quantitative perception training method and device for a neural network and electronic equipment.

Background

With the development of artificial intelligence technology, neural networks have wide application prospects in many fields. In general, the higher the precision of parameters in the neural network, the better the precision and stability of the neural network processing task. However, the high precision of the parameters may cause the occupation space of the parameters included in the neural network to be too large, and the hardware memory resources required for operating the neural network are more. In order to enable the neural network to be deployed and used on a terminal device with a small hardware memory, in the prior art, quantization operation is performed on the neural network to reduce the precision of parameters. However, the introduction of quantization operation leads to the rapid decrease of the stability of the neural network and the increase of jitter.

Disclosure of Invention

An object of the present application is to provide a method, an apparatus, and an electronic device for quantitative perceptual training of a neural network, which can improve the stability of the neural network.

According to an aspect of an embodiment of the present application, a method for quantitative perceptual training of a neural network is disclosed, the method including:

inputting training data which is marked with a first result in advance into a neural network to obtain a second result output by the neural network;

calculating a loss function of the neural network based on an error between the first result and the second result;

in the process of obtaining the quantization operation of a pseudo quantization operator in the neural network on the floating point input corresponding to the training data, the quantization parameter adopted by the pseudo quantization operator directly operates the direct output obtained by the floating point input;

calculating a gradient of the quantization parameter based on the direct output;

and adjusting the quantization parameters in the direction of reducing the loss function based on the gradient of the quantization parameters until the loss function meets a preset condition to obtain the neural network with the completion of the quantization perception training.

According to an aspect of an embodiment of the present application, an apparatus for training quantization perception of a neural network is disclosed, the apparatus including:

the data input module is configured to input training data which are pre-marked with first results into a neural network to obtain second results output by the neural network;

a loss calculation module configured to calculate a loss function for the neural network based on an error between the first result and the second result;

the output acquisition module is configured to acquire direct output obtained by quantizing the floating point input corresponding to the training data by a pseudo quantization operator in the neural network;

a gradient calculation module configured to calculate a gradient of a quantization parameter employed by the pseudo quantization operator for quantization operation based on the direct output;

and the adjusting module is configured to adjust the quantization parameter in the direction of reducing the loss function based on the gradient of the quantization parameter until the loss function meets a preset condition, so as to obtain the neural network with the quantized sensing training completed.

In an exemplary embodiment of the present application, the apparatus is configured to:

simulating the gradient of a quantization operator adopted by the pseudo quantization operator for quantization operation;

and calculating a partial derivative of the quantization parameter based on the direct output and the gradient of the quantization operator to obtain the gradient of the quantization parameter.

controlling the quantization operator to directly operate preset first data to obtain corresponding second data;

performing regression processing on discrete points formed by the first data and the second data to obtain a simulation function with the first data as an independent variable and the second data as a dependent variable;

and obtaining the gradient of the quantization operator by derivation of the analog function.

acquiring the maximum value of the floating point input and the minimum value of the floating point input;

acquiring the maximum value of an output interval corresponding to the quantization operation and the minimum value of the output interval corresponding to the quantization operation;

and adjusting the quantization parameter based on the maximum value of the floating point input, the minimum value of the floating point input, the maximum value of the output interval corresponding to the quantization operation and the minimum value of the output interval corresponding to the quantization operation.

inputting a face picture which is pre-marked with first face key point distribution into a neural network to obtain second face key point distribution output by the neural network;

calculating a loss function of the neural network based on an error between the first face keypoint distribution and the second face keypoint distribution;

acquiring direct output obtained by quantization operation of a pseudo quantization operator in the neural network on floating point input corresponding to the face picture;

calculating the gradient of a quantization parameter adopted by the pseudo quantization operator for quantization operation based on the direct output;

the image input module is configured to input a human face image which is pre-marked with first human face key point distribution into a neural network to obtain second human face key point distribution output by the neural network;

a loss calculation module configured to calculate a loss function of the neural network based on an error between the first face keypoint distribution and the second face keypoint distribution;

the output acquisition module is configured to acquire direct output obtained by quantizing floating point input corresponding to the face picture by a pseudo quantization operator in the neural network;

using a formula

Calculating to obtain the direct output, wherein v is the floating point input, s and zp are the quantization parameters, and x_qIs the direct output.

using a formula

And formulas

Calculating the gradient of the quantization parameter, wherein,_Qminis the minimum value of the output interval corresponding to the quantization operation,_Qmaxis the maximum value of the output interval corresponding to the quantization operation,

is the gradient of the quantization parameter zp,

is the gradient of the quantization parameter s.

According to an aspect of an embodiment of the present application, an electronic device is disclosed, including: a memory storing computer readable instructions; a processor reading computer readable instructions stored by the memory to perform the method of any of the preceding claims.

According to an aspect of embodiments of the present application, a computer program medium is disclosed, having computer readable instructions stored thereon, which, when executed by a processor of a computer, cause the computer to perform the method of any of the preceding claims.

According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.

In the embodiment of the application, the quantization parameter is processed into a parameter which can be learned in the neural network by calculating the gradient of the quantization parameter. On the basis of the gradient of the quantization parameter, the adjustment of the quantization parameter is associated with the loss function of the neural network, so that the quantization parameter obtained by adjustment is better than the loss function, and the stability of the neural network is improved.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

Fig. 1 shows a schematic diagram of an exemplary system architecture of the provided solution according to an embodiment of the present application.

FIG. 2 shows a flow diagram of a method of quantitative perceptual training of a neural network according to one embodiment of the present application.

FIG. 3 illustrates a simplified diagram of quantitative perceptual training of a neural network, according to one embodiment of the present application.

Fig. 4 is a schematic diagram illustrating a distribution of first face keypoints of a labeled face picture according to an embodiment of the present application.

FIG. 5 shows a block diagram of a quantitative perceptual training apparatus of a neural network according to one embodiment of the present application.

FIG. 6 illustrates a hardware diagram of an electronic device according to one embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these example embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In the following description, numerous specific details are provided to give a thorough understanding of example embodiments of the present application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, steps, and so forth. In other instances, well-known structures, methods, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The application provides a quantitative perception training method of a neural network, and relates to machine learning in the field of artificial intelligence.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In the embodiment of the present application, the quantization in the neural network mainly refers to that a weight value or an activation value represented by a high bit width is approximately represented by a low bit width. The representation of quantization on the value is discretizing the continuous value. For example: the weight of 32float is represented as a weight of int 8.

The quantization operation is carried out on the neural network, so that the size of the space occupied by the parameters contained in the neural network can be reduced, and the hardware memory occupied by the neural network during operation is reduced, so that the neural network can be operated in edge equipment such as a mobile terminal in a low-precision mode. But at the same time, the stability of the neural network is also reduced sharply.

According to the embodiment of the application, the pseudo quantization operator is introduced into the neural network to carry out quantization perception training on the neural network, so that the neural network can perceive and quantize in the training process, and the stability of the neural network is improved compared with that of the neural network which is simply subjected to quantization operation.

In the embodiment of the present application, the function operation described by the pseudo quantization operator introduced in the neural network is equivalent to performing quantization operation and then performing inverse quantization operation on data located in a layer above the pseudo quantization operator.

Fig. 1 shows a schematic diagram of an exemplary system architecture of the provided solution of an embodiment of the present application.

As shown in fig. 1, the system architecture may include terminal devices (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform.

The server 105 is mainly used for training the neural network, and the terminal device is mainly used for deploying the trained neural network. In the server 105, after the neural network is subjected to quantitative perception training, the size of the space occupied by the parameters in the neural network is reduced, and the hardware memory occupied during operation is reduced. After the trained neural network is deployed in the terminal equipment at the edge, the neural network can stably operate in a low-precision mode.

It should be noted that the embodiment is only an exemplary illustration of the system architecture to which the present application can be applied, and should not limit the function and the scope of the application.

Before describing the specific implementation process of the embodiments of the present application in detail, a brief explanation of some concepts related to the embodiments of the present application will be provided.

A floating-point input refers to input data of the floating-point type.

The quantization parameter refers to a parameter adopted by the pseudo quantization operator when performing the quantization operation.

And direct output refers to output obtained by directly operating the floating point input by using a quantization parameter adopted by the pseudo quantization operator in the process of performing quantization operation on the floating point input by the pseudo quantization operator. Specifically, the pseudo quantization operator performs quantization operation on the floating-point input, and generally includes two stages: firstly, directly operating floating point input by adopting quantization parameters, mapping the floating point input of continuous numerical values into direct output of discrete numerical values, and recording the direct output as x_q(ii) a Two, direct output x obtained by the pair_qRange interception is carried out, and x is directly output_qOutput interval [ Qmin, Qmax ] mapped to a predetermined range]Obtaining a quantized output, and recording the quantized output as x_Q。

Fig. 2 illustrates a method for quantitative perceptual training of a neural network according to an embodiment of the present application, the method including:

step S210, inputting training data which is marked with a first result in advance into a neural network to obtain a second result output by the neural network;

step S220, calculating a loss function of the neural network based on an error between the first result and the second result;

step S230, acquiring direct output obtained by directly operating floating point input by quantization parameters adopted by a pseudo quantization operator in the process of performing quantization operation on the floating point input corresponding to the training data by the pseudo quantization operator in the neural network;

step S240, calculating the gradient of the quantization parameter based on the direct output;

and S250, based on the gradient of the quantization parameter, adjusting the quantization parameter in the direction of reducing the loss function until the loss function meets a preset condition, and obtaining the neural network with the completion of the quantization perception training.

In the embodiment of the application, the training data is adopted to carry out quantitative perception training on the neural network. Specifically, a first result corresponding to the training data is pre-labeled, and the first result is a target output of the neural network. And inputting the training data into a neural network to obtain a second result output by the neural network. And when the error between the second result and the pre-marked first result stably meets the preset requirement, the neural network can accurately process data and the training is completed.

Wherein the error between the second result and the first result is represented by a loss function. Therefore, in the process of carrying out quantitative perception training on the neural network, the loss function of the neural network is calculated based on the error between the first result and the second result.

In the process of carrying out quantitative perception training on the neural network, a pseudo quantization operator is introduced to carry out quantization operation on data on the upper layer of the pseudo quantization operator and then carry out inverse quantization operation, so that the neural network can realize perception quantization. Specifically, the pseudo quantization operator performs quantization operation and then performs inverse quantization operation on the floating point input corresponding to the training data located in the upper layer. Wherein the inverse quantization operation is an inverse operation of the quantization operation.

And in the process of obtaining the quantization operation of the pseudo quantization operator on the floating point input, the quantization parameter adopted by the pseudo quantization operator directly operates the direct output obtained by the floating point input. Further, based on the direct output, a gradient of the quantization parameter is calculated. And then based on the gradient of the quantization parameter, adjusting the quantization parameter in the direction of reducing the loss function until the loss function meets the preset condition, and obtaining the neural network finished by the quantization perception training.

Therefore, in the embodiment of the application, the quantization parameter is processed into a learnable parameter in the neural network by calculating the gradient of the quantization parameter. On the basis of the gradient of the quantization parameter, the adjustment of the quantization parameter is associated with the loss function of the neural network, so that the quantization parameter obtained by adjustment is better than the loss function, and the stability of the neural network is improved.

Fig. 3 shows a schematic diagram of quantitative perceptual training of a neural network according to an embodiment of the present application.

In this embodiment, two different pseudo quantization operators, which are denoted as pseudo quantization operator 1 and pseudo quantization operator 2, are introduced into the neural network. The two pseudo quantization operators are used to make the neural network aware of the quantization at different stages of the training of the neural network, respectively.

After the training data are input into the neural network, the result output by the neural network is obtained after the processing of the pseudo quantization operator 1, the processing of other nodes of the neural network and the processing of the pseudo quantization operator 2. And calculating a loss function according to the result output by the neural network, performing back propagation on the pseudo-quantization operator 2 and the pseudo-quantization operator 1 according to the loss function, and respectively adjusting the quantization parameters of the pseudo-quantization operators so as to continuously reduce the loss function until the training is finished.

In an embodiment, a squared difference between the first result and the second result is taken as a loss function of the neural network.

Specifically, a first result pre-labeled for the training data is recorded as gt, a second result output by the neural network after the training data is input into the neural network is recorded as y, and a loss function of the neural network is recorded as L. Then L is calculated using the formula shown below:

L＝||y-gt||₂

it should be noted that this embodiment only shows an exemplary method for calculating the loss function. It will be appreciated that the loss function may also be calculated using variance or logarithm, etc. This embodiment should not limit the scope of the application and its function.

In the embodiment of the application, after the training data is input into the neural network, the training data is processed into floating point input by a layer of nodes before the pseudo quantization operator. After the quantization operation and the inverse quantization operation of the node where the pseudo quantization operator is located are carried out on the floating point input, the floating point input is continuously processed by the nodes layer by layer behind the pseudo quantization operator, and finally a second result is output. And extracting the floating point input and the direct output from the neural network in a mode of positioning the node where the pseudo quantization operator is located in the neural network.

In the embodiment of the application, in the process of quantizing the floating point input by the pseudo quantization operator, the direct output obtained by directly operating the floating point input by the quantization parameter adopted by the pseudo quantization operator is discrete. Therefore, it is difficult to calculate the gradient of the quantization parameter in a strict mathematical sense. In the embodiment of the present application, the calculated gradient of the quantization parameter is generally obtained by approximate simulation.

In one embodiment, the pseudo quantization operator is modeled to perform a quantization operation using a gradient of quantization operators. And solving a partial derivative of the quantization parameter based on the direct output and the gradient of the quantization operator to obtain the gradient of the quantization parameter.

Specifically, under the combined action of the quantization operator and the quantization parameter, a continuous floating-point input is processed into a discrete direct output. Wherein the discrete realization is mainly caused by the action of quantization operators. Therefore, after the gradient of the quantization operator is obtained through simulation, the quantization parameter can be approximately regarded as a parameter on a continuous numerical space, and then the partial derivative of the quantization parameter can be solved to obtain the gradient of the quantization parameter.

In an embodiment, the gradient of the quantization operator is modeled as a constant.

Specifically, although the direct output of the quantization operator is discrete and cannot be derived in a strict mathematical sense, it can be directly modeled as a constant in the process of approximate modeling.

In an embodiment, the quantization operator is controlled to directly operate the preset first data to obtain the corresponding second data. And performing regression processing on discrete points consisting of the first data and the second data to obtain a simulation function with the first data as independent variables and the second data as dependent variables. And (5) deriving the analog function to obtain the gradient of the quantization operator.

Specifically, a plurality of first data are preset, and the quantization operator is controlled to directly operate each first data respectively to obtain a second data corresponding to each operated first data. And establishing discrete points consisting of the first data and the corresponding second data, and further performing regression processing on the discrete points to obtain a simulation function taking the first data as an independent variable and the second data as a dependent variable, wherein the simulation function simulates an approximate expression of mapping the output of the quantization operator to a continuous numerical space. And further, the analog function is derived to obtain the gradient of the quantization operator.

In an embodiment, the quantization parameter is randomly initialized, and then a gradient of the quantization parameter obtained by the random initialization is calculated, and then the quantization parameter obtained by the random initialization is adjusted based on the calculated gradient.

In an embodiment, the quantization parameter is further adjusted based on statistics on the basis of the adjustment of the quantization parameter based on the gradient of the quantization parameter.

In this embodiment, the maximum value of the floating-point input and the minimum value of the floating-point input are obtained. And acquiring the maximum value of the output interval corresponding to the quantization operation and the minimum value of the output interval corresponding to the quantization operation. And adjusting the quantization parameter based on the maximum value of the floating point input, the minimum value of the floating point input, the maximum value of the output interval corresponding to the quantization operation and the minimum value of the output interval corresponding to the quantization operation.

Specifically, a range [ xmin, xmax ] of the floating point input is obtained in a statistical manner, where xmin is a minimum value of the floating point input, and xmax is a maximum value of the floating point input. And acquiring an output interval [ Qmin, Qmax ] corresponding to the quantization operation, wherein Qmin is the minimum value of the output interval, and Qmax is the maximum value of the output interval.

Wherein, the quantization operation corresponds to the output regionThe time is related to the number of bits output after the quantization operation, and specifically, when the output is N bits after the quantization operation, Qmin is-2^N-1，Qmax＝2^N-1-1。

The quantization parameter is further adjusted based on xmin, xmax, Qmin, and Qmax. Specifically, by assigning a preset weight w to xmin and xmax, and assigning weights (1-w) to Qmin and Qmax, the quantization parameter is adjusted based on the weighted sum.

The embodiment has the advantages that the quantization parameter is adjusted based on statistics on the basis of adjusting the quantization parameter based on the gradient of the quantization parameter, so that the convergence speed of quantization parameter adjustment can be increased, and the training effect of the neural network is improved.

An embodiment of the present application further provides a quantitative perception training method for a neural network, where the method includes: and inputting the face picture pre-marked with the first face key point distribution into a neural network to obtain a second face key point distribution output by the neural network. And calculating a loss function of the neural network based on the error between the first face key point distribution and the second face key point distribution. And obtaining direct output obtained by quantizing the floating point input corresponding to the human face picture by the pseudo quantization operator in the neural network. And calculating the gradient of the quantization parameter adopted by the pseudo quantization operator for quantization operation based on the direct output. And based on the gradient of the quantization parameter, adjusting the quantization parameter in the direction of reducing the loss function until the loss function meets the preset condition to obtain the neural network finished by the quantization perception training.

In the embodiment, the neural network is used for identifying key points of a human face, and a pseudo quantization operator is introduced to carry out quantization perception training on the neural network, so that the neural network can be deployed on terminal equipment located at the edge, and support is stably provided for video call service or video special effect service of the terminal equipment.

In this embodiment, a face image is used to perform quantitative perception training on a neural network. Specifically, the first face key point distribution corresponding to the face image is labeled in advance, and the first face key point distribution is output as a target of the neural network. And inputting the face picture into a neural network to obtain second face key point distribution output by the neural network. When the error between the second face key point distribution and the pre-labeled first face key point distribution stably meets the preset requirement, the neural network can accurately identify the face key points, and the training is completed.

And the error between the second face key point distribution and the first face key point distribution is expressed by a loss function. Therefore, in the process of carrying out quantitative perception training on the neural network, the loss function of the neural network is calculated based on the error between the first face key point distribution and the second face key point distribution.

In the process of carrying out quantitative perception training on the neural network, a pseudo quantization operator is introduced to carry out quantization operation on data on the upper layer of the pseudo quantization operator and then carry out inverse quantization operation, so that the neural network can realize perception quantization. Specifically, the pseudo quantization operator performs quantization operation on floating point input corresponding to the face picture on the upper layer and then performs inverse quantization operation. Wherein the inverse quantization operation is an inverse operation of the quantization operation.

It should be noted that the neural network using the face image as an input may be regarded as an application of the neural network using the training data as an input in a face key point recognition scene. The quantization perception training process of the neural network taking the face picture as input is the same as the quantization perception training process of the neural network taking the training data as input, so the similar implementation process is not repeated.

Fig. 4 is a schematic diagram illustrating a distribution of first face key points of a labeled face picture according to an embodiment of the present application.

In this embodiment, the collected face picture is subjected to registration processing. Specifically, by rotating the face in the picture, the face tilted in the picture is centered and kept horizontal. And then, carrying out key point labeling on the face after the face is subjected to the positive processing to obtain a first key point distribution shown on the rightmost side of the figure.

And then inputting the rightmost face picture and information describing coordinates of each key point in the first key point distribution into a neural network, and training the neural network according to the quantitative perception training method of the neural network provided by the application to obtain the neural network which can be deployed on terminal equipment at the edge and can stably identify the face key points.

In an embodiment, in the process of performing quantization perception training on a neural network using a face picture as an input, a formula shown as follows is adopted, and a pseudo quantization operator performs quantization operation on floating point input to obtain direct output:

where v is the floating point input, s and zp are both quantization parameters, x_qIs directly output. The round function is a rounding function, i.e. rounding the operated value. s is the scaling factor and zp is the translation factor.

In this embodiment, the quantization operation performed by the pseudo quantization operator on the floating-point input can be expressed as the following formula:

x_Q＝clamp(x_q,Qmin,Qmax)

wherein x is_QIs the quantized output. Qmin is the minimum value of the output interval corresponding to the quantization operation, and Qmax is the maximum value of the output interval. The clamp function intercepts the function for the interval when x_qWhen less than Qmin, x_QTaking the value as Qmin; when x is_qWhen it is greater than Qmax, x_QThe value is Qmax; when x is_qWhen it is not less than Qmin and not more than Qmax, x_QValue of x_q。

In an embodiment, in the process of performing quantization perception training on a neural network using a face picture as an input, based on direct output, the following formula is adopted to calculate the gradient of a quantization parameter used by a pseudo quantization operator for quantization operation:

wherein Qmin is the minimum value of the output interval corresponding to the quantization operation, Qmax is the maximum value of the output interval corresponding to the quantization operation,

in order to quantify the gradient of the parameter zp,

is the gradient of the quantization parameter s.

Specifically, the process of performing quantization operation and then performing inverse quantization operation by the pseudo quantization operator is represented as:

v＝(clamp(x_q,Q_min,Q_max)-zp)*s

setting the gradient of the round function to 1, and solving a partial derivative of the quantization parameter zp:

determining the gradient of the quantization parameter zp

Partial derivatives are calculated for the quantization parameter s:

determining the gradient of the quantization parameter zp

In this embodiment, the quantization parameter is adjusted in the direction of reducing the loss function by calculating the gradient of the quantization parameter, so that the neural network that has completed the quantization perception training can stably recognize the key points of the face in the terminal device located at the edge.

Fig. 5 illustrates a quantitative perceptual training apparatus of a neural network according to an embodiment of the present application, the apparatus including:

a data input module 310, configured to input training data, which is pre-marked with a first result, into a neural network, so as to obtain a second result output by the neural network;

a loss calculation module 320 configured to calculate a loss function of the neural network based on an error between the first result and the second result;

an output obtaining module 330, configured to obtain direct output obtained by performing quantization operation on floating point input corresponding to the training data by a pseudo quantization operator in the neural network;

a gradient calculation module 340 configured to calculate, based on the direct output, a gradient of a quantization parameter employed by the pseudo quantization operator for quantization operation;

an adjusting module 350, configured to adjust the quantization parameter in a direction of decreasing the loss function based on the gradient of the quantization parameter until the loss function meets a preset condition, so as to obtain a neural network with a quantized perceptual training completed.

The present application further provides a device for training quantization perception of a neural network, the device comprising:

using a formula

using a formula

And formulas

is the gradient of the quantization parameter zp,

for the quantizationThe gradient of the parameter s.

An electronic device 40 according to an embodiment of the present application is described below with reference to fig. 6. The electronic device 40 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, electronic device 40 is embodied in the form of a general purpose computing device. The components of electronic device 40 may include, but are not limited to: the at least one processing unit 410, the at least one memory unit 420, and a bus 430 that couples various system components including the memory unit 420 and the processing unit 410.

Wherein the storage unit stores program code executable by the processing unit 410 to cause the processing unit 410 to perform steps according to various exemplary embodiments of the present invention as described in the description part of the above exemplary methods of the present specification. For example, the processing unit 410 may perform the various steps as shown in fig. 2.

The storage unit 420 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)4201 and/or a cache memory unit 4202, and may further include a read only memory unit (ROM) 4203.

The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205, such program modules 4205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 430 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 40 may also communicate with one or more external devices 500 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 40, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 40 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 450. An input/output (I/O) interface 450 is connected to the display unit 440. Also, the electronic device 40 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 460. As shown, network adapter 460 communicates with other modules of electronic device 40 via bus 430. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 40, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.

In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method described in the above method embodiment section.

According to an embodiment of the present application, there is also provided a program product for implementing the method in the above method embodiment, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. A method of quantitative perceptual training of a neural network, the method comprising:

2. The method of claim 1, wherein calculating a gradient of the quantization parameter based on the direct output comprises:

3. The method of claim 2, wherein simulating a gradient of a quantization operator used by the pseudo quantization operator for the quantization operation comprises:

4. The method of claim 1, further comprising:

5. A method of quantitative perceptual training of a neural network, the method comprising:

and adjusting the quantization parameters in the direction of reducing the loss function based on the gradient of the quantization parameters until the loss function meets a preset condition to obtain a neural network finished by quantization perception training.

6. The method of claim 5, wherein obtaining a direct output obtained by performing a quantization operation on a floating-point input corresponding to the face picture by a pseudo quantization operator in the neural network comprises:

using a formula

7. The method of claim 6, wherein calculating, based on the direct output, a gradient of a quantization parameter employed by the pseudo-quantization operator for the quantization operation comprises:

using a formula

And formulas

Calculating the gradient of the quantization parameter, wherein Qmin is the minimum value of the output interval corresponding to the quantization operation, Qmax is the maximum value of the output interval corresponding to the quantization operation,

is the gradient of the quantization parameter zp,

is the gradient of the quantization parameter s.

8. An apparatus for quantitative perceptual training of a neural network, the apparatus comprising:

9. An electronic device, comprising:

a memory storing computer readable instructions;

a processor reading computer readable instructions stored by the memory to perform the method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor of a computer, cause the computer to perform the method of any one of claims 1-7.