CN112115825B - Quantification method, device, server and storage medium of neural network - Google Patents

Quantification method, device, server and storage medium of neural network Download PDF

Info

Publication number
CN112115825B
CN112115825B CN202010934398.1A CN202010934398A CN112115825B CN 112115825 B CN112115825 B CN 112115825B CN 202010934398 A CN202010934398 A CN 202010934398A CN 112115825 B CN112115825 B CN 112115825B
Authority
CN
China
Prior art keywords
weight
quantization
original
neural network
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010934398.1A
Other languages
Chinese (zh)
Other versions
CN112115825A (en
Inventor
李品逸
蔡志文
陈腊梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Autopilot Technology Co Ltd filed Critical Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority to CN202010934398.1A priority Critical patent/CN112115825B/en
Publication of CN112115825A publication Critical patent/CN112115825A/en
Application granted granted Critical
Publication of CN112115825B publication Critical patent/CN112115825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a neural network quantization method, a neural network quantization device, a neural network quantization server and a neural network storage medium. The quantification method of the neural network comprises the following steps: initializing quantization weights by using the original weights; setting an objective function, wherein the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index; solving the objective function to minimize the included angle between the quantization weight and the original weight, and obtaining the shared weight value and the weight distribution index; and obtaining the quantization weight according to the shared weight value and the weight distribution index. The quantization method of the neural network is a method for optimizing the quantization problem of the neural network based on the vector direction, and the quantization weight can save original weight information as much as possible by minimizing the included angle between the quantization weight and the original weight, so that the information loss caused by quantization is reduced.

Description

Quantification method, device, server and storage medium of neural network
Technical Field
The present invention relates to the field of deep neural networks, and in particular, to a method, an apparatus, a server, and a storage medium for quantifying a neural network.
Background
In recent years, deep neural networks have greatly driven the development of the autopilot field, so that the hope of a person over the past decades has become increasingly possible. However, the huge calculation amount required by the deep neural network limits the application of the deep neural network to vehicle-mounted intelligent hardware with limited operation resources. To address this bottleneck, many research efforts have achieved the need to reduce the computational overhead of deep neural networks by quantifying parameters in deep neural network operations, by converting floating point parameters into fixed point parameters and shortening their bit width. However, in the quantization process, how to reduce the quantized information loss becomes a technical problem to be solved.
Disclosure of Invention
The embodiment of the invention provides a neural network quantization method, a neural network quantization device, a neural network server and a neural network storage medium.
The method for quantifying the neural network comprises the following steps:
Initializing quantization weights by using the original weights;
Setting an objective function, wherein the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index;
Solving the objective function to minimize the included angle between the quantization weight and the original weight, and obtaining the shared weight value and the weight distribution index;
And obtaining the quantization weight according to the shared weight value and the weight distribution index.
In some embodiments, initializing quantization weights with original weights includes:
the quantization weights are initialized using a minimized euclidean distance for the original weights for each layer of network.
In some embodiments, initializing quantization weights with original weights includes:
logarithmic quantization is used on the original weights of each layer of network to initialize the quantized weights.
In certain embodiments, the quantization method comprises:
Batch normalization is added to each layer of network to reduce internal covariance variation.
In some embodiments, the angle between the quantized weight and the original weight is expressed as the inner product of the i-th kernel function vector and quantized weight vector in the original weight divided by the length of the i-th kernel function vector and quantized weight vector.
In some embodiments, solving the objective function includes:
Fixing the shared weight value and adjusting the weight distribution index;
and fixing the weight distribution index and adjusting the sharing weight value.
In some implementations, obtaining the quantization weights from the shared weight values and the weight assignment index includes:
and distributing corresponding shared weight values to each layer of original weights in turn according to the weight distribution indexes so as to obtain the quantization weights.
The quantization device of the neural network according to the embodiment of the invention comprises:
the initialization module is used for initializing the quantization weights by using the original weights;
the setting module is used for setting an objective function, and the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index;
The solving module is used for solving the objective function to minimize the included angle between the quantized weight and the original weight and obtain the shared weight value and the weight distribution index;
and the distribution module is used for obtaining the quantization weight according to the shared weight value and the weight distribution index.
The server according to an embodiment of the present invention includes a memory storing a computer program and a processor for executing the program to implement the method for quantifying a neural network according to any of the above embodiments.
The computer-readable storage medium according to an embodiment of the present invention stores thereon a computer program that, when executed by a processor, implements the steps of the neural network quantization method according to any of the above embodiments.
According to the method, the device, the server and the storage medium for quantizing the neural network, the problem of quantizing the neural network is optimized based on the vector direction, and the quantization weight can save original weight information as much as possible by minimizing the included angle between the quantization weight and the original weight, so that information loss caused by quantization is reduced.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIGS. 1-7 are flow diagrams of a method for quantifying a neural network according to an embodiment of the present invention;
fig. 8 is a schematic block diagram of a quantization apparatus of a neural network according to an embodiment of the present invention;
Fig. 9 is a schematic block diagram of a server according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, and are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
Referring to fig. 1, a method for quantifying a neural network according to an embodiment of the present invention includes:
step S12: initializing quantization weights by using the original weights;
Step S14: setting an objective function, wherein the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index;
step S16: solving an objective function to minimize an included angle between the quantized weight and the original weight and obtain a shared weight value and a weight distribution index;
Step S18: and obtaining quantization weights according to the shared weight values and the weight distribution indexes.
The quantization method of the neural network is a method for optimizing the quantization problem of the neural network based on the vector direction, and the quantization weight can save original weight information as much as possible by minimizing the included angle between the quantization weight and the original weight, so that the information loss caused by quantization is reduced.
In the related art, the deep neural network needs huge calculation amount, and the calculation cost of the deep neural network can be reduced by quantifying parameters in the operation of the deep neural network. Meanwhile, in order to avoid information loss caused by the quantization process, the Euclidean distance (L2-distance) between the quantization weight and the original deep neural network weight is mainly minimized to obtain the quantization weight with smaller information loss. In order to accelerate the training speed of the deep neural network, batch normalization (Batch Normalization, BN) is usually added to each layer of network to reduce the internal covariance variation, and the specific calculation formula is as follows:
Where x i is the input value of the batch b= { x 1...m } before entering the activation function. Mu B is the average value of the current input values, Is the variance of the input value and the average value. Gamma and beta are parameters that can be adjusted in training, epsilon being the minimum term introduced to avoid variance of 0. However, in the case of N times the euclidean distance of the input value x i, both the mean and variance are correspondingly N times the same, and the result is unchanged after batch normalization.
That is, in the related art, the length information of the input vector does not affect the result, the euclidean distance between the quantization weight and the original deep neural network weight is minimized, and the information loss caused by the quantization process cannot be effectively avoided.
In the method for quantizing the neural network, provided by the embodiment of the invention, the quantization weight is optimized by minimizing the included angle between the quantization weight and the original weight by utilizing the characteristic that the information of the input vector mainly exists in the direction of the input vector, so that the original weight information can be stored as much as possible by the quantization weight, and the information loss caused by quantization is reduced.
It is understood that the neural network includes an input layer, an output layer, and a plurality of hidden layers between the input layer and the output layer. The input layer, the output layer, and the hidden layer may each include a plurality of neurons. The input layer, the output layer and the hidden layer are fully connected, namely, any neuron in one layer is connected with any neuron in the other layer in any adjacent two layers, and an original weight exists between every two connected neurons. By setting and solving the objective function, the included angle between the quantized weight and the original weight is minimized, the quantized weight is obtained, and then the quantized weight is used for calculating the neural network, so that a more accurate result can be obtained, and the information loss caused by quantization is reduced.
Specifically, in some embodiments, the angle between the quantized weight and the original weight is expressed as the inner product of the i-th kernel function vector and the quantized weight vector in the original weight divided by the length of the i-th kernel function vector and the quantized weight vector. In this way, the magnitude of the included angle between the quantization weight and the original weight can be represented by cosine similarity. Further, the included angle between the quantized weight and the original weight is minimized, i.e., the cosine similarity between the quantized weight and the original weight is maximized. Thus, the objective function may be set to:
wherein, Representing the quantized shared weight value of the first layer, I l represents the weight allocation index,/>Representing the i-th kernel function vector in the original weight,/>Representing the corresponding quantized weight vector, the included angle between the quantized weight and the original weight can be expressed as/>And/>Dividing the inner product of (2) by/>And/>Is a length of (c). When the objective function value is the largest, the cosine similarity between the quantized weight and the original weight is the largest, the included angle between the corresponding quantized weight and the original weight is the smallest, the shared weight value and the weight distribution index are used as criteria to quantize the weights, the length influence of quantized weight vectors can be completely eliminated, quantized weight vectors which are the most similar to the original weight vectors in direction are found, namely, the characteristic direction extracted by the original weights is reserved, and the information loss caused by quantization is reduced.
In one example, a large image classification dataset ImageNet is selected for the experiment, wherein the ImageNet dataset is a large color image dataset comprising 1000 classes of object classifications, including one hundred twenty thousand training pictures and fifty thousand verification pictures. The size of the pictures is not uniform due to the ImageNet dataset. In order to standardize the size of the picture, the short side sizes of all training pictures and verification pictures are scaled to 256 to facilitate subsequent processing. During training or fine tuning of the network, all images are randomly clipped to 224 x 224 size and then flipped randomly horizontally before being sent into the network, except that no additional method is used for data augmentation. When testing the performance on the verification set, only a single picture with the size of 224 multiplied by 224 is cut from the center of the verification set picture and sent to the network for verification, and the classification accuracy of the first rank (Top-1) and the Top five ranks (Top-5) is calculated.
Referring to Table 1, table 1 shows the classification accuracy of Top-1 and Top-5 when quantified in different numbers on ImageNet verification for Alexnet networks. It can be seen that when the vector included angle is used for 6bit quantization, the Top-1 classification accuracy is 59.4%, the Top-5 classification accuracy is 81.3%, and in the 32bit original network, the Top-1 classification accuracy is 60.1%, and the Top-5 classification accuracy is 81.9%, i.e. the performance similar to that of the original 32bit network can be achieved by using the vector included angle for 6bit quantization.
TABLE 1
Bit number Top-1 Accuracy Top-5 Accuracy
4 54.9% 78.2%
5 57.9% 80.1%
6 59.4% 81.3%
32 60.1% 81.9%
Referring to table 2, table 2 is a table comparing quantization effects of the quantization method according to the embodiment of the present invention and the related art. The original network is AlexNet networks, and the corresponding Top-1 classification accuracy is 60.1%, and Top-5 classification accuracy is 81.9%. Lines 3-7 are in turn the top-1 classification accuracy and top-5 classification accuracy corresponding to the Deep Compression (DC) method 8/5bit quantization, the weight-based quantization (weighted-Entropy-based Quantization, WEBQ) method 5bit quantization, the incremental network quantization (INCREMENTAL NETWORK QUANTIZATION, INQ) method 5bit quantization, the 4bit quantization of the method and the 5bit quantization of the method. In the quantization process of the neural network, the smaller the bit number is, the smaller the calculated amount is, but the information loss is increased, and the classification accuracy is reduced. It can be seen that, when the quantization is 5 bits, the classification accuracy of the quantization method of the embodiment of the invention is higher, and even when the quantization is 4 bits with further reduced calculation amount, the classification accuracy is still higher than that when the quantization is 5 bits of the other three methods. Therefore, the method can save the original weight information as much as possible, and reduce the information loss caused by quantization.
TABLE 2
[1]Song Han,Huizi Mao,William Dally.Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding.In ICLR,2016.
[2]Eunhyeok Park,Junwhan Ahn,and Sungjoo Yoo.Weighted-Entropy-based Quantization for Deep Neural Networks.In CVPR,2017.
[3]Aojun Zhou,et al.Incremental network quantization:Towards lossless cnns with low-precision weights.In ICLR,2017.
Referring to fig. 2, in some embodiments, step S12 includes:
step S122: the minimum euclidean distance is used for the original weights of each layer of network to initialize the quantization weights.
In this way, initializing quantization weights can be achieved. Specifically, each layer of network has an original weight, and the quantization weight can be initialized by minimizing the euclidean distance between the original weight and the quantization weight, and an initial quantization shared weight value, a quantization weight index allocation threshold value and an allocation index are obtained.
Further, the calculation formula may be:
Where x i is the input value of the batch b= { x 1...m } before entering the activation function. Mu B is the average value of the current input values, Is the variance of the input value and the average value. Gamma and beta are parameters that can be adjusted in training, epsilon being the minimum term introduced to avoid variance of 0.
Referring to fig. 3, in some embodiments, step S12 includes:
Step S123: logarithmic quantization is used on the original weights of each layer of network to initialize the quantized weights.
In this way, initializing quantization weights can be achieved. Specifically, logarithmic quantization is used to initialize quantization weights, that is, the original weights of each layer network are converted into an exponential multiple of 2, and then the quantization weights after initialization are calculated by a formula.
Further, the calculation formula may be:
Wherein x i is the original weight, For quantization weights after initialization, the Quantize function may convert the logarithmic result to an integer, i.e., the quantization weights after initialization are integers.
Referring to fig. 4 and 5, in some embodiments, the quantization method includes:
step S124: batch normalization is added to each layer of network to reduce internal covariance variation.
Therefore, the variance variation in the internal covariance of the neural network can be reduced, and the training speed of the deep neural network can be increased. It can be understood that the normalization processing is performed before the data of each layer of network is input, and then each layer of network is input after the normalization processing is completed. In the embodiment shown in fig. 4, where the original weights of each layer of network are used to minimize euclidean distance to initialize the quantization weights, the batch normalization may include a γ parameter and a β parameter, and by adjusting the γ parameter and the β parameter, the accuracy of the batch normalization may be adjusted, thereby reducing the internal covariance variance inside the neural network.
Referring to fig. 6, in some embodiments, step S16 includes:
Step S162: fixing the shared weight value, and adjusting the weight distribution index;
Step S164: and (5) fixing the weight allocation index and adjusting the sharing weight value.
In this way, the objective function is circularly solved, and the shared weight value and the weight distribution index corresponding to the included angle between the minimized quantization weight and the original weight are obtained. In one example, the objective function is solved by using a maximum expectation algorithm (EM) solution method, which includes two steps of: maintaining the shared weight value unchanged, changing a threshold value by using a greedy algorithm, adjusting a weight distribution index, and determining the weight distribution index when the objective function reaches the maximum value; the weight distribution index is kept unchanged, a global gradient descent method is used for adjusting the shared weight value, and the shared weight value when the objective function reaches the maximum value is determined, wherein the global gradient descent method can comprise batch gradient descent, random gradient descent and small batch gradient descent. In this way, a shared weight value and a weight allocation index corresponding to an included angle between the minimized quantization weight and the original weight can be obtained.
Referring to fig. 7, in some embodiments, step S18 includes:
Step S182: and distributing corresponding shared weight values to each layer of original weights in turn according to the weight distribution indexes so as to obtain quantization weights.
In this way, the final quantization weights can be obtained. It can be understood that the weight allocation index includes a correspondence between the address of each original weight and the shared weight value in each layer of network, and allocates the shared weight value to the address of each original weight in turn according to the correspondence, so as to obtain the final quantization weight.
Referring to fig. 8, a quantization apparatus 10 of a neural network according to an embodiment of the present invention includes an initialization module 12, a setting module 14, a solving module 16, and an allocation module 18. The initialization module 12 is configured to initialize quantization weights with original weights. The setting module 14 is configured to set an objective function, where the objective function includes an included angle between the quantization weight and the original weight, a shared weight value, and a weight allocation index. The solving module 16 is configured to solve the objective function to minimize an included angle between the quantized weight and the original weight, and obtain a shared weight value and a weight allocation index. The allocation module 18 is configured to obtain quantization weights according to the shared weight values and the weight allocation indexes.
The quantization device 10 of the neural network optimizes the quantization problem of the neural network based on the vector direction, and minimizes the included angle between the quantization weight and the original weight, so that the quantization weight can save the original weight information as much as possible, thereby reducing the information loss caused by quantization.
It should be noted that the above explanation of the embodiment and advantageous effects of the quantization method for a neural network is also applicable to the quantization apparatus 10 for a neural network of the present embodiment and the server and computer readable storage medium of the following embodiments, and is not developed in detail herein to avoid redundancy.
Referring to fig. 9, a server 100 according to an embodiment of the present invention includes a memory 102 and a processor 104. The memory 102 stores a computer program, and the processor 104 is configured to execute the program to implement the method for quantifying a neural network according to any of the above embodiments.
The server 100 optimizes the quantization problem of the neural network based on the vector direction, and minimizes the included angle between the quantization weight and the original weight, so that the quantization weight can store the original weight information as much as possible, thereby reducing the information loss caused by quantization.
The computer-readable storage medium according to an embodiment of the present invention stores a computer program that, when executed by a processor, implements the steps of the neural network quantization method according to any of the above embodiments.
For example, in the case where the program is executed by a processor, the steps of the following control method are implemented:
step S12: initializing quantization weights by using the original weights;
Step S14: setting an objective function, wherein the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index;
step S16: solving an objective function to minimize an included angle between the quantized weight and the original weight and obtain a shared weight value and a weight distribution index;
Step S18: and obtaining quantization weights according to the shared weight values and the weight distribution indexes.
Specifically, the computer readable storage medium may be provided in a vehicle or may be provided in a server, and the vehicle may communicate with the server to acquire a corresponding program. Vehicles include, but are not limited to, electric-only vehicles, hybrid electric vehicles, range-extending electric vehicles, fuel-fired vehicles, and the like.
It is understood that the computer program comprises computer program code. The computer program code may be in the form of source code, object code, executable files, or in some intermediate form, among others. The computer readable storage medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a software distribution medium, and so forth. The processor may refer to the processor 104 contained in the server 100. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf programmable gate array (field-programmable GATE ARRAY, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, system that includes a processing module, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of embodiments of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
Furthermore, functional units in various embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (9)

1. A quantization method of a neural network for image processing, the quantization method comprising:
Initializing quantization weights by using the original weights;
Setting an objective function, wherein the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index;
Solving the objective function to minimize the included angle between the quantization weight and the original weight, and obtaining the shared weight value and the weight distribution index;
Obtaining the quantization weight according to the shared weight value and the weight distribution index;
and the included angle between the quantization weight and the original weight is minimum, and the information loss degree caused by carrying out bit quantization according to the quantization weight is minimum.
2. The quantization method of a neural network according to claim 1, wherein initializing quantization weights with original weights comprises:
the quantization weights are initialized using a minimized euclidean distance for the original weights for each layer of network.
3. The quantization method of a neural network according to claim 1, wherein initializing quantization weights with original weights comprises:
logarithmic quantization is used on the original weights of each layer of network to initialize the quantized weights.
4. A method of quantifying a neural network according to claim 2 or 3, comprising:
Batch normalization is added to each layer of network to reduce internal covariance variation.
5. The quantization method of a neural network according to claim 1, wherein an angle between the quantization weight and the original weight is expressed as an inner product of an i-th kernel function vector and quantization weight vector in the original weight divided by lengths of the i-th kernel function vector and quantization weight vector.
6. The method of quantization of a neural network of claim 1, wherein solving the objective function comprises:
Fixing the shared weight value and adjusting the weight distribution index;
and fixing the weight distribution index and adjusting the sharing weight value.
7. The quantization method of a neural network according to claim 1, wherein obtaining the quantization weight from the shared weight value and the weight distribution index comprises:
and distributing corresponding shared weight values to each layer of original weights in turn according to the weight distribution indexes so as to obtain the quantization weights.
8. A quantization apparatus of a neural network for image processing, the quantization apparatus comprising:
the initialization module is used for initializing the quantization weights by using the original weights;
the setting module is used for setting an objective function, and the objective function comprises an included angle between the quantized weight and the original weight, a shared weight value and a weight distribution index;
The solving module is used for solving the objective function to minimize the included angle between the quantized weight and the original weight and obtain the shared weight value and the weight distribution index;
The distribution module is used for obtaining the quantization weight according to the shared weight value and the weight distribution index;
and the included angle between the quantization weight and the original weight is minimum, and the information loss degree caused by carrying out bit quantization according to the quantization weight is minimum.
9. A server comprising a memory storing a computer program and a processor for executing the program to implement the method of quantifying a neural network according to any of claims 1-7.
CN202010934398.1A 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network Active CN112115825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010934398.1A CN112115825B (en) 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010934398.1A CN112115825B (en) 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network

Publications (2)

Publication Number Publication Date
CN112115825A CN112115825A (en) 2020-12-22
CN112115825B true CN112115825B (en) 2024-04-19

Family

ID=73803300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010934398.1A Active CN112115825B (en) 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network

Country Status (1)

Country Link
CN (1) CN112115825B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19734735A1 (en) * 1997-08-11 1999-02-18 Peter Prof Dr Lory Neural network for learning vector quantisation
CN110472725A (en) * 2019-07-04 2019-11-19 北京航空航天大学 A kind of balance binaryzation neural network quantization method and system
CN110909667A (en) * 2019-11-20 2020-03-24 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
CN110969251A (en) * 2019-11-28 2020-04-07 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
CN111260724A (en) * 2020-01-07 2020-06-09 王伟佳 Example segmentation method based on periodic B spline

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200082269A1 (en) * 2018-09-12 2020-03-12 Nvidia Corporation Memory efficient neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19734735A1 (en) * 1997-08-11 1999-02-18 Peter Prof Dr Lory Neural network for learning vector quantisation
CN110472725A (en) * 2019-07-04 2019-11-19 北京航空航天大学 A kind of balance binaryzation neural network quantization method and system
CN110909667A (en) * 2019-11-20 2020-03-24 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
CN110969251A (en) * 2019-11-28 2020-04-07 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
CN111260724A (en) * 2020-01-07 2020-06-09 王伟佳 Example segmentation method based on periodic B spline

Also Published As

Publication number Publication date
CN112115825A (en) 2020-12-22

Similar Documents

Publication Publication Date Title
TWI791610B (en) Method and apparatus for quantizing artificial neural network and floating-point neural network
Wang et al. Training deep neural networks with 8-bit floating point numbers
US20200218982A1 (en) Dithered quantization of parameters during training with a machine learning tool
CN110880038A (en) System for accelerating convolution calculation based on FPGA and convolution neural network
CN109800865B (en) Neural network generation and image processing method and device, platform and electronic equipment
US11704556B2 (en) Optimization methods for quantization of neural network models
KR20180013674A (en) Method for lightening neural network and recognition method and apparatus using the same
EP4008057B1 (en) Lossless exponent and lossy mantissa weight compression for training deep neural networks
US20200111501A1 (en) Audio signal encoding method and device, and audio signal decoding method and device
CN112740233A (en) Network quantization method, inference method, and network quantization device
Liu et al. Layer importance estimation with imprinting for neural network quantization
US11704555B2 (en) Batch normalization layer fusion and quantization method for model inference in AI neural network engine
CN112085175B (en) Data processing method and device based on neural network calculation
KR102462910B1 (en) Method and apparatus of quantization for weights of batch normalization layer
CN112115825B (en) Quantification method, device, server and storage medium of neural network
CN111027684A (en) Deep learning model quantification method and device, electronic equipment and storage medium
KR20190130443A (en) Method and apparatus for quantization of neural network
US20220261641A1 (en) Conversion device, conversion method, program, and information recording medium
CN114492778A (en) Operation method of neural network model, readable medium and electronic device
CN113177627B (en) Optimization system, retraining system, method thereof, processor and readable medium
CN116472538A (en) Method and system for quantifying neural networks
Zhou et al. Low-precision CNN model quantization based on optimal scaling factor estimation
CN110633722A (en) Artificial neural network adjusting method and device
US20240160695A1 (en) Approximating activation function in neural network with look-up table having hybrid architecture
JP4944924B2 (en) Signal quantization apparatus, method, program, recording medium thereof, and signal quantization system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant