CN112115825A - Neural network quantification method, device, server and storage medium - Google Patents

Neural network quantification method, device, server and storage medium Download PDF

Info

Publication number
CN112115825A
CN112115825A CN202010934398.1A CN202010934398A CN112115825A CN 112115825 A CN112115825 A CN 112115825A CN 202010934398 A CN202010934398 A CN 202010934398A CN 112115825 A CN112115825 A CN 112115825A
Authority
CN
China
Prior art keywords
weight
quantization
original
neural network
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010934398.1A
Other languages
Chinese (zh)
Other versions
CN112115825B (en
Inventor
李品逸
蔡志文
陈腊梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Original Assignee
Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Xiaopeng Autopilot Technology Co Ltd filed Critical Guangzhou Xiaopeng Autopilot Technology Co Ltd
Priority to CN202010934398.1A priority Critical patent/CN112115825B/en
Publication of CN112115825A publication Critical patent/CN112115825A/en
Application granted granted Critical
Publication of CN112115825B publication Critical patent/CN112115825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a neural network quantification method, a neural network quantification device, a server and a storage medium. The quantization method of the neural network comprises the following steps: initializing a quantization weight by using an original weight; setting an objective function, wherein the objective function comprises an included angle between the quantization weight and the original weight, a shared weight value and a weight distribution index; solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtaining the shared weight value and the weight distribution index; and obtaining the quantization weight according to the sharing weight value and the weight distribution index. The quantization method of the neural network is a method for optimizing the quantization problem of the neural network based on the vector direction, and the quantization weight can store the original weight information as much as possible by minimizing the included angle between the quantization weight and the original weight, so that the information loss caused by quantization is reduced.

Description

Neural network quantification method, device, server and storage medium
Technical Field
The present invention relates to the field of deep neural network technology, and in particular, to a quantization method, apparatus, server and storage medium for a neural network.
Background
In recent years, the development of the field of automatic driving is greatly promoted by a deep neural network, so that people gradually become possible in the past decades. However, the application of the deep neural network to vehicle-mounted intelligent hardware with limited computing resources is limited due to the huge calculation amount required by the deep neural network. To address this bottleneck, many research efforts have been directed to reducing the computational overhead of deep neural networks by quantizing parameters in deep neural network operations, converting floating-point parameters to fixed-point parameters and shortening their bit widths. However, how to reduce the information loss of quantization in the quantization process becomes a technical problem to be solved.
Disclosure of Invention
The embodiment of the invention provides a quantization method, a quantization device, a server and a storage medium of a neural network.
The quantization method of the neural network of the embodiment of the invention comprises the following steps:
initializing a quantization weight by using an original weight;
setting an objective function, wherein the objective function comprises an included angle between the quantization weight and the original weight, a shared weight value and a weight distribution index;
solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtaining the shared weight value and the weight distribution index;
and obtaining the quantization weight according to the sharing weight value and the weight distribution index.
In some embodiments, initializing the quantization weights with the original weights includes:
the quantized weights are initialized using the minimized euclidean distance for the original weights for each layer of the network.
In some embodiments, initializing the quantization weights with the original weights includes:
logarithmic quantization is used on the original weights of each layer network to initialize the quantization weights.
In some embodiments, the quantization method comprises:
batch normalization is added in each layer network to reduce the internal covariance variation.
In some embodiments, the angle between the quantization weight and the original weight is expressed as an inner product of an ith kernel function vector and a quantization weight vector in the original weight divided by the length of the ith kernel function vector and the quantization weight vector.
In some embodiments, solving the objective function comprises:
fixing the sharing weight value, and adjusting the weight distribution index;
and fixing the weight distribution index, and adjusting the sharing weight value.
In some embodiments, obtaining the quantization weight according to the shared weight value and the weight assignment index includes:
and distributing corresponding shared weight values to the original weights of each layer in sequence according to the weight distribution indexes to obtain the quantization weights.
The quantization device of the neural network according to the embodiment of the present invention includes:
an initialization module to initialize the quantization weights with the original weights;
the setting module is used for setting an objective function, and the objective function comprises an included angle between the quantization weight and the original weight, a shared weight value and a weight distribution index;
a solving module for solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtain the shared weight value and the weight distribution index;
an allocation module for obtaining the quantization weight according to the sharing weight value and the weight allocation index.
The server according to an embodiment of the present invention includes a memory storing a computer program and a processor executing the program to implement the quantization method of the neural network according to any one of the above embodiments.
The computer readable storage medium of the embodiments of the present invention stores thereon a computer program that, when executed by a processor, implements the steps of the quantization method of the neural network of any of the above embodiments.
The neural network quantization method, the device, the server and the storage medium optimize the neural network quantization problem based on the vector direction, and enable the quantization weight to store the original weight information as much as possible by minimizing the included angle between the quantization weight and the original weight, thereby reducing the information loss caused by quantization.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1-7 are flow diagrams of a method of quantifying neural networks in accordance with an embodiment of the present invention;
FIG. 8 is a block diagram of a quantization apparatus of a neural network according to an embodiment of the present invention;
fig. 9 is a block diagram of a server according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
Referring to fig. 1, a method for quantizing a neural network according to an embodiment of the present invention includes:
step S12: initializing a quantization weight by using an original weight;
step S14: setting an objective function, wherein the objective function comprises an included angle between a quantization weight and an original weight, a shared weight value and a weight distribution index;
step S16: solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtaining a shared weight value and a weight distribution index;
step S18: and obtaining the quantization weight according to the sharing weight value and the weight distribution index.
The quantization method of the neural network is a method for optimizing the quantization problem of the neural network based on the vector direction, and the quantization weight can store the original weight information as much as possible by minimizing the included angle between the quantization weight and the original weight, so that the information loss caused by quantization is reduced.
In the related art, the deep neural network needs huge calculation amount, and the requirement of the calculation overhead of the deep neural network can be reduced by quantizing parameters in the deep neural network operation. Meanwhile, in order to avoid information loss caused by the quantization process, the euclidean distance (L2-distance) between the quantization weight and the original deep neural network weight is mainly minimized to obtain the quantization weight with less information loss. In order to accelerate the training speed of the deep neural network, Batch Normalization (BN) is usually added to each layer of the network to reduce the internal covariance variation, and the specific calculation formula is as follows:
Figure BDA0002671419890000031
wherein x isiFor batch B ═ x1...mThe input value before entering the activation function. Mu.sBIs the average of the current input values,
Figure BDA0002671419890000032
is the variance of the input value and the mean value. γ and β are parameters that can be adjusted in training, which are extremely small terms introduced to avoid variance of 0. However, at the input value xiThe Euclidean distance of N times, the mean and variance will beCorrespondingly, the N times of the normalization is enlarged, and the result is not changed after batch normalization.
That is, in the related art, the length information of the input vector does not affect the result, and the euclidean distance between the quantization weight and the original deep neural network weight is minimized, so that the information loss caused by the quantization process cannot be effectively avoided.
The quantization method of the neural network in the embodiment of the invention optimizes the quantization weight by minimizing the included angle between the quantization weight and the original weight by utilizing the characteristic that the information of the input vector mainly exists in the direction of the input vector, so that the original weight information can be stored as much as possible by the quantization weight, and the information loss caused by quantization is reduced.
It is understood that the neural network includes an input layer, an output layer, and a plurality of hidden layers between the input layer and the output layer. The input layer, the output layer, and the hidden layer may each include a plurality of neurons. The input layer, the output layer and the hidden layer are all connected, that is, in any two adjacent layers, any neuron positioned in one layer is necessarily connected with any neuron positioned in the other layer, and an original weight exists between every two connected neurons. By setting and solving the objective function, the included angle between the quantization weight and the original weight is minimized to obtain the quantization weight, and then the neural network is calculated by using the quantization weight, so that a more accurate result can be obtained, and the information loss caused by quantization is reduced.
Specifically, in some embodiments, the angle between the quantization weight and the original weight is expressed as the inner product of the ith kernel function vector and the quantization weight vector in the original weight divided by the length of the ith kernel function vector and the quantization weight vector. Thus, the size of the included angle between the quantization weight and the original weight can be represented by cosine similarity. Further, an angle between the quantization weight and the original weight is minimized, that is, a cosine similarity between the quantization weight and the original weight is maximized. Thus, the objective function may be set to:
Figure BDA0002671419890000041
wherein the content of the first and second substances,
Figure BDA0002671419890000042
representing the shared weight value after quantization of layer IlThe weight is assigned to the index on behalf of,
Figure BDA0002671419890000043
representing the ith kernel function vector in the original weights,
Figure BDA0002671419890000044
representing the corresponding quantized weight vector, the included angle between the quantized weight and the original weight can be expressed as
Figure BDA0002671419890000045
And
Figure BDA0002671419890000046
is divided by
Figure BDA0002671419890000047
And
Figure BDA0002671419890000048
length of (d). When the objective function value is maximum, the cosine similarity between the quantization weight and the original weight is maximum, the included angle between the corresponding quantization weight and the original weight is minimum, the shared weight value and the weight distribution index at the moment are used as the criterion to quantize the weight, the length influence of the quantization weight vector can be completely eliminated, the quantization weight vector which is most similar to the original weight vector in the direction is found, namely the characteristic direction extracted by the original weight is kept, and the information loss caused by quantization is reduced.
In one example, a large image classification dataset ImageNet is selected for the experiment, wherein the ImageNet dataset is a large color image dataset comprising 1000 classes of objects, and comprises one million and twenty thousand training pictures and fifty thousand verification pictures. The size of the pictures is not uniform due to the ImageNet dataset. To normalize the scale size of the pictures, the short edge dimensions of all the training pictures and the verification pictures are first scaled to 256 to facilitate subsequent processing. In the process of training or fine tuning the network, all images are randomly cropped to 224 x 224 and then randomly horizontally flipped to be fed into the network, except that no additional method is used for data augmentation. When testing the representation on the verification set, only a single picture with the size of 224 multiplied by 224 is cut from the center of the picture of the verification set and sent to a network for verification, and the classification accuracy of the first ranking (Top-1) and the fifth ranking (Top-5) is calculated.
Please refer to Table 1, Table 1 shows the classification accuracy of Top-1 and Top-5 in ImageNet verification by Alexnet network, quantified by different digits. It can be seen that, when 6-bit quantization is performed by using the vector angle, the Top-1 classification accuracy is 59.4%, the Top-5 classification accuracy is 81.3%, and in the 32-bit original network, the Top-1 classification accuracy is 60.1%, and the Top-5 classification accuracy is 81.9%, i.e. the 6-bit quantization by using the vector angle can achieve the performance similar to that of the original 32-bit network.
TABLE 1
Bit number Top-1 Accuracy Top-5 Accuracy
4 54.9% 78.2%
5 57.9% 80.1%
6 59.4% 81.3%
32 60.1% 81.9%
Please refer to table 2, wherein table 2 is a table comparing the quantization effect of the quantization method according to the embodiment of the present invention and related technologies. Wherein, the original network is an AlexNet network, and the corresponding Top-1 classification accuracy is 60.1%, and the Top-5 classification accuracy is 81.9%. Lines 3-7 are sequentially Depth Compression (DC) method 8/5bit Quantization, weighting-based Quantization (WEBQ) method 5bit Quantization, Incremental Network Quantization (INQ) method 5bit Quantization, method 4bit Quantization and method 5bit Quantization corresponding to Top-1 classification accuracy and Top-5 classification accuracy. In the quantization process of the neural network, the smaller the value of the digit is, the less the calculation amount is, but the information loss is increased, and the classification accuracy is reduced. It can be seen that, also when 5-bit quantization is performed, the classification accuracy of the quantization method according to the embodiment of the present invention is higher, and even when 4-bit quantization is performed to further reduce the amount of calculation, the classification accuracy is still higher than that of the other three methods when 5-bit quantization is performed. Therefore, the method can save the original weight information as much as possible and reduce the information loss caused by quantization.
TABLE 2
Figure BDA0002671419890000051
[1]Song Han,Huizi Mao,William Dally.Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding.In ICLR,2016.
[2]Eunhyeok Park,Junwhan Ahn,and Sungjoo Yoo.Weighted-Entropy-based Quantization for Deep Neural Networks.In CVPR,2017.
[3]Aojun Zhou,et al.Incremental network quantization:Towards lossless cnns with low-precision weights.In ICLR,2017.
Referring to fig. 2, in some embodiments, step S12 includes:
step S122: the minimum euclidean distance is used for the original weights for each layer of the network to initialize the quantized weights.
In this manner, initializing quantization weights may be achieved. Specifically, each layer network has an original weight, and the quantization weight can be initialized by minimizing the euclidean distance between the original weight and the quantization weight, and an initial quantization shared weight value, a threshold value for quantization weight index allocation and an allocation index are obtained.
Further, the calculation formula may be:
Figure BDA0002671419890000061
wherein x isiFor batch B ═ x1...mThe input value before entering the activation function. Mu.sBIs the average of the current input values,
Figure BDA0002671419890000062
is the variance of the input value and the mean value. γ and β are parameters that can be adjusted in training, which are extremely small terms introduced to avoid variance of 0.
Referring to fig. 3, in some embodiments, step S12 includes:
step S123: logarithmic quantization is used for the original weights of each layer of the network to initialize the quantization weights.
In this manner, initializing quantization weights may be achieved. Specifically, logarithmic quantization is used to initialize the quantization weights, that is, the original weights of each layer network are converted into exponential multiples of 2, and then the quantization weights after initialization are calculated by a formula.
Further, the calculation formula may be:
Figure BDA0002671419890000063
wherein x isiIn order to be the original weight, the weight is,
Figure BDA0002671419890000064
for the quantization weights after initialization, the Quantize function may convert the logarithmic result into integers, i.e., the quantization weights after initialization are integers.
Referring to fig. 4 and 5, in some embodiments, the quantization method includes:
step S124: batch normalization is added in each layer network to reduce the internal covariance variation.
Therefore, the covariance variation in the neural network can be reduced, and the training speed of the deep neural network is accelerated. It can be understood that before the data of each layer network is input, normalization processing is performed, and then the data is input into each layer network after the normalization processing is completed. In the embodiment shown in fig. 4, the minimum euclidean distance is used for the original weights of each layer network to initialize the quantization weights, and the batch normalization may include a gamma parameter and a beta parameter, and the accuracy of the batch normalization may be adjusted by adjusting the gamma parameter and the beta parameter, so as to reduce the covariance variation inside the neural network.
Referring to fig. 6, in some embodiments, step S16 includes:
step S162: fixing a sharing weight value, and adjusting a weight distribution index;
step S164: and fixing the weight distribution index and adjusting the sharing weight value.
Thus, the objective function is solved circularly, and the shared weight value and the weight distribution index corresponding to the included angle between the minimized quantization weight and the original weight are obtained. In one example, with reference to the solving method of the maximum Expectation algorithm (EM), the solving objective function is divided into two steps and the objective function is solved circularly: keeping the shared weight value unchanged, changing a threshold value by using a greedy algorithm, adjusting the weight distribution index, and determining the weight distribution index when the objective function reaches the maximum value; keeping the weight distribution index unchanged, adjusting the shared weight value by using a global gradient descent method, and determining the shared weight value when the target function reaches the maximum value, wherein the global gradient descent method can comprise batch gradient descent, random gradient descent and small batch gradient descent. In this way, a shared weight value and a weight assignment index corresponding to an angle between the minimized quantization weight and the original weight can be obtained.
Referring to fig. 7, in some embodiments, step S18 includes:
step S182: and distributing corresponding shared weight values to each layer of original weights in sequence according to the weight distribution indexes to obtain the quantization weights.
In this way, the final quantization weight can be obtained. It can be understood that the weight assignment index includes a correspondence between an address of each original weight in each layer network and a shared weight value, and the shared weight value is assigned to the address of each original weight in sequence according to the correspondence, so as to obtain a final quantization weight.
Referring to fig. 8, the quantization apparatus 10 of the neural network according to the embodiment of the present invention includes an initialization module 12, a setting module 14, a solving module 16, and an allocation module 18. The initialization module 12 is used to initialize the quantization weights with the original weights. The setting module 14 is configured to set an objective function, where the objective function includes an included angle between the quantization weight and the original weight, a shared weight value, and a weight assignment index. The solving module 16 is configured to solve the objective function to minimize an included angle between the quantization weight and the original weight, and obtain a shared weight value and a weight distribution index. The allocating module 18 is configured to obtain a quantization weight according to the shared weight value and the weight allocation index.
The quantization device 10 of the neural network optimizes the quantization problem of the neural network based on the vector direction, and minimizes the included angle between the quantization weight and the original weight, so that the quantization weight can store the original weight information as much as possible, thereby reducing the information loss caused by quantization.
It should be noted that the above explanation of the embodiment and the advantageous effects of the quantization method of the neural network is also applicable to the quantization apparatus 10 of the neural network of the present embodiment and the server and the computer-readable storage medium of the following embodiments, and is not detailed here to avoid redundancy.
Referring to fig. 9, a server 100 according to an embodiment of the invention includes a memory 102 and a processor 104. The memory 102 stores a computer program, and the processor 104 is used for executing the program to implement the quantization method of the neural network of any one of the above embodiments.
The server 100 optimizes the neural network quantization problem based on the vector direction, and minimizes the included angle between the quantization weight and the original weight, so that the quantization weight can store the original weight information as much as possible, thereby reducing information loss caused by quantization.
The computer readable storage medium of the embodiments of the present invention stores thereon a computer program, which, when executed by a processor, implements the steps of the quantization method of the neural network of any of the above embodiments.
For example, in the case where the program is executed by a processor, the steps of the following control method are implemented:
step S12: initializing a quantization weight by using an original weight;
step S14: setting an objective function, wherein the objective function comprises an included angle between a quantization weight and an original weight, a shared weight value and a weight distribution index;
step S16: solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtaining a shared weight value and a weight distribution index;
step S18: and obtaining the quantization weight according to the sharing weight value and the weight distribution index.
Specifically, the computer-readable storage medium may be provided in a vehicle or a server, and the vehicle may communicate with the server to obtain the corresponding program. Vehicles include, but are not limited to, electric vehicles, hybrid electric vehicles, extended range electric vehicles, fuel vehicles, and the like.
It will be appreciated that the computer program comprises computer program code. The computer program code may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), software distribution medium, and the like. The processor may refer to the processor 104 included in the server 100. The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.
In the description herein, references to the description of the terms "one embodiment," "some embodiments," "an illustrative embodiment," "an example," "a specific example" or "some examples" or the like mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processing module-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of embodiments of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method of quantifying a neural network, comprising:
initializing a quantization weight by using an original weight;
setting an objective function, wherein the objective function comprises an included angle between the quantization weight and the original weight, a shared weight value and a weight distribution index;
solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtaining the shared weight value and the weight distribution index;
and obtaining the quantization weight according to the sharing weight value and the weight distribution index.
2. The quantization method of the neural network of claim 1, wherein initializing quantization weights with original weights comprises:
the quantized weights are initialized using the minimized euclidean distance for the original weights for each layer of the network.
3. The quantization method of the neural network of claim 1, wherein initializing quantization weights with original weights comprises:
logarithmic quantization is used on the original weights of each layer network to initialize the quantization weights.
4. The neural network quantization method of claim 2 or 3, wherein the quantization method comprises:
batch normalization is added in each layer network to reduce the internal covariance variation.
5. The quantization method of the neural network according to claim 1, wherein an angle between the quantization weight and the original weight is represented as an inner product of an ith kernel function vector and a quantization weight vector in the original weight divided by lengths of the ith kernel function vector and the quantization weight vector.
6. The method of claim 1, wherein solving the objective function comprises:
fixing the sharing weight value, and adjusting the weight distribution index;
and fixing the weight distribution index, and adjusting the sharing weight value.
7. The method of claim 1, wherein obtaining the quantization weight according to the shared weight value and the weight assignment index comprises:
and distributing corresponding shared weight values to the original weights of each layer in sequence according to the weight distribution indexes to obtain the quantization weights.
8. An apparatus for quantizing a neural network, comprising:
an initialization module to initialize the quantization weights with the original weights;
the setting module is used for setting an objective function, and the objective function comprises an included angle between the quantization weight and the original weight, a shared weight value and a weight distribution index;
a solving module for solving the objective function to minimize an included angle between the quantization weight and the original weight, and obtain the shared weight value and the weight distribution index;
an allocation module for obtaining the quantization weight according to the sharing weight value and the weight allocation index.
9. A server, characterized by comprising a memory storing a computer program and a processor for executing the program to implement the neural network quantization method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of quantifying a neural network according to any one of claims 1 to 7.
CN202010934398.1A 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network Active CN112115825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010934398.1A CN112115825B (en) 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010934398.1A CN112115825B (en) 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network

Publications (2)

Publication Number Publication Date
CN112115825A true CN112115825A (en) 2020-12-22
CN112115825B CN112115825B (en) 2024-04-19

Family

ID=73803300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010934398.1A Active CN112115825B (en) 2020-09-08 2020-09-08 Quantification method, device, server and storage medium of neural network

Country Status (1)

Country Link
CN (1) CN112115825B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19734735A1 (en) * 1997-08-11 1999-02-18 Peter Prof Dr Lory Neural network for learning vector quantisation
CN110472725A (en) * 2019-07-04 2019-11-19 北京航空航天大学 A kind of balance binaryzation neural network quantization method and system
US20200082269A1 (en) * 2018-09-12 2020-03-12 Nvidia Corporation Memory efficient neural networks
CN110909667A (en) * 2019-11-20 2020-03-24 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
CN110969251A (en) * 2019-11-28 2020-04-07 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
CN111260724A (en) * 2020-01-07 2020-06-09 王伟佳 Example segmentation method based on periodic B spline

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19734735A1 (en) * 1997-08-11 1999-02-18 Peter Prof Dr Lory Neural network for learning vector quantisation
US20200082269A1 (en) * 2018-09-12 2020-03-12 Nvidia Corporation Memory efficient neural networks
CN110472725A (en) * 2019-07-04 2019-11-19 北京航空航天大学 A kind of balance binaryzation neural network quantization method and system
CN110909667A (en) * 2019-11-20 2020-03-24 北京化工大学 Lightweight design method for multi-angle SAR target recognition network
CN110969251A (en) * 2019-11-28 2020-04-07 中国科学院自动化研究所 Neural network model quantification method and device based on label-free data
CN111260724A (en) * 2020-01-07 2020-06-09 王伟佳 Example segmentation method based on periodic B spline

Also Published As

Publication number Publication date
CN112115825B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
Wang et al. Training deep neural networks with 8-bit floating point numbers
US20200394523A1 (en) Neural Network Quantization Parameter Determination Method and Related Products
US20210286688A1 (en) Neural Network Quantization Parameter Determination Method and Related Products
CN110413255B (en) Artificial neural network adjusting method and device
CN109002889B (en) Adaptive iterative convolution neural network model compression method
WO2020142223A1 (en) Dithered quantization of parameters during training with a machine learning tool
CN109800865B (en) Neural network generation and image processing method and device, platform and electronic equipment
US10872295B1 (en) Residual quantization of bit-shift weights in an artificial neural network
US11704556B2 (en) Optimization methods for quantization of neural network models
US6594392B2 (en) Pattern recognition based on piecewise linear probability density function
EP4008057B1 (en) Lossless exponent and lossy mantissa weight compression for training deep neural networks
WO2021135715A1 (en) Image compression method and apparatus
CN110874627A (en) Data processing method, data processing apparatus, and computer readable medium
CN112085175B (en) Data processing method and device based on neural network calculation
CN111027684A (en) Deep learning model quantification method and device, electronic equipment and storage medium
CN114444686A (en) Method and device for quantizing model parameters of convolutional neural network and related device
CN112115825B (en) Quantification method, device, server and storage medium of neural network
CN112183726A (en) Neural network full-quantization method and system
CN115081542B (en) Subspace clustering method, terminal equipment and computer readable storage medium
Nicodemo et al. Memory requirement reduction of deep neural networks for field programmable gate arrays using low-bit quantization of parameters
CN115829056A (en) Deployment method and system of machine learning model and readable storage medium
CN114492778A (en) Operation method of neural network model, readable medium and electronic device
Lu et al. A very compact embedded CNN processor design based on logarithmic computing
Madadum et al. A resource-efficient convolutional neural network accelerator using fine-grained logarithmic quantization
CN113902114A (en) Quantization method, device and system of neural network model, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant