CN114781604B - Coding method of neural network weight parameters, coder and neural network processor - Google Patents
Coding method of neural network weight parameters, coder and neural network processor Download PDFInfo
- Publication number
- CN114781604B CN114781604B CN202210385708.8A CN202210385708A CN114781604B CN 114781604 B CN114781604 B CN 114781604B CN 202210385708 A CN202210385708 A CN 202210385708A CN 114781604 B CN114781604 B CN 114781604B
- Authority
- CN
- China
- Prior art keywords
- sub
- vector
- vectors
- prototype
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 296
- 230000006870 function Effects 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims 1
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a coding method of a neural network weight parameter, a coder and a neural network processor, wherein the coding method comprises the following steps: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set. By implementing the invention, the data volume of the weight parameters of the original neural network can be reduced.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method for encoding a neural network weight parameter, an encoder, and a neural network processor.
Background
Deep Neural Networks (DNNs) represent an important breakthrough in the field of machine learning (Meaching Learning, ML), improving the optimal performance of most machine learning tasks by introducing DNNs. The significant recognition effect of DNN is at the cost of enormous model computation and memory, taking CNN as an example, a typical CNN model (YOLOV 3) for target detection tasks requires up to 320 hundred million Floating-point Operations (FLOPs) and model parameters above 60MB, which makes them difficult to deploy into embedded devices with limited hardware resources and energy budget. Many special purpose processor chips (AI chips) designed for artificial intelligence application tasks are also emerging. The module within the AI chip that is specifically responsible for implementing AI operations and AI applications is called a Neural Network Processor (NPU). Since the speed of the NPU accessing the memory cannot keep pace with the speed of the computing unit consuming data, the computing unit cannot be fully utilized even if the computing unit is increased, i.e. a memory wall problem is formed. One of the directions of solving this problem is to reduce the amount of data to access the memory, and to reduce the amount of data to access the memory, it is necessary to reduce the amount of data of the neural network weight parameter, so how to reduce the amount of data of the neural network weight parameter is a problem to be solved.
Disclosure of Invention
The embodiment of the invention provides a coding method, a coder and a neural network processor for a neural network weight parameter, which can reduce the data volume of the neural network weight parameter.
An embodiment of the present invention provides a method for encoding a neural network weight parameter, including: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups;
dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors;
and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.
Further, the dividing the parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector, which specifically includes:
generating a one-dimensional vector according to parameters in the parameter set;
initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;
initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;
iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
Further, the grouping of the plurality of original neural network weight parameters specifically includes:
and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
Further, the method further comprises the following steps: a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector.
On the basis of the above method embodiments, another embodiment of the present invention provides an encoder, where the encoder is configured to obtain a plurality of original neural network weight parameters, and group the original neural network weight parameters to obtain a plurality of parameter sets;
dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors;
and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.
Further, the encoder divides parameters in the parameter set into a plurality of sub-vectors, learns and clusters the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector, and the method specifically comprises the following steps:
generating a one-dimensional vector according to parameters in the parameter set;
initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;
initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;
iteratively adjusting the initial values of the number of sub-vectors, the initial prototype vectors, the index values and the initial scaling factors to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
Further, the encoder groups a plurality of original neural network weight parameters, and specifically includes:
and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
Further, the encoder is further configured to generate a prototype vector lookup table according to each prototype vector and an index value of each prototype vector.
On the basis of the above method embodiments, another embodiment of the present invention provides a neural network processor, where the decoder is configured to obtain each encoded value encoded by the encoding method of the neural network weight parameter;
determining an index value and a scaling factor of a prototype vector corresponding to each code value according to each code value;
extracting prototype vectors corresponding to the coding values according to the index values of the coding values and the prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a coding method, a coder and a neural network processor of a neural network weight parameter, wherein the coding method firstly acquires all original neural network weight parameters, groups all the neural network weight parameters and acquires a plurality of parameter groups; then, dividing parameters in each parameter group into a plurality of sub-vectors, and clustering the sub-vectors to obtain a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector; and finally, taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter group. The coding method of the neural network weight parameter disclosed by the invention is characterized in that the sub-vectors are clustered to generate prototype vectors of one class of sub-vectors, so that a plurality of sub-vectors of the same class in the parameter group can be characterized by one prototype vector and respective corresponding scaling factors, the data volume is greatly reduced, therefore, after the sub-vectors in each parameter group are coded by adopting the index value of the prototype vector and the scaling factors as the coding values of the sub-vectors, the original neural network weight parameter can be characterized by each coding value, and the coded data volume is obviously compressed relative to the original data volume, so that the data volume of the original neural network weight parameter is greatly reduced.
Drawings
Fig. 1 is a flowchart of a method for encoding a neural network weight parameter according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a prototype vector look-up table according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating the effect of encoding a parameter set according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a neural network processor according to an embodiment of the present invention for performing neural network computation.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a method for encoding a neural network weight parameter, which at least includes the following steps:
step S101: and acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups.
Step S102: dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector.
And step 103, taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter group.
For step S101, in a preferred embodiment, the grouping of the several original neural network weight parameters specifically includes: and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
In the present invention, the obtained weight parameters of each original neural network are grouped, and typically, the parameters of one convolution kernel are divided into a group, or may be grouped in other manners. Each group of parameters after grouping is independently encoded, and the same grouping rule is used in decoding.
For step S102, in a preferred embodiment, the dividing the parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector specifically includes:
generating a one-dimensional vector according to parameters in the parameter set;
initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;
initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;
iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
Specifically, after the weight parameters of each original neural network are grouped to obtain a plurality of parameter sets, each parameter set is processed one by one according to the following steps:
1. expanding parameters in a parameter group into one-dimensional vectors, wherein the one-dimensional vectors are represented by A, the dimension of A is 1*L, and the A is divided into C sub-vectors, and the length of each sub-vector isEach subvector is called Vi, i ε [0, C-1]]. Wherein, the C value is in the value range [1:L ]]In the training process, when the iteration adjustment is carried out on C later, the C value is within the value range [1:L ]]Internal traversal, wherein the traversal rule is as follows: starting from L (i.e. the number of initial subvectors mentioned above is L), the equal step size decreases to not less than 1. The step size is reduced to c step, which is a preset empirical value.
2. Initializing N prototype vectors Ej (i.e., the above-described initial prototype vectors), j ε [0, N-1]]And N<The length of C, ej is equal to D, and the length of Vi. Prototype vector is a function g (Ej), j ε [0, N-1]]And (3) representing. Each Vi is fitted with one of N prototype vectors, then a can be fitted with a combination of C prototype vectors (C>N, prototype vector reuse), C prototype vector index Ki, i.e. [0, C-1]]Expressed, where Ki ε [1:N ]]. Obtaining a preliminary fitting mode: vi≡f (Ki, g (Ej)), where f (Ki, g (Ej)) refers to a prototype vector of index Ki in the set of prototype vectors. Prototype vector indexing requirement(bit) bits. The data type of prototype vector Ej is 8-bit shaped. Wherein the method comprises the steps ofN, ej, ki are learnable parameters.
3. To make the fitting more flexible, a scaling factor Si, i E [0, C-1 is set]Each Vi corresponds to one Si, and the fitting mode is changed to vi≡si×f (Ki, g (Ej)), so as to obtain the initial fitting model. The data type of the scaling factor is integer, expressed by N2 (bit) bit, and the number of the removable values is 2 N2 N2 is a preset empirical value, and Si is a learnable parameter.
4. Iteratively adjusting C, N, ej, ki and Si values, and performing iterative training on the initial fitting model vi≡si×f (Ki, g (Ej)) until the value loss_c of the following loss function is minimum:
loss_c is composed of two parts, the first half is the square value of the vector A fitting error, and the second half constrains the N value. Where β is an empirical value.
Recording corresponding output after training is completed, wherein the output comprises: the number N of prototype vectors; n prototype vectors Ej, j E [0, N-1]; index value Ki, i epsilon [0, C-1] of prototype vector corresponding to each Vi; the scaling factor Si, i E [0, C-1] for each Vi. The sub-vector of each parameter group is divided, and a prototype vector corresponding to each sub-vector, an index value of the prototype vector and a scaling factor are generated.
It should be noted that, during iterative adjustment, the value of C is first determined according to the aforementioned traversal rule of C, then the fitting model is trained, the training is aimed at minimizing loss_c under the value of C, and then the corresponding output and loss_c value under the value of C are recorded when loss_c is minimized; and then, re-selecting a C value according to the traversing rule of the C value, training, recording the output when the loss_c is minimum under the updated C value and the loss_c value, repeating the operation until the C is traversed in the value range [1:L ], comparing all the loss_c values, and taking the output when the loss_c value is minimum as the final output.
For step S103, after generating the prototype vector corresponding to the sub-vector of each parameter set, the index value of the prototype vector, and the scaling factor by the method in step S102, the index value of the prototype vector corresponding to the sub-vector and the scaling factor are used as the encoding values of the sub-vectors, and the sub-vectors in each parameter set are encoded, so as to obtain the encoded neural network weight parameter. Through coding, parameters in each parameter set are replaced by a prototype vector number index value and a scaling factor obtained through training.
In a preferred embodiment, further comprising: a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector. After all parameter sets are learned, storing the prototype vector lookup table, the coded index value and the scaling factor in a network model parameter file, wherein the coded index value and the scaling factor are used as the coded neural network weight parameters.
Illustratively, a corresponding prototype-vector look-up table of a parameter set is shown in FIG. 2, where the left-hand portion of the figure represents the index value of each prototype vector and the right-hand portion of the figure represents each prototype vector. The effect of the parameters in the corresponding parameter set after coding is schematically shown in fig. 3, the parameter set is shown to be divided into 10 (c=10) sub-vectors, the corresponding prototype vector set is 8 (n=8) prototype vectors in fig. 2, the lower half of the values are corresponding coding values, and the coding values include an index value Ki and a scaling factor Si. Taking the first sub-vector as an example, the sub-vector may be represented by a prototype vector having a code value of "010" and a scaling factor having a code value of "11".
The method for encoding the neural network weight parameters according to the above embodiment is applicable to be executed in an encoder, so that the present invention correspondingly provides an encoder based on the above embodiments of the method, where the encoder is configured to obtain a plurality of original neural network weight parameters, and group the original neural network weight parameters to obtain a plurality of parameter sets; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.
In a preferred embodiment, the encoder divides parameters in the parameter set into a plurality of sub-vectors, learns and clusters the sub-vectors, and generates a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector, which specifically includes:
generating a one-dimensional vector according to parameters in the parameter set; initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors; initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor; iteratively adjusting the initial values of the number of sub-vectors, the initial prototype vectors, the index values and the initial scaling factors to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
In a preferred embodiment, the encoder groups several original neural network weight parameters, including in particular: and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
In a preferred embodiment, the encoder is further configured to generate a prototype vector look-up table from each prototype vector and the index value of each prototype vector.
On the basis of the embodiment of the method, the invention correspondingly provides a neural network processor, wherein the decoder is used for acquiring each coded value coded by the coding method of the neural network weight parameter and determining the index value and the scaling factor of the prototype vector corresponding to each coded value according to each coded value; extracting prototype vectors corresponding to the coding values according to the index values of the coding values and the prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors.
Specifically, as shown in fig. 4, after the original neural network weight parameters are encoded by the method, encoded neural network weight parameters and a prototype vector lookup table are generated, and then the encoded neural network weight parameters and the prototype vector lookup table are stored in a memory, and the process is finished offline through an encoder.
In the online reasoning process of the network, a decoder in a neural network processor reads part or all of the coded neural network weight parameters and corresponding prototype vector lookup tables from a memory, searches each prototype vector according to an index value sequence, scales prototype vector values based on scaling factor parameters to obtain each decoded sub-vector, namely recovered neural network weight parameters, and then the neural network processor performs neural network calculation according to the recovered neural network weight parameters.
The weight parameter data quantity affecting the bandwidth is changed from the original neural network weight parameter before encoding into the encoded neural network weight parameter and the prototype lookup table, and the efficiency of the corresponding parameter compression is calculated as follows:
raw neural network weight parameter data volume:
Q1=L*8(bit);
the data quantity of the weight parameters of the encoded neural network:
prototype vector look-up table data amount:
parameter compression ratio:
the super parameters are adjusted according to specific application scenes, and the compression rate and the precision can be balanced after training.
By implementing the embodiment of the invention, the data volume of the weight parameters of the original neural network can be reduced, so that the data volume of the memory accessed by the neural network processor can be obviously reduced when the subsequent neural network processor performs the neural network calculation, thereby relieving the bandwidth pressure, improving the overall performance of the system and relieving the problem of 'memory wall'.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.
Claims (1)
1. A neural network processor, comprising a decoder; the decoder is used for acquiring each encoded value encoded by the encoding method of the neural network weight parameter;
determining an index value and a scaling factor of a prototype vector corresponding to each code value according to each code value;
extracting prototype vectors corresponding to the coding values according to the index values of the coding values and a prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors;
the coding method of the neural network weight parameters comprises the following steps: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; taking an index value of a prototype vector corresponding to the sub-vector and a scaling factor as coding values of the sub-vector, and coding the sub-vector in each parameter group;
dividing parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector, wherein the method specifically comprises the following steps: generating a one-dimensional vector according to parameters in the parameter set; initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors; initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor; iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of a preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, an index value of each prototype vector and a scaling factor corresponding to each sub-vector; the preset loss function specifically comprises the following steps:
where loss_c is the value of the loss function of C, C is the number of sub-vectors, vi is the value of the sub-vector, si is the value of the scaling factor, f (Ki, g (Ej)) is the prototype vector value with index value Ki, N is the number of prototype vectors, and β is the empirical value;
the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor are iteratively adjusted to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and a prototype vector corresponding to each sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in the parameter set are generated, specifically:
repeating iterative training until C is traversed in a value range [1:L ], comparing all the loss_c values, and taking the output when the loss_c value is minimum as the final output;
during each iterative training process: before the calculation of the preset loss function is carried out, calculating the value of the number C of the sub-vectors; c is traversed in a value range [1:L ], and the traversing rule is as follows: starting from L (i.e. the number of initial subvectors mentioned above is L), the equal step size decreases, not less than 1; decreasing the step length to be c_step, wherein the step length is a preset empirical value; after determining C, calculating a minimum loss_c corresponding to C, and taking all initial prototype vectors, all index values and all initial scaling factors corresponding to the C and the minimum loss_c as output;
the grouping of the several original neural network weight parameters specifically includes: dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which the weight parameters of each original neural network belong;
a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210385708.8A CN114781604B (en) | 2022-04-13 | 2022-04-13 | Coding method of neural network weight parameters, coder and neural network processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210385708.8A CN114781604B (en) | 2022-04-13 | 2022-04-13 | Coding method of neural network weight parameters, coder and neural network processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114781604A CN114781604A (en) | 2022-07-22 |
CN114781604B true CN114781604B (en) | 2024-02-20 |
Family
ID=82429865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210385708.8A Active CN114781604B (en) | 2022-04-13 | 2022-04-13 | Coding method of neural network weight parameters, coder and neural network processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114781604B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017031630A1 (en) * | 2015-08-21 | 2017-03-02 | 中国科学院自动化研究所 | Deep convolutional neural network acceleration and compression method based on parameter quantification |
WO2019155064A1 (en) * | 2018-02-09 | 2019-08-15 | Deepmind Technologies Limited | Data compression using jointly trained encoder, decoder, and prior neural networks |
CN111105035A (en) * | 2019-12-24 | 2020-05-05 | 西安电子科技大学 | Neural network pruning method based on combination of sparse learning and genetic algorithm |
EP3716158A2 (en) * | 2019-03-25 | 2020-09-30 | Nokia Technologies Oy | Compressing weight updates for decoder-side neural networks |
CN112381205A (en) * | 2020-09-29 | 2021-02-19 | 北京清微智能科技有限公司 | Neural network low bit quantization method |
KR20210131894A (en) * | 2020-04-24 | 2021-11-03 | (주)인시그널 | Apparatus and method for compressing trained deep neural networks |
CN113610227A (en) * | 2021-07-23 | 2021-11-05 | 人工智能与数字经济广东省实验室(广州) | Efficient deep convolutional neural network pruning method |
CN113657415A (en) * | 2021-10-21 | 2021-11-16 | 西安交通大学城市学院 | Object detection method oriented to schematic diagram |
CN113748605A (en) * | 2019-03-18 | 2021-12-03 | 弗劳恩霍夫应用研究促进协会 | Method and apparatus for compressing parameters of neural network |
CN114118347A (en) * | 2020-08-28 | 2022-03-01 | 辉达公司 | Fine-grained per-vector scaling for neural network quantization |
CN114175056A (en) * | 2019-07-02 | 2022-03-11 | Vid拓展公司 | Cluster-based quantization for neural network compression |
CN114341882A (en) * | 2019-09-03 | 2022-04-12 | 微软技术许可有限责任公司 | Lossless exponent and lossy mantissa weight compression for training deep neural networks |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180107926A1 (en) * | 2016-10-19 | 2018-04-19 | Samsung Electronics Co., Ltd. | Method and apparatus for neural network quantization |
US20210073643A1 (en) * | 2019-09-05 | 2021-03-11 | Vahid PARTOVI NIA | Neural network pruning |
-
2022
- 2022-04-13 CN CN202210385708.8A patent/CN114781604B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017031630A1 (en) * | 2015-08-21 | 2017-03-02 | 中国科学院自动化研究所 | Deep convolutional neural network acceleration and compression method based on parameter quantification |
WO2019155064A1 (en) * | 2018-02-09 | 2019-08-15 | Deepmind Technologies Limited | Data compression using jointly trained encoder, decoder, and prior neural networks |
CN113748605A (en) * | 2019-03-18 | 2021-12-03 | 弗劳恩霍夫应用研究促进协会 | Method and apparatus for compressing parameters of neural network |
EP3716158A2 (en) * | 2019-03-25 | 2020-09-30 | Nokia Technologies Oy | Compressing weight updates for decoder-side neural networks |
CN114175056A (en) * | 2019-07-02 | 2022-03-11 | Vid拓展公司 | Cluster-based quantization for neural network compression |
CN114341882A (en) * | 2019-09-03 | 2022-04-12 | 微软技术许可有限责任公司 | Lossless exponent and lossy mantissa weight compression for training deep neural networks |
CN111105035A (en) * | 2019-12-24 | 2020-05-05 | 西安电子科技大学 | Neural network pruning method based on combination of sparse learning and genetic algorithm |
KR20210131894A (en) * | 2020-04-24 | 2021-11-03 | (주)인시그널 | Apparatus and method for compressing trained deep neural networks |
CN114118347A (en) * | 2020-08-28 | 2022-03-01 | 辉达公司 | Fine-grained per-vector scaling for neural network quantization |
CN112381205A (en) * | 2020-09-29 | 2021-02-19 | 北京清微智能科技有限公司 | Neural network low bit quantization method |
CN113610227A (en) * | 2021-07-23 | 2021-11-05 | 人工智能与数字经济广东省实验室(广州) | Efficient deep convolutional neural network pruning method |
CN113657415A (en) * | 2021-10-21 | 2021-11-16 | 西安交通大学城市学院 | Object detection method oriented to schematic diagram |
Also Published As
Publication number | Publication date |
---|---|
CN114781604A (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11403528B2 (en) | Self-tuning incremental model compression solution in deep neural network with guaranteed accuracy performance | |
CN109859281B (en) | Compression coding method of sparse neural network | |
CN109635935B (en) | Model adaptive quantization method of deep convolutional neural network based on modular length clustering | |
CN111147862B (en) | End-to-end image compression method based on target coding | |
CN111078911A (en) | Unsupervised hashing method based on self-encoder | |
KR20220007853A (en) | Method and apparatus for compressing parameters of a neural network | |
Gu et al. | Compression of human motion capture data using motion pattern indexing | |
Ozan et al. | K-subspaces quantization for approximate nearest neighbor search | |
Vereshchagin et al. | Kolmogorov's structure functions with an application to the foundations of model selection | |
CN114781604B (en) | Coding method of neural network weight parameters, coder and neural network processor | |
CN116743182B (en) | Lossless data compression method | |
CN113467949A (en) | Gradient compression method for distributed DNN training in edge computing environment | |
CN101467459A (en) | Restrained vector quantization | |
Li et al. | Online variable coding length product quantization for fast nearest neighbor search in mobile retrieval | |
Sitaram et al. | Efficient codebooks for vector quantization image compression with an adaptive tree search algorithm | |
CN116318172A (en) | Design simulation software data self-adaptive compression method | |
CN114925658B (en) | Open text generation method and storage medium | |
CN113761834A (en) | Method, device and storage medium for acquiring word vector of natural language processing model | |
Cao et al. | A fast search algorithm for vector quantization using a directed graph | |
KR20110033154A (en) | Method for counting vectors in regular point networks | |
CN112464014A (en) | Unsupervised Hash industrial cloth texture picture retrieval method based on graph convolution | |
Kim et al. | Towards Accurate Low Bit DNNs with Filter-wise Quantization | |
Subia-Waud et al. | Weight fixing networks | |
KR20210113356A (en) | Data processing devices, data processing systems and data processing methods | |
CN113096673B (en) | Voice processing method and system based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |