CN114781604B - Coding method of neural network weight parameters, coder and neural network processor - Google Patents

Coding method of neural network weight parameters, coder and neural network processor Download PDF

Info

Publication number
CN114781604B
CN114781604B CN202210385708.8A CN202210385708A CN114781604B CN 114781604 B CN114781604 B CN 114781604B CN 202210385708 A CN202210385708 A CN 202210385708A CN 114781604 B CN114781604 B CN 114781604B
Authority
CN
China
Prior art keywords
sub
vector
vectors
prototype
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210385708.8A
Other languages
Chinese (zh)
Other versions
CN114781604A (en
Inventor
王彦飞
濮亚男
胡胜发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Ankai Microelectronics Co ltd
Original Assignee
Guangzhou Ankai Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Ankai Microelectronics Co ltd filed Critical Guangzhou Ankai Microelectronics Co ltd
Priority to CN202210385708.8A priority Critical patent/CN114781604B/en
Publication of CN114781604A publication Critical patent/CN114781604A/en
Application granted granted Critical
Publication of CN114781604B publication Critical patent/CN114781604B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a coding method of a neural network weight parameter, a coder and a neural network processor, wherein the coding method comprises the following steps: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set. By implementing the invention, the data volume of the weight parameters of the original neural network can be reduced.

Description

Coding method of neural network weight parameters, coder and neural network processor
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method for encoding a neural network weight parameter, an encoder, and a neural network processor.
Background
Deep Neural Networks (DNNs) represent an important breakthrough in the field of machine learning (Meaching Learning, ML), improving the optimal performance of most machine learning tasks by introducing DNNs. The significant recognition effect of DNN is at the cost of enormous model computation and memory, taking CNN as an example, a typical CNN model (YOLOV 3) for target detection tasks requires up to 320 hundred million Floating-point Operations (FLOPs) and model parameters above 60MB, which makes them difficult to deploy into embedded devices with limited hardware resources and energy budget. Many special purpose processor chips (AI chips) designed for artificial intelligence application tasks are also emerging. The module within the AI chip that is specifically responsible for implementing AI operations and AI applications is called a Neural Network Processor (NPU). Since the speed of the NPU accessing the memory cannot keep pace with the speed of the computing unit consuming data, the computing unit cannot be fully utilized even if the computing unit is increased, i.e. a memory wall problem is formed. One of the directions of solving this problem is to reduce the amount of data to access the memory, and to reduce the amount of data to access the memory, it is necessary to reduce the amount of data of the neural network weight parameter, so how to reduce the amount of data of the neural network weight parameter is a problem to be solved.
Disclosure of Invention
The embodiment of the invention provides a coding method, a coder and a neural network processor for a neural network weight parameter, which can reduce the data volume of the neural network weight parameter.
An embodiment of the present invention provides a method for encoding a neural network weight parameter, including: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups;
dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors;
and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.
Further, the dividing the parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector, which specifically includes:
generating a one-dimensional vector according to parameters in the parameter set;
initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;
initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;
iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
Further, the grouping of the plurality of original neural network weight parameters specifically includes:
and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
Further, the method further comprises the following steps: a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector.
On the basis of the above method embodiments, another embodiment of the present invention provides an encoder, where the encoder is configured to obtain a plurality of original neural network weight parameters, and group the original neural network weight parameters to obtain a plurality of parameter sets;
dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors;
and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.
Further, the encoder divides parameters in the parameter set into a plurality of sub-vectors, learns and clusters the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector, and the method specifically comprises the following steps:
generating a one-dimensional vector according to parameters in the parameter set;
initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;
initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;
iteratively adjusting the initial values of the number of sub-vectors, the initial prototype vectors, the index values and the initial scaling factors to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
Further, the encoder groups a plurality of original neural network weight parameters, and specifically includes:
and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
Further, the encoder is further configured to generate a prototype vector lookup table according to each prototype vector and an index value of each prototype vector.
On the basis of the above method embodiments, another embodiment of the present invention provides a neural network processor, where the decoder is configured to obtain each encoded value encoded by the encoding method of the neural network weight parameter;
determining an index value and a scaling factor of a prototype vector corresponding to each code value according to each code value;
extracting prototype vectors corresponding to the coding values according to the index values of the coding values and the prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a coding method, a coder and a neural network processor of a neural network weight parameter, wherein the coding method firstly acquires all original neural network weight parameters, groups all the neural network weight parameters and acquires a plurality of parameter groups; then, dividing parameters in each parameter group into a plurality of sub-vectors, and clustering the sub-vectors to obtain a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector; and finally, taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter group. The coding method of the neural network weight parameter disclosed by the invention is characterized in that the sub-vectors are clustered to generate prototype vectors of one class of sub-vectors, so that a plurality of sub-vectors of the same class in the parameter group can be characterized by one prototype vector and respective corresponding scaling factors, the data volume is greatly reduced, therefore, after the sub-vectors in each parameter group are coded by adopting the index value of the prototype vector and the scaling factors as the coding values of the sub-vectors, the original neural network weight parameter can be characterized by each coding value, and the coded data volume is obviously compressed relative to the original data volume, so that the data volume of the original neural network weight parameter is greatly reduced.
Drawings
Fig. 1 is a flowchart of a method for encoding a neural network weight parameter according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a prototype vector look-up table according to an embodiment of the present invention.
FIG. 3 is a diagram illustrating the effect of encoding a parameter set according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a neural network processor according to an embodiment of the present invention for performing neural network computation.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
As shown in fig. 1, an embodiment of the present invention provides a method for encoding a neural network weight parameter, which at least includes the following steps:
step S101: and acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups.
Step S102: dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector.
And step 103, taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter group.
For step S101, in a preferred embodiment, the grouping of the several original neural network weight parameters specifically includes: and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
In the present invention, the obtained weight parameters of each original neural network are grouped, and typically, the parameters of one convolution kernel are divided into a group, or may be grouped in other manners. Each group of parameters after grouping is independently encoded, and the same grouping rule is used in decoding.
For step S102, in a preferred embodiment, the dividing the parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector specifically includes:
generating a one-dimensional vector according to parameters in the parameter set;
initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;
initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;
iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
Specifically, after the weight parameters of each original neural network are grouped to obtain a plurality of parameter sets, each parameter set is processed one by one according to the following steps:
1. expanding parameters in a parameter group into one-dimensional vectors, wherein the one-dimensional vectors are represented by A, the dimension of A is 1*L, and the A is divided into C sub-vectors, and the length of each sub-vector isEach subvector is called Vi, i ε [0, C-1]]. Wherein, the C value is in the value range [1:L ]]In the training process, when the iteration adjustment is carried out on C later, the C value is within the value range [1:L ]]Internal traversal, wherein the traversal rule is as follows: starting from L (i.e. the number of initial subvectors mentioned above is L), the equal step size decreases to not less than 1. The step size is reduced to c step, which is a preset empirical value.
2. Initializing N prototype vectors Ej (i.e., the above-described initial prototype vectors), j ε [0, N-1]]And N<The length of C, ej is equal to D, and the length of Vi. Prototype vector is a function g (Ej), j ε [0, N-1]]And (3) representing. Each Vi is fitted with one of N prototype vectors, then a can be fitted with a combination of C prototype vectors (C>N, prototype vector reuse), C prototype vector index Ki, i.e. [0, C-1]]Expressed, where Ki ε [1:N ]]. Obtaining a preliminary fitting mode: vi≡f (Ki, g (Ej)), where f (Ki, g (Ej)) refers to a prototype vector of index Ki in the set of prototype vectors. Prototype vector indexing requirement(bit) bits. The data type of prototype vector Ej is 8-bit shaped. Wherein the method comprises the steps ofN, ej, ki are learnable parameters.
3. To make the fitting more flexible, a scaling factor Si, i E [0, C-1 is set]Each Vi corresponds to one Si, and the fitting mode is changed to vi≡si×f (Ki, g (Ej)), so as to obtain the initial fitting model. The data type of the scaling factor is integer, expressed by N2 (bit) bit, and the number of the removable values is 2 N2 N2 is a preset empirical value, and Si is a learnable parameter.
4. Iteratively adjusting C, N, ej, ki and Si values, and performing iterative training on the initial fitting model vi≡si×f (Ki, g (Ej)) until the value loss_c of the following loss function is minimum:
loss_c is composed of two parts, the first half is the square value of the vector A fitting error, and the second half constrains the N value. Where β is an empirical value.
Recording corresponding output after training is completed, wherein the output comprises: the number N of prototype vectors; n prototype vectors Ej, j E [0, N-1]; index value Ki, i epsilon [0, C-1] of prototype vector corresponding to each Vi; the scaling factor Si, i E [0, C-1] for each Vi. The sub-vector of each parameter group is divided, and a prototype vector corresponding to each sub-vector, an index value of the prototype vector and a scaling factor are generated.
It should be noted that, during iterative adjustment, the value of C is first determined according to the aforementioned traversal rule of C, then the fitting model is trained, the training is aimed at minimizing loss_c under the value of C, and then the corresponding output and loss_c value under the value of C are recorded when loss_c is minimized; and then, re-selecting a C value according to the traversing rule of the C value, training, recording the output when the loss_c is minimum under the updated C value and the loss_c value, repeating the operation until the C is traversed in the value range [1:L ], comparing all the loss_c values, and taking the output when the loss_c value is minimum as the final output.
For step S103, after generating the prototype vector corresponding to the sub-vector of each parameter set, the index value of the prototype vector, and the scaling factor by the method in step S102, the index value of the prototype vector corresponding to the sub-vector and the scaling factor are used as the encoding values of the sub-vectors, and the sub-vectors in each parameter set are encoded, so as to obtain the encoded neural network weight parameter. Through coding, parameters in each parameter set are replaced by a prototype vector number index value and a scaling factor obtained through training.
In a preferred embodiment, further comprising: a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector. After all parameter sets are learned, storing the prototype vector lookup table, the coded index value and the scaling factor in a network model parameter file, wherein the coded index value and the scaling factor are used as the coded neural network weight parameters.
Illustratively, a corresponding prototype-vector look-up table of a parameter set is shown in FIG. 2, where the left-hand portion of the figure represents the index value of each prototype vector and the right-hand portion of the figure represents each prototype vector. The effect of the parameters in the corresponding parameter set after coding is schematically shown in fig. 3, the parameter set is shown to be divided into 10 (c=10) sub-vectors, the corresponding prototype vector set is 8 (n=8) prototype vectors in fig. 2, the lower half of the values are corresponding coding values, and the coding values include an index value Ki and a scaling factor Si. Taking the first sub-vector as an example, the sub-vector may be represented by a prototype vector having a code value of "010" and a scaling factor having a code value of "11".
The method for encoding the neural network weight parameters according to the above embodiment is applicable to be executed in an encoder, so that the present invention correspondingly provides an encoder based on the above embodiments of the method, where the encoder is configured to obtain a plurality of original neural network weight parameters, and group the original neural network weight parameters to obtain a plurality of parameter sets; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.
In a preferred embodiment, the encoder divides parameters in the parameter set into a plurality of sub-vectors, learns and clusters the sub-vectors, and generates a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector, which specifically includes:
generating a one-dimensional vector according to parameters in the parameter set; initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors; initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor; iteratively adjusting the initial values of the number of sub-vectors, the initial prototype vectors, the index values and the initial scaling factors to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.
In a preferred embodiment, the encoder groups several original neural network weight parameters, including in particular: and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.
In a preferred embodiment, the encoder is further configured to generate a prototype vector look-up table from each prototype vector and the index value of each prototype vector.
On the basis of the embodiment of the method, the invention correspondingly provides a neural network processor, wherein the decoder is used for acquiring each coded value coded by the coding method of the neural network weight parameter and determining the index value and the scaling factor of the prototype vector corresponding to each coded value according to each coded value; extracting prototype vectors corresponding to the coding values according to the index values of the coding values and the prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors.
Specifically, as shown in fig. 4, after the original neural network weight parameters are encoded by the method, encoded neural network weight parameters and a prototype vector lookup table are generated, and then the encoded neural network weight parameters and the prototype vector lookup table are stored in a memory, and the process is finished offline through an encoder.
In the online reasoning process of the network, a decoder in a neural network processor reads part or all of the coded neural network weight parameters and corresponding prototype vector lookup tables from a memory, searches each prototype vector according to an index value sequence, scales prototype vector values based on scaling factor parameters to obtain each decoded sub-vector, namely recovered neural network weight parameters, and then the neural network processor performs neural network calculation according to the recovered neural network weight parameters.
The weight parameter data quantity affecting the bandwidth is changed from the original neural network weight parameter before encoding into the encoded neural network weight parameter and the prototype lookup table, and the efficiency of the corresponding parameter compression is calculated as follows:
raw neural network weight parameter data volume:
Q1=L*8(bit);
the data quantity of the weight parameters of the encoded neural network:
prototype vector look-up table data amount:
parameter compression ratio:
the super parameters are adjusted according to specific application scenes, and the compression rate and the precision can be balanced after training.
By implementing the embodiment of the invention, the data volume of the weight parameters of the original neural network can be reduced, so that the data volume of the memory accessed by the neural network processor can be obviously reduced when the subsequent neural network processor performs the neural network calculation, thereby relieving the bandwidth pressure, improving the overall performance of the system and relieving the problem of 'memory wall'.
While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims (1)

1. A neural network processor, comprising a decoder; the decoder is used for acquiring each encoded value encoded by the encoding method of the neural network weight parameter;
determining an index value and a scaling factor of a prototype vector corresponding to each code value according to each code value;
extracting prototype vectors corresponding to the coding values according to the index values of the coding values and a prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors;
the coding method of the neural network weight parameters comprises the following steps: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; taking an index value of a prototype vector corresponding to the sub-vector and a scaling factor as coding values of the sub-vector, and coding the sub-vector in each parameter group;
dividing parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector, wherein the method specifically comprises the following steps: generating a one-dimensional vector according to parameters in the parameter set; initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors; initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor; iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of a preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, an index value of each prototype vector and a scaling factor corresponding to each sub-vector; the preset loss function specifically comprises the following steps:
where loss_c is the value of the loss function of C, C is the number of sub-vectors, vi is the value of the sub-vector, si is the value of the scaling factor, f (Ki, g (Ej)) is the prototype vector value with index value Ki, N is the number of prototype vectors, and β is the empirical value;
the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor are iteratively adjusted to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and a prototype vector corresponding to each sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in the parameter set are generated, specifically:
repeating iterative training until C is traversed in a value range [1:L ], comparing all the loss_c values, and taking the output when the loss_c value is minimum as the final output;
during each iterative training process: before the calculation of the preset loss function is carried out, calculating the value of the number C of the sub-vectors; c is traversed in a value range [1:L ], and the traversing rule is as follows: starting from L (i.e. the number of initial subvectors mentioned above is L), the equal step size decreases, not less than 1; decreasing the step length to be c_step, wherein the step length is a preset empirical value; after determining C, calculating a minimum loss_c corresponding to C, and taking all initial prototype vectors, all index values and all initial scaling factors corresponding to the C and the minimum loss_c as output;
the grouping of the several original neural network weight parameters specifically includes: dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which the weight parameters of each original neural network belong;
a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector.
CN202210385708.8A 2022-04-13 2022-04-13 Coding method of neural network weight parameters, coder and neural network processor Active CN114781604B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210385708.8A CN114781604B (en) 2022-04-13 2022-04-13 Coding method of neural network weight parameters, coder and neural network processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210385708.8A CN114781604B (en) 2022-04-13 2022-04-13 Coding method of neural network weight parameters, coder and neural network processor

Publications (2)

Publication Number Publication Date
CN114781604A CN114781604A (en) 2022-07-22
CN114781604B true CN114781604B (en) 2024-02-20

Family

ID=82429865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210385708.8A Active CN114781604B (en) 2022-04-13 2022-04-13 Coding method of neural network weight parameters, coder and neural network processor

Country Status (1)

Country Link
CN (1) CN114781604B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017031630A1 (en) * 2015-08-21 2017-03-02 中国科学院自动化研究所 Deep convolutional neural network acceleration and compression method based on parameter quantification
WO2019155064A1 (en) * 2018-02-09 2019-08-15 Deepmind Technologies Limited Data compression using jointly trained encoder, decoder, and prior neural networks
CN111105035A (en) * 2019-12-24 2020-05-05 西安电子科技大学 Neural network pruning method based on combination of sparse learning and genetic algorithm
EP3716158A2 (en) * 2019-03-25 2020-09-30 Nokia Technologies Oy Compressing weight updates for decoder-side neural networks
CN112381205A (en) * 2020-09-29 2021-02-19 北京清微智能科技有限公司 Neural network low bit quantization method
KR20210131894A (en) * 2020-04-24 2021-11-03 (주)인시그널 Apparatus and method for compressing trained deep neural networks
CN113610227A (en) * 2021-07-23 2021-11-05 人工智能与数字经济广东省实验室(广州) Efficient deep convolutional neural network pruning method
CN113657415A (en) * 2021-10-21 2021-11-16 西安交通大学城市学院 Object detection method oriented to schematic diagram
CN113748605A (en) * 2019-03-18 2021-12-03 弗劳恩霍夫应用研究促进协会 Method and apparatus for compressing parameters of neural network
CN114118347A (en) * 2020-08-28 2022-03-01 辉达公司 Fine-grained per-vector scaling for neural network quantization
CN114175056A (en) * 2019-07-02 2022-03-11 Vid拓展公司 Cluster-based quantization for neural network compression
CN114341882A (en) * 2019-09-03 2022-04-12 微软技术许可有限责任公司 Lossless exponent and lossy mantissa weight compression for training deep neural networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107926A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
US20210073643A1 (en) * 2019-09-05 2021-03-11 Vahid PARTOVI NIA Neural network pruning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017031630A1 (en) * 2015-08-21 2017-03-02 中国科学院自动化研究所 Deep convolutional neural network acceleration and compression method based on parameter quantification
WO2019155064A1 (en) * 2018-02-09 2019-08-15 Deepmind Technologies Limited Data compression using jointly trained encoder, decoder, and prior neural networks
CN113748605A (en) * 2019-03-18 2021-12-03 弗劳恩霍夫应用研究促进协会 Method and apparatus for compressing parameters of neural network
EP3716158A2 (en) * 2019-03-25 2020-09-30 Nokia Technologies Oy Compressing weight updates for decoder-side neural networks
CN114175056A (en) * 2019-07-02 2022-03-11 Vid拓展公司 Cluster-based quantization for neural network compression
CN114341882A (en) * 2019-09-03 2022-04-12 微软技术许可有限责任公司 Lossless exponent and lossy mantissa weight compression for training deep neural networks
CN111105035A (en) * 2019-12-24 2020-05-05 西安电子科技大学 Neural network pruning method based on combination of sparse learning and genetic algorithm
KR20210131894A (en) * 2020-04-24 2021-11-03 (주)인시그널 Apparatus and method for compressing trained deep neural networks
CN114118347A (en) * 2020-08-28 2022-03-01 辉达公司 Fine-grained per-vector scaling for neural network quantization
CN112381205A (en) * 2020-09-29 2021-02-19 北京清微智能科技有限公司 Neural network low bit quantization method
CN113610227A (en) * 2021-07-23 2021-11-05 人工智能与数字经济广东省实验室(广州) Efficient deep convolutional neural network pruning method
CN113657415A (en) * 2021-10-21 2021-11-16 西安交通大学城市学院 Object detection method oriented to schematic diagram

Also Published As

Publication number Publication date
CN114781604A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
US11403528B2 (en) Self-tuning incremental model compression solution in deep neural network with guaranteed accuracy performance
CN109859281B (en) Compression coding method of sparse neural network
CN109635935B (en) Model adaptive quantization method of deep convolutional neural network based on modular length clustering
CN111147862B (en) End-to-end image compression method based on target coding
CN111078911A (en) Unsupervised hashing method based on self-encoder
KR20220007853A (en) Method and apparatus for compressing parameters of a neural network
Gu et al. Compression of human motion capture data using motion pattern indexing
Ozan et al. K-subspaces quantization for approximate nearest neighbor search
Vereshchagin et al. Kolmogorov's structure functions with an application to the foundations of model selection
CN114781604B (en) Coding method of neural network weight parameters, coder and neural network processor
CN116743182B (en) Lossless data compression method
CN113467949A (en) Gradient compression method for distributed DNN training in edge computing environment
CN101467459A (en) Restrained vector quantization
Li et al. Online variable coding length product quantization for fast nearest neighbor search in mobile retrieval
Sitaram et al. Efficient codebooks for vector quantization image compression with an adaptive tree search algorithm
CN116318172A (en) Design simulation software data self-adaptive compression method
CN114925658B (en) Open text generation method and storage medium
CN113761834A (en) Method, device and storage medium for acquiring word vector of natural language processing model
Cao et al. A fast search algorithm for vector quantization using a directed graph
KR20110033154A (en) Method for counting vectors in regular point networks
CN112464014A (en) Unsupervised Hash industrial cloth texture picture retrieval method based on graph convolution
Kim et al. Towards Accurate Low Bit DNNs with Filter-wise Quantization
Subia-Waud et al. Weight fixing networks
KR20210113356A (en) Data processing devices, data processing systems and data processing methods
CN113096673B (en) Voice processing method and system based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant