CN114781604B

CN114781604B - Coding method of neural network weight parameters, coder and neural network processor

Info

Publication number: CN114781604B
Application number: CN202210385708.8A
Authority: CN
Inventors: 王彦飞; 濮亚男; 胡胜发
Original assignee: Guangzhou Ankai Microelectronics Co ltd
Current assignee: Guangzhou Ankai Microelectronics Co ltd
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2024-02-20
Anticipated expiration: 2042-04-13
Also published as: CN114781604A

Abstract

The invention discloses a coding method of a neural network weight parameter, a coder and a neural network processor, wherein the coding method comprises the following steps: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set. By implementing the invention, the data volume of the weight parameters of the original neural network can be reduced.

Description

Coding method of neural network weight parameters, coder and neural network processor

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method for encoding a neural network weight parameter, an encoder, and a neural network processor.

Background

Deep Neural Networks (DNNs) represent an important breakthrough in the field of machine learning (Meaching Learning, ML), improving the optimal performance of most machine learning tasks by introducing DNNs. The significant recognition effect of DNN is at the cost of enormous model computation and memory, taking CNN as an example, a typical CNN model (YOLOV 3) for target detection tasks requires up to 320 hundred million Floating-point Operations (FLOPs) and model parameters above 60MB, which makes them difficult to deploy into embedded devices with limited hardware resources and energy budget. Many special purpose processor chips (AI chips) designed for artificial intelligence application tasks are also emerging. The module within the AI chip that is specifically responsible for implementing AI operations and AI applications is called a Neural Network Processor (NPU). Since the speed of the NPU accessing the memory cannot keep pace with the speed of the computing unit consuming data, the computing unit cannot be fully utilized even if the computing unit is increased, i.e. a memory wall problem is formed. One of the directions of solving this problem is to reduce the amount of data to access the memory, and to reduce the amount of data to access the memory, it is necessary to reduce the amount of data of the neural network weight parameter, so how to reduce the amount of data of the neural network weight parameter is a problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a coding method, a coder and a neural network processor for a neural network weight parameter, which can reduce the data volume of the neural network weight parameter.

An embodiment of the present invention provides a method for encoding a neural network weight parameter, including: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups;

dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors;

and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.

Further, the dividing the parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector, which specifically includes:

generating a one-dimensional vector according to parameters in the parameter set;

initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors;

initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor;

iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.

Further, the grouping of the plurality of original neural network weight parameters specifically includes:

and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.

Further, the method further comprises the following steps: a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector.

On the basis of the above method embodiments, another embodiment of the present invention provides an encoder, where the encoder is configured to obtain a plurality of original neural network weight parameters, and group the original neural network weight parameters to obtain a plurality of parameter sets;

Further, the encoder divides parameters in the parameter set into a plurality of sub-vectors, learns and clusters the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector, and the method specifically comprises the following steps:

iteratively adjusting the initial values of the number of sub-vectors, the initial prototype vectors, the index values and the initial scaling factors to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.

Further, the encoder groups a plurality of original neural network weight parameters, and specifically includes:

Further, the encoder is further configured to generate a prototype vector lookup table according to each prototype vector and an index value of each prototype vector.

On the basis of the above method embodiments, another embodiment of the present invention provides a neural network processor, where the decoder is configured to obtain each encoded value encoded by the encoding method of the neural network weight parameter;

determining an index value and a scaling factor of a prototype vector corresponding to each code value according to each code value;

extracting prototype vectors corresponding to the coding values according to the index values of the coding values and the prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a coding method, a coder and a neural network processor of a neural network weight parameter, wherein the coding method firstly acquires all original neural network weight parameters, groups all the neural network weight parameters and acquires a plurality of parameter groups; then, dividing parameters in each parameter group into a plurality of sub-vectors, and clustering the sub-vectors to obtain a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector; and finally, taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter group. The coding method of the neural network weight parameter disclosed by the invention is characterized in that the sub-vectors are clustered to generate prototype vectors of one class of sub-vectors, so that a plurality of sub-vectors of the same class in the parameter group can be characterized by one prototype vector and respective corresponding scaling factors, the data volume is greatly reduced, therefore, after the sub-vectors in each parameter group are coded by adopting the index value of the prototype vector and the scaling factors as the coding values of the sub-vectors, the original neural network weight parameter can be characterized by each coding value, and the coded data volume is obviously compressed relative to the original data volume, so that the data volume of the original neural network weight parameter is greatly reduced.

Drawings

Fig. 1 is a flowchart of a method for encoding a neural network weight parameter according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a prototype vector look-up table according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating the effect of encoding a parameter set according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of a neural network processor according to an embodiment of the present invention for performing neural network computation.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, an embodiment of the present invention provides a method for encoding a neural network weight parameter, which at least includes the following steps:

step S101: and acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups.

Step S102: dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector.

And step 103, taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter group.

For step S101, in a preferred embodiment, the grouping of the several original neural network weight parameters specifically includes: and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.

In the present invention, the obtained weight parameters of each original neural network are grouped, and typically, the parameters of one convolution kernel are divided into a group, or may be grouped in other manners. Each group of parameters after grouping is independently encoded, and the same grouping rule is used in decoding.

For step S102, in a preferred embodiment, the dividing the parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector specifically includes:

Specifically, after the weight parameters of each original neural network are grouped to obtain a plurality of parameter sets, each parameter set is processed one by one according to the following steps:

1. expanding parameters in a parameter group into one-dimensional vectors, wherein the one-dimensional vectors are represented by A, the dimension of A is 1*L, and the A is divided into C sub-vectors, and the length of each sub-vector isEach subvector is called Vi, i ε [0, C-1]]. Wherein, the C value is in the value range [1:L ]]In the training process, when the iteration adjustment is carried out on C later, the C value is within the value range [1:L ]]Internal traversal, wherein the traversal rule is as follows: starting from L (i.e. the number of initial subvectors mentioned above is L), the equal step size decreases to not less than 1. The step size is reduced to c step, which is a preset empirical value.

2. Initializing N prototype vectors Ej (i.e., the above-described initial prototype vectors), j ε [0, N-1]]And N<The length of C, ej is equal to D, and the length of Vi. Prototype vector is a function g (Ej), j ε [0, N-1]]And (3) representing. Each Vi is fitted with one of N prototype vectors, then a can be fitted with a combination of C prototype vectors (C>N, prototype vector reuse), C prototype vector index Ki, i.e. [0, C-1]]Expressed, where Ki ε [1:N ]]. Obtaining a preliminary fitting mode: vi≡f (Ki, g (Ej)), where f (Ki, g (Ej)) refers to a prototype vector of index Ki in the set of prototype vectors. Prototype vector indexing requirement(bit) bits. The data type of prototype vector Ej is 8-bit shaped. Wherein the method comprises the steps ofN, ej, ki are learnable parameters.

3. To make the fitting more flexible, a scaling factor Si, i E [0, C-1 is set]Each Vi corresponds to one Si, and the fitting mode is changed to vi≡si×f (Ki, g (Ej)), so as to obtain the initial fitting model. The data type of the scaling factor is integer, expressed by N2 (bit) bit, and the number of the removable values is 2 ^N2 N2 is a preset empirical value, and Si is a learnable parameter.

4. Iteratively adjusting C, N, ej, ki and Si values, and performing iterative training on the initial fitting model vi≡si×f (Ki, g (Ej)) until the value loss_c of the following loss function is minimum:

loss_c is composed of two parts, the first half is the square value of the vector A fitting error, and the second half constrains the N value. Where β is an empirical value.

Recording corresponding output after training is completed, wherein the output comprises: the number N of prototype vectors; n prototype vectors Ej, j E [0, N-1]; index value Ki, i epsilon [0, C-1] of prototype vector corresponding to each Vi; the scaling factor Si, i E [0, C-1] for each Vi. The sub-vector of each parameter group is divided, and a prototype vector corresponding to each sub-vector, an index value of the prototype vector and a scaling factor are generated.

It should be noted that, during iterative adjustment, the value of C is first determined according to the aforementioned traversal rule of C, then the fitting model is trained, the training is aimed at minimizing loss_c under the value of C, and then the corresponding output and loss_c value under the value of C are recorded when loss_c is minimized; and then, re-selecting a C value according to the traversing rule of the C value, training, recording the output when the loss_c is minimum under the updated C value and the loss_c value, repeating the operation until the C is traversed in the value range [1:L ], comparing all the loss_c values, and taking the output when the loss_c value is minimum as the final output.

For step S103, after generating the prototype vector corresponding to the sub-vector of each parameter set, the index value of the prototype vector, and the scaling factor by the method in step S102, the index value of the prototype vector corresponding to the sub-vector and the scaling factor are used as the encoding values of the sub-vectors, and the sub-vectors in each parameter set are encoded, so as to obtain the encoded neural network weight parameter. Through coding, parameters in each parameter set are replaced by a prototype vector number index value and a scaling factor obtained through training.

In a preferred embodiment, further comprising: a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector. After all parameter sets are learned, storing the prototype vector lookup table, the coded index value and the scaling factor in a network model parameter file, wherein the coded index value and the scaling factor are used as the coded neural network weight parameters.

Illustratively, a corresponding prototype-vector look-up table of a parameter set is shown in FIG. 2, where the left-hand portion of the figure represents the index value of each prototype vector and the right-hand portion of the figure represents each prototype vector. The effect of the parameters in the corresponding parameter set after coding is schematically shown in fig. 3, the parameter set is shown to be divided into 10 (c=10) sub-vectors, the corresponding prototype vector set is 8 (n=8) prototype vectors in fig. 2, the lower half of the values are corresponding coding values, and the coding values include an index value Ki and a scaling factor Si. Taking the first sub-vector as an example, the sub-vector may be represented by a prototype vector having a code value of "010" and a scaling factor having a code value of "11".

The method for encoding the neural network weight parameters according to the above embodiment is applicable to be executed in an encoder, so that the present invention correspondingly provides an encoder based on the above embodiments of the method, where the encoder is configured to obtain a plurality of original neural network weight parameters, and group the original neural network weight parameters to obtain a plurality of parameter sets; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; and taking the index value of the prototype vector corresponding to the sub-vector and the scaling factor as the coding value of the sub-vector, and coding the sub-vector in each parameter set.

In a preferred embodiment, the encoder divides parameters in the parameter set into a plurality of sub-vectors, learns and clusters the sub-vectors, and generates a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector, and a scaling factor corresponding to each sub-vector in each type of sub-vector, which specifically includes:

generating a one-dimensional vector according to parameters in the parameter set; initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors; initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor; iteratively adjusting the initial values of the number of sub-vectors, the initial prototype vectors, the index values and the initial scaling factors to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, the index value of each prototype vector and the scaling factor corresponding to each sub-vector.

In a preferred embodiment, the encoder groups several original neural network weight parameters, including in particular: and dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which each original neural network weight parameter belongs.

In a preferred embodiment, the encoder is further configured to generate a prototype vector look-up table from each prototype vector and the index value of each prototype vector.

On the basis of the embodiment of the method, the invention correspondingly provides a neural network processor, wherein the decoder is used for acquiring each coded value coded by the coding method of the neural network weight parameter and determining the index value and the scaling factor of the prototype vector corresponding to each coded value according to each coded value; extracting prototype vectors corresponding to the coding values according to the index values of the coding values and the prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors.

Specifically, as shown in fig. 4, after the original neural network weight parameters are encoded by the method, encoded neural network weight parameters and a prototype vector lookup table are generated, and then the encoded neural network weight parameters and the prototype vector lookup table are stored in a memory, and the process is finished offline through an encoder.

In the online reasoning process of the network, a decoder in a neural network processor reads part or all of the coded neural network weight parameters and corresponding prototype vector lookup tables from a memory, searches each prototype vector according to an index value sequence, scales prototype vector values based on scaling factor parameters to obtain each decoded sub-vector, namely recovered neural network weight parameters, and then the neural network processor performs neural network calculation according to the recovered neural network weight parameters.

The weight parameter data quantity affecting the bandwidth is changed from the original neural network weight parameter before encoding into the encoded neural network weight parameter and the prototype lookup table, and the efficiency of the corresponding parameter compression is calculated as follows:

raw neural network weight parameter data volume:

Q1＝L*8(bit)；

the data quantity of the weight parameters of the encoded neural network:

prototype vector look-up table data amount:

parameter compression ratio:

the super parameters are adjusted according to specific application scenes, and the compression rate and the precision can be balanced after training.

By implementing the embodiment of the invention, the data volume of the weight parameters of the original neural network can be reduced, so that the data volume of the memory accessed by the neural network processor can be obviously reduced when the subsequent neural network processor performs the neural network calculation, thereby relieving the bandwidth pressure, improving the overall performance of the system and relieving the problem of 'memory wall'.

While the foregoing is directed to the preferred embodiments of the present invention, it will be appreciated by those skilled in the art that changes and modifications may be made without departing from the principles of the invention, such changes and modifications are also intended to be within the scope of the invention.

Claims

1. A neural network processor, comprising a decoder; the decoder is used for acquiring each encoded value encoded by the encoding method of the neural network weight parameter;

extracting prototype vectors corresponding to the coding values according to the index values of the coding values and a prototype vector lookup table, and scaling the extracted prototype vectors according to scaling factors corresponding to the coding values to obtain decoded sub-vectors;

the coding method of the neural network weight parameters comprises the following steps: acquiring a plurality of original neural network weight parameters, and grouping the original neural network weight parameters to acquire a plurality of parameter groups; dividing parameters in the parameter set into a plurality of sub-vectors, and clustering the sub-vectors to generate prototype vectors corresponding to each type of sub-vectors, index values of each prototype vector and scaling factors corresponding to the sub-vectors in each type of sub-vectors; taking an index value of a prototype vector corresponding to the sub-vector and a scaling factor as coding values of the sub-vector, and coding the sub-vector in each parameter group;

dividing parameters in the parameter set into a plurality of sub-vectors, and learning and clustering the sub-vectors to generate a prototype vector corresponding to each type of sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in each type of sub-vector, wherein the method specifically comprises the following steps: generating a one-dimensional vector according to parameters in the parameter set; initializing the number of sub-vectors, and dividing the one-dimensional vector into a plurality of sub-vectors according to the number of the sub-vectors; initializing a plurality of initial prototype vectors, an index value of each initial prototype vector and an initial scaling factor of each sub-vector, and constructing an initial fitting model of the sub-vectors according to each initial prototype vector, the index value of each initial prototype vector and each initial scaling factor; iteratively adjusting the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor to iteratively train the initial fitting model until the function value of a preset loss function is minimum, and generating a prototype vector corresponding to each sub-vector in the parameter set, an index value of each prototype vector and a scaling factor corresponding to each sub-vector; the preset loss function specifically comprises the following steps:

where loss_c is the value of the loss function of C, C is the number of sub-vectors, vi is the value of the sub-vector, si is the value of the scaling factor, f (Ki, g (Ej)) is the prototype vector value with index value Ki, N is the number of prototype vectors, and β is the empirical value;

the number of sub-vectors, each initial prototype vector, each index value and each initial scaling factor are iteratively adjusted to iteratively train the initial fitting model until the function value of the preset loss function is minimum, and a prototype vector corresponding to each sub-vector, an index value of each prototype vector and a scaling factor corresponding to each sub-vector in the parameter set are generated, specifically:

repeating iterative training until C is traversed in a value range [1:L ], comparing all the loss_c values, and taking the output when the loss_c value is minimum as the final output;

during each iterative training process: before the calculation of the preset loss function is carried out, calculating the value of the number C of the sub-vectors; c is traversed in a value range [1:L ], and the traversing rule is as follows: starting from L (i.e. the number of initial subvectors mentioned above is L), the equal step size decreases, not less than 1; decreasing the step length to be c_step, wherein the step length is a preset empirical value; after determining C, calculating a minimum loss_c corresponding to C, and taking all initial prototype vectors, all index values and all initial scaling factors corresponding to the C and the minimum loss_c as output;

the grouping of the several original neural network weight parameters specifically includes: dividing the original neural network weight parameters of the same convolution kernel into a group according to the convolution kernel to which the weight parameters of each original neural network belong;

a prototype vector look-up table is generated from each prototype vector and the index value of each prototype vector.