CN109255429B - Parameter decompression method for sparse neural network model - Google Patents

Parameter decompression method for sparse neural network model Download PDF

Info

Publication number
CN109255429B
CN109255429B CN201810845949.XA CN201810845949A CN109255429B CN 109255429 B CN109255429 B CN 109255429B CN 201810845949 A CN201810845949 A CN 201810845949A CN 109255429 B CN109255429 B CN 109255429B
Authority
CN
China
Prior art keywords
matrix
boundary
index
weight
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810845949.XA
Other languages
Chinese (zh)
Other versions
CN109255429A (en
Inventor
刘必慰
陈胜刚
彭瑾
刘畅
郭阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201810845949.XA priority Critical patent/CN109255429B/en
Publication of CN109255429A publication Critical patent/CN109255429A/en
Application granted granted Critical
Publication of CN109255429B publication Critical patent/CN109255429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a parameter decompression method for a sparse neural network model, which comprises the following steps: s1, storing a required sparse matrix to a specified position, wherein when non-zero elements in the matrix are stored, a relative index and a weight quantization value corresponding to each non-zero element are stored, and if zero number between two non-zero elements is larger than a preset threshold value, a zero value is stored; s2, data stored in the matrix to be decompressed are obtained, relative indexes and weight quantization values in the data are extracted, the relative indexes are restored to absolute indexes, positions of non-zero elements and zero elements in the dense matrix and corresponding weight quantization values are determined according to the restored absolute indexes, a weight vector table is reconstructed according to the positions of the non-zero elements, and decompression of the dense matrix is completed after weight values in the weight vector table are restored. The invention has the advantages of simple realization operation, high decompression efficiency and resource utilization rate, wide and flexible application range and the like.

Description

Parameter decompression method for sparse neural network model
Technical Field
The invention relates to the technical field of neural networks, in particular to a parameter decompression method for a sparse neural network model.
Background
With the development of fire and heat in deep learning, models such as DNN (deep numerical network) and RNN (neural network) in deep learning are widely applied, the models overcome the obstacles of the traditional technology, and great achievements are achieved in many fields such as voice recognition and image recognition. Both DNN and RNN neural networks have full-connection layer calculation, and the full-connection layer calculation is to multiply a weight matrix and a corresponding vector. When training the neural network, it needs to use proper compression algorithm for the model, mainly compress the full connection layer in the neural network, i.e. compress the parameters of the weight matrix, without affecting its accuracy, it can greatly improve the execution speed of neural network reasoning, after training, store the trained parameters in the memory, and decompress them into the weight matrix when needed.
A typical model compression algorithm is shown in fig. 1, and mainly comprises pruning, quantization training and variable length coding, after an initialization training stage, the pruning of a model is realized by removing connections with weights lower than a threshold value, the pruning converts a dense layer into a sparse layer, and a first stage needs to learn a topological structure of a network and pay attention to important connections while removing unimportant connections; quantization training is a process of sharing weights, so that a plurality of connections share the same weight; the pruning and quantization training can generate high compression ratio under the condition of not influencing each other, so that the requirement of storage space is reduced. After the training is finished, it is necessary to decompress the trained parameters into an accurate weight matrix without affecting the precision of the trained parameters.
At present, a plurality of hardware architectures can directly operate parameters after model compression, but a more complex hardware architecture design is needed, the implementation is complex and high in cost, how to avoid using the complex hardware architecture to realize the correct decompression of sparse neural network model parameters is an urgent problem to be solved, the correct decompression of sparse neural network model parameters is needed, the problems that the compressed parameters are organized and stored in any mode, and the stored parameters are decompressed into a complete dense matrix under the condition of ensuring the accuracy of the stored parameters are solved, a plurality of matrixes can be compressed at one time at present, and the decompression of a plurality of matrixes at one time needs to be realized.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the parameter decompression method for the sparse neural network model, which has the advantages of simple operation, high decompression efficiency and resource utilization rate, wide application range and flexibility.
In order to solve the technical problems, the technical scheme provided by the invention is as follows:
a parameter decompression method for a sparse neural network model is characterized by comprising the following steps of:
s1, compression parameter storage: storing a required sparse matrix to a designated position, wherein when storing non-zero elements in the matrix, a relative index and a weight quantization value corresponding to each non-zero element are stored, the relative index is used for identifying the number of zeros between two non-zero elements, and if the number of zeros between two non-zero elements is greater than a preset threshold value, a zero value is stored;
s2, decompressing: and acquiring data stored in the matrix required to be decompressed according to the step S1, extracting the relative index and the weight quantization value, restoring the relative index into an absolute index corresponding to the element position one by one, determining the positions of non-zero elements and zero elements in the dense matrix and determining the weight quantization value at each position according to the restored absolute index, reconstructing a weight vector table according to the positions of the non-zero elements, restoring the weight values in the weight vector table, and completing decompression of the dense matrix.
As a further improvement of the present invention, the step S2 includes an instruction decoding and data obtaining step S21, which includes the following steps: the method comprises the steps of receiving and decoding a decompression instruction, determining the length and the source address of a matrix to be decompressed and the destination address of a weight matrix after decompression is stored according to decoding information, taking out data stored in the matrix to be decompressed according to the obtained source address, and extracting the relative index.
As a further improvement of the present invention, the step S2 includes an index restoring step S22, which specifically includes: and restoring the relative index into an absolute index in an accumulation mode.
As a further improvement of the present invention, the step S2 includes a step S23 of reconstructing a weight quantization table, which includes the following specific steps: and determining the position of a non-zero element according to each absolute index obtained by recovery, reconstructing a weight quantization table according to the position of each non-zero element, wherein the corresponding position of the non-zero element in the weight quantization table is an effective weight value.
As a further improvement of the present invention, the step S2 includes an inverse quantization step S24, which includes the following specific steps: and restoring the effective weight values in the weight quantization table to obtain a complete dense matrix.
As a further improvement of the present invention, in step S24, the effective weight values in the weight quantization table are restored through a lookup table.
As a further improvement of the present invention, it further comprises storing a plurality of required matrices in a manner of crossing boundaries, and configuring boundary flags to identify states stored across rows, the boundary flags including a first flag identifying no matrix boundary in the vector, a second flag identifying matrix boundary in the vector and not at the end of the vector, a third flag identifying matrix boundary in the vector and at the end of the vector, and a fourth flag identifying matrix boundary in the vector and discarding data after the boundary.
As a further improvement of the present invention, when multiple matrix decompressions are performed, the specific steps of restoring the relative index to the absolute index corresponding to the element position in one-to-one manner in step S2 are as follows:
converting all the stored relative indexes into absolute indexes;
judging when the boundary marks are obtained, judging matrix boundaries according to the absolute indexes if the boundary marks are the second marks and the fourth marks, performing exclusive OR on the two high bits of the adjacent absolute indexes, and judging that the former position is the boundary position of the first matrix and the latter position is the starting position of the next matrix if the result is 1 to obtain the matrix boundaries;
generating index effective signals corresponding to the relative indexes one by one and signals for judging whether the matrix is finished, and setting the index effective signals and the signals for judging whether the matrix is finished according to the boundary marks;
and outputting the index valid signal, the absolute index and a signal whether the matrix is finished.
As a further improvement of the present invention, the specific steps of setting the index valid signal and the signal whether the matrix is ended according to the boundary flag include:
when the boundary mark is the first mark, judging that no matrix boundary exists and the matrix is not finished, assigning each effective index signal to be 1, and setting the matrix non-finished signal to be 0;
when the boundary flag is the third flag, determining that a matrix boundary is at the end of a vector, assigning each effective index signal to be 1, and setting the matrix end signal to be 1;
when the boundary mark is the second mark, judging that a matrix boundary is in the center of a vector, and performing two beats according to the judged matrix boundary, wherein the first beat sets the index effective signal before the matrix boundary to be 1 and the matrix end signal to be 1, the second beat sets the index effective signal after the matrix boundary to be 1 and the matrix end signal to be 0;
when the boundary flag is the fourth flag, it is determined that the last row of the vector has been reached, an index valid signal in the matrix boundary is set to 1 according to the determined boundary of the last matrix, and unnecessary data is discarded.
As a further improvement of the present invention, the step S2 of restoring the weight values in the weight vector table specifically includes:
distributing a weight quantization value to a corresponding position according to the absolute index and the index effective signal;
and judging whether the matrix boundary is reached, if the matrix does not reach the boundary, registering the weight quantization value of the corresponding position, and if the matrix reaches the boundary, outputting a weight quantization table of the matrix.
Compared with the prior art, the invention has the advantages that:
1. according to the invention, the storage space of the storage parameters can be reduced by storing the compression parameters in a way of storing the relative indexes and the weight quantization values corresponding to the non-zero elements, then the relative indexes are restored into the absolute indexes and the weight vector table is restored based on the absolute indexes, finally, the correct weight matrix can be obtained by decompression, after the model is compressed, the decompression processing of the compression model can be realized without hardware design, the reconstruction of the weight matrix is completed, and the decompression speed is high, the efficiency is high and the realization is flexible.
2. The method supports sparse matrix and weight sharing in model compression, not only supports decompression of RNN model compression algorithm, but also can support decompression of DNN model compression, has good flexibility and expandability, is simple and flexible to realize, and has no requirement on dimensionality of the decompressed weight matrix.
3. According to the invention, a plurality of compressed matrixes are continuously stored, the boundary marks are arranged among the matrixes, when the matrixes need to be decompressed, only corresponding decompression instructions need to be sent, and the boundary state among the matrixes is determined by the boundary marks, so that a plurality of weight matrixes can be correctly decompressed at one time, the decompression efficiency and speed are effectively improved, in each row of spv, only 2 bits of matrix boundary marks are needed, the number of auxiliary data which needs to be additionally arranged is small, the storage space and the memory access bandwidth are saved, and the storage space and the memory access bandwidth can be further saved by continuously storing the compressed matrixes. The maximum number of decompression matrices is determined by the number of parameters stored in the spv and the length of the decoding.
Drawings
Fig. 1 is a schematic diagram of an implementation principle of a typical model compression algorithm.
Fig. 2 is a schematic diagram of an implementation principle of parameter decompression for implementing the sparse neural network model according to the present embodiment.
Fig. 3 is a schematic flow chart of an implementation of the parameter decompression method for the sparse neural network model according to the present embodiment.
Fig. 4 is a schematic diagram illustrating an implementation principle of storing non-zero elements in the compressed matrix in the SPV according to this embodiment.
Fig. 5 is a schematic diagram illustrating a principle of performing boundary-crossing storage on three matrices to be decompressed as spv boundary-crossing storage in an embodiment of the present invention.
Fig. 6 is a schematic diagram of a specific implementation flow of absolute index recovery during multi-matrix decompression according to this embodiment.
Fig. 7 is a schematic flow chart of implementing the reconstruction weight quantization table during multi-matrix decompression according to the present embodiment.
Fig. 8 is a schematic flow chart of the implementation of inverse quantization in the present embodiment.
Detailed Description
The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.
As shown in fig. 2 and 3, the parameter decompression method for the sparse neural network model of the present embodiment includes the steps of:
s1, compression parameter storage: storing a required sparse matrix to a designated position, wherein when storing non-zero elements in the matrix, a relative index and a weight quantization value corresponding to each non-zero element are stored, the relative index is used for identifying the number of zeros between two non-zero elements, and if the number of zeros between two non-zero elements is greater than a preset threshold value, a zero value is stored;
s2, decompressing: obtaining the matrix to be decompressed according to the data stored in step S1, extracting the relative index and the weight quantization value therein, restoring the relative index into an absolute index corresponding to the element position one by one, determining the positions of the non-zero element and the zero element in the dense matrix and determining the weight quantization value at each position according to the restored absolute index, reconstructing a weight vector table according to the positions of the non-zero elements, restoring the weight values in the weight vector table, and completing decompression of the dense matrix, which is the weight matrix in the neural network.
According to the method, after the model is compressed, decompression processing of the compressed model can be achieved without hardware design, reconstruction of the weight matrix is completed, the storage space of the storage parameters can be reduced by storing the compression parameters in a mode of storing relative indexes and weight quantization values corresponding to non-zero elements, and finally the correct weight matrix can be obtained through decompression by restoring the relative indexes into absolute indexes and restoring the weight vector table based on the absolute indexes.
In this embodiment, the compressed parameters are stored in the vector memory spv, the spv has a width of 32 words, the decompressed weight matrix is stored in the memory spm, the spm has a width of 1024 words, and the compressed matrix is stored in the multiplexed vector memory spv, so that the utilization rate of hardware resources is high. The compression techniques supported by the decompression method in this embodiment are two types: the sparse matrix and the weight value are shared, wherein for the storage of non-zero elements in the sparse matrix, a relative index and a weight value quantization value are stored, if the number of zeros between two non-zero elements is greater than a preset threshold value (specifically 15), a zero value is stored, and the relative index represents the number of zeros in front of the non-zero elements; for the determination of the quantization value, the weight quantization value is reduced from 16 bits to 5 bits due to weight sharing in the compression algorithm, so that only 9 bits are needed to store one non-zero element. The compressed parameters are stored in the vector memory spv in the manner described above, and one vector memory spv address can store 56 non-zero elements in particular.
The decompression method of the embodiment supports sparse matrix and weight sharing in model compression, not only supports decompression of RNN model compression algorithm, but also supports decompression of DNN model compression, and has good flexibility and expandability, simple and flexible realization and no requirement on dimensionality of the decompressed weight matrix.
As shown in fig. 3, when the embodiment implements decompression, a decompression command is sent, and when the decompression command is received, decompression processing including index recovery, determination of a weight quantization table, and inverse quantization operation is started for parameters stored in spv, and a decompressed weight matrix is stored in a memory spm.
In this embodiment, the step S2 includes an instruction decoding and data obtaining step S21, which includes the following steps: the method comprises the steps of receiving and decoding a decompression instruction, determining the length and the source address of a required decompression matrix and the destination address of a weight matrix after decompression is stored according to decoding information, taking out data stored in the required decompression matrix according to the obtained source address, and extracting a relative index and a weight quantization value in the data.
Step S2 of this embodiment includes step S22 of restoring the relative index to the absolute index, and step S22 restores the relative index to the absolute index by accumulation.
In this embodiment, the step S2 includes a step S23 of reconstructing the weight vector table, which includes the following steps: and determining the position of a non-zero element according to each recovered absolute index, reconstructing a weight quantization table from the positions of the non-zero elements, wherein the corresponding positions of the non-zero elements in the weight quantization table are effective weight values. Specifically, at each position in the weight quantization table, if there is a corresponding absolute index, that is, the position corresponding to the non-zero element is located, the position is assigned as a corresponding weight quantization value, otherwise, the weight quantization value at the position is assigned as 0.
In this embodiment, the step S2 includes an inverse quantization step S24, which includes the following steps: and restoring the effective weight values in the weight quantization table to obtain a complete dense matrix, and restoring the effective weight values in the weight quantization table through a lookup table.
The above detailed flow for implementing decompression in this embodiment includes:
s21, instruction decoding: sending different decompression commands according to the matrix to be decompressed, wherein the decompression commands comprise spv source addresses, spm destination addresses and length len of vectors for decompressing spv; after receiving the decompression command, decoding to determine the length of the decompression vector, the decompressed source address and the destination address of the memory weight matrix. Extracting stored parameters from a memory spv according to a source address, and extracting a relative index and a weight quantization value;
s22, index recovery: restoring the relative index into an absolute index by an accumulation sum method, determining the positions of zero elements and non-zero elements in the dense matrix according to the absolute index, and outputting the absolute index and a weight quantization value;
s23, reconstructing a weight vector table: determining the position of a non-zero element through each absolute index in the weight quantization table, establishing a weight quantization value corresponding to the position of the corresponding non-zero element according to the absolute index of the non-zero element, if the corresponding absolute index exists, giving the position to the corresponding weight quantization value, if the corresponding absolute index does not exist, giving the weight quantization value of the position to be 0, and reconstructing to obtain the weight quantization table of the matrix;
s24, inverse quantization: and restoring the effective weight value corresponding to the weight quantization table through the lookup table to obtain a complete dense matrix.
After each complete dense matrix is decompressed according to the method, the destination address is decoded according to the instruction so as to store the dense matrix at the corresponding destination address in the spm.
In this embodiment, a plurality of required matrices are stored in a manner of crossing boundaries, boundary flags are configured to identify states of cross-row storage, different boundary flags represent different conditions of cross-row storage, and the boundary flags include a first flag that identifies no matrix boundary in a vector, a second flag that identifies matrix boundaries in the vector and is not at the end of the vector, a third flag that identifies matrix boundaries in the vector and is at the end of the vector, and a fourth flag that identifies matrix boundaries in the vector and discards data after the boundaries. 56 non-zero elements can be stored in one spv address, the elements of the compressed weight matrix can not be completely stored in one row, and the situation that spv is stored across rows exists.
When model compression is carried out, only one weight matrix cannot be compressed at a time, in the embodiment, a plurality of compressed matrixes are continuously stored, boundary marks are arranged among the matrixes, when a plurality of matrixes need to be decompressed, only corresponding decompression instructions need to be sent, each matrix is decompressed by using the method, and the boundary state among the matrixes is determined by the boundary marks, so that the plurality of weight matrixes can be correctly decompressed at a time, the decompression efficiency and speed are effectively improved, in each row spv, only 2 bits of matrix boundary marks are needed, less auxiliary data need to be additionally arranged, the storage space and the memory access bandwidth are saved, and the storage space and the memory access bandwidth can be further saved by continuously storing the compressed matrixes. The maximum number of decompression matrices is determined by the number of parameters stored in the spv and the length of the decoding. The compressed parameters can be stored in the spv in a certain mode across the matrix boundary, the matrix boundary does not need to be distinguished during storage, a large amount of storage space can be further saved, the parameters corresponding to the addresses in the spv can be taken out only by sending corresponding instructions, and the weight matrix is obtained after decompression.
In this embodiment, non-zero elements in a compressed matrix are stored in spv as shown in fig. 4, where a low-order stores a weight table of the compressed non-zero elements, specifically 5 bits, a high-order stores a relative index, specifically 4 bits, and the highest two bits store a boundary flag, where one spv address can store 56 non-zero elements, and the elements of the compressed weight matrix usually cannot be stored completely within one row, which may cause a cross-row storage condition; the index of the 29 th element in spv is an absolute index, and since the index of the 29 th element is an absolute index, the recovery of the absolute indices of the two parts 0-28 and 30-56 can be done in parallel.
As shown in fig. 5, in the embodiment of the present invention, the boundary-crossing storage is performed on three matrices to be decompressed as spv boundary-crossing storage, and different boundary flags represent different conditions of row-crossing storage, where a first flag 00 indicates that there is no matrix boundary in the vector; the second flag 01 indicates that there is a matrix boundary in the vector and not at the end of the vector; the third marker 10 indicates that there is a matrix boundary in the vector and at the end of the vector; the fourth flag 11 indicates that there is a matrix boundary in the vector and the data after the boundary is discarded.
As shown in fig. 6, when the present embodiment performs multiple matrix decompression, the specific steps of restoring the relative index to the absolute index corresponding to the element position in one-to-one in step S2 are as follows:
converting all stored relative indexes (specifically 56) into absolute indexes by accumulation, namely adding the relative indexes one by one to obtain the absolute indexes; the embodiment sets the absolute index to one more bit so as to process the matrix boundary;
generating a weight quantization table;
thirdly, configuring boundary marks among all matrixes in the spv; when acquiring spv parameters, judging when acquiring boundary marks, if the boundary marks are the second mark (01) and the fourth mark (11), executing a step (iv), otherwise, executing the step (iv);
judging the matrix boundary according to the absolute indexes, carrying out exclusive OR on the high two bits of the adjacent absolute indexes, if the result is 1, judging that the former position is the boundary position of the first matrix, and the latter position is the starting position of the next matrix to obtain the matrix boundary;
generating index effective signals (specifically 56) corresponding to each relative index one by one and signals for judging whether the matrix is finished, and setting the index effective signals and the signals for judging whether the matrix is finished according to the boundary marks;
sixthly, outputting the index effective signal, the absolute index, the weight vector table and the signal whether the matrix is finished or not.
In the fifth step of this embodiment, the specific steps of setting the index valid signal and the signal indicating whether the matrix is ended according to the boundary flag are as follows:
5.1) when the boundary flag is the first flag (00), judging that no matrix boundary exists and the matrix is not ended, assigning 56 effective index signals to be 1 and setting a matrix non-ending signal to be 0;
5.2) when the boundary mark is the third mark (10), judging that the boundary of the matrix is at the end of the vector, assigning 56 effective index signals to be 1, and setting a matrix end signal to be 1;
5.3) when the boundary mark is a second mark (01), judging that the matrix boundary is in the center of the vector, and performing two beats according to the judged matrix boundary, wherein the first beat sets an index effective signal before the matrix boundary to be 1 and sets a matrix end signal to be 1, the second beat sets the index effective signal after the matrix boundary to be 1 and sets the matrix end signal to be 0;
5.4) when the boundary flag is the fourth flag (11), judging that the last line of the vector has been reached, setting the index valid signal in the matrix boundary to 1 according to the judged boundary of the last matrix, and discarding the unnecessary data.
As shown in fig. 7, when multiple matrix decompression is performed in this embodiment, the specific steps of restoring the weight values in the weight vector table in step S2 are as follows:
allocating a weight quantization value to a corresponding position according to an absolute index and an index effective signal;
judging whether a matrix boundary is reached, if the matrix does not reach the boundary, registering a weight quantization value of a corresponding position, and if the matrix reaches the boundary, outputting a weight quantization table of the matrix.
As shown in fig. 8, in the embodiment, when performing inverse quantization, the effective weight corresponding to each quantized weight value is restored by means of a lookup table, and whether a corresponding position of the matrix has a corresponding weight value is determined, if yes, the weight value is restored, and if not, the corresponding position is assigned to 0, so as to finally obtain a complete dense matrix.
By the decompression method, decompression processing can be performed on a neural network model compression algorithm, compressed data are stored and organized according to the method, storage space of storage parameters can be reduced, a plurality of matrixes can be decompressed at one time, weight sharing and sparse matrixes are supported, and decompression is fast, efficient and flexible.
The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims (9)

1. A parameter decompression method for a sparse neural network model is characterized by comprising the following steps of:
s1, compression parameter storage: storing a required sparse matrix to a designated position, wherein when storing non-zero elements in the matrix, a relative index and a weight quantization value corresponding to each non-zero element are stored, the relative index is used for identifying the number of zeros between two non-zero elements, and if the number of zeros between two non-zero elements is greater than a preset threshold value, a zero value is stored;
s2, decompress: acquiring data stored in the matrix required to be decompressed according to step S1, extracting the relative index and the weight quantization value therein, restoring the relative index into an absolute index corresponding to the element position one by one, determining the positions of non-zero elements and zero elements in the dense matrix and determining the weight quantization value at each position according to the restored absolute index, reconstructing a weight vector table according to the positions of the non-zero elements, restoring the weight values in the weight vector table, and completing decompression of the dense matrix;
further comprising storing a plurality of required matrices in a manner that crosses boundaries, and configuring boundary flags to identify states stored across rows, the boundary flags including a first flag identifying no matrix boundaries in the vector, a second flag identifying matrix boundaries in the vector and not at the end of the vector, a third flag identifying matrix boundaries in the vector and at the end of the vector, and a fourth flag identifying matrix boundaries in the vector and discarding data after the boundaries.
2. The method for decompressing parameters of a sparse neural network model of claim 1, wherein the step S2 comprises an instruction decoding and data obtaining step S21, comprising the steps of: the method comprises the steps of receiving and decoding a decompression instruction, determining the length and the source address of a matrix to be decompressed and the destination address of a weight matrix after decompression is stored according to decoding information, taking out data stored in the matrix to be decompressed according to the obtained source address, and extracting the relative index.
3. The parameter decompression method for the sparse neural network model according to claim 1, wherein the step S2 comprises an index recovery step S22, and the specific steps are as follows: and restoring the relative index into an absolute index in an accumulation mode.
4. The parameter decompression method for the sparse neural network model according to claim 1, 2 or 3, wherein the step S2 comprises a step S23 of reconstructing a weight quantization table, specifically comprising the steps of: and determining the position of a non-zero element according to each absolute index obtained by recovery, reconstructing a weight quantization table according to the position of each non-zero element, wherein the corresponding position of the non-zero element in the weight quantization table is an effective weight value.
5. The parameter decompression method for the sparse neural network model according to claim 4, wherein the step S2 comprises an inverse quantization step S24, and the specific steps are as follows: and restoring the effective weight values in the weight quantization table to obtain a complete dense matrix.
6. The parameter decompression method for the sparse neural network model according to claim 5, wherein in step S24, the effective weight values in the weight quantization table are restored through a lookup table.
7. The parameter decompression method for the sparse neural network model according to claim 1, wherein when multiple matrix decompressions are performed, the specific step of restoring the relative index to the absolute index corresponding to the element position in one-to-one correspondence in step S2 is:
converting all the stored relative indexes into absolute indexes;
judging when the boundary marks are obtained, judging matrix boundaries according to the absolute indexes if the boundary marks are the second marks and the fourth marks, performing exclusive OR on the two high bits of the adjacent absolute indexes, and judging that the former position is the boundary position of the first matrix and the latter position is the starting position of the next matrix if the result is 1 to obtain the matrix boundaries;
generating index effective signals corresponding to the relative indexes one by one and signals for judging whether the matrix is finished, and setting the index effective signals and the signals for judging whether the matrix is finished according to the boundary marks;
and outputting the index valid signal, the absolute index and a signal whether the matrix is finished.
8. The parameter decompression method for the sparse neural network model according to claim 7, wherein the specific step of setting the index valid signal and the signal indicating whether the matrix is ended according to the boundary flag comprises:
when the boundary mark is the first mark, judging that no matrix boundary exists and the matrix is not finished, assigning each index effective signal to be 1, and setting the matrix non-finished signal to be 0;
when the boundary mark is the third mark, judging that the matrix boundary is at the end of the vector, assigning each index effective signal to be 1, and setting the matrix end signal to be 1;
when the boundary mark is the second mark, judging that a matrix boundary is in the center of a vector, and performing two beats according to the judged matrix boundary, wherein the first beat sets the index effective signal before the matrix boundary to be 1 and the matrix end signal to be 1, the second beat sets the index effective signal after the matrix boundary to be 1 and the matrix end signal to be 0;
when the boundary flag is the fourth flag, it is determined that the last row of the vector has been reached, an index valid signal in the matrix boundary is set to 1 according to the determined boundary of the last matrix, and unnecessary data is discarded.
9. The parameter decompression method for the sparse neural network model according to claim 8, wherein the step S2 of restoring the weight values in the weight vector table comprises the specific steps of:
distributing a weight quantization value to a corresponding position according to the absolute index and the index effective signal;
and judging whether the matrix boundary is reached, if the matrix does not reach the boundary, registering the weight quantization value of the corresponding position, and if the matrix reaches the boundary, outputting a weight quantization table of the matrix.
CN201810845949.XA 2018-07-27 2018-07-27 Parameter decompression method for sparse neural network model Active CN109255429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810845949.XA CN109255429B (en) 2018-07-27 2018-07-27 Parameter decompression method for sparse neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810845949.XA CN109255429B (en) 2018-07-27 2018-07-27 Parameter decompression method for sparse neural network model

Publications (2)

Publication Number Publication Date
CN109255429A CN109255429A (en) 2019-01-22
CN109255429B true CN109255429B (en) 2020-11-20

Family

ID=65049925

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810845949.XA Active CN109255429B (en) 2018-07-27 2018-07-27 Parameter decompression method for sparse neural network model

Country Status (1)

Country Link
CN (1) CN109255429B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11562115B2 (en) 2017-01-04 2023-01-24 Stmicroelectronics S.R.L. Configurable accelerator framework including a stream switch having a plurality of unidirectional stream links
CN108269224B (en) 2017-01-04 2022-04-01 意法半导体股份有限公司 Reconfigurable interconnect
CN110766136B (en) * 2019-10-16 2022-09-09 北京航空航天大学 Compression method of sparse matrix and vector
US11295199B2 (en) 2019-12-09 2022-04-05 UMNAI Limited XAI and XNN conversion
US11593609B2 (en) 2020-02-18 2023-02-28 Stmicroelectronics S.R.L. Vector quantization decoding hardware unit for real-time dynamic decompression for parameters of neural networks
US11531873B2 (en) 2020-06-23 2022-12-20 Stmicroelectronics S.R.L. Convolution acceleration with embedded vector decompression

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007323401A (en) * 2006-06-01 2007-12-13 Kagawa Univ Data processor, data restoration device, data processing method and data restoration method
CN105260776A (en) * 2015-09-10 2016-01-20 华为技术有限公司 Neural network processor and convolutional neural network processor
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated
CN108111863A (en) * 2017-12-22 2018-06-01 洛阳中科信息产业研究院(中科院计算技术研究所洛阳分所) A kind of online real-time three-dimensional model video coding-decoding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007323401A (en) * 2006-06-01 2007-12-13 Kagawa Univ Data processor, data restoration device, data processing method and data restoration method
CN105260776A (en) * 2015-09-10 2016-01-20 华为技术有限公司 Neural network processor and convolutional neural network processor
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107944555A (en) * 2017-12-07 2018-04-20 广州华多网络科技有限公司 Method, storage device and the terminal that neutral net is compressed and accelerated
CN108111863A (en) * 2017-12-22 2018-06-01 洛阳中科信息产业研究院(中科院计算技术研究所洛阳分所) A kind of online real-time three-dimensional model video coding-decoding method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING;Song Han 等;《arXiv:1510.00149v5 [cs.CV]》;20160215;1-14页第2节,图2 *

Also Published As

Publication number Publication date
CN109255429A (en) 2019-01-22

Similar Documents

Publication Publication Date Title
CN109255429B (en) Parameter decompression method for sparse neural network model
US11531889B2 (en) Weight data storage method and neural network processor based on the method
CN105260776A (en) Neural network processor and convolutional neural network processor
CN113222150B (en) Quantum state transformation method and device
CN114764549A (en) Quantum line simulation calculation method and device based on matrix product state
CN112232513A (en) Quantum state preparation method and device
CN110868223B (en) Numerical operation implementation method and circuit for Huffman coding
US20220114454A1 (en) Electronic apparatus for decompressing a compressed artificial intelligence model and control method therefor
CN113986818B (en) Chip address reconstruction method, chip, electronic device and storage medium
JP2019028746A (en) Network coefficient compressing device, network coefficient compressing method and program
CN113258934A (en) Data compression method, system and equipment
CN113222159B (en) Quantum state determination method and device
JPH05335967A (en) Sound information compression method and sound information reproduction device
US10559093B2 (en) Selecting encoding options
CN113222151A (en) Quantum state transformation method and device
JP2018081294A (en) Acoustic model learning device, voice recognition device, acoustic model learning method, voice recognition method, and program
CN104618715A (en) Method and device for obtaining minimal rate-distortion cost
US20220036190A1 (en) Neural network compression device
Liang et al. Exploiting noise correlation for channel decoding with convolutional neural networks
WO2023075630A8 (en) Adaptive deep-learning based probability prediction method for point cloud compression
KR102153786B1 (en) Image processing method and apparatus using selection unit
US10263638B2 (en) Lossless compression method for graph traversal
CN111767204A (en) Overflow risk detection method, device and equipment
WO2022217502A1 (en) Information processing method and apparatus, communication device, and storage medium
US20230087752A1 (en) Information processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant