CN111008698A - Sparse matrix multiplication accelerator for hybrid compressed recurrent neural networks - Google Patents
Sparse matrix multiplication accelerator for hybrid compressed recurrent neural networks Download PDFInfo
- Publication number
- CN111008698A CN111008698A CN201911160416.9A CN201911160416A CN111008698A CN 111008698 A CN111008698 A CN 111008698A CN 201911160416 A CN201911160416 A CN 201911160416A CN 111008698 A CN111008698 A CN 111008698A
- Authority
- CN
- China
- Prior art keywords
- weight
- variable length
- recurrent neural
- neural network
- compression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The invention belongs to the technical field of integrated circuits, and particularly relates to a sparse matrix multiplication accelerator for a hybrid compressed recurrent neural network. The accelerator includes: 2 groups of multiply-accumulate units for calculating the eigenvalues of 2 different output channels in the network; 4 input memories, 2 column combination weight memories, 1 variable length coding weight memory and 1 variable length coding index memory, storing the weight and index of irregular variable length coding compression; 2 second-stage accumulators for reading the intermediate result in the output memory and accumulating the result of the multiply-accumulate unit to update the output result; and the 1 decoder is used for decoding the weight of the variable length compression and transmitting the weight to the corresponding multiply-accumulate unit. The invention compresses the sparse weight matrix by utilizing the sparsity of the weights in the network, thereby ensuring the precision of the original circulating network, reducing the weight storage space, accelerating the calculation speed and reducing the calculation power consumption.
Description
Technical Field
The invention belongs to the technical field of integrated circuits, and particularly relates to a sparse matrix multiplication accelerator for a hybrid compressed recurrent neural network.
Background
Owing to the continuous development of the recurrent neural network, the recurrent neural network is widely applied to natural language processing tasks such as text classification, machine translation, speech synthesis and the like. The weights corresponding to different output channels in the recurrent neural network form each row of the weight matrix, and the columns of the weight matrix correspond to different input channels. And performing matrix multiplication operation on the weight matrix and the input feature vector to obtain an output result of the recurrent neural network.
The weight matrix of the recurrent neural network contains a large number of zero or zero-trending weights that have little effect on the final output result. Traditional computing platforms CPU/GPU may speed up matrix multiplication operations but cannot take advantage of sparse weights in the weight matrix. The sparse weights interact between the storage unit and the computation unit, increasing computation delay and energy consumption. Storing sparse weights also results in wasted memory cells.
In order to fully utilize the sparsity of the weight matrix in the recurrent neural network, the weight tending to zero is clipped. To satisfy hardware accelerator parallel computation, a compression weight matrix is often clipped using rules. The weight of the clipping compression for non-satisfying rules is discarded. This may cause the performance of the original recurrent neural network to degrade.
Disclosure of Invention
The invention aims to provide a sparse matrix multiplication accelerator for a hybrid compressed recurrent neural network, which can compress a weight storage space and reduce the calculation time and power consumption.
The invention provides a sparse matrix multiplication accelerator for a hybrid compressed recurrent neural network, which comprises:
2 groups of multiply-accumulate units for calculating the characteristic values of 2 different output channels in the recurrent neural network;
4 input memories for storing characteristic values of 4 sets of input channels in the recurrent neural network;
2 column combination weight memories for storing the weight matrix compressed by the column combination rules corresponding to the 2 different output channels;
1 variable length coding weight memory and 1 variable length coding index memory, which are used for storing the weight and index of irregular variable length coding compression;
2 output memories for temporarily storing intermediate results in the calculation and final results of the final 2 output channels;
2 second-stage accumulators for reading the intermediate result in the output memory and accumulating the result of the multiply-accumulate unit to update the output result;
and 1 decoder for decoding the variable length compressed weight and transmitting the decoded weight to a corresponding multiply-accumulate unit.
The multiply-accumulate unit comprises a main calculating unit and an auxiliary calculating unit. The main calculating unit is responsible for calculating multiplication operation corresponding to the column combination rule compression weight, the auxiliary calculating unit is responsible for calculating multiplication operation corresponding to the variable length coding irregular compression weight, multiplication results are accumulated in the main calculating unit, and multiplication and accumulation operation of a plurality of input channels is completed.
In the invention, the adder and the secondary accumulator in the main calculation unit form a two-stage addition structure of the accelerator, and multiply-accumulate results of different input channels are accumulated with previous intermediate results in different periods, thereby meeting the calculation requirements of the recurrent neural network containing different numbers of input channels.
In the present invention, the weight memory is divided into a column combination weight memory and a variable length coding weight memory. The former stores the weights after the column assembly rule is compressed. For weights that do not satisfy the compression of the column combination rule, the latter may be compressed and stored by variable length coding. And the residual weight after the cyclic neural network rule compression is ensured not to be discarded, and the network performance is prevented from being reduced.
In the invention, the multiply-accumulate unit and the input memory are interconnected through a 4-to-1 selector to form a reconfigurable interconnection network. And transmitting the input characteristic value corresponding to the compression weight to the multiply-accumulate unit through a selection signal of the control selector.
In the invention, after the variable length coding compression weight is decompressed by the decoder, if the decoding result is zero weight, the auxiliary computing unit is closed, and the computing power consumption is reduced.
The invention combines different input channels of the sparse weight matrix to obtain the weight matrix compressed by the rule, and compresses the weight which does not meet the compression of the rule by using variable length coding. And accelerating the calculation of the compressed recurrent neural network by using a special matrix multiplication accelerator. The invention utilizes the sparsity of the weight matrix of the recurrent neural network and overcomes the problem of precision reduction of the recurrent neural network caused by weight clipping compression. Compared with the CPU/GPU, the method can compress the storage space required by the weight, and simultaneously reduce the calculation time and the power consumption. Compared with the cyclic network compressed by a single rule, the cyclic neural network compressed by the rule and the variable length code has higher precision.
Drawings
Fig. 1 is a circuit block diagram of the present invention.
Fig. 2 is a schematic diagram of compression of column combination rules employed in the present invention.
Fig. 3 is a schematic diagram of variable length coding employed in the present invention.
Detailed Description
The present invention will be described more fully hereinafter in the reference to the accompanying drawings, which provide preferred embodiments of the invention, and which are not to be considered as limited to the embodiments set forth herein.
An embodiment is a sparse matrix multiplication accelerator that mixes compressed recurrent neural networks. Fig. 1 is a circuit block diagram thereof.
The accelerator comprises 2 groups of multiply-accumulate units, 4 input memories, 2 groups of combination weight memories, 1 variable length coding weight memory and 1 variable length coding index memory, 2 output memories, 2 two-stage accumulators and a variable length coding decoder.
Fig. 2 is a schematic diagram of compression of column combination rules employed in the present invention. The weight matrix contains 8 input channels and 4 output channels. Each 4 input channels of the sparse weight matrix are divided into one group, and each group of 4 input channels in one output channel only contains one nonzero weight after being cut. The sparse weights may also be packed in 2 or 3 input channels.
Fig. 3 is a schematic diagram of variable length coding employed in the present invention. For the remaining weights (red part) of the column combination rule of fig. 2, compression is performed using variable length coding. Variable length coding compression results in a data vector and an index vector. The data vector is all non-zero weights. The index vector head element represents the number of non-zero weights in the data vector, and the rest elements represent the number of zero weights before the non-zero weights in the data vector.
For the weight matrix with low sparsity, the primary and secondary calculation units in the multiply-accumulate unit respectively calculate the multiply-accumulate operation of 2 different input channels. The multiplication result of the auxiliary computing unit flows into the adder of the main computing unit to complete summation operation.
When the compressed weight matrix is calculated, the main calculating unit in the multiply-accumulate unit receives the compression weight from the column combination weight memory and the corresponding input characteristic value to complete the multiplication operation. The weight and index of variable length coding compression are transmitted into the auxiliary computing unit through the decoder, and multiplication operation is completed with the corresponding input characteristic value. The two are summed in the main computing unit adder.
When the result after decoding the variable length coding weight is zero, the auxiliary computing unit is closed to reduce the computing power consumption.
The result of the accumulation unit and the intermediate result temporarily stored in the output memory are accumulated in the secondary accumulator, and the intermediate result is updated until the final output characteristic value is obtained.
While the embodiments of the present invention have been described with reference to specific examples, those skilled in the art will readily appreciate that the various illustrative embodiments are capable of providing many other embodiments and that many other advantages and features of the invention are possible. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Claims (9)
1. A sparse matrix multiplication accelerator for a hybrid compressed recurrent neural network, comprising:
2 groups of multiply-accumulate units for calculating the characteristic values of 2 different output channels in the recurrent neural network;
4 input memories for storing characteristic values of 4 sets of input channels in the recurrent neural network;
2 column combination weight memories for storing the weight matrix compressed by the column combination rules corresponding to the 2 different output channels;
1 variable length coding weight memory and 1 variable length coding index memory, which are used for storing the weight and index of irregular variable length coding compression;
2 output memories for temporarily storing intermediate results in the calculation and final results of the final 2 output channels;
2 second-stage accumulators for reading the intermediate result in the output memory and accumulating the result of the multiply-accumulate unit to update the output result;
1 decoder, decoding the weight of variable length compression and transmitting to corresponding multiply-accumulate unit;
the multiply-accumulate unit comprises a main calculating unit and an auxiliary calculating unit; the main calculating unit is responsible for calculating multiplication operation corresponding to the column combination rule compression weight, the auxiliary calculating unit is responsible for calculating multiplication operation corresponding to the variable length coding irregular compression weight, multiplication results are accumulated in the main calculating unit, and multiplication and accumulation operation of a plurality of input channels is completed.
2. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network according to claim 1, wherein the adder and the secondary accumulator in the main computing unit form a two-stage addition structure of the accelerator, and multiply-accumulate results of different input channels are accumulated with previous intermediate results in different periods, so as to meet the computational requirements of the recurrent neural network with different numbers of input channels.
3. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 2, wherein the weight memory is divided into a column combination weight memory and a variable length coding weight memory; the former is used for storing the weight of the compressed column combination rule; for the weight which does not satisfy the compression of the column combination rule, the weight is compressed and stored in the latter by variable length coding; and the residual weight after the cyclic neural network rule compression is ensured not to be discarded, and the network performance is prevented from being reduced.
4. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 3, wherein the multiply-accumulate unit and the input memory are interconnected through a 1-out-of-4 selector to form a reconfigurable interconnection network; and transmitting the input characteristic value corresponding to the compression weight to the multiply-accumulate unit through a selection signal of the control selector.
5. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 4, wherein after the compression weights of the variable length codes are decompressed by the decoder, if the decoding result is zero weight, the auxiliary computing unit is turned off, and the computing power consumption is reduced.
6. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 3, wherein in a column combination weight compression algorithm, the weight matrix has 8 input channels, 4 output channels; each 4 input channels of the sparse weight matrix are divided into one group, and each group of 4 input channels in one output channel only contains one nonzero weight after being cut; or the sparse weights are compressed in groups of 2 or 3 input channels.
7. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 6, wherein for the remaining weights compressed by the column combination rule, compression is performed using variable length coding; carrying out variable length coding compression to obtain a data vector and an index vector; the data vector is all nonzero weights, the first element of the index vector represents the number of the nonzero weights in the data vector, and the rest elements represent the number of the zero weights before the nonzero weights in the data vector.
8. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 7, wherein for a weight matrix with low sparsity, the primary and secondary calculation units in the multiply-accumulate unit respectively calculate multiply-accumulate operations of 2 different input channels; the multiplication result of the auxiliary computing unit flows into the adder of the main computing unit to complete summation operation.
9. The sparse matrix multiplication accelerator of a hybrid compressed recurrent neural network of claim 8, wherein when calculating the compressed weight matrix, the main calculation unit in the multiply-accumulate unit accepts the compressed weights from the column combination weight memory and performs multiplication with the corresponding input eigenvalues; the weight and index of variable length coding compression are transmitted into an auxiliary computing unit through a decoder, and multiplication operation is completed with the corresponding input characteristic value; the two are summed in the main computing unit adder.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911160416.9A CN111008698B (en) | 2019-11-23 | 2019-11-23 | Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911160416.9A CN111008698B (en) | 2019-11-23 | 2019-11-23 | Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111008698A true CN111008698A (en) | 2020-04-14 |
CN111008698B CN111008698B (en) | 2023-05-02 |
Family
ID=70112690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911160416.9A Active CN111008698B (en) | 2019-11-23 | 2019-11-23 | Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111008698B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112395549A (en) * | 2020-11-12 | 2021-02-23 | 华中科技大学 | Reconfigurable matrix multiplication accelerating system for matrix multiplication intensive algorithm |
CN114527930A (en) * | 2021-05-27 | 2022-05-24 | 北京灵汐科技有限公司 | Weight matrix data storage method, data acquisition method and device and electronic equipment |
WO2022247908A1 (en) * | 2021-05-27 | 2022-12-01 | 北京灵汐科技有限公司 | Data storage method and apparatus for weight matrix, data acquisition method and apparatus for weight matrix, and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107229967A (en) * | 2016-08-22 | 2017-10-03 | 北京深鉴智能科技有限公司 | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA |
US20180046916A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
US20180174036A1 (en) * | 2016-12-15 | 2018-06-21 | DeePhi Technology Co., Ltd. | Hardware Accelerator for Compressed LSTM |
CN109245773A (en) * | 2018-10-30 | 2019-01-18 | 南京大学 | A kind of decoding method based on block circulation sparse matrix neural network |
CN109472350A (en) * | 2018-10-30 | 2019-03-15 | 南京大学 | A kind of neural network acceleration system based on block circulation sparse matrix |
CN109635944A (en) * | 2018-12-24 | 2019-04-16 | 西安交通大学 | A kind of sparse convolution neural network accelerator and implementation method |
CN110110851A (en) * | 2019-04-30 | 2019-08-09 | 南京大学 | A kind of the FPGA accelerator and its accelerated method of LSTM neural network |
-
2019
- 2019-11-23 CN CN201911160416.9A patent/CN111008698B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180046916A1 (en) * | 2016-08-11 | 2018-02-15 | Nvidia Corporation | Sparse convolutional neural network accelerator |
CN107229967A (en) * | 2016-08-22 | 2017-10-03 | 北京深鉴智能科技有限公司 | A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA |
US20180174036A1 (en) * | 2016-12-15 | 2018-06-21 | DeePhi Technology Co., Ltd. | Hardware Accelerator for Compressed LSTM |
CN109245773A (en) * | 2018-10-30 | 2019-01-18 | 南京大学 | A kind of decoding method based on block circulation sparse matrix neural network |
CN109472350A (en) * | 2018-10-30 | 2019-03-15 | 南京大学 | A kind of neural network acceleration system based on block circulation sparse matrix |
CN109635944A (en) * | 2018-12-24 | 2019-04-16 | 西安交通大学 | A kind of sparse convolution neural network accelerator and implementation method |
CN110110851A (en) * | 2019-04-30 | 2019-08-09 | 南京大学 | A kind of the FPGA accelerator and its accelerated method of LSTM neural network |
Non-Patent Citations (2)
Title |
---|
张禾; 陈客松: "基于FPGA的稀疏矩阵向量乘的设计研究" * |
曹亚松刘胜: "面向稀疏矩阵向量乘的DMA设计与验证" * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112395549A (en) * | 2020-11-12 | 2021-02-23 | 华中科技大学 | Reconfigurable matrix multiplication accelerating system for matrix multiplication intensive algorithm |
CN112395549B (en) * | 2020-11-12 | 2024-04-19 | 华中科技大学 | Reconfigurable matrix multiplication acceleration system for matrix multiplication intensive algorithm |
CN114527930A (en) * | 2021-05-27 | 2022-05-24 | 北京灵汐科技有限公司 | Weight matrix data storage method, data acquisition method and device and electronic equipment |
WO2022247908A1 (en) * | 2021-05-27 | 2022-12-01 | 北京灵汐科技有限公司 | Data storage method and apparatus for weight matrix, data acquisition method and apparatus for weight matrix, and device |
CN114527930B (en) * | 2021-05-27 | 2024-01-30 | 北京灵汐科技有限公司 | Weight matrix data storage method, data acquisition method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN111008698B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
US20220012593A1 (en) | Neural network accelerator and neural network acceleration method based on structured pruning and low-bit quantization | |
CN111008698B (en) | Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks | |
CN107423816B (en) | Multi-calculation-precision neural network processing method and system | |
US20180046895A1 (en) | Device and method for implementing a sparse neural network | |
CN111832719A (en) | Fixed point quantization convolution neural network accelerator calculation circuit | |
KR20050008761A (en) | Method and system for multi-rate lattice vector quantization of a signal | |
CN107395211B (en) | Data processing method and device based on convolutional neural network model | |
CN112286864B (en) | Sparse data processing method and system for accelerating operation of reconfigurable processor | |
CN110766155A (en) | Deep neural network accelerator based on mixed precision storage | |
CN114697654B (en) | Neural network quantization compression method and system | |
CN113741858B (en) | Memory multiply-add computing method, memory multiply-add computing device, chip and computing equipment | |
CN114615507B (en) | Image coding method, decoding method and related device | |
CN112861996A (en) | Deep neural network model compression method and device, electronic equipment and storage medium | |
CN110363291B (en) | Operation method and device of neural network, computer equipment and storage medium | |
CN111078189B (en) | Sparse matrix multiplication accelerator for cyclic neural network natural language processing | |
US20230342419A1 (en) | Matrix calculation apparatus, method, system, circuit, and device, and chip | |
CN115828044B (en) | Dual sparsity matrix multiplication circuit, method and device based on neural network | |
CN117273092A (en) | Model quantization method and device, electronic equipment and storage medium | |
CN110766136A (en) | Compression method of sparse matrix and vector | |
CN113495669B (en) | Decompression device, accelerator and method for decompression device | |
CN114065923A (en) | Compression method, system and accelerating device of convolutional neural network | |
CN114077893A (en) | Method and equipment for compressing and decompressing neural network model | |
CN113793601B (en) | Voice recognition method and device | |
CN112200301B (en) | Convolution computing device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |