CN111008698B - Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks - Google Patents

Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks Download PDF

Info

Publication number
CN111008698B
CN111008698B CN201911160416.9A CN201911160416A CN111008698B CN 111008698 B CN111008698 B CN 111008698B CN 201911160416 A CN201911160416 A CN 201911160416A CN 111008698 B CN111008698 B CN 111008698B
Authority
CN
China
Prior art keywords
weight
compression
variable length
neural network
cyclic neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911160416.9A
Other languages
Chinese (zh)
Other versions
CN111008698A (en
Inventor
刘诗玮
张怡云
史传进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN201911160416.9A priority Critical patent/CN111008698B/en
Publication of CN111008698A publication Critical patent/CN111008698A/en
Application granted granted Critical
Publication of CN111008698B publication Critical patent/CN111008698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a sparse matrix multiplication accelerator for a hybrid compression cyclic neural network. The accelerator includes: the 2-group multiply-accumulate unit is used for calculating the characteristic values of 2 different output channels in the network; 4 input memories, 2 column combination weight memories, 1 variable length coding weight memory and 1 variable length coding index memory, and storing weights and indexes of irregular variable length coding compression; 2 secondary accumulators for reading the intermediate result from the output memory and accumulating the calculated result from the multiply-accumulate unit to update the output result; and 1 decoder for decoding and transmitting the variable length compressed weight to the corresponding multiply-accumulate unit. According to the invention, the sparse weight matrix is compressed by utilizing the sparsity of the weights in the network, so that the original cyclic network precision is ensured, the weight storage space is reduced, the calculation speed is increased, and the calculation power consumption is reduced.

Description

Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks
Technical Field
The invention belongs to the technical field of integrated circuits, and particularly relates to a sparse matrix multiplication accelerator for a hybrid compression cyclic neural network.
Background
Thanks to the continuous development of the recurrent neural network, the recurrent neural network is widely applied to natural language processing tasks such as text classification, machine translation, speech synthesis and the like. Weights corresponding to different output channels in the cyclic neural network form each row of a weight matrix, and columns of the weight matrix correspond to different input channels. And performing matrix multiplication operation on the weight matrix and the input eigenvector to obtain an output result of the cyclic neural network.
The weight matrix of the recurrent neural network contains a large number of zero weights or weights towards zero, which have little impact on the final output result. Conventional computing platform CPUs/GPUs can accelerate matrix multiplication operations, but cannot utilize sparse weights in the weight matrix. The sparse weight interacts between the storage unit and the computing unit, increasing computation delay and energy consumption. Storing sparse weights also creates a waste of memory cells.
In order to fully utilize the sparsity of the weight matrix in the cyclic neural network, the weights tending to zero are cut. To meet hardware accelerator parallel computation, a rule cut compression weight matrix is often used. The weights for the rule clipping compression are discarded. This may result in reduced performance of the orthocyclic neural network.
Disclosure of Invention
The invention aims to provide a sparse matrix multiplication accelerator for a hybrid compression cyclic neural network, which can compress a weight storage space and reduce calculation time and power consumption.
The invention provides a sparse matrix multiplication accelerator for a hybrid compression cyclic neural network, which comprises the following components:
the 2-group multiply-accumulate unit is used for calculating the characteristic values of 2 different output channels in the cyclic neural network;
4 input memories for storing the characteristic values of 4 groups of input channels in the cyclic neural network;
2 column combination weight memories for storing weight matrixes compressed by column combination rules corresponding to 2 different output channels;
1 variable length coding weight memory and 1 variable length coding index memory for storing irregular variable length coding compressed weight and index;
2 output memories for temporarily storing intermediate results in calculation and final results of the final 2 output channels;
2 secondary accumulators for reading the intermediate result from the output memory and accumulating the calculated result from the multiply-accumulate unit to update the output result;
and 1 decoder for decoding and transmitting the variable length compressed weight to the corresponding multiply-accumulate unit.
The multiply-accumulate unit comprises a main computing unit and an auxiliary computing unit. The main calculation unit is responsible for calculating multiplication corresponding to the column combination rule compression weight, the auxiliary calculation unit is responsible for calculating multiplication corresponding to the variable length coding irregular compression weight, multiplication results are accumulated in the main calculation unit, and multiply-accumulate operation of a plurality of input channels is completed.
In the invention, the adder and the two-stage accumulator in the main calculation unit form a two-stage addition structure of the accelerator, and multiply-accumulate results of different input channels are accumulated with previous intermediate results in different periods, so that the calculation requirement of a cyclic neural network with different numbers of input channels is met.
In the present invention, the weight memory is divided into a column combination weight memory and a variable length coding weight memory. The former stores the weights after column combination rule compression. For weights that do not meet column combination rule compression, the compression may be stored in the latter by variable length coding. The residual weight after the rule compression of the cyclic neural network is ensured not to be truncated, and the network performance is prevented from being reduced.
In the invention, the multiply-accumulate unit and the input memory are interconnected through a 4-select 1 selector to form a reconfigurable interconnection network. And transmitting the input characteristic value corresponding to the compression weight to the multiply-accumulate unit through a selection signal of the control selector.
In the invention, after the variable length coding compression weight is decompressed by the decoder, if the decoding result is zero weight, the auxiliary calculation unit is closed, and the calculation power consumption is reduced.
The invention combines different input channels of the sparse weight matrix to obtain the weight matrix of regular compression, and compresses the weights which do not meet the regular compression by using variable length codes. And accelerating the calculation of the compressed cyclic neural network by using a special matrix multiplication accelerator. The invention utilizes the sparsity of the weight matrix of the cyclic neural network, and solves the problem of the decline of the precision of the cyclic neural network caused by weight clipping and compression. Compared with a CPU/GPU, the method can compress the storage space required by the weight and simultaneously reduce the calculation time and the power consumption. Compared with a single-rule compressed cyclic network, the cyclic neural network after the rule compression and the variable length coding compression has higher precision.
Drawings
Fig. 1 is a circuit block diagram of the present invention.
Fig. 2 is a schematic diagram of column combination rule compression employed by the present invention.
Fig. 3 is a schematic diagram of variable length coding employed in the present invention.
Detailed Description
The present invention will be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are provided, but should not be construed as being limited to the embodiments set forth herein.
An embodiment is a sparse matrix multiplication accelerator for a hybrid compression cyclic neural network. Fig. 1 is a circuit block diagram thereof.
The accelerator comprises 2 groups of multiply-accumulate units, 4 input memories, 2 column combination weight memories, 1 variable length coding weight memory and 1 variable length coding index memory, 2 output memories, 2 secondary accumulators and a variable length coding decoder.
Fig. 2 is a schematic diagram of column combination rule compression employed in the present invention. The weight matrix contains 8 input channels and 4 output channels. The sparse weight matrix is divided into a group of 4 input channels, and each group of 4 input channels in one output channel only contains one non-zero weight after being cut. The sparse weights may also be packet compressed in 2 or 3 input channels.
Fig. 3 is a schematic diagram of variable length coding employed in the present invention. For the column combination rule compression of fig. 2, the remaining weights (red part) are compressed using variable length coding. Variable length coding compression results in a data vector and an index vector. The data vector is all non-zero weights. The first element of the index vector represents the number of non-zero weights in the data vector, and the remaining elements represent the number of zero weights before the non-zero weights in the data vector.
For the weight matrix with low sparsity, the main and auxiliary calculation units in the multiply-accumulate unit calculate multiply-accumulate operations of 2 different input channels respectively. The multiplication result of the auxiliary calculation unit flows into the adder of the main calculation unit to complete the summation operation.
When the compressed weight matrix is calculated, a main calculation unit in the multiplication accumulation unit receives the compression weight from the column combination weight memory and the corresponding input characteristic value to complete multiplication operation. The weight and index of variable length coding compression are transmitted into an auxiliary calculation unit through a decoder, and multiplication operation is completed with the corresponding input characteristic value. Both are summed in a main computing unit adder.
When the result after the variable length coding weight is decoded is zero, the auxiliary calculation unit is turned off to reduce the calculation power consumption.
The result of the accumulation unit and the intermediate result temporarily stored in the output memory are accumulated in the second-stage accumulator, and the intermediate result is updated until the final output characteristic value is obtained.
The foregoing describes embodiments of the present invention with specific examples, and other advantages and effects of the present invention will be readily apparent to those skilled in the art from the disclosure herein. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

Claims (9)

1. A sparse matrix multiplication accelerator for a hybrid compression cyclic neural network, comprising:
the 2 groups of multiply-accumulate units are used for calculating the calculation results of the corresponding multiply-accumulate units of 2 different output channels in the cyclic neural network;
the multiplication accumulation unit comprises a main calculation unit and an auxiliary calculation unit; the main calculation unit is responsible for calculating multiplication corresponding to the column combination rule compression weight, the auxiliary calculation unit is responsible for calculating multiplication corresponding to the variable length coding irregular compression weight, multiplication results are accumulated in the main calculation unit, and multiply-accumulate operation of a plurality of input channels is completed;
4 input memories for storing the characteristic values of 4 groups of input channels in the cyclic neural network;
2 column combination weight memories for storing weight matrixes compressed by column combination rules corresponding to 2 different output channels;
1 variable length coding weight memory and 1 variable length coding index memory for storing irregular variable length coding compressed weight and index;
2 output memories for temporarily storing intermediate results in calculation and final results of the final 2 output channels;
the 2 secondary accumulators are used for reading the intermediate result in the output memory and the calculation result of the multiply-accumulate unit, accumulating the intermediate result and the calculation result and updating the output result;
and 1 decoder for decoding and transmitting the variable length compressed weight to the corresponding multiply-accumulate unit.
2. The sparse matrix multiplication accelerator of the hybrid compression cyclic neural network of claim 1, wherein the adder and the two-stage accumulator in the main computing unit form a two-stage addition structure of the accelerator, and multiplication accumulation results of different input channels accumulate with previous intermediate results in different periods, so that the cyclic neural network computing requirements of different numbers of input channels are met.
3. The sparse matrix multiplication accelerator of a hybrid compression cyclic neural network of claim 2, wherein the weight memory is divided into a column combining weight memory and a variable length coding weight memory; the former is used for storing the weights after the column combination rule compression; for the weights which do not meet the compression of the column combination rule, compressing and storing the weights in the latter by variable length coding; the residual weight after the rule compression of the cyclic neural network is ensured not to be truncated, and the network performance is prevented from being reduced.
4. The sparse matrix multiplication accelerator of the hybrid compression cyclic neural network of claim 3, wherein the multiply-accumulate unit is interconnected with the input memory through a 1-to-4 selector to form a reconfigurable interconnection network; and transmitting the input characteristic value corresponding to the compression weight to the multiply-accumulate unit through a selection signal of the control selector.
5. The sparse matrix multiplication accelerator of the hybrid compression cyclic neural network of claim 4, wherein the auxiliary computation unit is turned off to reduce computation power consumption if the decoding result is zero after the variable length coding compression weights are decompressed by the decoder.
6. A sparse matrix multiplication accelerator for a hybrid compression cyclic neural network according to claim 3, wherein in the compression algorithm for the column combining weight memory, the sparse weight matrix contains 8 input channels, 4 output channels; the sparse weight matrix is divided into a group of every 4 input channels, and each group of 4 input channels in one output channel only contains one non-zero weight after being cut; or the sparse weight matrix performs packet compression according to 2 or 3 input channels.
7. The sparse matrix multiplication accelerator of a hybrid compression cyclic neural network of claim 6, wherein for column combination rule compression of the remaining weights, variable length coding is employed for compression; variable length coding compression is carried out to obtain a data vector and an index vector; the data vector is all non-zero weights, the first element of the index vector represents the number of the non-zero weights in the data vector, and the rest elements represent the number of the non-zero weights before the zero weights in the data vector.
8. The sparse matrix multiplication accelerator of the hybrid compression cyclic neural network of claim 7, wherein for a weight matrix with low sparsity, a main calculation unit and an auxiliary calculation unit in the multiplication accumulation unit calculate multiplication accumulation operations of 2 different input channels respectively; the multiplication result of the auxiliary calculation unit flows into the adder of the main calculation unit to complete the summation operation.
9. The sparse matrix multiplication accelerator of the hybrid compression cyclic neural network of claim 8, wherein when calculating the compressed weight matrix, a main calculation unit in the multiply-accumulate unit accepts compression weights from the column combination weight memory and corresponding input eigenvalues to complete multiplication; the weight and index of variable length coding compression are transmitted into an auxiliary calculation unit through a decoder, and multiplication operation is completed with the corresponding input characteristic value; both are summed in a main computing unit adder.
CN201911160416.9A 2019-11-23 2019-11-23 Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks Active CN111008698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911160416.9A CN111008698B (en) 2019-11-23 2019-11-23 Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911160416.9A CN111008698B (en) 2019-11-23 2019-11-23 Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks

Publications (2)

Publication Number Publication Date
CN111008698A CN111008698A (en) 2020-04-14
CN111008698B true CN111008698B (en) 2023-05-02

Family

ID=70112690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911160416.9A Active CN111008698B (en) 2019-11-23 2019-11-23 Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks

Country Status (1)

Country Link
CN (1) CN111008698B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395549B (en) * 2020-11-12 2024-04-19 华中科技大学 Reconfigurable matrix multiplication acceleration system for matrix multiplication intensive algorithm
US20240046113A1 (en) * 2021-05-27 2024-02-08 Lynxi Technologies Co., Ltd. Data storage method, data acquisition method, data acquisition apparatus for a weight matrix, and device
CN114527930B (en) * 2021-05-27 2024-01-30 北京灵汐科技有限公司 Weight matrix data storage method, data acquisition method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN109245773A (en) * 2018-10-30 2019-01-18 南京大学 A kind of decoding method based on block circulation sparse matrix neural network
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10997496B2 (en) * 2016-08-11 2021-05-04 Nvidia Corporation Sparse convolutional neural network accelerator
US10691996B2 (en) * 2016-12-15 2020-06-23 Beijing Deephi Intelligent Technology Co., Ltd. Hardware accelerator for compressed LSTM

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN109245773A (en) * 2018-10-30 2019-01-18 南京大学 A kind of decoding method based on block circulation sparse matrix neural network
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN109635944A (en) * 2018-12-24 2019-04-16 西安交通大学 A kind of sparse convolution neural network accelerator and implementation method
CN110110851A (en) * 2019-04-30 2019-08-09 南京大学 A kind of the FPGA accelerator and its accelerated method of LSTM neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张禾 ; 陈客松.基于FPGA的稀疏矩阵向量乘的设计研究.《计算机应用研究》.2014,第31卷(第06期),1756-1759. *
曹亚松刘胜.面向稀疏矩阵向量乘的DMA设计与验证.《计算机与数字工程》.2019,第47卷(第11期),2686-2690. *

Also Published As

Publication number Publication date
CN111008698A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111008698B (en) Sparse matrix multiplication accelerator for hybrid compression cyclic neural networks
CN111832719A (en) Fixed point quantization convolution neural network accelerator calculation circuit
CN107423816B (en) Multi-calculation-precision neural network processing method and system
US20180046895A1 (en) Device and method for implementing a sparse neural network
CN107836083B (en) Method, apparatus and system for semantic value data compression and decompression
US10872295B1 (en) Residual quantization of bit-shift weights in an artificial neural network
CN107395211B (en) Data processing method and device based on convolutional neural network model
CN112286864B (en) Sparse data processing method and system for accelerating operation of reconfigurable processor
EP4008057B1 (en) Lossless exponent and lossy mantissa weight compression for training deep neural networks
CN110766155A (en) Deep neural network accelerator based on mixed precision storage
CN113283587B (en) Winograd convolution operation acceleration method and acceleration module
CN114647399B (en) Low-energy-consumption high-precision approximate parallel fixed-width multiplication accumulation device
CN115276666B (en) Efficient data transmission method for equipment training simulator
CN114615507B (en) Image coding method, decoding method and related device
CN111078189B (en) Sparse matrix multiplication accelerator for cyclic neural network natural language processing
CN116205244B (en) Digital signal processing structure
US11962335B2 (en) Compression and decompression in hardware for data processing
CN111492369B (en) Residual quantization of shift weights in artificial neural networks
CN117273092A (en) Model quantization method and device, electronic equipment and storage medium
KR20200050895A (en) Log-quantized mac for stochastic computing and accelerator comprising the same
CN111897513B (en) Multiplier based on reverse polarity technology and code generation method thereof
CN114065923A (en) Compression method, system and accelerating device of convolutional neural network
Mahmood et al. Efficient compression scheme for large natural text using zipf distribution
CN113495669A (en) Decompression device, accelerator and method for decompression device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant