CN109144469A - Pipeline organization neural network matrix operation framework and method - Google Patents
Pipeline organization neural network matrix operation framework and method Download PDFInfo
- Publication number
- CN109144469A CN109144469A CN201810813920.3A CN201810813920A CN109144469A CN 109144469 A CN109144469 A CN 109144469A CN 201810813920 A CN201810813920 A CN 201810813920A CN 109144469 A CN109144469 A CN 109144469A
- Authority
- CN
- China
- Prior art keywords
- input
- matrix
- vector
- column
- multiply
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 125
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 25
- 230000008520 organization Effects 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims description 12
- 239000013598 vector Substances 0.000 claims abstract description 101
- 238000006073 displacement reaction Methods 0.000 claims description 7
- 230000001186 cumulative effect Effects 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/491—Computations with decimal numbers radix 12 or 20.
- G06F7/498—Computations with decimal numbers radix 12 or 20. using counter-type accumulators
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention proposes a kind of pipeline organization neural network matrix operation frameworks, it includes: accelerator, pass through digital circuit, for operating to input vector A and input matrix B execution pipeline formula are multiply-add to obtain the result of A*B=D, wherein, A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, the first row block of input vector A and input matrix B is multiplied and is added up and exports result, it is further continued for executing multiplying and adding up and export result for next column block in input vector A and input matrix B, iteration repeatedly, until in input vector A and input matrix B last column block also complete multiply and add up and result also export after to get to the multiplied result D of input vector A and input matrix B.
Description
Technical field
The present invention relates to digital circuit Integrated design technical fields, and in particular to a kind of pipeline organization neural network matrix
Operation framework and method.
Background technique
Neural network mould is passed through such as the feature vector or two-dimensional image data of voice signal for one group of input data
The calculating of type can be derived that morpheme information corresponding to the voice signal or two-dimensional image data or the corresponding markup information of image, from
Data be input to using neural network model calculate last produce output result generally require to consume a large amount of computing resource or
Storage resource.
And it is known that the quality of one piece of performance of integrated circuits mainly handles speed, the stability, material of data from it
Material cost and occupied space size etc. are evaluated, and the processing mode of data is related to many aspects such as the speed of operation
Performance, various optimizations are carried out by every means in Processing Algorithm with regard to chip designer currently on the market, so as to reach efficiently,
Save the cost, the purpose of enhancing product performance, for example, existing neural network matrix operation framework usually has the disadvantages that
1, the dimension of matrix operation is fixed, cannot adaptively change operation scale;
2, it is usually central processing unit CPU via committed memory, is calculated such as RAM, be a kind of software operation,
Speed depends on the operation frequency of CPU, a large amount of memory headrooms can be consumed when scale is big, computational efficiency is very low;
3, realize that matrix-vector multiplication is operated by dsp processor, such operation is often serial to be executed, and execution efficiency is low
It taking a long time, input vector and the pre-existing ram space of weight matrix formula, the intermediate variable in calculating process is also required to export,
Further increase storage and broadband expense.
Summary of the invention
The purpose of the present invention is to provide a kind of pipeline organization neural network matrix operation framework and methods, utilize number
Circuit realization includes the accelerator of the counter for multiplying accumulating MAC unit and being equipped with of array arrangement, shift unit realization,
Datacycle is inputted in conjunction with circulation theory, realizes that such as pipeline organization is overlapped according to primitive request, it is cumulative to playback, so that
Matrix, which multiplies operation with vector, to be executed parallel, for the processing mode of CPU and DSP, greatly improve processing speed
Degree, and intermediate result can be stored in local, not consume additional storage overhead;By the auxiliary of controller, realize that dynamic is matched
Set dimension, the quantity of counter pulse, the displacement depth of shift unit of the matrix and vector that participate in multiply-add operation.
In order to achieve the above object, the invention is realized by the following technical scheme:
A kind of pipeline organization neural network matrix operation framework, characterized in that include:
Accelerator, by digital circuit, for operation multiply-add to input vector A and input matrix B execution pipeline formula with
Obtain the result of A*B=D, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is the vector matrix of 1 row n column
Export result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, by input vector A with it is defeated
The first row block for entering matrix B is multiplied and is added up and export result, is further continued under executing in input vector A and input matrix B
One column block multiplies and adds up and export result, repeatedly iteration, until last column in input vector A and input matrix B
Block also complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
Above-mentioned pipeline organization neural network matrix operation framework, wherein the accelerator includes:
Fixed point multiplies accumulating module, for executing the multiply-add operation of continuous-flow type to input vector A and input matrix B;The fixed point multiplies accumulating
Module includes several fixed point adder and multipliers run parallel, and two input terminals of each fixed-point multiply-accumulator sequentially input the 1 of vector A
Each element in each element and input matrix B respective column block of row m column in respective column, synchronously to execute to defeated respectively
Each respective column multiplies and adds up in the respective column block of incoming vector A and input matrix B, and in counter reset after the completion of calculating
Multiply accumulating the output and zero of result under the control of the RC reset pulse enable end of each fixed point adder and multiplier of pulse pair, then holds
Next respective column block multiplies and adds up in row input vector A and input matrix B;
Counter, for multiplying and tire out in the fixed point every respective column block for having executed an input vector A and input matrix B of adder and multiplier
A reset pulse is exported after adding, the pulse is multiply-add to each fixed point by the first register chain generation assembly line reset signal
The RC reset pulse enable end of device, and the every flowing water for completing an input vector A and input matrix B of module is multiplied accumulating in fixed point
The clearing of itself pulse is controlled after the multiply-add operation of formula;
Shift unit, for controlling the displacement depth of input vector A columns;
First register chain, counter by first register chain to it is each fixed point adder and multiplier RC reset pulse enable end into
Row pulse control;
1 row m column element of the second register chain, input vector A is input to each fixed point continually by second register chain
Adder and multiplier;
Several third register chain, the correspondence column element in input matrix B respective column block are deposited continually by corresponding third
Device chain is input to corresponding fixed point adder and multiplier.
Above-mentioned pipeline organization neural network matrix operation framework, wherein also include:
Controller connects accelerator, for the columns of dynamic configuration input vector A and line number m, the input matrix B of input matrix B
Columns n and accelerator in counter pulse quantity, so that shift unit control is to the displacement depth of input vector A columns,
With complete an input vector A and input matrix B respective column block multiply-add operation after counter RC reset pulse is made
Can end control, and after completing the multiply-add operation of continuous-flow type of input vector A and input matrix B counter controls its
The clearing of pulse.
Above-mentioned pipeline organization neural network matrix operation framework, in which:
The controller is realized by CPU.
Above-mentioned pipeline organization neural network matrix operation framework, in which:
The quantity for pinpointing adder and multiplier and third register chain is identical as the columns that each column block of input matrix B is included respectively.
A kind of method of the pipeline organization neural network matrix operation of digital circuit, characterized in that include:
By digital circuit operation multiply-add to input vector A and input matrix B execution pipeline formula to obtain the knot of A*B=D
Fruit, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;
The multiply-add operation of the pipeline system refers to, input matrix B divide for multiple and different column blocks, by input vector A with input square
The first row block of battle array B is multiplied and is added up and export result, is further continued for executing next column in input vector A and input matrix B
Block multiplies and adds up and export result, repeatedly iteration, until last column block in input vector A and input matrix B
Complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
Compared with the prior art, the present invention has the following advantages: pass through the hardware-accelerated of the multiply-add equal operations of high speed matrix-vector
Quick neural network acceleration capacity is realized, so that counting in real time after data input and model load by above-mentioned operation framework
It calculates as a result, further speeding up image or speech recognition process with the speed and efficiency of the promotion neural network computing of big degree.
Detailed description of the invention
Fig. 1 is structural block diagram of the invention;
Fig. 2 is the structural block diagram of accelerator in the present invention;
Fig. 3 is the specific block diagram of accelerator in the embodiment of the present invention.
Specific embodiment
The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.
As shown in Figure 1, the invention proposes a kind of pipeline organization neural network matrix operation framework, it includes:
Accelerator, by digital circuit, for operation multiply-add to input vector A and input matrix B execution pipeline formula with
Obtain the result of A*B=D, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is the vector matrix of 1 row n column
Export result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, by input vector A with it is defeated
The first row block for entering matrix B is multiplied and is added up and export result, is further continued under executing in input vector A and input matrix B
One column block multiplies and adds up and export result, repeatedly iteration, until last column in input vector A and input matrix B
Block also complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
As shown in Fig. 2, specifically, the accelerator includes:
Fixed point multiplies accumulating module, for executing the multiply-add operation of continuous-flow type to input vector A and input matrix B;The fixed point multiplies accumulating
Module includes several fixed point adder and multipliers run parallel, and two input terminals of each fixed-point multiply-accumulator sequentially input the 1 of vector A
Each element in each element and input matrix B respective column block of row m column in respective column, synchronously to execute to defeated respectively
In the respective column block of incoming vector A and input matrix B each respective column multiply and it is cumulative (in figure, the i-th column in i representing matrix B, B
The all elements of i-th column in [:] [i] representing matrix B, the columns that an iteration carries out multiply-add operation is x+1), and calculate
It carries out multiplying accumulating knot under the control of the RC reset pulse enable end of each fixed point adder and multiplier of counter reset pulse pair after the completion
The output and zero of fruit, then execute multiplying and adding up for next respective column block in input vector A and input matrix B;
Counter (can be cycle tiemr or timer), for fixed point adder and multiplier it is every has executed input vector A with
The respective column block of input matrix B multiplies and exports after cumulative a reset pulse, which generates by the first register chain flows
Waterline reset signal multiplies accumulating the primary input of the every completion of module in fixed point to the RC reset pulse enable end of each fixed point adder and multiplier
The clearing of itself pulse is controlled after the multiply-add operation of continuous-flow type of vector A and input matrix B;
Shift unit, for controlling the displacement depth of input vector A columns;
First register chain, counter by first register chain to it is each fixed point adder and multiplier RC reset pulse enable end into
Row pulse control;
1 row m column element of the second register chain, input vector A is input to each fixed point continually by second register chain
Adder and multiplier;
Several third register chain, the correspondence column element in input matrix B respective column block are deposited continually by corresponding third
Device chain is input to corresponding fixed point adder and multiplier.
The pipeline organization neural network matrix operation framework also includes:
Controller can be realized by CPU, connect accelerator, the columns and input square for dynamic configuration input vector A
The quantity of counter pulse in the line number m of battle array B, the columns n of input matrix B and accelerator, so that shift unit is controlled to input
The displacement depth of vector A columns, and after completing the multiply-add operation of respective column block of an input vector A and input matrix B
Control of the counter to RC reset pulse enable end, and it is multiply-add in the continuous-flow type for completing an input vector A and input matrix B
The clearing of its pulse of counter controls after operating.
In the present embodiment, pinpoint the quantity of adder and multiplier and third register chain respectively with each column block institute of input matrix B
The columns for including is identical.
It, then can be with for example, increase the quantity of fixed point adder and multiplier specifically, bigger degree of parallelism can be expanded further
It is primary to calculate more column, and then reduce the number of iteration.
Below in conjunction with a most preferred embodiment, to further illustrate the realization process of operation framework of the invention:
Be two matrixes in example it is respectively A:1*m and B as shown in figure 3, the structure can once calculate the multiply-add operation of 32 column:
M*n, m, n can configure that counter (Timer) is a cycle count in this example to adapt to different scale of neural network
Device or timer, m multiply-add operation of every completion will export a reset pulse and carry out accumulation result output and zero, figure
In MAC be one 16 fixed point adder and multipliers, one shares 32, i.e. x=31 in the present embodiment, and an iteration carries out multiply-add
The columns of operation is 32 column, and the every progress once-through operation of MAC is completed once to multiply and be operated with one-accumulate, and operational formula is c=c+a*
B, wherein a is some value in input vector A, and b is some value in input matrix B, and c is accumulated result, and RC is
Reset pulse is enabled.Entire matrix operation process is that his 1 row m column element is continuously input to the second deposit by input vector A
Device chain (Register Chain), at the same time, input matrix B carry out every 32 column in his m row n column as a unit
Input just completes D [1:32]=A [1:m] after input matrix B inputs all m row elements for completing this corresponding 32 column
* the operation of B [1:m] [1:32], then carries out iteration next time again, and each iteration requires all of input input vector A
Element, but the different lines block of B can be selected, for example first time iteration has selected the 1st to the 32nd of B to arrange, second of iteration selects B
The the 33rd to the 64th column.Start to repeat the accumulation operations before just now again while exporting result after iteration, it is different
The element of B is different when only adding up, and the rule that adds up is still and original the same;It is taken turns the 32nd time having carried out last
A new matrix is obtained after cumulative, its arrangement is such a array of 1*n, and it is clear that this hour counter starts a reset
Zero, clearing each time can mean that the operation for completing two matrixes.If also starting new matrix operation or and
Last time equally starts to carry out same repetitive operation.It is some other that the advantage of this Matrix Multiplication is that he can effectively be reduced
CPU and DSP participates in the energy consumption delay cost that same operation needs;First it effectively avoid such as the data of CPU at
It needs to read data, decoding, analysis and execution first when reason, finally output is as a result, and the arithmetic unit of this Matrix Multiplication is carrying out
It can directly be entered data into when the processing of data in register chain and carry out operation according to beat, not need to decode;Third
Advantage is that such matrix can carry out Random Design, if wanting the matrix of operation bigger, can design register chain
It is 32 or 64, but if can be to carry out 16 by circuit design to save the material cost in space and hardware circuit
The matrix operation circuit of position fixed-point calculation, only needs to carry out more loop computation during operation;4th this
Kind matrix operation hardware circuit improves the service efficiency of the adder in circuit, saves material cost.
The invention also provides a kind of method of the pipeline organization neural network matrix operation of digital circuit, packets
Contain:
By digital circuit operation multiply-add to input vector A and input matrix B execution pipeline formula to obtain the knot of A*B=D
Fruit, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;
The multiply-add operation of the pipeline system refers to, input matrix B divide for multiple and different column blocks, by input vector A with input square
The first row block of battle array B is multiplied and is added up and export result, is further continued for executing next column in input vector A and input matrix B
Block multiplies and adds up and export result, repeatedly iteration, until last column block in input vector A and input matrix B
Complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
In conclusion the present invention is arranged using the integration of the fixed point adder and multiplier and counter of big array, circulation theory is utilized
Data are subjected to circulation input, is overlapped just like pipeline organization according to primitive request, playbacks and add up, by a large-scale matrix
Vector operations split into small matrix-vector dimension and are operated, and customized can successively do x+1 parallel vectors and multiply, so that
Multiplying operation and can executing parallel for vector matrix, for traditional CPU and DSP processing mode, substantially increases processing
Speed, and intermediate result can be stored in local, not consume additional storage overhead, such as vector A and any column of matrix B
M times multiply-add result is retained in corresponding adder and multiplier, without carrying out data-moving;For controller, Ke Yishun
It accesses to sequence and outputs and inputs data, successively moved into calculative data by shift register, for controller
Only need in advance to read in the data of input vector A, while reading in every column data of input matrix B in batches, when all data all
It reads in and completes, that is, complete a vector matrix and multiply operation.
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned
Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention
A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.
Claims (6)
1. a kind of pipeline organization neural network matrix operation framework, characterized by comprising:
Accelerator, by digital circuit, for operation multiply-add to input vector A and input matrix B execution pipeline formula with
Obtain the result of A*B=D, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is the vector matrix of 1 row n column
Export result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, by input vector A with it is defeated
The first row block for entering matrix B is multiplied and is added up and export result, is further continued under executing in input vector A and input matrix B
One column block multiplies and adds up and export result, repeatedly iteration, until last column in input vector A and input matrix B
Block also complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
2. pipeline organization neural network matrix operation framework as described in claim 1, which is characterized in that the accelerator
Include:
Fixed point multiplies accumulating module, for executing the multiply-add operation of continuous-flow type to input vector A and input matrix B;The fixed point multiplies accumulating
Module includes several fixed point adder and multipliers run parallel, and two input terminals of each fixed-point multiply-accumulator sequentially input the 1 of vector A
Each element in each element and input matrix B respective column block of row m column in respective column, synchronously to execute to defeated respectively
Each respective column multiplies and adds up in the respective column block of incoming vector A and input matrix B, and in counter reset after the completion of calculating
Multiply accumulating the output and zero of result under the control of the RC reset pulse enable end of each fixed point adder and multiplier of pulse pair, then holds
Next respective column block multiplies and adds up in row input vector A and input matrix B;
Counter, for multiplying and tire out in the fixed point every respective column block for having executed an input vector A and input matrix B of adder and multiplier
A reset pulse is exported after adding, the pulse is multiply-add to each fixed point by the first register chain generation assembly line reset signal
The RC reset pulse enable end of device multiplies accumulating the every continuous-flow type for completing an input vector A and input matrix B of module in fixed point and multiplies
The clearing of itself pulse is controlled after add operation;
Shift unit, for controlling the displacement depth of input vector A columns;
First register chain, counter by first register chain to it is each fixed point adder and multiplier RC reset pulse enable end into
Row pulse control;
1 row m column element of the second register chain, input vector A is input to each fixed point continually by second register chain
Adder and multiplier;
Several third register chain, the correspondence column element in input matrix B respective column block are deposited continually by corresponding third
Device chain is input to corresponding fixed point adder and multiplier.
3. pipeline organization neural network matrix operation framework as claimed in claim 2, which is characterized in that also include:
Controller connects accelerator, for the columns of dynamic configuration input vector A and line number m, the input matrix B of input matrix B
Columns n and accelerator in counter pulse quantity, so that shift unit control is to the displacement depth of input vector A columns,
With complete an input vector A and input matrix B respective column block multiply-add operation after counter RC reset pulse is made
Can end control, and after completing the multiply-add operation of continuous-flow type of input vector A and input matrix B counter controls its
The clearing of pulse.
4. pipeline organization neural network matrix operation framework as described in claim 1, it is characterised in that:
The controller is realized by CPU.
5. pipeline organization neural network matrix operation framework as claimed in claim 2, it is characterised in that:
The quantity for pinpointing adder and multiplier and third register chain is identical as the columns that each column block of input matrix B is included respectively.
6. a kind of method of the pipeline organization neural network matrix operation of digital circuit, characterized by comprising:
By digital circuit operation multiply-add to input vector A and input matrix B execution pipeline formula to obtain the knot of A*B=D
Fruit, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;
The multiply-add operation of the pipeline system refers to, input matrix B divide for multiple and different column blocks, by input vector A with input square
The first row block of battle array B is multiplied and is added up and export result, is further continued for executing next column in input vector A and input matrix B
Block multiplies and adds up and export result, repeatedly iteration, until last column block in input vector A and input matrix B
Complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810813920.3A CN109144469B (en) | 2018-07-23 | 2018-07-23 | Pipeline structure neural network matrix operation architecture and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810813920.3A CN109144469B (en) | 2018-07-23 | 2018-07-23 | Pipeline structure neural network matrix operation architecture and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109144469A true CN109144469A (en) | 2019-01-04 |
CN109144469B CN109144469B (en) | 2023-12-05 |
Family
ID=64801554
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810813920.3A Active CN109144469B (en) | 2018-07-23 | 2018-07-23 | Pipeline structure neural network matrix operation architecture and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109144469B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276047A (en) * | 2019-05-18 | 2019-09-24 | 南京惟心光电系统有限公司 | A method of matrix-vector multiplication is carried out using photoelectricity computing array |
CN110738311A (en) * | 2019-10-14 | 2020-01-31 | 哈尔滨工业大学 | LSTM network acceleration method based on high-level synthesis |
CN110889259A (en) * | 2019-11-06 | 2020-03-17 | 北京中科胜芯科技有限公司 | Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix |
CN112434256A (en) * | 2020-12-03 | 2021-03-02 | 海光信息技术股份有限公司 | Matrix multiplier and processor |
CN113266559A (en) * | 2021-05-21 | 2021-08-17 | 华能秦煤瑞金发电有限责任公司 | Neural network-based wireless detection method for concrete delivery pump blockage |
WO2022189872A1 (en) * | 2021-03-09 | 2022-09-15 | International Business Machines Corporation | Resistive memory device for matrix-vector multiplications |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5008833A (en) * | 1988-11-18 | 1991-04-16 | California Institute Of Technology | Parallel optoelectronic neural network processors |
CN102662623A (en) * | 2012-04-28 | 2012-09-12 | 电子科技大学 | Parallel matrix multiplier based on single field programmable gate array (FPGA) and implementation method for parallel matrix multiplier |
CN104572011A (en) * | 2014-12-22 | 2015-04-29 | 上海交通大学 | FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof |
CN105589677A (en) * | 2014-11-17 | 2016-05-18 | 沈阳高精数控智能技术股份有限公司 | Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof |
-
2018
- 2018-07-23 CN CN201810813920.3A patent/CN109144469B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5008833A (en) * | 1988-11-18 | 1991-04-16 | California Institute Of Technology | Parallel optoelectronic neural network processors |
CN102662623A (en) * | 2012-04-28 | 2012-09-12 | 电子科技大学 | Parallel matrix multiplier based on single field programmable gate array (FPGA) and implementation method for parallel matrix multiplier |
CN105589677A (en) * | 2014-11-17 | 2016-05-18 | 沈阳高精数控智能技术股份有限公司 | Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof |
CN104572011A (en) * | 2014-12-22 | 2015-04-29 | 上海交通大学 | FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276047A (en) * | 2019-05-18 | 2019-09-24 | 南京惟心光电系统有限公司 | A method of matrix-vector multiplication is carried out using photoelectricity computing array |
CN110276047B (en) * | 2019-05-18 | 2023-01-17 | 南京惟心光电系统有限公司 | Method for performing matrix vector multiplication operation by using photoelectric calculation array |
CN110738311A (en) * | 2019-10-14 | 2020-01-31 | 哈尔滨工业大学 | LSTM network acceleration method based on high-level synthesis |
CN110889259A (en) * | 2019-11-06 | 2020-03-17 | 北京中科胜芯科技有限公司 | Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix |
CN110889259B (en) * | 2019-11-06 | 2021-07-09 | 北京中科胜芯科技有限公司 | Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix |
CN112434256A (en) * | 2020-12-03 | 2021-03-02 | 海光信息技术股份有限公司 | Matrix multiplier and processor |
CN112434256B (en) * | 2020-12-03 | 2022-09-13 | 海光信息技术股份有限公司 | Matrix multiplier and processor |
WO2022189872A1 (en) * | 2021-03-09 | 2022-09-15 | International Business Machines Corporation | Resistive memory device for matrix-vector multiplications |
GB2619654A (en) * | 2021-03-09 | 2023-12-13 | Ibm | Resistive memory device for matrix-vector multiplications |
CN113266559A (en) * | 2021-05-21 | 2021-08-17 | 华能秦煤瑞金发电有限责任公司 | Neural network-based wireless detection method for concrete delivery pump blockage |
CN113266559B (en) * | 2021-05-21 | 2022-10-28 | 华能秦煤瑞金发电有限责任公司 | Neural network-based wireless detection method for concrete delivery pump blockage |
Also Published As
Publication number | Publication date |
---|---|
CN109144469B (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109144469A (en) | Pipeline organization neural network matrix operation framework and method | |
KR102443546B1 (en) | matrix multiplier | |
US10691996B2 (en) | Hardware accelerator for compressed LSTM | |
CN104899182B (en) | A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks | |
US10698657B2 (en) | Hardware accelerator for compressed RNN on FPGA | |
CN102197369B (en) | Apparatus and method for performing SIMD multiply-accumulate operations | |
CN104915322B (en) | A kind of hardware-accelerated method of convolutional neural networks | |
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
CN109472350A (en) | A kind of neural network acceleration system based on block circulation sparse matrix | |
CN107704916A (en) | A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA | |
CN109284824B (en) | Reconfigurable technology-based device for accelerating convolution and pooling operation | |
CN102945224A (en) | High-speed variable point FFT (Fast Fourier Transform) processor based on FPGA (Field-Programmable Gate Array) and processing method of high-speed variable point FFT processor | |
CN116710912A (en) | Matrix multiplier and control method thereof | |
Que et al. | Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs | |
Cho et al. | FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks | |
CN109284085B (en) | High-speed modular multiplication and modular exponentiation operation method and device based on FPGA | |
CN107368459A (en) | The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication | |
CN115408061B (en) | Hardware acceleration method, device, chip and storage medium for complex matrix operation | |
Cao et al. | FPGA-based accelerator for convolution operations | |
CN110716751B (en) | High-parallelism computing platform, system and computing implementation method | |
TWI688895B (en) | Fast vector multiplication and accumulation circuit | |
CN102231624B (en) | Vector processor-oriented floating point complex number block finite impulse response (FIR) vectorization realization method | |
Gao et al. | FPGA-based accelerator for independently recurrent neural network | |
CN104598199B (en) | The data processing method and system of a kind of Montgomery modular multipliers for smart card | |
CN1553310A (en) | Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |