CN109144469A - Pipeline organization neural network matrix operation framework and method - Google Patents

Pipeline organization neural network matrix operation framework and method Download PDF

Info

Publication number
CN109144469A
CN109144469A CN201810813920.3A CN201810813920A CN109144469A CN 109144469 A CN109144469 A CN 109144469A CN 201810813920 A CN201810813920 A CN 201810813920A CN 109144469 A CN109144469 A CN 109144469A
Authority
CN
China
Prior art keywords
input
matrix
vector
column
multiply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810813920.3A
Other languages
Chinese (zh)
Other versions
CN109144469B (en
Inventor
王照钢
毛劲松
徐栋麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Liang Niu Semiconductor Technology Co Ltd
Original Assignee
Shanghai Liang Niu Semiconductor Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Liang Niu Semiconductor Technology Co Ltd filed Critical Shanghai Liang Niu Semiconductor Technology Co Ltd
Priority to CN201810813920.3A priority Critical patent/CN109144469B/en
Publication of CN109144469A publication Critical patent/CN109144469A/en
Application granted granted Critical
Publication of CN109144469B publication Critical patent/CN109144469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/498Computations with decimal numbers radix 12 or 20. using counter-type accumulators
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention proposes a kind of pipeline organization neural network matrix operation frameworks, it includes: accelerator, pass through digital circuit, for operating to input vector A and input matrix B execution pipeline formula are multiply-add to obtain the result of A*B=D, wherein, A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, the first row block of input vector A and input matrix B is multiplied and is added up and exports result, it is further continued for executing multiplying and adding up and export result for next column block in input vector A and input matrix B, iteration repeatedly, until in input vector A and input matrix B last column block also complete multiply and add up and result also export after to get to the multiplied result D of input vector A and input matrix B.

Description

Pipeline organization neural network matrix operation framework and method
Technical field
The present invention relates to digital circuit Integrated design technical fields, and in particular to a kind of pipeline organization neural network matrix Operation framework and method.
Background technique
Neural network mould is passed through such as the feature vector or two-dimensional image data of voice signal for one group of input data The calculating of type can be derived that morpheme information corresponding to the voice signal or two-dimensional image data or the corresponding markup information of image, from Data be input to using neural network model calculate last produce output result generally require to consume a large amount of computing resource or Storage resource.
And it is known that the quality of one piece of performance of integrated circuits mainly handles speed, the stability, material of data from it Material cost and occupied space size etc. are evaluated, and the processing mode of data is related to many aspects such as the speed of operation Performance, various optimizations are carried out by every means in Processing Algorithm with regard to chip designer currently on the market, so as to reach efficiently, Save the cost, the purpose of enhancing product performance, for example, existing neural network matrix operation framework usually has the disadvantages that
1, the dimension of matrix operation is fixed, cannot adaptively change operation scale;
2, it is usually central processing unit CPU via committed memory, is calculated such as RAM, be a kind of software operation, Speed depends on the operation frequency of CPU, a large amount of memory headrooms can be consumed when scale is big, computational efficiency is very low;
3, realize that matrix-vector multiplication is operated by dsp processor, such operation is often serial to be executed, and execution efficiency is low It taking a long time, input vector and the pre-existing ram space of weight matrix formula, the intermediate variable in calculating process is also required to export, Further increase storage and broadband expense.
Summary of the invention
The purpose of the present invention is to provide a kind of pipeline organization neural network matrix operation framework and methods, utilize number Circuit realization includes the accelerator of the counter for multiplying accumulating MAC unit and being equipped with of array arrangement, shift unit realization, Datacycle is inputted in conjunction with circulation theory, realizes that such as pipeline organization is overlapped according to primitive request, it is cumulative to playback, so that Matrix, which multiplies operation with vector, to be executed parallel, for the processing mode of CPU and DSP, greatly improve processing speed Degree, and intermediate result can be stored in local, not consume additional storage overhead;By the auxiliary of controller, realize that dynamic is matched Set dimension, the quantity of counter pulse, the displacement depth of shift unit of the matrix and vector that participate in multiply-add operation.
In order to achieve the above object, the invention is realized by the following technical scheme:
A kind of pipeline organization neural network matrix operation framework, characterized in that include:
Accelerator, by digital circuit, for operation multiply-add to input vector A and input matrix B execution pipeline formula with Obtain the result of A*B=D, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is the vector matrix of 1 row n column Export result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, by input vector A with it is defeated The first row block for entering matrix B is multiplied and is added up and export result, is further continued under executing in input vector A and input matrix B One column block multiplies and adds up and export result, repeatedly iteration, until last column in input vector A and input matrix B Block also complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
Above-mentioned pipeline organization neural network matrix operation framework, wherein the accelerator includes:
Fixed point multiplies accumulating module, for executing the multiply-add operation of continuous-flow type to input vector A and input matrix B;The fixed point multiplies accumulating Module includes several fixed point adder and multipliers run parallel, and two input terminals of each fixed-point multiply-accumulator sequentially input the 1 of vector A Each element in each element and input matrix B respective column block of row m column in respective column, synchronously to execute to defeated respectively Each respective column multiplies and adds up in the respective column block of incoming vector A and input matrix B, and in counter reset after the completion of calculating Multiply accumulating the output and zero of result under the control of the RC reset pulse enable end of each fixed point adder and multiplier of pulse pair, then holds Next respective column block multiplies and adds up in row input vector A and input matrix B;
Counter, for multiplying and tire out in the fixed point every respective column block for having executed an input vector A and input matrix B of adder and multiplier A reset pulse is exported after adding, the pulse is multiply-add to each fixed point by the first register chain generation assembly line reset signal The RC reset pulse enable end of device, and the every flowing water for completing an input vector A and input matrix B of module is multiplied accumulating in fixed point The clearing of itself pulse is controlled after the multiply-add operation of formula;
Shift unit, for controlling the displacement depth of input vector A columns;
First register chain, counter by first register chain to it is each fixed point adder and multiplier RC reset pulse enable end into Row pulse control;
1 row m column element of the second register chain, input vector A is input to each fixed point continually by second register chain Adder and multiplier;
Several third register chain, the correspondence column element in input matrix B respective column block are deposited continually by corresponding third Device chain is input to corresponding fixed point adder and multiplier.
Above-mentioned pipeline organization neural network matrix operation framework, wherein also include:
Controller connects accelerator, for the columns of dynamic configuration input vector A and line number m, the input matrix B of input matrix B Columns n and accelerator in counter pulse quantity, so that shift unit control is to the displacement depth of input vector A columns, With complete an input vector A and input matrix B respective column block multiply-add operation after counter RC reset pulse is made Can end control, and after completing the multiply-add operation of continuous-flow type of input vector A and input matrix B counter controls its The clearing of pulse.
Above-mentioned pipeline organization neural network matrix operation framework, in which:
The controller is realized by CPU.
Above-mentioned pipeline organization neural network matrix operation framework, in which:
The quantity for pinpointing adder and multiplier and third register chain is identical as the columns that each column block of input matrix B is included respectively.
A kind of method of the pipeline organization neural network matrix operation of digital circuit, characterized in that include:
By digital circuit operation multiply-add to input vector A and input matrix B execution pipeline formula to obtain the knot of A*B=D Fruit, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;
The multiply-add operation of the pipeline system refers to, input matrix B divide for multiple and different column blocks, by input vector A with input square The first row block of battle array B is multiplied and is added up and export result, is further continued for executing next column in input vector A and input matrix B Block multiplies and adds up and export result, repeatedly iteration, until last column block in input vector A and input matrix B Complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
Compared with the prior art, the present invention has the following advantages: pass through the hardware-accelerated of the multiply-add equal operations of high speed matrix-vector Quick neural network acceleration capacity is realized, so that counting in real time after data input and model load by above-mentioned operation framework It calculates as a result, further speeding up image or speech recognition process with the speed and efficiency of the promotion neural network computing of big degree.
Detailed description of the invention
Fig. 1 is structural block diagram of the invention;
Fig. 2 is the structural block diagram of accelerator in the present invention;
Fig. 3 is the specific block diagram of accelerator in the embodiment of the present invention.
Specific embodiment
The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.
As shown in Figure 1, the invention proposes a kind of pipeline organization neural network matrix operation framework, it includes:
Accelerator, by digital circuit, for operation multiply-add to input vector A and input matrix B execution pipeline formula with Obtain the result of A*B=D, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is the vector matrix of 1 row n column Export result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, by input vector A with it is defeated The first row block for entering matrix B is multiplied and is added up and export result, is further continued under executing in input vector A and input matrix B One column block multiplies and adds up and export result, repeatedly iteration, until last column in input vector A and input matrix B Block also complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
As shown in Fig. 2, specifically, the accelerator includes:
Fixed point multiplies accumulating module, for executing the multiply-add operation of continuous-flow type to input vector A and input matrix B;The fixed point multiplies accumulating Module includes several fixed point adder and multipliers run parallel, and two input terminals of each fixed-point multiply-accumulator sequentially input the 1 of vector A Each element in each element and input matrix B respective column block of row m column in respective column, synchronously to execute to defeated respectively In the respective column block of incoming vector A and input matrix B each respective column multiply and it is cumulative (in figure, the i-th column in i representing matrix B, B The all elements of i-th column in [:] [i] representing matrix B, the columns that an iteration carries out multiply-add operation is x+1), and calculate It carries out multiplying accumulating knot under the control of the RC reset pulse enable end of each fixed point adder and multiplier of counter reset pulse pair after the completion The output and zero of fruit, then execute multiplying and adding up for next respective column block in input vector A and input matrix B;
Counter (can be cycle tiemr or timer), for fixed point adder and multiplier it is every has executed input vector A with The respective column block of input matrix B multiplies and exports after cumulative a reset pulse, which generates by the first register chain flows Waterline reset signal multiplies accumulating the primary input of the every completion of module in fixed point to the RC reset pulse enable end of each fixed point adder and multiplier The clearing of itself pulse is controlled after the multiply-add operation of continuous-flow type of vector A and input matrix B;
Shift unit, for controlling the displacement depth of input vector A columns;
First register chain, counter by first register chain to it is each fixed point adder and multiplier RC reset pulse enable end into Row pulse control;
1 row m column element of the second register chain, input vector A is input to each fixed point continually by second register chain Adder and multiplier;
Several third register chain, the correspondence column element in input matrix B respective column block are deposited continually by corresponding third Device chain is input to corresponding fixed point adder and multiplier.
The pipeline organization neural network matrix operation framework also includes:
Controller can be realized by CPU, connect accelerator, the columns and input square for dynamic configuration input vector A The quantity of counter pulse in the line number m of battle array B, the columns n of input matrix B and accelerator, so that shift unit is controlled to input The displacement depth of vector A columns, and after completing the multiply-add operation of respective column block of an input vector A and input matrix B Control of the counter to RC reset pulse enable end, and it is multiply-add in the continuous-flow type for completing an input vector A and input matrix B The clearing of its pulse of counter controls after operating.
In the present embodiment, pinpoint the quantity of adder and multiplier and third register chain respectively with each column block institute of input matrix B The columns for including is identical.
It, then can be with for example, increase the quantity of fixed point adder and multiplier specifically, bigger degree of parallelism can be expanded further It is primary to calculate more column, and then reduce the number of iteration.
Below in conjunction with a most preferred embodiment, to further illustrate the realization process of operation framework of the invention:
Be two matrixes in example it is respectively A:1*m and B as shown in figure 3, the structure can once calculate the multiply-add operation of 32 column: M*n, m, n can configure that counter (Timer) is a cycle count in this example to adapt to different scale of neural network Device or timer, m multiply-add operation of every completion will export a reset pulse and carry out accumulation result output and zero, figure In MAC be one 16 fixed point adder and multipliers, one shares 32, i.e. x=31 in the present embodiment, and an iteration carries out multiply-add The columns of operation is 32 column, and the every progress once-through operation of MAC is completed once to multiply and be operated with one-accumulate, and operational formula is c=c+a* B, wherein a is some value in input vector A, and b is some value in input matrix B, and c is accumulated result, and RC is Reset pulse is enabled.Entire matrix operation process is that his 1 row m column element is continuously input to the second deposit by input vector A Device chain (Register Chain), at the same time, input matrix B carry out every 32 column in his m row n column as a unit Input just completes D [1:32]=A [1:m] after input matrix B inputs all m row elements for completing this corresponding 32 column * the operation of B [1:m] [1:32], then carries out iteration next time again, and each iteration requires all of input input vector A Element, but the different lines block of B can be selected, for example first time iteration has selected the 1st to the 32nd of B to arrange, second of iteration selects B The the 33rd to the 64th column.Start to repeat the accumulation operations before just now again while exporting result after iteration, it is different The element of B is different when only adding up, and the rule that adds up is still and original the same;It is taken turns the 32nd time having carried out last A new matrix is obtained after cumulative, its arrangement is such a array of 1*n, and it is clear that this hour counter starts a reset Zero, clearing each time can mean that the operation for completing two matrixes.If also starting new matrix operation or and Last time equally starts to carry out same repetitive operation.It is some other that the advantage of this Matrix Multiplication is that he can effectively be reduced CPU and DSP participates in the energy consumption delay cost that same operation needs;First it effectively avoid such as the data of CPU at It needs to read data, decoding, analysis and execution first when reason, finally output is as a result, and the arithmetic unit of this Matrix Multiplication is carrying out It can directly be entered data into when the processing of data in register chain and carry out operation according to beat, not need to decode;Third Advantage is that such matrix can carry out Random Design, if wanting the matrix of operation bigger, can design register chain It is 32 or 64, but if can be to carry out 16 by circuit design to save the material cost in space and hardware circuit The matrix operation circuit of position fixed-point calculation, only needs to carry out more loop computation during operation;4th this Kind matrix operation hardware circuit improves the service efficiency of the adder in circuit, saves material cost.
The invention also provides a kind of method of the pipeline organization neural network matrix operation of digital circuit, packets Contain:
By digital circuit operation multiply-add to input vector A and input matrix B execution pipeline formula to obtain the knot of A*B=D Fruit, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;
The multiply-add operation of the pipeline system refers to, input matrix B divide for multiple and different column blocks, by input vector A with input square The first row block of battle array B is multiplied and is added up and export result, is further continued for executing next column in input vector A and input matrix B Block multiplies and adds up and export result, repeatedly iteration, until last column block in input vector A and input matrix B Complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
In conclusion the present invention is arranged using the integration of the fixed point adder and multiplier and counter of big array, circulation theory is utilized Data are subjected to circulation input, is overlapped just like pipeline organization according to primitive request, playbacks and add up, by a large-scale matrix Vector operations split into small matrix-vector dimension and are operated, and customized can successively do x+1 parallel vectors and multiply, so that Multiplying operation and can executing parallel for vector matrix, for traditional CPU and DSP processing mode, substantially increases processing Speed, and intermediate result can be stored in local, not consume additional storage overhead, such as vector A and any column of matrix B M times multiply-add result is retained in corresponding adder and multiplier, without carrying out data-moving;For controller, Ke Yishun It accesses to sequence and outputs and inputs data, successively moved into calculative data by shift register, for controller Only need in advance to read in the data of input vector A, while reading in every column data of input matrix B in batches, when all data all It reads in and completes, that is, complete a vector matrix and multiply operation.
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (6)

1. a kind of pipeline organization neural network matrix operation framework, characterized by comprising:
Accelerator, by digital circuit, for operation multiply-add to input vector A and input matrix B execution pipeline formula with Obtain the result of A*B=D, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is the vector matrix of 1 row n column Export result;The multiply-add operation of the pipeline system refers to, input matrix B is divided for multiple and different column blocks, by input vector A with it is defeated The first row block for entering matrix B is multiplied and is added up and export result, is further continued under executing in input vector A and input matrix B One column block multiplies and adds up and export result, repeatedly iteration, until last column in input vector A and input matrix B Block also complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
2. pipeline organization neural network matrix operation framework as described in claim 1, which is characterized in that the accelerator Include:
Fixed point multiplies accumulating module, for executing the multiply-add operation of continuous-flow type to input vector A and input matrix B;The fixed point multiplies accumulating Module includes several fixed point adder and multipliers run parallel, and two input terminals of each fixed-point multiply-accumulator sequentially input the 1 of vector A Each element in each element and input matrix B respective column block of row m column in respective column, synchronously to execute to defeated respectively Each respective column multiplies and adds up in the respective column block of incoming vector A and input matrix B, and in counter reset after the completion of calculating Multiply accumulating the output and zero of result under the control of the RC reset pulse enable end of each fixed point adder and multiplier of pulse pair, then holds Next respective column block multiplies and adds up in row input vector A and input matrix B;
Counter, for multiplying and tire out in the fixed point every respective column block for having executed an input vector A and input matrix B of adder and multiplier A reset pulse is exported after adding, the pulse is multiply-add to each fixed point by the first register chain generation assembly line reset signal The RC reset pulse enable end of device multiplies accumulating the every continuous-flow type for completing an input vector A and input matrix B of module in fixed point and multiplies The clearing of itself pulse is controlled after add operation;
Shift unit, for controlling the displacement depth of input vector A columns;
First register chain, counter by first register chain to it is each fixed point adder and multiplier RC reset pulse enable end into Row pulse control;
1 row m column element of the second register chain, input vector A is input to each fixed point continually by second register chain Adder and multiplier;
Several third register chain, the correspondence column element in input matrix B respective column block are deposited continually by corresponding third Device chain is input to corresponding fixed point adder and multiplier.
3. pipeline organization neural network matrix operation framework as claimed in claim 2, which is characterized in that also include:
Controller connects accelerator, for the columns of dynamic configuration input vector A and line number m, the input matrix B of input matrix B Columns n and accelerator in counter pulse quantity, so that shift unit control is to the displacement depth of input vector A columns, With complete an input vector A and input matrix B respective column block multiply-add operation after counter RC reset pulse is made Can end control, and after completing the multiply-add operation of continuous-flow type of input vector A and input matrix B counter controls its The clearing of pulse.
4. pipeline organization neural network matrix operation framework as described in claim 1, it is characterised in that:
The controller is realized by CPU.
5. pipeline organization neural network matrix operation framework as claimed in claim 2, it is characterised in that:
The quantity for pinpointing adder and multiplier and third register chain is identical as the columns that each column block of input matrix B is included respectively.
6. a kind of method of the pipeline organization neural network matrix operation of digital circuit, characterized by comprising:
By digital circuit operation multiply-add to input vector A and input matrix B execution pipeline formula to obtain the knot of A*B=D Fruit, wherein A is the column vector of a dimension 1*m, and the dimension of B is m*n, and D is that the vector matrix of 1 row n column exports result;
The multiply-add operation of the pipeline system refers to, input matrix B divide for multiple and different column blocks, by input vector A with input square The first row block of battle array B is multiplied and is added up and export result, is further continued for executing next column in input vector A and input matrix B Block multiplies and adds up and export result, repeatedly iteration, until last column block in input vector A and input matrix B Complete multiply and add up and result also export after to get arrive input vector A and input matrix B multiplied result D.
CN201810813920.3A 2018-07-23 2018-07-23 Pipeline structure neural network matrix operation architecture and method Active CN109144469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810813920.3A CN109144469B (en) 2018-07-23 2018-07-23 Pipeline structure neural network matrix operation architecture and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810813920.3A CN109144469B (en) 2018-07-23 2018-07-23 Pipeline structure neural network matrix operation architecture and method

Publications (2)

Publication Number Publication Date
CN109144469A true CN109144469A (en) 2019-01-04
CN109144469B CN109144469B (en) 2023-12-05

Family

ID=64801554

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810813920.3A Active CN109144469B (en) 2018-07-23 2018-07-23 Pipeline structure neural network matrix operation architecture and method

Country Status (1)

Country Link
CN (1) CN109144469B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276047A (en) * 2019-05-18 2019-09-24 南京惟心光电系统有限公司 A method of matrix-vector multiplication is carried out using photoelectricity computing array
CN110738311A (en) * 2019-10-14 2020-01-31 哈尔滨工业大学 LSTM network acceleration method based on high-level synthesis
CN110889259A (en) * 2019-11-06 2020-03-17 北京中科胜芯科技有限公司 Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix
CN112434256A (en) * 2020-12-03 2021-03-02 海光信息技术股份有限公司 Matrix multiplier and processor
CN113266559A (en) * 2021-05-21 2021-08-17 华能秦煤瑞金发电有限责任公司 Neural network-based wireless detection method for concrete delivery pump blockage
WO2022189872A1 (en) * 2021-03-09 2022-09-15 International Business Machines Corporation Resistive memory device for matrix-vector multiplications

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5008833A (en) * 1988-11-18 1991-04-16 California Institute Of Technology Parallel optoelectronic neural network processors
CN102662623A (en) * 2012-04-28 2012-09-12 电子科技大学 Parallel matrix multiplier based on single field programmable gate array (FPGA) and implementation method for parallel matrix multiplier
CN104572011A (en) * 2014-12-22 2015-04-29 上海交通大学 FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof
CN105589677A (en) * 2014-11-17 2016-05-18 沈阳高精数控智能技术股份有限公司 Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5008833A (en) * 1988-11-18 1991-04-16 California Institute Of Technology Parallel optoelectronic neural network processors
CN102662623A (en) * 2012-04-28 2012-09-12 电子科技大学 Parallel matrix multiplier based on single field programmable gate array (FPGA) and implementation method for parallel matrix multiplier
CN105589677A (en) * 2014-11-17 2016-05-18 沈阳高精数控智能技术股份有限公司 Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof
CN104572011A (en) * 2014-12-22 2015-04-29 上海交通大学 FPGA (Field Programmable Gate Array)-based general matrix fixed-point multiplier and calculation method thereof

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276047A (en) * 2019-05-18 2019-09-24 南京惟心光电系统有限公司 A method of matrix-vector multiplication is carried out using photoelectricity computing array
CN110276047B (en) * 2019-05-18 2023-01-17 南京惟心光电系统有限公司 Method for performing matrix vector multiplication operation by using photoelectric calculation array
CN110738311A (en) * 2019-10-14 2020-01-31 哈尔滨工业大学 LSTM network acceleration method based on high-level synthesis
CN110889259A (en) * 2019-11-06 2020-03-17 北京中科胜芯科技有限公司 Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix
CN110889259B (en) * 2019-11-06 2021-07-09 北京中科胜芯科技有限公司 Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix
CN112434256A (en) * 2020-12-03 2021-03-02 海光信息技术股份有限公司 Matrix multiplier and processor
CN112434256B (en) * 2020-12-03 2022-09-13 海光信息技术股份有限公司 Matrix multiplier and processor
WO2022189872A1 (en) * 2021-03-09 2022-09-15 International Business Machines Corporation Resistive memory device for matrix-vector multiplications
GB2619654A (en) * 2021-03-09 2023-12-13 Ibm Resistive memory device for matrix-vector multiplications
CN113266559A (en) * 2021-05-21 2021-08-17 华能秦煤瑞金发电有限责任公司 Neural network-based wireless detection method for concrete delivery pump blockage
CN113266559B (en) * 2021-05-21 2022-10-28 华能秦煤瑞金发电有限责任公司 Neural network-based wireless detection method for concrete delivery pump blockage

Also Published As

Publication number Publication date
CN109144469B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN109144469A (en) Pipeline organization neural network matrix operation framework and method
KR102443546B1 (en) matrix multiplier
US10691996B2 (en) Hardware accelerator for compressed LSTM
CN104899182B (en) A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks
US10698657B2 (en) Hardware accelerator for compressed RNN on FPGA
CN102197369B (en) Apparatus and method for performing SIMD multiply-accumulate operations
CN104915322B (en) A kind of hardware-accelerated method of convolutional neural networks
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN109472350A (en) A kind of neural network acceleration system based on block circulation sparse matrix
CN107704916A (en) A kind of hardware accelerator and method that RNN neutral nets are realized based on FPGA
CN109284824B (en) Reconfigurable technology-based device for accelerating convolution and pooling operation
CN102945224A (en) High-speed variable point FFT (Fast Fourier Transform) processor based on FPGA (Field-Programmable Gate Array) and processing method of high-speed variable point FFT processor
CN116710912A (en) Matrix multiplier and control method thereof
Que et al. Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs
Cho et al. FARNN: FPGA-GPU hybrid acceleration platform for recurrent neural networks
CN109284085B (en) High-speed modular multiplication and modular exponentiation operation method and device based on FPGA
CN107368459A (en) The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication
CN115408061B (en) Hardware acceleration method, device, chip and storage medium for complex matrix operation
Cao et al. FPGA-based accelerator for convolution operations
CN110716751B (en) High-parallelism computing platform, system and computing implementation method
TWI688895B (en) Fast vector multiplication and accumulation circuit
CN102231624B (en) Vector processor-oriented floating point complex number block finite impulse response (FIR) vectorization realization method
Gao et al. FPGA-based accelerator for independently recurrent neural network
CN104598199B (en) The data processing method and system of a kind of Montgomery modular multipliers for smart card
CN1553310A (en) Symmetric cutting algorithm for high-speed low loss multiplier and circuit strucure thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant