CN109284475A - A kind of matrix convolution computing module and matrix convolution calculation method - Google Patents
A kind of matrix convolution computing module and matrix convolution calculation method Download PDFInfo
- Publication number
- CN109284475A CN109284475A CN201811101509.XA CN201811101509A CN109284475A CN 109284475 A CN109284475 A CN 109284475A CN 201811101509 A CN201811101509 A CN 201811101509A CN 109284475 A CN109284475 A CN 109284475A
- Authority
- CN
- China
- Prior art keywords
- register
- group
- memory
- matrix
- multiplier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
- G06F17/153—Multidimensional correlation or convolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Algebra (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Neurology (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Complex Calculations (AREA)
Abstract
This application discloses a kind of matrix convolution computing module and matrix convolution calculation methods, the matrix convolution computing module is provided with (m-1) a memory, first in, first out storage organization is constituted between this (m-1) a memory, when the input data stored in reading External memory equipment, without reading m row input data from External memory equipment simultaneously, input data is sequentially read line by line.Therefore, it using matrix convolution computing module provided by the embodiments of the present application, can reduce to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
Description
Technical field
This application involves electronic technology field, more particularly to a kind of matrix convolution computing module realized based on FPGA and
Matrix convolution calculation method.
Background technique
With the development of science and technology, convolutional neural networks (Convolutional Neural Network, CNN) is answered
With more and more extensive.CNN is a kind of multilayer neural network, and convolutional layer is the important component of CNN, and main operational is to complete
The convolution algorithm of input data and acoustic convolver.Wherein, input data can be presented as that input matrix, acoustic convolver can be presented as volume
Product device matrix, convolution output are presented as output matrix.
If input matrix is XM*N, acoustic convolver matrix is Km*n;General m < < M, n < < N.Then convolution exports are as follows:Wherein, 0≤w≤M-m+1,0≤t≤N-n+1.By above formula it is found that
The operation of an output point needs multiply by each point on acoustic convolver and input data corresponding position tired in completion output matrix
Add.The operation of different output points can be completed from the positional relationship of input data by changing acoustic convolver.In order to complete entire convolution fortune
It calculates, acoustic convolver needs from left to right to move, and every one lattice of movement complete the operation of an output point;Complete the operation of a line output
Afterwards, by acoustic convolver line down, then operation is from left to right moved and completes, to the last the last one output point processing of a line
It completes.
Realize that matrix convolution operation has certain advantage using FPGA, a large amount of multiplication involved in matrix convolution operation and
Accumulating operation can make full use of the resource of FPGA, and the parallel characteristics of FPGA can also bring very big mention to arithmetic speed
It rises.
Referring to Fig. 1, which is the schematic diagram for realizing matrix convolution operation using FPGA in the prior art.With input in Fig. 1
Matrix is 5*5 matrix, acoustic convolver matrix is to be introduced for 3*3 matrix that first element y in output matrix is calculated00's
Calculation.It will be seen from figure 1 that being related to 9 registers 110,9 when calculating output matrix according to input matrix and acoustic convolver
A multiplier 120 and an add tree.Calculate y00Process it is as follows: while reading the 1st row, the 2nd row, the 3rd of input data
Row, each row of data sequentially input 3 input ports, i.e., the first row data enter input port 101, the second row data enter input
After port 102, the third line data enter 103,3 clock cycle of input port, data in each register as shown in Figure 1,
Start to start multiplying;The output of each multiplier at this time is respectively (listing by the sequence of diagram multiplier from the top down):
x0,2*K02、x0,1*K01、x0,0*K00、x1,2*K12、x1,1*K11、x1,0*K10、x2,2*K22、x2,1*K21、x2,0*K20.Each multiplier
Output enter add tree carry out add operation, to obtain y00Value.
It is understood that continuing to input with data, value such as Fig. 2 institute in next clock cycle each register
Show, the output of each multiplier at this time is respectively (listing by the sequence of diagram multiplier from the top down): x0,3*K02、x0,2*K01、
x0,1*K00、x1,3*K12、x1,2*K11、x1,1*K10、x2,3*K22、x2,2*K21、x2,1*K20.The output of each multiplier enters add tree
Add operation is carried out, to obtain y01Value.When the 1st row, the 2nd row, the 3rd row data fully enter completion, output can be obtained
The value of each element of matrix the first row.
It is similar with the value of each element of the 1st row of output matrix is calculated, calculating each element of the 2nd row of output matrix
When value, the 2nd row of input data, the data of the 3rd row and the 4th row are read simultaneously, and each row of data sequentially inputs input port
101, input port 102, input port 103 these three input ports.Calculate the side of the value of each element of the 2nd row of output matrix
Method is similar with the method for value of each element of the 1st row of output matrix is calculated, and details are not described herein again.It is understood that until
It reads last three rows input data to complete, the calculating process of entire output matrix terminates.
It is understood that in practical applications, input and output (input output, IO) interface resource of FPGA has
Limit, and the above-mentioned scheme using FPGA realization matrix convolution requires to read 3 row input datas from External memory equipment simultaneously, then
It is more demanding to external storage device interface bandwidth and fpga chip interface quantity.In fact, being read simultaneously from External memory equipment
The line number for evidence of fetching is related to the dimension of acoustic convolver matrix, if acoustic convolver matrix is the matrix of m*n, needs to deposit from outside simultaneously
It stores up and reads m row input data in equipment.It is understood that m is bigger, external storage device interface bandwidth and fpga chip are connect
Mouth quantitative requirement is higher, and therefore, the applicability of above scheme in practical applications is not strong.
In consideration of it, needing to propose a kind of scheme, can solve the above problem.
Summary of the invention
Technical problems to be solved in this application are in the prior art based on the applicable of the FPGA matrix convolution operation realized
Property is not strong, provides a kind of matrix convolution computing module and matrix convolution calculation method.
In a first aspect, the embodiment of the present application provides a kind of matrix convolution computing module realized based on FPGA, square is inputted
Battle array is M*N matrix, and acoustic convolver matrix is m*n matrix, and M, N, m and n are positive integer, and m is less than or equal to M, n be less than or
Equal to N;The convolutional calculation module includes: m*n register, m*n multiplier, an add tree and (m-1) a memory;
The output end of the m*n multiplier is connected to the input terminal of the add tree;
The input of any one multiplier is the element and the register in acoustic convolver matrix in the m*n multiplier
The value of middle storage;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n are posted
Storage is presented as that m group register, each group of register include n register;The m*n register and the m*n multiplication
Device corresponds;
It is corresponding with i-th group of multiplier in the corresponding n register of i-th group of multiplier in the m group multiplier
The element in the corresponding acoustic convolver matrix of first register in n register are as follows: (i-1) row in the acoustic convolver, the
(n-1) element arranged, the corresponding acoustic convolver matrix of p-th of register in n register corresponding with i-th group of multiplier
In element are as follows: in the acoustic convolver (i-1) row, (n-p) column element;Wherein, the p be more than or equal to 2 be less than etc.
In the integer of n, the i is the integer more than or equal to 1 and less than or equal to m;
(m-1) group register in (m-1) a memory and the m group register corresponds, and described first deposits
The output end of reservoir is connected with the input terminal of first register in first group of register;Wherein, first storage
Device is any one memory in described (m-1) a memory, and first group of register is and the first memory pair
The one group of register answered;The input terminal of one group of register in the m group register in addition to (m-1) group register with deposit
The output end for storing up the External memory equipment of input data is connected;Wherein, the input terminal of one group of register refers to described one group
The input terminal of first register in register;
The storage size of (m-1) a memory is the N;
It is cascade connection between n register in any one group of register in the m group register, when clock arrives
When, the value of j-th of register is updated to the value of (j-1) a register in the n register, in the n register
The value of first register is the value read from the memory, wherein the j is less than or equal to the n and to be greater than 1
Integer;
First in, first out storage organization, the data output end of k-th of memory are constituted between (m-1) a memory
It is connected with the data input pin of (k-1) a memory, wherein k is less than or equal to (m-1).
Optionally, the memory includes: depositing inside the First Input First Output FIFO or FPGA inside FPGA at random
Access to memory RAM.
Second aspect, the embodiment of the present application, which provides, a kind of utilizes matrix convolution described in the above first aspect any one
The method that computing module realizes matrix convolution, input matrix are M*N matrix, and acoustic convolver matrix is m*n matrix, and M, N, m and n are equal
For positive integer, m is less than or equal to M, and n is less than or equal to N;The described method includes:
Input data is read from External memory equipment, the data of first memory are filled with signal, waiting n if detecting
Start multiplier after a clock cycle.
Compared with prior art, the embodiment of the present application has the advantage that
The embodiment of the present application provides a kind of matrix convolution computing module realized based on FPGA, comprising: m*n register,
M*n multiplier, an add tree and (m-1) a memory;The output end of the m*n multiplier is connected to the add tree
Input terminal;The input of any one multiplier is the element in acoustic convolver matrix and the deposit in the m*n multiplier
The value stored in device;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n
Register is presented as that m group register, each group of register include n register;The m*n register and the m*n multiply
Musical instruments used in a Buddhist or Taoist mass corresponds;In the corresponding n register of i-th group of multiplier in the m group multiplier, with i-th group of multiplier
The element in the corresponding acoustic convolver of first register in corresponding n register are as follows: (i-1) row in the acoustic convolver,
In the data of (n-1) column, the corresponding acoustic convolver of p-th of register in n register corresponding with i-th group of multiplier
Element are as follows: in the acoustic convolver (i-1) row, (n-p) column data;Wherein, the p is to be less than or equal to n more than or equal to 2
Integer, the i be more than or equal to 1 and be less than or equal to m integer;In (m-1) a memory and the m group register
(m-1) group register corresponds, first deposit in the output end of the first memory and first group of register
The input terminal of device is connected;Wherein, the first memory is any one memory in described (m-1) a memory, described
First group of register is one group of register corresponding with the first memory;Except described (m-1) group is posted in the m group register
The input terminal of one group of register except storage is connected with the output end of the External memory equipment of storage input data;Wherein, institute
The input terminal for stating one group of register refers to the input terminal of first register in one group of register;(m-1) a storage
The storage size of device is the N;It is that cascade is closed between n register in any one group of register in the m group register
System, when clock arrives, the value of j-th of register is updated to the value of (j-1) a register, the n in the n register
The value of first register in a register is the value read from the memory, wherein the j be less than or equal to
The n and be greater than 1 integer;First in, first out storage organization, k-th of memory are constituted between (m-1) a memory
Data output end be connected with the data input pin of (k-1) a memory, wherein k is less than or equal to (m-1).
That is, in the embodiment of the present application, being provided with (m-1) a memory, being constituted between this (m-1) a memory
First in, first out storage organization, when the input data stored in reading External memory equipment, without being set simultaneously from external storage
Standby middle reading m row input data, sequentially reads input data line by line.Therefore, matrix provided by the embodiments of the present application is utilized
Convolutional calculation module can be reduced to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts,
It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a schematic diagram for realizing matrix convolution operation using FPGA in the prior art;
Fig. 2 is another schematic diagram for realizing matrix convolution operation using FPGA in the prior art;
Fig. 3 is a kind of structural schematic diagram of matrix convolution computing module provided by the embodiments of the present application;
Fig. 4 is the shifting of the storing data provided by the embodiments of the present application from memory when reading data in External memory equipment
Dynamic situation schematic diagram;
Fig. 5 is the schematic diagram provided by the embodiments of the present application that matrix convolution operation is realized with FPGA;
Fig. 6 is another schematic diagram provided by the embodiments of the present application that matrix convolution operation is realized with FPGA.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application
Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this
Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.
Present inventor has found after study, in the prior art, the scheme requirement of matrix convolution is realized using FPGA
Multirow input data is read from External memory equipment simultaneously, then to external storage device interface bandwidth and fpga chip number of ports
It measures more demanding.Specifically, the line number for reading data simultaneously from External memory equipment is related to the dimension of acoustic convolver matrix, if volume
The matrix that product device matrix is m*n then needs to read m row input data from External memory equipment simultaneously.It is understood that m
Bigger, higher to external storage device interface bandwidth and the requirement of fpga chip interface quantity, therefore, above scheme is in practical application
In applicability it is not strong.
To solve the above-mentioned problems, the embodiment of the present application provides a kind of matrix convolution computing module realized based on FPGA,
It include: m*n register, m*n multiplier, an add tree and (m-1) a memory;The output of the m*n multiplier
End is connected to the input terminal of the add tree;The input of any one multiplier is acoustic convolver matrix in the m*n multiplier
In element and the register in the value that stores;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n
A multiplier;The m*n register is presented as that m group register, each group of register include n register;The m*n are posted
Storage and the m*n multiplier correspond;In the corresponding n register of i-th group of multiplier in the m group multiplier,
The element in the corresponding acoustic convolver of first register in n register corresponding with i-th group of multiplier are as follows: the volume
(i-1) row in product device, the data of (n-1) column, p-th in n register corresponding with i-th group of multiplier are deposited
Element in the corresponding acoustic convolver of device are as follows: (i-1) row in the acoustic convolver, the data of (n-p) column;Wherein, the p is big
In being equal to 2 integers for being less than or equal to n, the i is the integer more than or equal to 1 and less than or equal to m;(m-1) a memory with
(m-1) group register in the m group register corresponds, the output end of the first memory and first group of deposit
The input terminal of first register in device is connected;Wherein, the first memory is any in described (m-1) a memory
One memory, first group of register are one group of register corresponding with the first memory;In the m group register
The output of the External memory equipment of the input terminal and storage input data of one group of register in addition to (m-1) organizes register
End is connected;Wherein, the input terminal of one group of register refers to the input terminal of first register in one group of register;Institute
The storage size for stating (m-1) a memory is the N;N register in any one group of register in the m group register
Between be cascade connection, when clock arrives, the value of j-th of register is updated to (j-1) a deposit in the n register
The value of the value of device, first register in the n register is the value read from the memory, wherein the j is
Integer less than or equal to the n and greater than 1;First in, first out storage organization is constituted between (m-1) a memory, it is described
The data output end of k-th of memory is connected with the data input pin of (k-1) a memory, and wherein k is less than or equal to institute
State (m-1).
That is, in the embodiment of the present application, being provided with (m-1) a memory, being constituted between this (m-1) a memory
First in, first out storage organization, when the input data stored in reading External memory equipment, without being set simultaneously from external storage
Standby middle reading m row input data, sequentially reads input data line by line.Therefore, matrix provided by the embodiments of the present application is utilized
Convolutional calculation module can be reduced to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
With reference to the accompanying drawing, the various non-limiting embodiments of the application are described in detail.
Referring to Fig. 3, which is a kind of structural schematic diagram of matrix convolution computing module provided by the embodiments of the present application.
Matrix convolution computing module 300 provided by the embodiments of the present application, can be used for calculating input matrix is M*N matrix,
Acoustic convolver matrix is the convolution results of m*n matrix.Wherein, M, N, m and n are positive integer, and m is less than or equal to M, and n is less than
Or it is equal to N.
The convolutional calculation module 300 includes: 302, m*n register 301, m*n multiplier 303 and of add tree
(m-1) a memory 304, respectively memory 1 and memory 2.It is said so that acoustic convolver matrix is 3*3 matrix as an example in Fig. 3
It is bright, wherein including 9 301,9, register add tree 303 and 2 memories 304 of multiplier 302, one.
Wherein, the output end of the m*n multiplier is connected to the input terminal of the add tree 303, it is to be understood that
The output of the add tree 303 is an element value of the output matrix of the input matrix and the acoustic convolver matrix.
The input of any one multiplier is the element and the register in acoustic convolver matrix in the m*n multiplier
The value of middle storage;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n are posted
Storage is presented as that m group register, each group of register include n register;The m*n register and the m*n multiplication
Device corresponds.As shown in figure 3, the 3*3 register is presented as that 3 groups of registers, each group of register include 3 deposits
Device;The 3*3 register and the 3*3 multiplier correspond.
It is corresponding with i-th group of multiplier in the corresponding n register of i-th group of multiplier in the m group multiplier
The element in the corresponding acoustic convolver matrix of first register in n register are as follows: in the acoustic convolver matrix (i-1)
Row, the data, the corresponding convolution of p-th of register in n register corresponding with i-th group of multiplier of (n-1) column
Element in device are as follows: (i-1) row in the acoustic convolver, the data of (n-p) column;Wherein, the p is to be less than more than or equal to 2
Integer equal to n, the i are the integer for being less than or equal to m more than or equal to 1.Understood in combination with Fig. 3, the 1st group of multiplier is schemed
In 3 from top to bottom in preceding 3 multipliers, the element in the corresponding acoustic convolver matrix of first multiplier is acoustic convolver matrix China
0th row, the element K of the 2nd column02.The 3rd corresponding convolution of register in 3 registers corresponding with the 1st group of multiplier
Element in device are as follows: the 0th row in the acoustic convolver, the element K of the 0th column00。
(m-1) a memory 304 is corresponded with (m-1) group register in the m group register, and described first
The output end of memory is connected with the input terminal of first register in first group of register;Wherein, it described first deposits
Reservoir is any one memory in described (m-1) a memory, and first group of register is and the first memory
Corresponding one group of register.Understood in combination with Fig. 3, the corresponding 1 group of register of memory 1, the output end of memory 1 and the group
The input terminal of first register (i.e. first register of number from top to bottom in Fig. 3) in 3 registers is connected.Memory 2
One group of register is corresponded to, first register in the output end of memory 2 and 3 registers of the group is (i.e. in Fig. 3 from top to bottom
The 4th register of number) input terminal be connected.
The input terminal of one group of register in the m group register in addition to (m-1) group register and storage input
The output end of the External memory equipment of data is connected;Wherein, the input terminal of one group of register refers to one group of register
In first register input terminal.Understood in combination with Fig. 3, the 7th register is posted to the 9th from top to bottom in figure three
These three registers of storage, not corresponding with any one memory, in the embodiment of the present application, the in these three registers
One group of deposit that the input terminal of one register (i.e. the 7th register from top to bottom in Fig. 3), i.e. these three registers are constituted
The input terminal of device is connected with the output end of the External memory equipment of storage input data.
In the embodiment of the present application, the storage size of described (m-1) a memory is the N.That is, a storage
It can store the data line in the input matrix in device.
It in the embodiment of the present application, is grade between n register in any one group of register in the m group register
Connection relationship, when clock arrives, the value of j-th of register is updated to the value of (j-1) a register in the n register,
The value of first register in the n register is the value read from the memory, wherein the j be less than or
Equal to the n, and the integer greater than 1.
It should be noted that clock mentioned herein can refer to the system clock of FPGA, when being also possible to according to system
The clock that clock frequency multiplication or frequency dividing obtain.
It is understood that the value of the j register be the equal of (j-1) a register value with it is described when
Clock signal has played a bat.
First in, first out storage organization, the data output end of k-th of memory are constituted between (m-1) a memory
It is connected with the data input pin of (k-1) a memory, wherein k is less than or equal to (m-1).It is understood that working as
When reading data from External memory equipment, the data in (m-1) a memory can be moved.Reference can be made to Fig. 4 is understood, figure
4 show when reading data from External memory equipment, the situation of movement schematic diagram of storing data in (m-1) a memory.Fig. 4
In show data storage condition by external data storage into memory 401 and 402.Understanding merely for convenience in Fig. 4
It is that 6 data are illustrated with input data, does not constitute the restriction to the embodiment of the present application.
It should be noted that in the embodiment of the present application, the memory includes: the First Input First Output inside FPGA
Random access memory ram inside FIFO or FPGA.
The embodiment of the present application does not limit the External memory equipment specifically, and the External memory equipment can be for example outside
Memory.
In the embodiment of the present application, it is provided with (m-1) a memory, is constituted between this (m-1) a memory and first enters elder generation
Storage organization out, when the input data stored in reading External memory equipment, without reading m from External memory equipment simultaneously
Row input data sequentially reads input data line by line.Therefore, mould is calculated using matrix convolution provided by the embodiments of the present application
Block can be reduced to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
A kind of matrix convolution computing module is described above, it is introduced below to carry out matrix using the matrix convolution computing module
The method of convolutional calculation.
Firstly, read input data from External memory equipment, by the structure of above-mentioned matrix convolution computing module it is found that with
Read input data and increase, in (m-1) a memory may be filled with signal.It is understood that being deposited due to one
Reservoir can store the data line in input matrix, therefore, when detecting first memory in described (m-1) a memory
Data when being filled with signal, it is a to (m-1) to illustrate that (m-1) row data in input matrix have stored in first memory
In memory.In the embodiment of the present application, if detecting, the data of first memory are filled with signal, waiting n clock cycle
After start multiplier, start calculate input matrix and acoustic convolver matrix matrix convolution result.
Below in conjunction with attached drawing, with input matrix for (5*5) matrix, for acoustic convolver matrix is (3*3) matrix, illustrate this Shen
Please embodiment provide realization matrix convolution method.
Firstly, initial phase multiplier does not start, the 1st row data of input matrix are read from External memory equipment, according to
The first row data described in matrix convolution computing module shown in Fig. 3 can be written in memory 2;Then, from External memory equipment after
It resumes studies and takes the 2nd row data of input matrix, memory 2 can be written according to the 2nd row data of matrix convolution computing module shown in Fig. 3
In, the data in simultaneous memory 2 can be written in memory 1.When memory 2 and memory 1 are filled with, set from external storage
It is standby to continue to read the 3rd row data, 3 clock cycle are waited, multiplier is started.What memory 2 exported at this time is the 2nd row data,
What memory 1 exported is the 1st row data;After three clock cycle, the value of each register is as shown in figure 5, at this point it is possible to calculate
Obtain first element y of output matrix00Value;After four clock cycle, the value of each register is as shown in fig. 6, at this point, can
Second element y of output matrix is calculated01Value., once analogize, it can be by the element of the first row of output matrix
Value is all calculated.
It is understood that during the value of the element of the above-mentioned the first row by output matrix is all calculated,
Data have been substituted for the 3rd row data in memory 2, and data have been substituted for the 2nd row data in memory 1, at this time can be from outside
Storage equipment continues to read the 4th row data.To which the data of multiplier input have been updated to the 2nd, 3, the data of 4 rows, from
And the value of each element of the second row in output matrix can be calculated.It is understood that the last data line has been read
At entire matrix convolution, which calculates, to be completed.
By above scheme it is found that using method provided by the embodiments of the present application, it can not only reduce and external storage is set
Standby interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.And input data reading order is simple, and input data is pressed
It is stored sequentially in External memory equipment according to row;In this application by cascading multiple RAM or FIFO in design
The method come enables all input data iteration to use, and reading can once meet the needs of entire matrix convolution operation.
When can solve in the prior art using FPGA calculating matrix convolution, input data is stored in External memory equipment, is needed full
Foot reads the adjacent data of any 3 row simultaneously, higher to data storage sequence, call format, and data management is sufficiently complex and every
Data line, which there is a problem of continuously repeating, reads 3 times, the lower problem of input data reading efficiency.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or
Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following
Claim is pointed out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims
The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and
Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.
Claims (3)
1. a kind of matrix convolution computing module realized based on FPGA, input matrix is M*N matrix, and acoustic convolver matrix is m*n square
Battle array, M, N, m and n are positive integer, and m is less than or equal to M, and n is less than or equal to N;It is characterized in that, the convolutional calculation
Module includes: m*n register, m*n multiplier, an add tree and (m-1) a memory;
The output end of the m*n multiplier is connected to the input terminal of the add tree;
The input of any one multiplier is to deposit in the element and the register in acoustic convolver matrix in the m*n multiplier
The value of storage;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n register
It is presented as that m group register, each group of register include n register;The m*n register and the m*n multiplier one
One is corresponding;
In the corresponding n register of i-th group of multiplier in the m group multiplier, n corresponding with i-th group of multiplier
The element in the corresponding acoustic convolver matrix of first register in register are as follows: (i-1) row, (n- in the acoustic convolver
1) in the element arranged, the corresponding acoustic convolver matrix of p-th of register in n register corresponding with i-th group of multiplier
Element are as follows: in the acoustic convolver (i-1) row, (n-p) column element;Wherein, the p is to be less than or equal to n more than or equal to 2
Integer, the i be more than or equal to 1 and be less than or equal to m integer;
(m-1) group register in (m-1) a memory and the m group register corresponds, the first memory
Output end be connected with the input terminal of first register in first group of register;Wherein, the first memory is
Any one memory in (m-1) a memory, first group of register are corresponding with the first memory
One group of register;The input terminal of one group of register in the m group register in addition to (m-1) group register and storage are defeated
The output end for entering the External memory equipment of data is connected;Wherein, the input terminal of one group of register refers to one group of deposit
The input terminal of first register in device;
The storage size of (m-1) a memory is the N;
It is cascade connection between n register in any one group of register in the m group register, when clock arrives,
The value of j-th of register is updated to the value of (j-1) a register in the n register, and first in the n register
The value of a register is the value read from the memory, wherein the j be less than or equal to the n and greater than 1 it is whole
Number;
First in, first out storage organization, the data output end of k-th of memory and the are constituted between (m-1) a memory
(k-1) data input pin of a memory is connected, and wherein k is less than or equal to (m-1).
2. matrix convolution computing module according to claim 1, which is characterized in that the memory includes:
The random access memory ram inside First Input First Output FIFO or FPGA inside FPGA.
3. a kind of method for realizing matrix convolution using matrix convolution computing module as described in claim 1, input matrix are
M*N matrix, acoustic convolver matrix are m*n matrix, and M, N, m and n are positive integer, and m is less than or equal to M, and n is less than or equal to
N;It is characterized in that, which comprises
Input data is read from External memory equipment, the data of first memory are filled with signal if detecting, when waiting n
Clock starts multiplier after the period.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811101509.XA CN109284475B (en) | 2018-09-20 | 2018-09-20 | Matrix convolution calculating device and matrix convolution calculating method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811101509.XA CN109284475B (en) | 2018-09-20 | 2018-09-20 | Matrix convolution calculating device and matrix convolution calculating method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109284475A true CN109284475A (en) | 2019-01-29 |
CN109284475B CN109284475B (en) | 2021-10-29 |
Family
ID=65181844
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811101509.XA Active CN109284475B (en) | 2018-09-20 | 2018-09-20 | Matrix convolution calculating device and matrix convolution calculating method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109284475B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN110648313A (en) * | 2019-09-05 | 2020-01-03 | 北京智行者科技有限公司 | Laser stripe center line fitting method based on FPGA |
CN111240746A (en) * | 2020-01-12 | 2020-06-05 | 苏州浪潮智能科技有限公司 | Floating point data inverse quantization and quantization method and equipment |
CN112612447A (en) * | 2020-12-31 | 2021-04-06 | 安徽芯纪元科技有限公司 | Matrix calculator and full-connection-layer calculation method based on matrix calculator |
CN113536221A (en) * | 2020-04-21 | 2021-10-22 | 中科寒武纪科技股份有限公司 | Operation method, processor and related product |
CN113986200A (en) * | 2021-10-29 | 2022-01-28 | 上海阵量智能科技有限公司 | Matrix transposition circuit, artificial intelligence chip and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4937774A (en) * | 1988-11-03 | 1990-06-26 | Harris Corporation | East image processing accelerator for real time image processing applications |
US20120051406A1 (en) * | 2010-08-25 | 2012-03-01 | Qualcomm Incorporated | Circuit and method for computing circular convolution in streaming mode |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
CN106844294A (en) * | 2016-12-29 | 2017-06-13 | 华为机器有限公司 | Convolution algorithm chip and communication equipment |
CN107341544A (en) * | 2017-06-30 | 2017-11-10 | 清华大学 | A kind of reconfigurable accelerator and its implementation based on divisible array |
CN107656899A (en) * | 2017-09-27 | 2018-02-02 | 深圳大学 | A kind of mask convolution method and system based on FPGA |
-
2018
- 2018-09-20 CN CN201811101509.XA patent/CN109284475B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4937774A (en) * | 1988-11-03 | 1990-06-26 | Harris Corporation | East image processing accelerator for real time image processing applications |
US20120051406A1 (en) * | 2010-08-25 | 2012-03-01 | Qualcomm Incorporated | Circuit and method for computing circular convolution in streaming mode |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
CN106250103A (en) * | 2016-08-04 | 2016-12-21 | 东南大学 | A kind of convolutional neural networks cyclic convolution calculates the system of data reusing |
CN106844294A (en) * | 2016-12-29 | 2017-06-13 | 华为机器有限公司 | Convolution algorithm chip and communication equipment |
CN107341544A (en) * | 2017-06-30 | 2017-11-10 | 清华大学 | A kind of reconfigurable accelerator and its implementation based on divisible array |
CN107656899A (en) * | 2017-09-27 | 2018-02-02 | 深圳大学 | A kind of mask convolution method and system based on FPGA |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097174A (en) * | 2019-04-22 | 2019-08-06 | 西安交通大学 | Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row |
CN110648313A (en) * | 2019-09-05 | 2020-01-03 | 北京智行者科技有限公司 | Laser stripe center line fitting method based on FPGA |
CN110648313B (en) * | 2019-09-05 | 2022-05-24 | 北京智行者科技有限公司 | Laser stripe center line fitting method based on FPGA |
CN111240746A (en) * | 2020-01-12 | 2020-06-05 | 苏州浪潮智能科技有限公司 | Floating point data inverse quantization and quantization method and equipment |
CN111240746B (en) * | 2020-01-12 | 2023-01-10 | 苏州浪潮智能科技有限公司 | Floating point data inverse quantization and quantization method and equipment |
CN113536221A (en) * | 2020-04-21 | 2021-10-22 | 中科寒武纪科技股份有限公司 | Operation method, processor and related product |
CN113536221B (en) * | 2020-04-21 | 2023-12-15 | 中科寒武纪科技股份有限公司 | Operation method, processor and related products |
CN112612447A (en) * | 2020-12-31 | 2021-04-06 | 安徽芯纪元科技有限公司 | Matrix calculator and full-connection-layer calculation method based on matrix calculator |
CN112612447B (en) * | 2020-12-31 | 2023-12-08 | 安徽芯纪元科技有限公司 | Matrix calculator and full-connection layer calculating method based on same |
CN113986200A (en) * | 2021-10-29 | 2022-01-28 | 上海阵量智能科技有限公司 | Matrix transposition circuit, artificial intelligence chip and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109284475B (en) | 2021-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109284475A (en) | A kind of matrix convolution computing module and matrix convolution calculation method | |
CN106445471B (en) | Processor and the method for performing matrix multiplication on a processor | |
CN107341544B (en) | Reconfigurable accelerator based on divisible array and implementation method thereof | |
CN106875013B (en) | System and method for multi-core optimized recurrent neural networks | |
CN108304923A (en) | Convolution algorithm processing method and Related product | |
CN109101272A (en) | Processing with Neural Network device and its method for executing matrix multiple instruction | |
CN107239824A (en) | Apparatus and method for realizing sparse convolution neutral net accelerator | |
CN110543939B (en) | Hardware acceleration realization device for convolutional neural network backward training based on FPGA | |
CN109948774A (en) | Neural network accelerator and its implementation based on network layer binding operation | |
CN103955447B (en) | FFT accelerator based on DSP chip | |
CN109032670A (en) | Processing with Neural Network device and its method for executing vector duplicate instructions | |
CN105589677A (en) | Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof | |
CN109711533A (en) | Convolutional neural networks module based on FPGA | |
CN109146065B (en) | Convolution operation method and device for two-dimensional data | |
CN110580519B (en) | Convolution operation device and method thereof | |
WO2018027706A1 (en) | Fft processor and algorithm | |
WO2023065983A1 (en) | Computing apparatus, neural network processing device, chip, and data processing method | |
WO2023197526A1 (en) | Data processing method and apparatus, electronic device, and readable storage medium | |
CN115423081A (en) | Neural network accelerator based on CNN _ LSTM algorithm of FPGA | |
CN108073549A (en) | Convolution algorithm device and method | |
CN115310037A (en) | Matrix multiplication computing unit, acceleration unit, computing system and related method | |
CN113869507B (en) | Neural network accelerator convolution calculation device and method based on pulse array | |
CN116720549A (en) | FPGA multi-core two-dimensional convolution acceleration optimization method based on CNN input full cache | |
CN113222129B (en) | Convolution operation processing unit and system based on multi-level cache cyclic utilization | |
CN113301221B (en) | Image processing method of depth network camera and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |