CN109284475A - A kind of matrix convolution computing module and matrix convolution calculation method - Google Patents

A kind of matrix convolution computing module and matrix convolution calculation method Download PDF

Info

Publication number
CN109284475A
CN109284475A CN201811101509.XA CN201811101509A CN109284475A CN 109284475 A CN109284475 A CN 109284475A CN 201811101509 A CN201811101509 A CN 201811101509A CN 109284475 A CN109284475 A CN 109284475A
Authority
CN
China
Prior art keywords
register
group
memory
matrix
multiplier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811101509.XA
Other languages
Chinese (zh)
Other versions
CN109284475B (en
Inventor
满宏涛
王振江
李拓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201811101509.XA priority Critical patent/CN109284475B/en
Publication of CN109284475A publication Critical patent/CN109284475A/en
Application granted granted Critical
Publication of CN109284475B publication Critical patent/CN109284475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Algebra (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Complex Calculations (AREA)

Abstract

This application discloses a kind of matrix convolution computing module and matrix convolution calculation methods, the matrix convolution computing module is provided with (m-1) a memory, first in, first out storage organization is constituted between this (m-1) a memory, when the input data stored in reading External memory equipment, without reading m row input data from External memory equipment simultaneously, input data is sequentially read line by line.Therefore, it using matrix convolution computing module provided by the embodiments of the present application, can reduce to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.

Description

A kind of matrix convolution computing module and matrix convolution calculation method
Technical field
This application involves electronic technology field, more particularly to a kind of matrix convolution computing module realized based on FPGA and Matrix convolution calculation method.
Background technique
With the development of science and technology, convolutional neural networks (Convolutional Neural Network, CNN) is answered With more and more extensive.CNN is a kind of multilayer neural network, and convolutional layer is the important component of CNN, and main operational is to complete The convolution algorithm of input data and acoustic convolver.Wherein, input data can be presented as that input matrix, acoustic convolver can be presented as volume Product device matrix, convolution output are presented as output matrix.
If input matrix is XM*N, acoustic convolver matrix is Km*n;General m < < M, n < < N.Then convolution exports are as follows:Wherein, 0≤w≤M-m+1,0≤t≤N-n+1.By above formula it is found that The operation of an output point needs multiply by each point on acoustic convolver and input data corresponding position tired in completion output matrix Add.The operation of different output points can be completed from the positional relationship of input data by changing acoustic convolver.In order to complete entire convolution fortune It calculates, acoustic convolver needs from left to right to move, and every one lattice of movement complete the operation of an output point;Complete the operation of a line output Afterwards, by acoustic convolver line down, then operation is from left to right moved and completes, to the last the last one output point processing of a line It completes.
Realize that matrix convolution operation has certain advantage using FPGA, a large amount of multiplication involved in matrix convolution operation and Accumulating operation can make full use of the resource of FPGA, and the parallel characteristics of FPGA can also bring very big mention to arithmetic speed It rises.
Referring to Fig. 1, which is the schematic diagram for realizing matrix convolution operation using FPGA in the prior art.With input in Fig. 1 Matrix is 5*5 matrix, acoustic convolver matrix is to be introduced for 3*3 matrix that first element y in output matrix is calculated00's Calculation.It will be seen from figure 1 that being related to 9 registers 110,9 when calculating output matrix according to input matrix and acoustic convolver A multiplier 120 and an add tree.Calculate y00Process it is as follows: while reading the 1st row, the 2nd row, the 3rd of input data Row, each row of data sequentially input 3 input ports, i.e., the first row data enter input port 101, the second row data enter input After port 102, the third line data enter 103,3 clock cycle of input port, data in each register as shown in Figure 1, Start to start multiplying;The output of each multiplier at this time is respectively (listing by the sequence of diagram multiplier from the top down): x0,2*K02、x0,1*K01、x0,0*K00、x1,2*K12、x1,1*K11、x1,0*K10、x2,2*K22、x2,1*K21、x2,0*K20.Each multiplier Output enter add tree carry out add operation, to obtain y00Value.
It is understood that continuing to input with data, value such as Fig. 2 institute in next clock cycle each register Show, the output of each multiplier at this time is respectively (listing by the sequence of diagram multiplier from the top down): x0,3*K02、x0,2*K01、 x0,1*K00、x1,3*K12、x1,2*K11、x1,1*K10、x2,3*K22、x2,2*K21、x2,1*K20.The output of each multiplier enters add tree Add operation is carried out, to obtain y01Value.When the 1st row, the 2nd row, the 3rd row data fully enter completion, output can be obtained The value of each element of matrix the first row.
It is similar with the value of each element of the 1st row of output matrix is calculated, calculating each element of the 2nd row of output matrix When value, the 2nd row of input data, the data of the 3rd row and the 4th row are read simultaneously, and each row of data sequentially inputs input port 101, input port 102, input port 103 these three input ports.Calculate the side of the value of each element of the 2nd row of output matrix Method is similar with the method for value of each element of the 1st row of output matrix is calculated, and details are not described herein again.It is understood that until It reads last three rows input data to complete, the calculating process of entire output matrix terminates.
It is understood that in practical applications, input and output (input output, IO) interface resource of FPGA has Limit, and the above-mentioned scheme using FPGA realization matrix convolution requires to read 3 row input datas from External memory equipment simultaneously, then It is more demanding to external storage device interface bandwidth and fpga chip interface quantity.In fact, being read simultaneously from External memory equipment The line number for evidence of fetching is related to the dimension of acoustic convolver matrix, if acoustic convolver matrix is the matrix of m*n, needs to deposit from outside simultaneously It stores up and reads m row input data in equipment.It is understood that m is bigger, external storage device interface bandwidth and fpga chip are connect Mouth quantitative requirement is higher, and therefore, the applicability of above scheme in practical applications is not strong.
In consideration of it, needing to propose a kind of scheme, can solve the above problem.
Summary of the invention
Technical problems to be solved in this application are in the prior art based on the applicable of the FPGA matrix convolution operation realized Property is not strong, provides a kind of matrix convolution computing module and matrix convolution calculation method.
In a first aspect, the embodiment of the present application provides a kind of matrix convolution computing module realized based on FPGA, square is inputted Battle array is M*N matrix, and acoustic convolver matrix is m*n matrix, and M, N, m and n are positive integer, and m is less than or equal to M, n be less than or Equal to N;The convolutional calculation module includes: m*n register, m*n multiplier, an add tree and (m-1) a memory;
The output end of the m*n multiplier is connected to the input terminal of the add tree;
The input of any one multiplier is the element and the register in acoustic convolver matrix in the m*n multiplier The value of middle storage;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n are posted Storage is presented as that m group register, each group of register include n register;The m*n register and the m*n multiplication Device corresponds;
It is corresponding with i-th group of multiplier in the corresponding n register of i-th group of multiplier in the m group multiplier The element in the corresponding acoustic convolver matrix of first register in n register are as follows: (i-1) row in the acoustic convolver, the (n-1) element arranged, the corresponding acoustic convolver matrix of p-th of register in n register corresponding with i-th group of multiplier In element are as follows: in the acoustic convolver (i-1) row, (n-p) column element;Wherein, the p be more than or equal to 2 be less than etc. In the integer of n, the i is the integer more than or equal to 1 and less than or equal to m;
(m-1) group register in (m-1) a memory and the m group register corresponds, and described first deposits The output end of reservoir is connected with the input terminal of first register in first group of register;Wherein, first storage Device is any one memory in described (m-1) a memory, and first group of register is and the first memory pair The one group of register answered;The input terminal of one group of register in the m group register in addition to (m-1) group register with deposit The output end for storing up the External memory equipment of input data is connected;Wherein, the input terminal of one group of register refers to described one group The input terminal of first register in register;
The storage size of (m-1) a memory is the N;
It is cascade connection between n register in any one group of register in the m group register, when clock arrives When, the value of j-th of register is updated to the value of (j-1) a register in the n register, in the n register The value of first register is the value read from the memory, wherein the j is less than or equal to the n and to be greater than 1 Integer;
First in, first out storage organization, the data output end of k-th of memory are constituted between (m-1) a memory It is connected with the data input pin of (k-1) a memory, wherein k is less than or equal to (m-1).
Optionally, the memory includes: depositing inside the First Input First Output FIFO or FPGA inside FPGA at random Access to memory RAM.
Second aspect, the embodiment of the present application, which provides, a kind of utilizes matrix convolution described in the above first aspect any one The method that computing module realizes matrix convolution, input matrix are M*N matrix, and acoustic convolver matrix is m*n matrix, and M, N, m and n are equal For positive integer, m is less than or equal to M, and n is less than or equal to N;The described method includes:
Input data is read from External memory equipment, the data of first memory are filled with signal, waiting n if detecting Start multiplier after a clock cycle.
Compared with prior art, the embodiment of the present application has the advantage that
The embodiment of the present application provides a kind of matrix convolution computing module realized based on FPGA, comprising: m*n register, M*n multiplier, an add tree and (m-1) a memory;The output end of the m*n multiplier is connected to the add tree Input terminal;The input of any one multiplier is the element in acoustic convolver matrix and the deposit in the m*n multiplier The value stored in device;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n Register is presented as that m group register, each group of register include n register;The m*n register and the m*n multiply Musical instruments used in a Buddhist or Taoist mass corresponds;In the corresponding n register of i-th group of multiplier in the m group multiplier, with i-th group of multiplier The element in the corresponding acoustic convolver of first register in corresponding n register are as follows: (i-1) row in the acoustic convolver, In the data of (n-1) column, the corresponding acoustic convolver of p-th of register in n register corresponding with i-th group of multiplier Element are as follows: in the acoustic convolver (i-1) row, (n-p) column data;Wherein, the p is to be less than or equal to n more than or equal to 2 Integer, the i be more than or equal to 1 and be less than or equal to m integer;In (m-1) a memory and the m group register (m-1) group register corresponds, first deposit in the output end of the first memory and first group of register The input terminal of device is connected;Wherein, the first memory is any one memory in described (m-1) a memory, described First group of register is one group of register corresponding with the first memory;Except described (m-1) group is posted in the m group register The input terminal of one group of register except storage is connected with the output end of the External memory equipment of storage input data;Wherein, institute The input terminal for stating one group of register refers to the input terminal of first register in one group of register;(m-1) a storage The storage size of device is the N;It is that cascade is closed between n register in any one group of register in the m group register System, when clock arrives, the value of j-th of register is updated to the value of (j-1) a register, the n in the n register The value of first register in a register is the value read from the memory, wherein the j be less than or equal to The n and be greater than 1 integer;First in, first out storage organization, k-th of memory are constituted between (m-1) a memory Data output end be connected with the data input pin of (k-1) a memory, wherein k is less than or equal to (m-1).
That is, in the embodiment of the present application, being provided with (m-1) a memory, being constituted between this (m-1) a memory First in, first out storage organization, when the input data stored in reading External memory equipment, without being set simultaneously from external storage Standby middle reading m row input data, sequentially reads input data line by line.Therefore, matrix provided by the embodiments of the present application is utilized Convolutional calculation module can be reduced to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a schematic diagram for realizing matrix convolution operation using FPGA in the prior art;
Fig. 2 is another schematic diagram for realizing matrix convolution operation using FPGA in the prior art;
Fig. 3 is a kind of structural schematic diagram of matrix convolution computing module provided by the embodiments of the present application;
Fig. 4 is the shifting of the storing data provided by the embodiments of the present application from memory when reading data in External memory equipment Dynamic situation schematic diagram;
Fig. 5 is the schematic diagram provided by the embodiments of the present application that matrix convolution operation is realized with FPGA;
Fig. 6 is another schematic diagram provided by the embodiments of the present application that matrix convolution operation is realized with FPGA.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.
Present inventor has found after study, in the prior art, the scheme requirement of matrix convolution is realized using FPGA Multirow input data is read from External memory equipment simultaneously, then to external storage device interface bandwidth and fpga chip number of ports It measures more demanding.Specifically, the line number for reading data simultaneously from External memory equipment is related to the dimension of acoustic convolver matrix, if volume The matrix that product device matrix is m*n then needs to read m row input data from External memory equipment simultaneously.It is understood that m Bigger, higher to external storage device interface bandwidth and the requirement of fpga chip interface quantity, therefore, above scheme is in practical application In applicability it is not strong.
To solve the above-mentioned problems, the embodiment of the present application provides a kind of matrix convolution computing module realized based on FPGA, It include: m*n register, m*n multiplier, an add tree and (m-1) a memory;The output of the m*n multiplier End is connected to the input terminal of the add tree;The input of any one multiplier is acoustic convolver matrix in the m*n multiplier In element and the register in the value that stores;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n A multiplier;The m*n register is presented as that m group register, each group of register include n register;The m*n are posted Storage and the m*n multiplier correspond;In the corresponding n register of i-th group of multiplier in the m group multiplier, The element in the corresponding acoustic convolver of first register in n register corresponding with i-th group of multiplier are as follows: the volume (i-1) row in product device, the data of (n-1) column, p-th in n register corresponding with i-th group of multiplier are deposited Element in the corresponding acoustic convolver of device are as follows: (i-1) row in the acoustic convolver, the data of (n-p) column;Wherein, the p is big In being equal to 2 integers for being less than or equal to n, the i is the integer more than or equal to 1 and less than or equal to m;(m-1) a memory with (m-1) group register in the m group register corresponds, the output end of the first memory and first group of deposit The input terminal of first register in device is connected;Wherein, the first memory is any in described (m-1) a memory One memory, first group of register are one group of register corresponding with the first memory;In the m group register The output of the External memory equipment of the input terminal and storage input data of one group of register in addition to (m-1) organizes register End is connected;Wherein, the input terminal of one group of register refers to the input terminal of first register in one group of register;Institute The storage size for stating (m-1) a memory is the N;N register in any one group of register in the m group register Between be cascade connection, when clock arrives, the value of j-th of register is updated to (j-1) a deposit in the n register The value of the value of device, first register in the n register is the value read from the memory, wherein the j is Integer less than or equal to the n and greater than 1;First in, first out storage organization is constituted between (m-1) a memory, it is described The data output end of k-th of memory is connected with the data input pin of (k-1) a memory, and wherein k is less than or equal to institute State (m-1).
That is, in the embodiment of the present application, being provided with (m-1) a memory, being constituted between this (m-1) a memory First in, first out storage organization, when the input data stored in reading External memory equipment, without being set simultaneously from external storage Standby middle reading m row input data, sequentially reads input data line by line.Therefore, matrix provided by the embodiments of the present application is utilized Convolutional calculation module can be reduced to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
With reference to the accompanying drawing, the various non-limiting embodiments of the application are described in detail.
Referring to Fig. 3, which is a kind of structural schematic diagram of matrix convolution computing module provided by the embodiments of the present application.
Matrix convolution computing module 300 provided by the embodiments of the present application, can be used for calculating input matrix is M*N matrix, Acoustic convolver matrix is the convolution results of m*n matrix.Wherein, M, N, m and n are positive integer, and m is less than or equal to M, and n is less than Or it is equal to N.
The convolutional calculation module 300 includes: 302, m*n register 301, m*n multiplier 303 and of add tree (m-1) a memory 304, respectively memory 1 and memory 2.It is said so that acoustic convolver matrix is 3*3 matrix as an example in Fig. 3 It is bright, wherein including 9 301,9, register add tree 303 and 2 memories 304 of multiplier 302, one.
Wherein, the output end of the m*n multiplier is connected to the input terminal of the add tree 303, it is to be understood that The output of the add tree 303 is an element value of the output matrix of the input matrix and the acoustic convolver matrix.
The input of any one multiplier is the element and the register in acoustic convolver matrix in the m*n multiplier The value of middle storage;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n are posted Storage is presented as that m group register, each group of register include n register;The m*n register and the m*n multiplication Device corresponds.As shown in figure 3, the 3*3 register is presented as that 3 groups of registers, each group of register include 3 deposits Device;The 3*3 register and the 3*3 multiplier correspond.
It is corresponding with i-th group of multiplier in the corresponding n register of i-th group of multiplier in the m group multiplier The element in the corresponding acoustic convolver matrix of first register in n register are as follows: in the acoustic convolver matrix (i-1) Row, the data, the corresponding convolution of p-th of register in n register corresponding with i-th group of multiplier of (n-1) column Element in device are as follows: (i-1) row in the acoustic convolver, the data of (n-p) column;Wherein, the p is to be less than more than or equal to 2 Integer equal to n, the i are the integer for being less than or equal to m more than or equal to 1.Understood in combination with Fig. 3, the 1st group of multiplier is schemed In 3 from top to bottom in preceding 3 multipliers, the element in the corresponding acoustic convolver matrix of first multiplier is acoustic convolver matrix China 0th row, the element K of the 2nd column02.The 3rd corresponding convolution of register in 3 registers corresponding with the 1st group of multiplier Element in device are as follows: the 0th row in the acoustic convolver, the element K of the 0th column00
(m-1) a memory 304 is corresponded with (m-1) group register in the m group register, and described first The output end of memory is connected with the input terminal of first register in first group of register;Wherein, it described first deposits Reservoir is any one memory in described (m-1) a memory, and first group of register is and the first memory Corresponding one group of register.Understood in combination with Fig. 3, the corresponding 1 group of register of memory 1, the output end of memory 1 and the group The input terminal of first register (i.e. first register of number from top to bottom in Fig. 3) in 3 registers is connected.Memory 2 One group of register is corresponded to, first register in the output end of memory 2 and 3 registers of the group is (i.e. in Fig. 3 from top to bottom The 4th register of number) input terminal be connected.
The input terminal of one group of register in the m group register in addition to (m-1) group register and storage input The output end of the External memory equipment of data is connected;Wherein, the input terminal of one group of register refers to one group of register In first register input terminal.Understood in combination with Fig. 3, the 7th register is posted to the 9th from top to bottom in figure three These three registers of storage, not corresponding with any one memory, in the embodiment of the present application, the in these three registers One group of deposit that the input terminal of one register (i.e. the 7th register from top to bottom in Fig. 3), i.e. these three registers are constituted The input terminal of device is connected with the output end of the External memory equipment of storage input data.
In the embodiment of the present application, the storage size of described (m-1) a memory is the N.That is, a storage It can store the data line in the input matrix in device.
It in the embodiment of the present application, is grade between n register in any one group of register in the m group register Connection relationship, when clock arrives, the value of j-th of register is updated to the value of (j-1) a register in the n register, The value of first register in the n register is the value read from the memory, wherein the j be less than or Equal to the n, and the integer greater than 1.
It should be noted that clock mentioned herein can refer to the system clock of FPGA, when being also possible to according to system The clock that clock frequency multiplication or frequency dividing obtain.
It is understood that the value of the j register be the equal of (j-1) a register value with it is described when Clock signal has played a bat.
First in, first out storage organization, the data output end of k-th of memory are constituted between (m-1) a memory It is connected with the data input pin of (k-1) a memory, wherein k is less than or equal to (m-1).It is understood that working as When reading data from External memory equipment, the data in (m-1) a memory can be moved.Reference can be made to Fig. 4 is understood, figure 4 show when reading data from External memory equipment, the situation of movement schematic diagram of storing data in (m-1) a memory.Fig. 4 In show data storage condition by external data storage into memory 401 and 402.Understanding merely for convenience in Fig. 4 It is that 6 data are illustrated with input data, does not constitute the restriction to the embodiment of the present application.
It should be noted that in the embodiment of the present application, the memory includes: the First Input First Output inside FPGA Random access memory ram inside FIFO or FPGA.
The embodiment of the present application does not limit the External memory equipment specifically, and the External memory equipment can be for example outside Memory.
In the embodiment of the present application, it is provided with (m-1) a memory, is constituted between this (m-1) a memory and first enters elder generation Storage organization out, when the input data stored in reading External memory equipment, without reading m from External memory equipment simultaneously Row input data sequentially reads input data line by line.Therefore, mould is calculated using matrix convolution provided by the embodiments of the present application Block can be reduced to external storage device interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.
A kind of matrix convolution computing module is described above, it is introduced below to carry out matrix using the matrix convolution computing module The method of convolutional calculation.
Firstly, read input data from External memory equipment, by the structure of above-mentioned matrix convolution computing module it is found that with Read input data and increase, in (m-1) a memory may be filled with signal.It is understood that being deposited due to one Reservoir can store the data line in input matrix, therefore, when detecting first memory in described (m-1) a memory Data when being filled with signal, it is a to (m-1) to illustrate that (m-1) row data in input matrix have stored in first memory In memory.In the embodiment of the present application, if detecting, the data of first memory are filled with signal, waiting n clock cycle After start multiplier, start calculate input matrix and acoustic convolver matrix matrix convolution result.
Below in conjunction with attached drawing, with input matrix for (5*5) matrix, for acoustic convolver matrix is (3*3) matrix, illustrate this Shen Please embodiment provide realization matrix convolution method.
Firstly, initial phase multiplier does not start, the 1st row data of input matrix are read from External memory equipment, according to The first row data described in matrix convolution computing module shown in Fig. 3 can be written in memory 2;Then, from External memory equipment after It resumes studies and takes the 2nd row data of input matrix, memory 2 can be written according to the 2nd row data of matrix convolution computing module shown in Fig. 3 In, the data in simultaneous memory 2 can be written in memory 1.When memory 2 and memory 1 are filled with, set from external storage It is standby to continue to read the 3rd row data, 3 clock cycle are waited, multiplier is started.What memory 2 exported at this time is the 2nd row data, What memory 1 exported is the 1st row data;After three clock cycle, the value of each register is as shown in figure 5, at this point it is possible to calculate Obtain first element y of output matrix00Value;After four clock cycle, the value of each register is as shown in fig. 6, at this point, can Second element y of output matrix is calculated01Value., once analogize, it can be by the element of the first row of output matrix Value is all calculated.
It is understood that during the value of the element of the above-mentioned the first row by output matrix is all calculated, Data have been substituted for the 3rd row data in memory 2, and data have been substituted for the 2nd row data in memory 1, at this time can be from outside Storage equipment continues to read the 4th row data.To which the data of multiplier input have been updated to the 2nd, 3, the data of 4 rows, from And the value of each element of the second row in output matrix can be calculated.It is understood that the last data line has been read At entire matrix convolution, which calculates, to be completed.
By above scheme it is found that using method provided by the embodiments of the present application, it can not only reduce and external storage is set Standby interface bandwidth and the requirement of fpga chip interface quantity, strong applicability.And input data reading order is simple, and input data is pressed It is stored sequentially in External memory equipment according to row;In this application by cascading multiple RAM or FIFO in design The method come enables all input data iteration to use, and reading can once meet the needs of entire matrix convolution operation. When can solve in the prior art using FPGA calculating matrix convolution, input data is stored in External memory equipment, is needed full Foot reads the adjacent data of any 3 row simultaneously, higher to data storage sequence, call format, and data management is sufficiently complex and every Data line, which there is a problem of continuously repeating, reads 3 times, the lower problem of input data reading efficiency.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are by following Claim is pointed out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims
The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.

Claims (3)

1. a kind of matrix convolution computing module realized based on FPGA, input matrix is M*N matrix, and acoustic convolver matrix is m*n square Battle array, M, N, m and n are positive integer, and m is less than or equal to M, and n is less than or equal to N;It is characterized in that, the convolutional calculation Module includes: m*n register, m*n multiplier, an add tree and (m-1) a memory;
The output end of the m*n multiplier is connected to the input terminal of the add tree;
The input of any one multiplier is to deposit in the element and the register in acoustic convolver matrix in the m*n multiplier The value of storage;The m*n multiplier is presented as that m group multiplier, one group of multiplier include n multiplier;The m*n register It is presented as that m group register, each group of register include n register;The m*n register and the m*n multiplier one One is corresponding;
In the corresponding n register of i-th group of multiplier in the m group multiplier, n corresponding with i-th group of multiplier The element in the corresponding acoustic convolver matrix of first register in register are as follows: (i-1) row, (n- in the acoustic convolver 1) in the element arranged, the corresponding acoustic convolver matrix of p-th of register in n register corresponding with i-th group of multiplier Element are as follows: in the acoustic convolver (i-1) row, (n-p) column element;Wherein, the p is to be less than or equal to n more than or equal to 2 Integer, the i be more than or equal to 1 and be less than or equal to m integer;
(m-1) group register in (m-1) a memory and the m group register corresponds, the first memory Output end be connected with the input terminal of first register in first group of register;Wherein, the first memory is Any one memory in (m-1) a memory, first group of register are corresponding with the first memory One group of register;The input terminal of one group of register in the m group register in addition to (m-1) group register and storage are defeated The output end for entering the External memory equipment of data is connected;Wherein, the input terminal of one group of register refers to one group of deposit The input terminal of first register in device;
The storage size of (m-1) a memory is the N;
It is cascade connection between n register in any one group of register in the m group register, when clock arrives, The value of j-th of register is updated to the value of (j-1) a register in the n register, and first in the n register The value of a register is the value read from the memory, wherein the j be less than or equal to the n and greater than 1 it is whole Number;
First in, first out storage organization, the data output end of k-th of memory and the are constituted between (m-1) a memory (k-1) data input pin of a memory is connected, and wherein k is less than or equal to (m-1).
2. matrix convolution computing module according to claim 1, which is characterized in that the memory includes:
The random access memory ram inside First Input First Output FIFO or FPGA inside FPGA.
3. a kind of method for realizing matrix convolution using matrix convolution computing module as described in claim 1, input matrix are M*N matrix, acoustic convolver matrix are m*n matrix, and M, N, m and n are positive integer, and m is less than or equal to M, and n is less than or equal to N;It is characterized in that, which comprises
Input data is read from External memory equipment, the data of first memory are filled with signal if detecting, when waiting n Clock starts multiplier after the period.
CN201811101509.XA 2018-09-20 2018-09-20 Matrix convolution calculating device and matrix convolution calculating method Active CN109284475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811101509.XA CN109284475B (en) 2018-09-20 2018-09-20 Matrix convolution calculating device and matrix convolution calculating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811101509.XA CN109284475B (en) 2018-09-20 2018-09-20 Matrix convolution calculating device and matrix convolution calculating method

Publications (2)

Publication Number Publication Date
CN109284475A true CN109284475A (en) 2019-01-29
CN109284475B CN109284475B (en) 2021-10-29

Family

ID=65181844

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811101509.XA Active CN109284475B (en) 2018-09-20 2018-09-20 Matrix convolution calculating device and matrix convolution calculating method

Country Status (1)

Country Link
CN (1) CN109284475B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN110648313A (en) * 2019-09-05 2020-01-03 北京智行者科技有限公司 Laser stripe center line fitting method based on FPGA
CN111240746A (en) * 2020-01-12 2020-06-05 苏州浪潮智能科技有限公司 Floating point data inverse quantization and quantization method and equipment
CN112612447A (en) * 2020-12-31 2021-04-06 安徽芯纪元科技有限公司 Matrix calculator and full-connection-layer calculation method based on matrix calculator
CN113536221A (en) * 2020-04-21 2021-10-22 中科寒武纪科技股份有限公司 Operation method, processor and related product
CN113986200A (en) * 2021-10-29 2022-01-28 上海阵量智能科技有限公司 Matrix transposition circuit, artificial intelligence chip and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4937774A (en) * 1988-11-03 1990-06-26 Harris Corporation East image processing accelerator for real time image processing applications
US20120051406A1 (en) * 2010-08-25 2012-03-01 Qualcomm Incorporated Circuit and method for computing circular convolution in streaming mode
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
CN106844294A (en) * 2016-12-29 2017-06-13 华为机器有限公司 Convolution algorithm chip and communication equipment
CN107341544A (en) * 2017-06-30 2017-11-10 清华大学 A kind of reconfigurable accelerator and its implementation based on divisible array
CN107656899A (en) * 2017-09-27 2018-02-02 深圳大学 A kind of mask convolution method and system based on FPGA

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4937774A (en) * 1988-11-03 1990-06-26 Harris Corporation East image processing accelerator for real time image processing applications
US20120051406A1 (en) * 2010-08-25 2012-03-01 Qualcomm Incorporated Circuit and method for computing circular convolution in streaming mode
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106250103A (en) * 2016-08-04 2016-12-21 东南大学 A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
CN106844294A (en) * 2016-12-29 2017-06-13 华为机器有限公司 Convolution algorithm chip and communication equipment
CN107341544A (en) * 2017-06-30 2017-11-10 清华大学 A kind of reconfigurable accelerator and its implementation based on divisible array
CN107656899A (en) * 2017-09-27 2018-02-02 深圳大学 A kind of mask convolution method and system based on FPGA

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097174A (en) * 2019-04-22 2019-08-06 西安交通大学 Preferential convolutional neural networks implementation method, system and device are exported based on FPGA and row
CN110648313A (en) * 2019-09-05 2020-01-03 北京智行者科技有限公司 Laser stripe center line fitting method based on FPGA
CN110648313B (en) * 2019-09-05 2022-05-24 北京智行者科技有限公司 Laser stripe center line fitting method based on FPGA
CN111240746A (en) * 2020-01-12 2020-06-05 苏州浪潮智能科技有限公司 Floating point data inverse quantization and quantization method and equipment
CN111240746B (en) * 2020-01-12 2023-01-10 苏州浪潮智能科技有限公司 Floating point data inverse quantization and quantization method and equipment
CN113536221A (en) * 2020-04-21 2021-10-22 中科寒武纪科技股份有限公司 Operation method, processor and related product
CN113536221B (en) * 2020-04-21 2023-12-15 中科寒武纪科技股份有限公司 Operation method, processor and related products
CN112612447A (en) * 2020-12-31 2021-04-06 安徽芯纪元科技有限公司 Matrix calculator and full-connection-layer calculation method based on matrix calculator
CN112612447B (en) * 2020-12-31 2023-12-08 安徽芯纪元科技有限公司 Matrix calculator and full-connection layer calculating method based on same
CN113986200A (en) * 2021-10-29 2022-01-28 上海阵量智能科技有限公司 Matrix transposition circuit, artificial intelligence chip and electronic equipment

Also Published As

Publication number Publication date
CN109284475B (en) 2021-10-29

Similar Documents

Publication Publication Date Title
CN109284475A (en) A kind of matrix convolution computing module and matrix convolution calculation method
CN106445471B (en) Processor and the method for performing matrix multiplication on a processor
CN107341544B (en) Reconfigurable accelerator based on divisible array and implementation method thereof
CN106875013B (en) System and method for multi-core optimized recurrent neural networks
CN108304923A (en) Convolution algorithm processing method and Related product
CN109101272A (en) Processing with Neural Network device and its method for executing matrix multiple instruction
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
CN109948774A (en) Neural network accelerator and its implementation based on network layer binding operation
CN103955447B (en) FFT accelerator based on DSP chip
CN109032670A (en) Processing with Neural Network device and its method for executing vector duplicate instructions
CN105589677A (en) Systolic structure matrix multiplier based on FPGA (Field Programmable Gate Array) and implementation method thereof
CN109711533A (en) Convolutional neural networks module based on FPGA
CN109146065B (en) Convolution operation method and device for two-dimensional data
CN110580519B (en) Convolution operation device and method thereof
WO2018027706A1 (en) Fft processor and algorithm
WO2023065983A1 (en) Computing apparatus, neural network processing device, chip, and data processing method
WO2023197526A1 (en) Data processing method and apparatus, electronic device, and readable storage medium
CN115423081A (en) Neural network accelerator based on CNN _ LSTM algorithm of FPGA
CN108073549A (en) Convolution algorithm device and method
CN115310037A (en) Matrix multiplication computing unit, acceleration unit, computing system and related method
CN113869507B (en) Neural network accelerator convolution calculation device and method based on pulse array
CN116720549A (en) FPGA multi-core two-dimensional convolution acceleration optimization method based on CNN input full cache
CN113222129B (en) Convolution operation processing unit and system based on multi-level cache cyclic utilization
CN113301221B (en) Image processing method of depth network camera and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant