CN109740116A - A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate - Google Patents
A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate Download PDFInfo
- Publication number
- CN109740116A CN109740116A CN201910016870.0A CN201910016870A CN109740116A CN 109740116 A CN109740116 A CN 109740116A CN 201910016870 A CN201910016870 A CN 201910016870A CN 109740116 A CN109740116 A CN 109740116A
- Authority
- CN
- China
- Prior art keywords
- matrix
- nonzero element
- random access
- access memory
- flag
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 239000011159 matrix material Substances 0.000 title claims abstract description 214
- 238000001514 detection method Methods 0.000 claims description 18
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 238000000151 deposition Methods 0.000 claims 1
- 238000000034 method Methods 0.000 abstract description 30
- 230000008569 process Effects 0.000 abstract description 22
- 230000007935 neutral effect Effects 0.000 abstract description 11
- 230000000694 effects Effects 0.000 abstract description 5
- 230000006870 function Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 229940050561 matrix product Drugs 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002269 spontaneous effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Landscapes
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of circuits for realizing sparse matrix multiplication operation, including multiple modules, pass through the cooperation between modules, it can be during carrying out the multiplying of sparse matrix, filter out the nonzero element in sparse matrix, and in calculating process, step-by-step and operation are carried out by the line flag position that column flag bit to the first matrix rows and the second matrix respectively arrange, obtain blip position, the nonzero element for really needing and participating in operation is finally chosen from nonzero element according to blip position, avoid the calculating process to a large amount of neutral elements and the nonzero element for needing not participate in operation, it has been obviously improved the efficiency of sparse matrix multiplication operation.In addition, being acted on corresponding with the effect of foregoing circuit the present invention also provides a kind of FPGA plate.
Description
Technical field
The present invention relates to hardware design field, in particular to a kind of circuit and FPGA for realizing sparse matrix multiplication operation
Plate.
Background technique
Sparse matrix almost results from all large-scale scientific engineering computing fields, including present machine learning, big number
According to popular domains such as, image procossings.For sparse matrix, the number of nonzero element is far smaller than the total of matrix element in matrix
Number, and the no rule of distribution of nonzero element, if calculated according to conventional matrix calculation sparse matrix,
Greatly waste memory space, while also greatly reducing calculating speed.There is the scene of high requirement in some pairs of processing speeds
In, if processing speed is low, is not able to satisfy the requirement handled in real time by the way of software processing.
Summary of the invention
The object of the present invention is to provide a kind of circuit for realizing sparse matrix multiplication operation and FPGA plates, pass through to solve
The problem of arithmetic speed is lower when software progress sparse matrix multiplication operation, is not able to satisfy the demand handled in real time.
In order to solve the above technical problems, the present invention provides a kind of circuits for realizing sparse matrix multiplication operation, comprising: inspection
Survey module, the first random access memory, the second random access memory, the first flag bit generation module, the second flag bit generation module, the
Three random access memory, the 4th random access memory, address generation module, controller, accumulator, output module;
Wherein, the detection module is used to detect whether the element in the first matrix by row input to be nonzero element, and
Nonzero element in first matrix is stored to the second square for being also used to detect to first random access memory by column input
Whether the element in battle array is nonzero element, and the nonzero element in second matrix is stored to second random storage
Device;
The first flag bit generator is used for the row number according to nonzero element in first random access memory, generates institute
The column flag bit of the first matrix rows is stated, and the column flag bit is stored to the third random access memory, is also used to basis
The line number of nonzero element in second random access memory, generates the line flag position that second matrix respectively arranges, and by the row
Flag bit is stored to the 4th random access memory;
The address generation module is used to carry out step-by-step and operation to the column flag bit and the line flag position, obtains mesh
Mark flag bit;
The controller be used for according to the blip position respectively from first random access memory and described second with
Machine memory reads multipair nonzero element, carries out multiplying to each pair of nonzero element read, obtains multiple first operations
As a result;
The accumulator obtains the second operation result for adding up to the multiple first operation result, and by institute
The storage of the second operation result is stated to the output module;
The output module obtains first matrix and described second for carrying out tissue to second operation result
The matrix multiplication operation result of matrix.
Optionally, the address generation module is specifically used for: the enabled instruction sent in response to the controller, from described
Third random access memory and the 4th random access memory obtain column flag bit and line flag position, and to the column flag bit and the row
Flag bit carries out step-by-step and operation, obtains blip position, wherein the enabled instruction is that the controller is deposited in reading line by line
It is generated and sent when storing up the nonzero element in first random access memory;
The controller is specifically used for: according to the blip position, reading non-zero entry from second random access memory
The nonzero element is formed multipair nonzero element with the nonzero element read from first random access memory by element, and to each right
Nonzero element carries out multiplying, obtains multiple first operation results.
Optionally, the controller is specifically used for:
According to the blip position, determine the nonzero element for participating in this multiplying in second random access memory
In destination address, and according to the destination address from second random access memory read nonzero element.
Optionally, the controller is also used to:
It is determining to be located at same row with the nonzero element for participating in this multiplying according to the blip position, and
It is not involved in the nonzero element of this multiplying, and is deleted.
Optionally, the circuit further include:
Interrupt signal generation module: in detecting first random access memory or second random access memory
There is no when the unread nonzero element arrived, interrupt signal is generated, to prompt the first matrix described in front-end module and described second
The matrix multiplication operation of matrix has been completed.
Optionally, the first flag bit generator is specifically used for:
The nonzero element in first random access memory is obtained one by one;
Whether the line number for judging the nonzero element is identical as the line number of previous nonzero element;
If they are the same, the line flag position of the current line is updated according to the row number of the nonzero element;
If not identical, by the numerical value storage of the line flag position of current line to the third random access memory, initialization row
Flag bit, and the line flag position after initialization is updated according to the nonzero element.
Optionally, the detection module is specifically used for:
By the line number of nonzero element, described in the numerical value of the nonzero element in first matrix, first matrix
The row number of nonzero element is stored to first random access memory in one matrix, is also used to the non-zero entry in second matrix
The numerical value of element, the line number of nonzero element in second matrix, in second matrix nonzero element row number storage to described
First random access memory.
In addition, the present invention also provides a kind of FPGA plate, including FPGA plate ontology, it further include a kind of realization as described above
The circuit of sparse matrix multiplication operation.
A kind of circuit for realizing sparse matrix multiplication operation provided by the present invention, including multiple modules, pass through each mould
Cooperation between block can filter out the non-zero in sparse matrix during carrying out the multiplying of sparse matrix
Element, and in calculating process, the line flag position respectively arranged by column flag bit to the first matrix rows and the second matrix into
Row step-by-step and operation, obtain blip position, are finally chosen from nonzero element according to blip position and really need participation fortune
The nonzero element of calculation avoids the calculating process to a large amount of neutral elements and the nonzero element for needing not participate in operation, significantly mentions
The efficiency of sparse matrix multiplication operation is risen.
In addition, effect is corresponding with the effect of foregoing circuit the present invention also provides a kind of FPGA plate, it is no longer superfluous here
It states.
Detailed description of the invention
It, below will be to embodiment or existing for the clearer technical solution for illustrating the embodiment of the present invention or the prior art
Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of structural representation for the circuit embodiments one for realizing sparse matrix multiplication operation provided by the present invention
Figure;
Fig. 2 is a kind of calculation process for the circuit embodiments one for realizing sparse matrix multiplication operation provided by the present invention
Figure;
Fig. 3 is a kind of structural representation for the circuit embodiments two for realizing sparse matrix multiplication operation provided by the present invention
Figure.
Specific embodiment
Core of the invention is to provide a kind of circuit for realizing sparse matrix multiplication operation and FPGA plate, avoids to a large amount of
Neutral element and need not participate in operation nonzero element calculating process, be obviously improved the effect of sparse matrix multiplication operation
Rate.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description
The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than
Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise
Under every other embodiment obtained, shall fall within the protection scope of the present invention.
A kind of circuit embodiments one for realizing sparse matrix multiplication operation provided by the invention are introduced below, referring to
Fig. 1, embodiment one include: detection module 101, the first random access memory 102, the second random access memory 103, the life of the first flag bit
It is generated at module 104, the second flag bit generation module 105, third random access memory 106, the 4th random access memory 107, address
Module 108, controller 109, accumulator 110, output module 111, the function of modules are as follows:
Whether the element that the detection module 101 is used to detect in the first matrix by row input is nonzero element, and will
Nonzero element in first matrix is stored to first random access memory 102, is also used to detect second by column input
Whether the element in matrix is nonzero element, and the nonzero element in second matrix is stored to second random storage
Device 103;
The first flag bit generator 104 is used for the row number according to nonzero element in first random access memory 102,
The column flag bit of first matrix rows is generated, and by column flag bit storage to the third random access memory 106, also
For the line number according to nonzero element in second random access memory 103, the line flag position that second matrix respectively arranges is generated,
And the line flag position is stored to the 4th random access memory 107;
The address generation module 108 is used to carry out step-by-step and operation to the column flag bit and the line flag position, obtains
To blip position;
The controller 109 is used for according to the blip position respectively from first random access memory 102 and described
Second random access memory 103 reads multipair nonzero element, carries out multiplying to each pair of nonzero element read, obtains multiple
First operation result;
The accumulator 110 is used to add up to the multiple first operation result, obtains the second operation result, and will
Second operation result is stored to the output module 111;
The output module 111 for carrying out tissue to second operation result, obtain first matrix with it is described
The matrix multiplication operation result of second matrix.
It should be noted that above-mentioned first matrix and the second matrix are sparse matrix, i.e., of nonzero element in matrix
Number is far smaller than the quantity of neutral element in matrix, and the distribution of nonzero element is without rule.
The column flag bit of above-mentioned first matrix rows refers to nonzero element institute in the row element for the first matrix of label
Parameter in position, it is assumed that the first matrix 8 arranges totally, then the column flag bit of each row is one 8 binary numbers, as one kind
Optional embodiment, " 1 " indicates that corresponding element is nonzero element in the row in column flag bit, and " 0 " is indicated in the row
In corresponding element be neutral element, then, it is assumed that the column flag bit of the second row of the first matrix be 10010011, then show first
The element that the 1st column, the 4th column, the 7th column, the 8th arrange in the element of second row of matrix is nonzero element, remaining element is neutral element.
Similarly, the line flag position that above-mentioned second matrix respectively arranges refers in the column element for the second matrix of label where nonzero element
The parameter of position.
Above-mentioned step-by-step and operation, refer to two data for participating in operation, carry out AND operation, operation by binary digit
Rule are as follows: 0&0=0,0&1=0,1&0=0,1&1=1, that is to say, that only simultaneously it is " 1 " when two, is as a result just " 1 ",
Otherwise result is 0.
It is known that matrix multiplication operation is a kind of basic mathematical operation, mathematical definition are as follows: set A as the square of m × p
Battle array, B is that the matrix of p × n is denoted as C=AB then the Matrix C of m × n is referred to as the product of matrix A and matrix B, wherein in Matrix C
The i-th row jth column element can indicate are as follows:
As can be seen that an element c in calculating matrix CijWhen, it needs to arrange the i-th row of A matrix with B matrix jth each
A corresponding element carries out multiplying and obtains the element then by each product addition.Therefore, the present embodiment by pair
The line flag position that the column flag bit and the second matrix of first matrix rows respectively arrange carries out step-by-step and operation, can weed out those needs
The nonzero element of multiplication operation is carried out with neutral element, that is to say, that can be true according to the blip position that step-by-step is obtained with operation
Really it is necessary to carry out the nonzero element of multiplication operation in fixed first matrix and the second matrix, avoid nonzero element and neutral element
Carry out the process of multiplication operation.
Circuit progress is described below in a kind of circuit for realizing sparse matrix multiplication operation provided based on the above embodiment
The first matrix for participating in matrix multiplication operation is known as A matrix, the second square during being somebody's turn to do by the process of sparse matrix multiplication operation
The product of battle array referred to as B matrix, A matrix and B matrix is C matrix.As shown in Fig. 2, the process specifically includes:
The input of step S101:A matrix by rows, B matrix by column input, detection module 101 to the element of A matrix, B matrix into
The detection of row non-zero, the first random access memory 102, the second random access memory 103 store A matrix, B nonzeros respectively.
The S102: the first flag bit of step generation module 104 is according to the position where nonzero element in each row element of A matrix
It sets, generates column flag bit, and column flag bit is stored in third random access memory 106;Second flag bit generation module, 105 basis
Position in each column element of B matrix where nonzero element generates line flag position, and line flag position is stored in the 4th and is deposited at random
Reservoir 107.
Step S103: address generation module 108 takes column flag bit from third random access memory 106, from the 4th random storage
Device 107 takes line flag position, and carries out step-by-step and operation to column flag bit and line flag position, obtains blip position.
Step S104: controller 109 is according to blip position from the first random access memory 102 and the second random access memory
103 choose element, and carry out multiplication operation, send accumulator 110 for the result of multiplication operation.
Step S105: accumulator 110 adds up to the result of multiplication operation, obtains an element of C matrix.
Step S106: output module 111 carries out tissue to the element for the C matrix that accumulator 110 obtains, and obtains final C
Matrix.
As it can be seen that the present embodiment provides a kind of circuit for realizing sparse matrix multiplication operation, including multiple modules, by each
Cooperation between a module can filter out in sparse matrix during carrying out the multiplying of sparse matrix
Nonzero element, and in calculating process, the line flag respectively arranged by column flag bit to the first matrix rows and the second matrix
Position carries out step-by-step and operation, obtains blip position, is finally chosen from nonzero element according to blip position and really need ginseng
With the nonzero element of operation, the calculating process to a large amount of neutral elements and the nonzero element for needing not participate in operation is avoided, is shown
Work improves the efficiency of sparse matrix multiplication operation.
Start that a kind of circuit embodiments two for realizing sparse matrix multiplication operation provided by the invention are discussed in detail below, it is real
It applies example two one to realize based on the above embodiment, and has carried out expansion to a certain extent on the basis of example 1.
As shown in figure 3, embodiment two specifically includes: detection module 101, the first random access memory 102, the second random storage
Device 103, the first flag bit generation module 104, the second flag bit generation module 105, third random access memory the 106, the 4th are random
Memory 107, address generation module 108, controller 109, accumulator 110, output module 111, in addition to this, the present embodiment is also
It include: interrupt signal generation module 112.
Interrupt signal generation module 112 is used to detect first random access memory 102 or described second deposit at random
In reservoir 103 be not present the unread nonzero element arrived when, generate interrupt signal, with prompt the first matrix described in front-end module with
The matrix multiplication operation of second matrix has been completed.Retouching in embodiment one is referred to for the basic function of remaining module
It states, here not reinflated introduction, the part that the present embodiment is expanded based on embodiment one is described below:
As an alternative embodiment, in the present embodiment, address generation module 108 and non-spontaneous reading column flag bit
With line flag position and carry out step-by-step and operation, but in response to controller 109 send instruction execution aforesaid operations.In other words
It says, during executing matrix multiplication operation, controller 109 is regular from the first random access memory 102 or the second according to certain
Random access memory 103 reads nonzero element, and sends corresponding enabled instruction to address generation module 108, and triggering following operates,
In this process, it is assumed that controller 109 is operated by reading the nonzero element triggering following of the first random access memory 102, that
Modules are specifically divided the work as described below:
The address generation module 108 is specifically used for: the enabled instruction sent in response to the controller 109, from described
Third random access memory 106 and the 4th random access memory 107 obtain column flag bit and line flag position, and to the column flag bit and
The line flag position carries out step-by-step and operation, obtains blip position, wherein the enabled instruction is that the controller 109 exists
It is generated and sent when reading stores the nonzero element in first random access memory line by line;
The controller 109 is specifically used for: according to the blip position, reading from second random access memory 103
The nonzero element is formed multipair nonzero element with the nonzero element read from first random access memory 102 by nonzero element,
And multiplying is carried out to each pair of nonzero element, obtain multiple first operation results.
As an alternative embodiment, controller is after obtaining blip position, it is also necessary to according to blip
Position determines the address where the nonzero element for needing to read, and then reads the element from random access memory.Assuming that controller 109 exists
After reading the element in the first random access memory 102, it is desirable to be determined and be needed from the second random access memory 103 according to blip position
When the nonzero element of middle reading, then, the process specifically:
The controller 109 is specifically used for: according to the blip position, determining the non-zero entry for participating in this multiplying
Destination address of the element in second random access memory, and read according to the destination address from second random access memory
Nonzero element.
As an alternative embodiment, can by the first random access memory 102 and the second random access memory 102 not
The nonzero element for needing to participate in operation is deleted, that is to say, that, it is assumed that a certain column in the second matrix have been determined according to blip position
The middle nonzero element for needing to participate in operation, then, to remaining nonzero element in the column, so that it may be deleted.Detailed process
It is as follows:
The controller 109 is also used to: determining to participate in the non-of this multiplying with described according to the blip position
Neutral element is located at same row and is not involved in the nonzero element of this multiplying, and is deleted.
As an alternative embodiment, the work of the first flag bit generator 104 or the second flag bit generator 105
In the process, due to needing to generate the column flag bits of the first matrix rows, or the line flag position that the second matrix respectively arranges is generated, therefore,
During the work time, need to judge whether a column flag bit or line flag position generate completely, specifically, below with the first mark
Position generator 104 describes the realization of the process, is also referred to hereafter as the second flag bit generator 105, the process packet
It includes:
The first flag bit generator 104 is specifically used for: obtaining the non-zero in first random access memory 102 one by one
Element;Whether the line number for judging the nonzero element is identical as the line number of previous nonzero element;If they are the same, according to the non-zero
The row number of element updates the line flag position of the current line;If not identical, the numerical value storage of the line flag position of current line is arrived
The third random access memory 106 initializes line flag position, and updates the line flag after initialization according to the nonzero element
Position.
As an alternative embodiment, detection module 101 can not only detect nonzero element, can also detect that
Line number, the row number of nonzero element, and nonzero element, line number, row number are saved in together in corresponding random access memory, the process
Specifically:
The detection module 101 is specifically used for: by the numerical value of the nonzero element in first matrix, first matrix
The row number of nonzero element is stored to first random access memory 102 in the line number of middle nonzero element, first matrix, is also used
In by the line number of nonzero element, second matrix in the numerical value of the nonzero element in second matrix, second matrix
The row number of middle nonzero element is stored to second random access memory 102.
A kind of circuit for realizing sparse matrix multiplication operation provided based on the above embodiment, describes the electricity in detail below
Road carries out the process of sparse matrix multiplication operation, and during being somebody's turn to do, the first matrix for participating in matrix multiplication operation is known as A matrix,
Second matrix is known as B matrix.The process specifically includes:
Step S201: drive module or prime module issue one group of sparse matrix A, B, it should be noted that A matrix element
It is to be issued by row, B matrix element is issued by column, and such transmitting sequence can accelerate the calculating speed of sparse matrix product.
Step S202: detection module 101 is responsible for carrying out the element of A matrix and B matrix non-zero detection, line number, row number
Detection gives up the element that value is 0, retains non-zero element.
Specifically, step S202 can be divided into following steps:
Step S2021: for matrix A, the preceding m*p data of DMA transfer are the element of A matrix, for non-zero entry
Element is saved with the Ram_0 that width is RAM_DEPTH, at the same save be current non-zero element line number and row number.Wherein
A_data_num is the input number of current A matrix element.
Line number A_row_num=(A_data_num/p);
Row number A_column_num=(A_data_num%p);
The high ROW_WIDTH bit address of Ram_0, addr [RAM_DEPTH-1:RAM_DEPTH-ROW_WIDTH] storage line
Number A_row_num;The intermediate COLUMN_WIDTH bit address of Ram_0, addr [(RAM_DEPTH-ROW_WIDTH-1): (RAM_
DEPTH-ROW_WIDTH-COLUMN_WIDTH row number A_column_num)] is stored;The low DATA_WIDTH bit address of Ram_0,
The nonzero element of addr [DATA_WIDTH-1:0] storage A.
Wherein, RAM_WIDTH indicate storage A matrix, B matrix non-zero element RAM width, ROW_WIDTH expression deposit
Storage A matrix, B matrix non-zero element RAM in for storing the data bit width of line number, COLUMN_WIDTH indicate storage A matrix,
For storing the data bit width of row number in the RAM of B matrix non-zero element, DATA_WIDTH indicates storage A matrix, B matrix non-zero
For storing the data bit width of non-zero in the RAM of element.It should be noted that RAM_WIDTH, ROW_WIDTH,
COLUMN_WIDTH, DATA_WIDTH meet following relationship: RAM_WIDTH=ROW_WIDTH+COLUMN_WIDTH+DATA_
WIDTH。
The sum of the nonzero element of cumulative all rows of A matrix simultaneously, obtains A_DATA_NO_0_SUM.
Step S2022: the column mark of the nonzero element of the first flag bit generation module 104 statistics every a line of A matrix is utilized
Position.Specific statistical are as follows:
A_No_0_flag bit wide is p (A matrix column number), is initialized as 0, generates mould when being input to the first flag bit
The line number of the element A (A data have passed through the processing of detection module 101 at this time, obtain line number and row information) of block 104 is constant
(A_row_num is constant) changes the value of A_No_0_flag according to its row number A_column_num.If the line number of element A changes
Then Ram_2 is written in A_No_0_flag by (A_row_num variation), and set is 0 to A_No_0_flag again, then according to A data
The value of row number A_column_num change A_No_0_flag.
Specially 1 is set by the corresponding bit of A_No_0_flag according to the value of A_column_num.For example, A square
The column number of battle array is p=8, i.e. A has 8 column.
A_row_num=1, the A_column_num=1 of current A matrix element, then A_No_0_flag=8 ' b 0000_
0001。
After inputting an element A again, A_row_num=1, A_column_num=3, then A_No_0_flag=8 '
b0000_0101;
After inputting an element A again, A_row_num=1, A_column_num=5, then A_No_0_flag=8 ' b
0001_0101;
After inputting an element A again, A_row_num=3, A_column_num=4, the line number (A_row_ of A data at this time
Num it) changes and Ram_2 then is written into A_No_0_flag=8 ' b0001_0101 first, then set A_No_0_flag=
A_No_0_flag=8 ' b 0000_1000 is arranged further according to A_column_num=4 in 8 ' b 0000_0000;
After inputting an element A again, A_row_num=3, A_column_num=7, then A_No_0_flag=8 '
b0100_1000。
Step S2023: for matrix B, the rear p*n element of DMA transfer is the element of B matrix, simultaneously for non-
Zero matrix B element is saved with the Ram_1 that width is RAM_DEPTH, at the same save be current non-zero data line number
With row number (it should be noted that because B matrix is inputted by column, B matrix at this time is the transposed matrix of B).
Wherein B_data_num is the input number of current B data.
Line number B_row_num=(B_data_num/p);
Row number B_column_num=(B_data_num%p);
The high ROW_WIDTH bit address of Ram_0, addr [RAM_DEPTH-1:RAM_DEPTH-ROW_WIDTH] storage line
Number B_row_num;The intermediate COLUMN_WIDTH bit address of Ram_0, addr [(RAM_DEPTH-ROW_WIDTH-1): (RAM_
DEPTH-ROW_WIDTH-COLUMN_WIDTH row number B_column_num)] is stored;The low DATA_WIDTH bit address of Ram_0,
The nonzero element of addr [DATA_WIDTH-1:0] storage B.
Wherein, RAM_WIDTH indicate storage A matrix, B matrix non-zero element RAM width, ROW_WIDTH expression deposit
Storage A matrix, B matrix non-zero element RAM in for storing the data bit width of line number, COLUMN_WIDTH indicate storage A matrix,
For storing the data bit width of row number in the RAM of B matrix non-zero element, DATA_WIDTH indicates storage A matrix, B matrix non-zero
For storing the data bit width of non-zero in the RAM of element.It should be noted that RAM_WIDTH, ROW_WIDTH,
COLUMN_WIDTH, DATA_WIDTH meet following relationship: RAM_WIDTH=ROW_WIDTH+COLUMN_WIDTH+DATA_
WIDTH。
Meanwhile the sum of the nonzero element of cumulative all column of B matrix, obtain B_DATA_NO_0_SUM.
Step S2023: the column mark of the nonzero element of the second flag bit generation module 105 statistics every a line of B matrix is utilized
Position.Statistic processes specifically includes:
B_No_0_flag bit wide is p (B matrix column number), is initialized as 0, generates mould when being input to the second flag bit
The line number of the B element (B element have passed through the processing of detection module 101 at this time, obtain line number and row information) of block 105 is constant
(B_row_num is constant) changes the value of B_No_0_flag according to its row number B_column_num.If the line number of B element changes
Then the 4th random access memory 108 is written in B_No_0_flag by (B_row_num variation), and set is 0 to B_No_0_flag again, so
Change the value of B_No_0_flag according to the row number B_column_num of B element afterwards.
Specially 1 is set by the corresponding bit of B_No_0_flag according to the value of B_column_num.For example, B square
The column number of battle array is p=8, i.e. B has 8 column.
B_row_num=1, the B_column_num=1 of current B matrix element, then B_No_0_flag=8 ' b0000_
0010;
After inputting a B element again, B_row_num=1, B_column_num=5, then B_No_0_flag=8 '
b0010_0010;
After inputting a B element again, B_row_num=1, B_column_num=7, then B_No_0_flag=8 '
b1010_0010;
B_row_num=3, B_column_num=4 after a B element are inputted again, at this time line number (the B_row_ of B data
Num it) changes and Ram_2 then is written into B_No_0_flag=8 ' b1010_0010 first, then set B_No_0_flag=
8 ' b 0000_0000 update B_No_0_flag=8 ' b 0001_0000 further according to B_column_num=4;
After inputting a B element again, B_row_num=3, B_column_num=6, then B_No_0_flag=8 ' b
0101_0000。
Step S203: controller 109 from the first random access memory 102, the negated neutral element of the second random access memory 103, from
Column flag bit is read in third random access memory 105, the 4th random access memory 106 and line flag position carries out logic control, specifically
The following steps are included:
Step S2031: a line nonzero element of the first random access memory 102 is read, the line number A_row_ of element A is obtained
Num, row number A_column_num, numerical value A_data.
Step S2032: a third random access memory 106 is read, the column mark of the nonzero element of A matrix current line is obtained
Position A_No_0_flag.
Step S2033: reading the 4th random access memory 107, obtains B matrix when the line flag of the nonzero element in forefront
Position B_No_0_flag.
Step S2034: sending A_No_0_flag and B_No_0_flag into address generation module 108, obtains being read
In the address of the second random access memory 103, (purpose is only by B matrix when the nonzero element that forefront participates in operation is read out to element
Come, the nonzero element for being not involved in operation is not read, and arithmetic speed is greatly accelerated), concrete operation method is:
Carry out step-by-step and operation, i.e. Valid_No_0_flag=A_No_0_flag&B_No_0_flag (& indicate step-by-step with
Operation);And then the position of nonzero element in the Valid_No_0_flag of blip position is obtained, setting addr_0, addr_1 ...,
Addr_Valid_No_0_cnt-1, wherein Valid_No_0_cnt indicates the quantity of nonzero element in Valid_No_0_flag;
Then addr_0, addr_1 ..., addr_Valid_No_0_cnt-1 are converted to it in all nonzero elements of B_No_0_flag
In position, obtain addr_B_0, addr_B_1 ..., addr_B_Valid_No_0_cnt-1;It is directed to B matrix current operation
The address of the row Ram_1 to be read are as follows:
B_addr=B_No_0_sum+addr_B_0
B_addr=B_No_0_sum+addr_B_1
………
B_addr=B_No_0_sum+addr_B_Valid_No_0_cnt-1.
Finally, the second random access memory 103 is read according to obtained B_addr, by the element A and B that have identical row number member
Element carries out multiplication operation, and sends the result to accumulator 110.
After reading Ram_1, B_No_0_sum executes accumulation operations, i.e. B_No_0_sum=B_No_0_sum+B_
No_0_cnt, wherein B_No_0_cnt indicates the number of all non-zero bit in current line B_No_0_flag.
Step S2034 is illustrated for convenience of understanding, specific as follows:
Assuming that A_No_0_flag=8 ' b 0101_1010, B_No_0_flag=8 ' b1101_0110, then it should in B matrix
The quantity B_No_0_cnt=5 of column nonzero element;So, step-by-step and operation after, blip position C_No_0_flag=A_No_
0_flag&B_No_0_flag=8 ' b 0101_0010, it is seen that the quantity Valid_No_0_cnt=3 of effective nonzero element,
Position of each effective nonzero element in blip position is respectively addr_0=1, addr_1=4, addr_2=6, and
Position of each effective nonzero element in the line flag position of B matrix is addr_B_0=0, addr_B_1=2, addr_B_2
=3.It finally successively goes to read 0,2,3 addresses in the second random access memory 103, the B element of reading is subjected to phase with corresponding element A
Multiplication, and result is sent into accumulator 110, final updating B_No_0_sum=0+B_No_0_cnt=5.
Later, into next round, in the 4th random access memory 107 of calculating and then secondary reading for the column element for completing B,
A line flag position B_No_0_flag=8 ' b0100_1110 of B matrix is obtained, the number of the column nonzero element in B matrix is obtained
Measure B_No_0_cnt=4.So, blip position C_No_0_flag=A_No_0_flag&B_No_0_flag=8 ' b0100_
1010, it is seen then that the quantity Valid_No_0_cnt=3 of effective nonzero element, each effective nonzero element is in blip
Position in position is respectively addr_0=1, addr_1=3, addr_2=6, and each effective nonzero element is in the row of B matrix
Position in flag bit is respectively addr_B_0=0, addr_B_1=2, addr_B_2=3.Finally, successively going to read second at random
(5+0) of memory 103, (5+2), the address (5+3), by the B element of reading and corresponding element A progress multiplication operation, and to
As a result accumulator 110, final updating B_No_0_sum=5+B_No_0_cnt=9 are sent to.
Step S204: repeating step S203, and the process flow of last accumulator 110 is as follows:
When the second random access memory 103 read B element in B_row_num constant (remaining as the B element of same row),
Then Value=Value+A_data*B_data, and the line number for recording Value is Value_row_num=A_row_num, row number
For Value_column_num=B_row_num;
When B_row_num variation (as new a line B data) in the B element that the second random access memory 103 is read, then will
The value of Value, including Value_row_num, Value_column_num write into output module 111 together, and then, Value is multiple
Position is 0.
Wherein, Value indicates the result of product of A matrix, B matrix, and Value_row_num indicates that A matrix, B matrix multiply
Line number of the product result in matrix of consequence, Value_column_num indicate the result of product of A matrix, B matrix in matrix of consequence
In row number.
Step S205: after calculating is fully completed, i.e. A_addr=A_DATA_NO_0_SUM or B_addr=A_DATA_
When NO_0_SUM, interrupt signal IRQ is generated, prime module or drive module is notified to calculate knot to read sparse matrix product
Fruit.
As it can be seen that a kind of circuit for realizing sparse matrix multiplication operation provided in this embodiment, in a first aspect, propose a kind of A,
B matrix storage method carries out non-zero detection to A, B matrix, only stores the non-zero and corresponding line number row number of A, B matrix
Information is greatly saved storage resource, while accelerating calculating speed.Second aspect devises address generation module 108,
Only read participate in operation B element, for be not involved in work as forefront element do not read, greatly accelerate calculating speed.Third
Aspect, reasonable control logic realizes the quick multiplication of A matrix, B matrix non-zero data, and obtains the line number of calculated result
Row information, only the nonzero value of storage calculated result and corresponding row/column information, further save storage resource.
To sum up, it is continuous to complete sparse matrix for a kind of circuit for realizing sparse matrix multiplication operation provided in this embodiment
Quick product calculates, and computational efficiency is higher, can be widely applied to include the popular neck such as big data, image procossing, machine learning
Domain.
In addition, it further includes as described above that the FPGA plate, which includes FPGA plate ontology, the present invention also provides a kind of FPGA plate
A kind of circuit for realizing sparse matrix multiplication operation.
A kind of FPGA plate of the present embodiment is for realizing a kind of circuit for realizing sparse matrix multiplication operation above-mentioned, therefore
The embodiment part of the visible circuit for realizing sparse matrix multiplication operation one of above of the FPGA classes of specific embodiment,
Its specific embodiment is referred to the description of corresponding various pieces embodiment, and the work of its effect and foregoing circuit embodiment
With corresponding, not reinflated introduction herein.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other
The difference of embodiment, same or similar part may refer to each other between each embodiment.For being filled disclosed in embodiment
For setting, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part
Explanation.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
A kind of circuit for realizing sparse matrix multiplication operation provided by the present invention and FPGA plate have been carried out in detail above
It introduces.Used herein a specific example illustrates the principle and implementation of the invention, the explanation of above embodiments
It is merely used to help understand method and its core concept of the invention.It should be pointed out that for the ordinary skill people of the art
Member for, without departing from the principle of the present invention, can with several improvements and modifications are made to the present invention, these improve and
Modification is also fallen within the protection scope of the claims of the present invention.
Claims (8)
1. a kind of circuit for realizing sparse matrix multiplication operation characterized by comprising detection module, the first random access memory,
Second random access memory, the first flag bit generation module, the second flag bit generation module, third random access memory, the 4th are at random
Memory, address generation module, controller, accumulator, output module;
Wherein, whether the element that the detection module is used to detect in the first matrix inputted by row is nonzero element, and by institute
The nonzero element stated in the first matrix stores in the second matrix for being also used to detect to first random access memory by column input
Element whether be nonzero element, and by the nonzero element storage in second matrix to second random access memory;
The first flag bit generator is used for according to the row number of nonzero element in first random access memory, generates described the
The column flag bit of one matrix rows, and the column flag bit is stored to the third random access memory, it is also used to according to
The line number of nonzero element in second random access memory, generates the line flag position that second matrix respectively arranges, and by the line flag
The 4th random access memory is arrived in position storage;
The address generation module is used to carry out step-by-step and operation to the column flag bit and the line flag position, obtains target mark
Will position;
The controller for depositing from first random access memory and described second at random respectively according to the blip position
Reservoir reads multipair nonzero element, carries out multiplying to each pair of nonzero element read, obtains multiple first operation results;
The accumulator obtains the second operation result for adding up to the multiple first operation result, and by described the
Two operation results are stored to the output module;
The output module obtains first matrix and second matrix for carrying out tissue to second operation result
Matrix multiplication operation result.
2. circuit as described in claim 1, which is characterized in that the address generation module is specifically used for: in response to the control
The enabled instruction that device processed is sent obtains column flag bit and line flag from the third random access memory and the 4th random access memory
Position, and step-by-step and operation are carried out to the column flag bit and the line flag position, obtain blip position, wherein the starting
Instruction is what the controller generated and sent when reading store the nonzero element in first random access memory line by line;
The controller is specifically used for: according to the blip position, nonzero element is read from second random access memory, it will
The nonzero element forms multipair nonzero element with the nonzero element read from first random access memory, and to each pair of non-zero entry
Element carries out multiplying, obtains multiple first operation results.
3. circuit as claimed in claim 2, which is characterized in that the controller is specifically used for:
According to the blip position, determine the nonzero element for participating in this multiplying in second random access memory
Destination address, and nonzero element is read from second random access memory according to the destination address.
4. circuit as claimed in claim 3, which is characterized in that the controller is also used to:
It is determining to be located at same row with the nonzero element for participating in this multiplying according to the blip position, and do not join
With the nonzero element of this multiplying, and deleted.
5. circuit as claimed in claim 4, which is characterized in that the circuit further include:
Interrupt signal generation module: for not deposited in detecting first random access memory or second random access memory
In the unread nonzero element arrived, interrupt signal is generated, to prompt the first matrix described in front-end module and second matrix
Matrix multiplication operation completed.
6. circuit as described in claim 1, which is characterized in that the first flag bit generator is specifically used for:
The nonzero element in first random access memory is obtained one by one;
Whether the line number for judging the nonzero element is identical as the line number of previous nonzero element;
If they are the same, the line flag position of the current line is updated according to the row number of the nonzero element;
If not identical, by the numerical value storage of the line flag position of current line to the third random access memory, line flag is initialized
Position, and the line flag position after initialization is updated according to the nonzero element.
7. circuit as claimed in any one of claims 1 to 6, which is characterized in that the detection module is specifically used for:
By the line number of nonzero element, first square in the numerical value of the nonzero element in first matrix, first matrix
The row number of nonzero element is stored to first random access memory in battle array, is also used to the nonzero element in second matrix
The line number of nonzero element in numerical value, second matrix, in second matrix nonzero element row number storage to described second
Random access memory.
8. a kind of FPGA plate, which is characterized in that further include as described in claim 1-7 any one including FPGA plate ontology
A kind of circuit for realizing sparse matrix multiplication operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910016870.0A CN109740116A (en) | 2019-01-08 | 2019-01-08 | A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910016870.0A CN109740116A (en) | 2019-01-08 | 2019-01-08 | A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109740116A true CN109740116A (en) | 2019-05-10 |
Family
ID=66363863
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910016870.0A Withdrawn CN109740116A (en) | 2019-01-08 | 2019-01-08 | A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109740116A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704799A (en) * | 2019-09-06 | 2020-01-17 | 苏州浪潮智能科技有限公司 | Data processing equipment and system |
CN111798363A (en) * | 2020-07-06 | 2020-10-20 | 上海兆芯集成电路有限公司 | Graphics processor |
CN112306660A (en) * | 2020-11-05 | 2021-02-02 | 山东云海国创云计算装备产业创新中心有限公司 | Data processing method and system based on RISC-V coprocessor |
WO2022053032A1 (en) * | 2020-09-11 | 2022-03-17 | 北京希姆计算科技有限公司 | Matrix calculation circuit, method, electronic device, and computer-readable storage medium |
WO2022148181A1 (en) * | 2021-01-08 | 2022-07-14 | 苏州浪潮智能科技有限公司 | Sparse matrix accelerated computing method and apparatus, device, and medium |
WO2023272917A1 (en) * | 2021-06-28 | 2023-01-05 | 华中科技大学 | Sparse matrix storage and computation system and method |
CN117155843A (en) * | 2023-10-31 | 2023-12-01 | 苏州元脑智能科技有限公司 | Data transmission method, device, routing node, computer network and medium |
-
2019
- 2019-01-08 CN CN201910016870.0A patent/CN109740116A/en not_active Withdrawn
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704799A (en) * | 2019-09-06 | 2020-01-17 | 苏州浪潮智能科技有限公司 | Data processing equipment and system |
CN111798363A (en) * | 2020-07-06 | 2020-10-20 | 上海兆芯集成电路有限公司 | Graphics processor |
US11409523B2 (en) * | 2020-07-06 | 2022-08-09 | Glenfly Technology Co., Ltd. | Graphics processing unit |
CN111798363B (en) * | 2020-07-06 | 2024-06-04 | 格兰菲智能科技有限公司 | Graphics processor |
WO2022053032A1 (en) * | 2020-09-11 | 2022-03-17 | 北京希姆计算科技有限公司 | Matrix calculation circuit, method, electronic device, and computer-readable storage medium |
CN112306660A (en) * | 2020-11-05 | 2021-02-02 | 山东云海国创云计算装备产业创新中心有限公司 | Data processing method and system based on RISC-V coprocessor |
WO2022148181A1 (en) * | 2021-01-08 | 2022-07-14 | 苏州浪潮智能科技有限公司 | Sparse matrix accelerated computing method and apparatus, device, and medium |
WO2023272917A1 (en) * | 2021-06-28 | 2023-01-05 | 华中科技大学 | Sparse matrix storage and computation system and method |
CN117155843A (en) * | 2023-10-31 | 2023-12-01 | 苏州元脑智能科技有限公司 | Data transmission method, device, routing node, computer network and medium |
CN117155843B (en) * | 2023-10-31 | 2024-02-23 | 苏州元脑智能科技有限公司 | Data transmission method, device, routing node, computer network and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740116A (en) | A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate | |
Imani et al. | Ultra-efficient processing in-memory for data intensive applications | |
CN107340993B (en) | Arithmetic device and method | |
CN112114776A (en) | Quantum multiplication method and device, electronic device and storage medium | |
WO2022148181A1 (en) | Sparse matrix accelerated computing method and apparatus, device, and medium | |
CN107423816A (en) | A kind of more computational accuracy Processing with Neural Network method and systems | |
GB2580153A (en) | Converting floating point numbers to reduce the precision | |
CN109284824A (en) | A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies | |
CN113033769B (en) | Probabilistic calculation neural network method and asynchronous logic circuit | |
TW202230165A (en) | Device and method of compute in memory | |
CN113805842A (en) | Integrative device of deposit and calculation based on carry look ahead adder realizes | |
CN108960414A (en) | Method for realizing single broadcast multiple operations based on deep learning accelerator | |
US20170168775A1 (en) | Methods and Apparatuses for Performing Multiplication | |
CN115879530A (en) | Method for optimizing array structure of RRAM (resistive random access memory) memory computing system | |
CN106021188A (en) | Parallel hardware architecture and parallel computing method for floating point matrix inversion | |
CN116362314A (en) | Integrated storage and calculation device and calculation method | |
CN113222129B (en) | Convolution operation processing unit and system based on multi-level cache cyclic utilization | |
US20230253032A1 (en) | In-memory computation device and in-memory computation method to perform multiplication operation in memory cell array according to bit orders | |
Ahn | Computation of deep belief networks using special-purpose hardware architecture | |
CN110245756A (en) | Method for handling the programming device of data group and handling data group | |
CN115481364A (en) | Parallel computing method for large-scale elliptic curve multi-scalar multiplication based on GPU (graphics processing Unit) acceleration | |
CN113378115A (en) | Near-memory sparse vector multiplier based on magnetic random access memory | |
CN114881239A (en) | Method and apparatus for constructing quantum generator, medium, and electronic apparatus | |
CN111221500B (en) | Massively parallel associative multiplier-accumulator | |
CN116151171B (en) | Full-connection I Xin Moxing annealing treatment circuit based on parallel tempering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190510 |
|
WW01 | Invention patent application withdrawn after publication |