CN109740116A - A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate - Google Patents

A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate Download PDF

Info

Publication number
CN109740116A
CN109740116A CN201910016870.0A CN201910016870A CN109740116A CN 109740116 A CN109740116 A CN 109740116A CN 201910016870 A CN201910016870 A CN 201910016870A CN 109740116 A CN109740116 A CN 109740116A
Authority
CN
China
Prior art keywords
matrix
nonzero element
random access
access memory
flag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910016870.0A
Other languages
Chinese (zh)
Inventor
张贞雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201910016870.0A priority Critical patent/CN109740116A/en
Publication of CN109740116A publication Critical patent/CN109740116A/en
Withdrawn legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of circuits for realizing sparse matrix multiplication operation, including multiple modules, pass through the cooperation between modules, it can be during carrying out the multiplying of sparse matrix, filter out the nonzero element in sparse matrix, and in calculating process, step-by-step and operation are carried out by the line flag position that column flag bit to the first matrix rows and the second matrix respectively arrange, obtain blip position, the nonzero element for really needing and participating in operation is finally chosen from nonzero element according to blip position, avoid the calculating process to a large amount of neutral elements and the nonzero element for needing not participate in operation, it has been obviously improved the efficiency of sparse matrix multiplication operation.In addition, being acted on corresponding with the effect of foregoing circuit the present invention also provides a kind of FPGA plate.

Description

A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate
Technical field
The present invention relates to hardware design field, in particular to a kind of circuit and FPGA for realizing sparse matrix multiplication operation Plate.
Background technique
Sparse matrix almost results from all large-scale scientific engineering computing fields, including present machine learning, big number According to popular domains such as, image procossings.For sparse matrix, the number of nonzero element is far smaller than the total of matrix element in matrix Number, and the no rule of distribution of nonzero element, if calculated according to conventional matrix calculation sparse matrix, Greatly waste memory space, while also greatly reducing calculating speed.There is the scene of high requirement in some pairs of processing speeds In, if processing speed is low, is not able to satisfy the requirement handled in real time by the way of software processing.
Summary of the invention
The object of the present invention is to provide a kind of circuit for realizing sparse matrix multiplication operation and FPGA plates, pass through to solve The problem of arithmetic speed is lower when software progress sparse matrix multiplication operation, is not able to satisfy the demand handled in real time.
In order to solve the above technical problems, the present invention provides a kind of circuits for realizing sparse matrix multiplication operation, comprising: inspection Survey module, the first random access memory, the second random access memory, the first flag bit generation module, the second flag bit generation module, the Three random access memory, the 4th random access memory, address generation module, controller, accumulator, output module;
Wherein, the detection module is used to detect whether the element in the first matrix by row input to be nonzero element, and Nonzero element in first matrix is stored to the second square for being also used to detect to first random access memory by column input Whether the element in battle array is nonzero element, and the nonzero element in second matrix is stored to second random storage Device;
The first flag bit generator is used for the row number according to nonzero element in first random access memory, generates institute The column flag bit of the first matrix rows is stated, and the column flag bit is stored to the third random access memory, is also used to basis The line number of nonzero element in second random access memory, generates the line flag position that second matrix respectively arranges, and by the row Flag bit is stored to the 4th random access memory;
The address generation module is used to carry out step-by-step and operation to the column flag bit and the line flag position, obtains mesh Mark flag bit;
The controller be used for according to the blip position respectively from first random access memory and described second with Machine memory reads multipair nonzero element, carries out multiplying to each pair of nonzero element read, obtains multiple first operations As a result;
The accumulator obtains the second operation result for adding up to the multiple first operation result, and by institute The storage of the second operation result is stated to the output module;
The output module obtains first matrix and described second for carrying out tissue to second operation result The matrix multiplication operation result of matrix.
Optionally, the address generation module is specifically used for: the enabled instruction sent in response to the controller, from described Third random access memory and the 4th random access memory obtain column flag bit and line flag position, and to the column flag bit and the row Flag bit carries out step-by-step and operation, obtains blip position, wherein the enabled instruction is that the controller is deposited in reading line by line It is generated and sent when storing up the nonzero element in first random access memory;
The controller is specifically used for: according to the blip position, reading non-zero entry from second random access memory The nonzero element is formed multipair nonzero element with the nonzero element read from first random access memory by element, and to each right Nonzero element carries out multiplying, obtains multiple first operation results.
Optionally, the controller is specifically used for:
According to the blip position, determine the nonzero element for participating in this multiplying in second random access memory In destination address, and according to the destination address from second random access memory read nonzero element.
Optionally, the controller is also used to:
It is determining to be located at same row with the nonzero element for participating in this multiplying according to the blip position, and It is not involved in the nonzero element of this multiplying, and is deleted.
Optionally, the circuit further include:
Interrupt signal generation module: in detecting first random access memory or second random access memory There is no when the unread nonzero element arrived, interrupt signal is generated, to prompt the first matrix described in front-end module and described second The matrix multiplication operation of matrix has been completed.
Optionally, the first flag bit generator is specifically used for:
The nonzero element in first random access memory is obtained one by one;
Whether the line number for judging the nonzero element is identical as the line number of previous nonzero element;
If they are the same, the line flag position of the current line is updated according to the row number of the nonzero element;
If not identical, by the numerical value storage of the line flag position of current line to the third random access memory, initialization row Flag bit, and the line flag position after initialization is updated according to the nonzero element.
Optionally, the detection module is specifically used for:
By the line number of nonzero element, described in the numerical value of the nonzero element in first matrix, first matrix The row number of nonzero element is stored to first random access memory in one matrix, is also used to the non-zero entry in second matrix The numerical value of element, the line number of nonzero element in second matrix, in second matrix nonzero element row number storage to described First random access memory.
In addition, the present invention also provides a kind of FPGA plate, including FPGA plate ontology, it further include a kind of realization as described above The circuit of sparse matrix multiplication operation.
A kind of circuit for realizing sparse matrix multiplication operation provided by the present invention, including multiple modules, pass through each mould Cooperation between block can filter out the non-zero in sparse matrix during carrying out the multiplying of sparse matrix Element, and in calculating process, the line flag position respectively arranged by column flag bit to the first matrix rows and the second matrix into Row step-by-step and operation, obtain blip position, are finally chosen from nonzero element according to blip position and really need participation fortune The nonzero element of calculation avoids the calculating process to a large amount of neutral elements and the nonzero element for needing not participate in operation, significantly mentions The efficiency of sparse matrix multiplication operation is risen.
In addition, effect is corresponding with the effect of foregoing circuit the present invention also provides a kind of FPGA plate, it is no longer superfluous here It states.
Detailed description of the invention
It, below will be to embodiment or existing for the clearer technical solution for illustrating the embodiment of the present invention or the prior art Attached drawing needed in technical description is briefly described, it should be apparent that, the accompanying drawings in the following description is only this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is a kind of structural representation for the circuit embodiments one for realizing sparse matrix multiplication operation provided by the present invention Figure;
Fig. 2 is a kind of calculation process for the circuit embodiments one for realizing sparse matrix multiplication operation provided by the present invention Figure;
Fig. 3 is a kind of structural representation for the circuit embodiments two for realizing sparse matrix multiplication operation provided by the present invention Figure.
Specific embodiment
Core of the invention is to provide a kind of circuit for realizing sparse matrix multiplication operation and FPGA plate, avoids to a large amount of Neutral element and need not participate in operation nonzero element calculating process, be obviously improved the effect of sparse matrix multiplication operation Rate.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.
A kind of circuit embodiments one for realizing sparse matrix multiplication operation provided by the invention are introduced below, referring to Fig. 1, embodiment one include: detection module 101, the first random access memory 102, the second random access memory 103, the life of the first flag bit It is generated at module 104, the second flag bit generation module 105, third random access memory 106, the 4th random access memory 107, address Module 108, controller 109, accumulator 110, output module 111, the function of modules are as follows:
Whether the element that the detection module 101 is used to detect in the first matrix by row input is nonzero element, and will Nonzero element in first matrix is stored to first random access memory 102, is also used to detect second by column input Whether the element in matrix is nonzero element, and the nonzero element in second matrix is stored to second random storage Device 103;
The first flag bit generator 104 is used for the row number according to nonzero element in first random access memory 102, The column flag bit of first matrix rows is generated, and by column flag bit storage to the third random access memory 106, also For the line number according to nonzero element in second random access memory 103, the line flag position that second matrix respectively arranges is generated, And the line flag position is stored to the 4th random access memory 107;
The address generation module 108 is used to carry out step-by-step and operation to the column flag bit and the line flag position, obtains To blip position;
The controller 109 is used for according to the blip position respectively from first random access memory 102 and described Second random access memory 103 reads multipair nonzero element, carries out multiplying to each pair of nonzero element read, obtains multiple First operation result;
The accumulator 110 is used to add up to the multiple first operation result, obtains the second operation result, and will Second operation result is stored to the output module 111;
The output module 111 for carrying out tissue to second operation result, obtain first matrix with it is described The matrix multiplication operation result of second matrix.
It should be noted that above-mentioned first matrix and the second matrix are sparse matrix, i.e., of nonzero element in matrix Number is far smaller than the quantity of neutral element in matrix, and the distribution of nonzero element is without rule.
The column flag bit of above-mentioned first matrix rows refers to nonzero element institute in the row element for the first matrix of label Parameter in position, it is assumed that the first matrix 8 arranges totally, then the column flag bit of each row is one 8 binary numbers, as one kind Optional embodiment, " 1 " indicates that corresponding element is nonzero element in the row in column flag bit, and " 0 " is indicated in the row In corresponding element be neutral element, then, it is assumed that the column flag bit of the second row of the first matrix be 10010011, then show first The element that the 1st column, the 4th column, the 7th column, the 8th arrange in the element of second row of matrix is nonzero element, remaining element is neutral element. Similarly, the line flag position that above-mentioned second matrix respectively arranges refers in the column element for the second matrix of label where nonzero element The parameter of position.
Above-mentioned step-by-step and operation, refer to two data for participating in operation, carry out AND operation, operation by binary digit Rule are as follows: 0&0=0,0&1=0,1&0=0,1&1=1, that is to say, that only simultaneously it is " 1 " when two, is as a result just " 1 ", Otherwise result is 0.
It is known that matrix multiplication operation is a kind of basic mathematical operation, mathematical definition are as follows: set A as the square of m × p Battle array, B is that the matrix of p × n is denoted as C=AB then the Matrix C of m × n is referred to as the product of matrix A and matrix B, wherein in Matrix C The i-th row jth column element can indicate are as follows:
As can be seen that an element c in calculating matrix CijWhen, it needs to arrange the i-th row of A matrix with B matrix jth each A corresponding element carries out multiplying and obtains the element then by each product addition.Therefore, the present embodiment by pair The line flag position that the column flag bit and the second matrix of first matrix rows respectively arrange carries out step-by-step and operation, can weed out those needs The nonzero element of multiplication operation is carried out with neutral element, that is to say, that can be true according to the blip position that step-by-step is obtained with operation Really it is necessary to carry out the nonzero element of multiplication operation in fixed first matrix and the second matrix, avoid nonzero element and neutral element Carry out the process of multiplication operation.
Circuit progress is described below in a kind of circuit for realizing sparse matrix multiplication operation provided based on the above embodiment The first matrix for participating in matrix multiplication operation is known as A matrix, the second square during being somebody's turn to do by the process of sparse matrix multiplication operation The product of battle array referred to as B matrix, A matrix and B matrix is C matrix.As shown in Fig. 2, the process specifically includes:
The input of step S101:A matrix by rows, B matrix by column input, detection module 101 to the element of A matrix, B matrix into The detection of row non-zero, the first random access memory 102, the second random access memory 103 store A matrix, B nonzeros respectively.
The S102: the first flag bit of step generation module 104 is according to the position where nonzero element in each row element of A matrix It sets, generates column flag bit, and column flag bit is stored in third random access memory 106;Second flag bit generation module, 105 basis Position in each column element of B matrix where nonzero element generates line flag position, and line flag position is stored in the 4th and is deposited at random Reservoir 107.
Step S103: address generation module 108 takes column flag bit from third random access memory 106, from the 4th random storage Device 107 takes line flag position, and carries out step-by-step and operation to column flag bit and line flag position, obtains blip position.
Step S104: controller 109 is according to blip position from the first random access memory 102 and the second random access memory 103 choose element, and carry out multiplication operation, send accumulator 110 for the result of multiplication operation.
Step S105: accumulator 110 adds up to the result of multiplication operation, obtains an element of C matrix.
Step S106: output module 111 carries out tissue to the element for the C matrix that accumulator 110 obtains, and obtains final C Matrix.
As it can be seen that the present embodiment provides a kind of circuit for realizing sparse matrix multiplication operation, including multiple modules, by each Cooperation between a module can filter out in sparse matrix during carrying out the multiplying of sparse matrix Nonzero element, and in calculating process, the line flag respectively arranged by column flag bit to the first matrix rows and the second matrix Position carries out step-by-step and operation, obtains blip position, is finally chosen from nonzero element according to blip position and really need ginseng With the nonzero element of operation, the calculating process to a large amount of neutral elements and the nonzero element for needing not participate in operation is avoided, is shown Work improves the efficiency of sparse matrix multiplication operation.
Start that a kind of circuit embodiments two for realizing sparse matrix multiplication operation provided by the invention are discussed in detail below, it is real It applies example two one to realize based on the above embodiment, and has carried out expansion to a certain extent on the basis of example 1.
As shown in figure 3, embodiment two specifically includes: detection module 101, the first random access memory 102, the second random storage Device 103, the first flag bit generation module 104, the second flag bit generation module 105, third random access memory the 106, the 4th are random Memory 107, address generation module 108, controller 109, accumulator 110, output module 111, in addition to this, the present embodiment is also It include: interrupt signal generation module 112.
Interrupt signal generation module 112 is used to detect first random access memory 102 or described second deposit at random In reservoir 103 be not present the unread nonzero element arrived when, generate interrupt signal, with prompt the first matrix described in front-end module with The matrix multiplication operation of second matrix has been completed.Retouching in embodiment one is referred to for the basic function of remaining module It states, here not reinflated introduction, the part that the present embodiment is expanded based on embodiment one is described below:
As an alternative embodiment, in the present embodiment, address generation module 108 and non-spontaneous reading column flag bit With line flag position and carry out step-by-step and operation, but in response to controller 109 send instruction execution aforesaid operations.In other words It says, during executing matrix multiplication operation, controller 109 is regular from the first random access memory 102 or the second according to certain Random access memory 103 reads nonzero element, and sends corresponding enabled instruction to address generation module 108, and triggering following operates, In this process, it is assumed that controller 109 is operated by reading the nonzero element triggering following of the first random access memory 102, that Modules are specifically divided the work as described below:
The address generation module 108 is specifically used for: the enabled instruction sent in response to the controller 109, from described Third random access memory 106 and the 4th random access memory 107 obtain column flag bit and line flag position, and to the column flag bit and The line flag position carries out step-by-step and operation, obtains blip position, wherein the enabled instruction is that the controller 109 exists It is generated and sent when reading stores the nonzero element in first random access memory line by line;
The controller 109 is specifically used for: according to the blip position, reading from second random access memory 103 The nonzero element is formed multipair nonzero element with the nonzero element read from first random access memory 102 by nonzero element, And multiplying is carried out to each pair of nonzero element, obtain multiple first operation results.
As an alternative embodiment, controller is after obtaining blip position, it is also necessary to according to blip Position determines the address where the nonzero element for needing to read, and then reads the element from random access memory.Assuming that controller 109 exists After reading the element in the first random access memory 102, it is desirable to be determined and be needed from the second random access memory 103 according to blip position When the nonzero element of middle reading, then, the process specifically:
The controller 109 is specifically used for: according to the blip position, determining the non-zero entry for participating in this multiplying Destination address of the element in second random access memory, and read according to the destination address from second random access memory Nonzero element.
As an alternative embodiment, can by the first random access memory 102 and the second random access memory 102 not The nonzero element for needing to participate in operation is deleted, that is to say, that, it is assumed that a certain column in the second matrix have been determined according to blip position The middle nonzero element for needing to participate in operation, then, to remaining nonzero element in the column, so that it may be deleted.Detailed process It is as follows:
The controller 109 is also used to: determining to participate in the non-of this multiplying with described according to the blip position Neutral element is located at same row and is not involved in the nonzero element of this multiplying, and is deleted.
As an alternative embodiment, the work of the first flag bit generator 104 or the second flag bit generator 105 In the process, due to needing to generate the column flag bits of the first matrix rows, or the line flag position that the second matrix respectively arranges is generated, therefore, During the work time, need to judge whether a column flag bit or line flag position generate completely, specifically, below with the first mark Position generator 104 describes the realization of the process, is also referred to hereafter as the second flag bit generator 105, the process packet It includes:
The first flag bit generator 104 is specifically used for: obtaining the non-zero in first random access memory 102 one by one Element;Whether the line number for judging the nonzero element is identical as the line number of previous nonzero element;If they are the same, according to the non-zero The row number of element updates the line flag position of the current line;If not identical, the numerical value storage of the line flag position of current line is arrived The third random access memory 106 initializes line flag position, and updates the line flag after initialization according to the nonzero element Position.
As an alternative embodiment, detection module 101 can not only detect nonzero element, can also detect that Line number, the row number of nonzero element, and nonzero element, line number, row number are saved in together in corresponding random access memory, the process Specifically:
The detection module 101 is specifically used for: by the numerical value of the nonzero element in first matrix, first matrix The row number of nonzero element is stored to first random access memory 102 in the line number of middle nonzero element, first matrix, is also used In by the line number of nonzero element, second matrix in the numerical value of the nonzero element in second matrix, second matrix The row number of middle nonzero element is stored to second random access memory 102.
A kind of circuit for realizing sparse matrix multiplication operation provided based on the above embodiment, describes the electricity in detail below Road carries out the process of sparse matrix multiplication operation, and during being somebody's turn to do, the first matrix for participating in matrix multiplication operation is known as A matrix, Second matrix is known as B matrix.The process specifically includes:
Step S201: drive module or prime module issue one group of sparse matrix A, B, it should be noted that A matrix element It is to be issued by row, B matrix element is issued by column, and such transmitting sequence can accelerate the calculating speed of sparse matrix product.
Step S202: detection module 101 is responsible for carrying out the element of A matrix and B matrix non-zero detection, line number, row number Detection gives up the element that value is 0, retains non-zero element.
Specifically, step S202 can be divided into following steps:
Step S2021: for matrix A, the preceding m*p data of DMA transfer are the element of A matrix, for non-zero entry Element is saved with the Ram_0 that width is RAM_DEPTH, at the same save be current non-zero element line number and row number.Wherein A_data_num is the input number of current A matrix element.
Line number A_row_num=(A_data_num/p);
Row number A_column_num=(A_data_num%p);
The high ROW_WIDTH bit address of Ram_0, addr [RAM_DEPTH-1:RAM_DEPTH-ROW_WIDTH] storage line Number A_row_num;The intermediate COLUMN_WIDTH bit address of Ram_0, addr [(RAM_DEPTH-ROW_WIDTH-1): (RAM_ DEPTH-ROW_WIDTH-COLUMN_WIDTH row number A_column_num)] is stored;The low DATA_WIDTH bit address of Ram_0, The nonzero element of addr [DATA_WIDTH-1:0] storage A.
Wherein, RAM_WIDTH indicate storage A matrix, B matrix non-zero element RAM width, ROW_WIDTH expression deposit Storage A matrix, B matrix non-zero element RAM in for storing the data bit width of line number, COLUMN_WIDTH indicate storage A matrix, For storing the data bit width of row number in the RAM of B matrix non-zero element, DATA_WIDTH indicates storage A matrix, B matrix non-zero For storing the data bit width of non-zero in the RAM of element.It should be noted that RAM_WIDTH, ROW_WIDTH, COLUMN_WIDTH, DATA_WIDTH meet following relationship: RAM_WIDTH=ROW_WIDTH+COLUMN_WIDTH+DATA_ WIDTH。
The sum of the nonzero element of cumulative all rows of A matrix simultaneously, obtains A_DATA_NO_0_SUM.
Step S2022: the column mark of the nonzero element of the first flag bit generation module 104 statistics every a line of A matrix is utilized Position.Specific statistical are as follows:
A_No_0_flag bit wide is p (A matrix column number), is initialized as 0, generates mould when being input to the first flag bit The line number of the element A (A data have passed through the processing of detection module 101 at this time, obtain line number and row information) of block 104 is constant (A_row_num is constant) changes the value of A_No_0_flag according to its row number A_column_num.If the line number of element A changes Then Ram_2 is written in A_No_0_flag by (A_row_num variation), and set is 0 to A_No_0_flag again, then according to A data The value of row number A_column_num change A_No_0_flag.
Specially 1 is set by the corresponding bit of A_No_0_flag according to the value of A_column_num.For example, A square The column number of battle array is p=8, i.e. A has 8 column.
A_row_num=1, the A_column_num=1 of current A matrix element, then A_No_0_flag=8 ' b 0000_ 0001。
After inputting an element A again, A_row_num=1, A_column_num=3, then A_No_0_flag=8 ' b0000_0101;
After inputting an element A again, A_row_num=1, A_column_num=5, then A_No_0_flag=8 ' b 0001_0101;
After inputting an element A again, A_row_num=3, A_column_num=4, the line number (A_row_ of A data at this time Num it) changes and Ram_2 then is written into A_No_0_flag=8 ' b0001_0101 first, then set A_No_0_flag= A_No_0_flag=8 ' b 0000_1000 is arranged further according to A_column_num=4 in 8 ' b 0000_0000;
After inputting an element A again, A_row_num=3, A_column_num=7, then A_No_0_flag=8 ' b0100_1000。
Step S2023: for matrix B, the rear p*n element of DMA transfer is the element of B matrix, simultaneously for non- Zero matrix B element is saved with the Ram_1 that width is RAM_DEPTH, at the same save be current non-zero data line number With row number (it should be noted that because B matrix is inputted by column, B matrix at this time is the transposed matrix of B).
Wherein B_data_num is the input number of current B data.
Line number B_row_num=(B_data_num/p);
Row number B_column_num=(B_data_num%p);
The high ROW_WIDTH bit address of Ram_0, addr [RAM_DEPTH-1:RAM_DEPTH-ROW_WIDTH] storage line Number B_row_num;The intermediate COLUMN_WIDTH bit address of Ram_0, addr [(RAM_DEPTH-ROW_WIDTH-1): (RAM_ DEPTH-ROW_WIDTH-COLUMN_WIDTH row number B_column_num)] is stored;The low DATA_WIDTH bit address of Ram_0, The nonzero element of addr [DATA_WIDTH-1:0] storage B.
Wherein, RAM_WIDTH indicate storage A matrix, B matrix non-zero element RAM width, ROW_WIDTH expression deposit Storage A matrix, B matrix non-zero element RAM in for storing the data bit width of line number, COLUMN_WIDTH indicate storage A matrix, For storing the data bit width of row number in the RAM of B matrix non-zero element, DATA_WIDTH indicates storage A matrix, B matrix non-zero For storing the data bit width of non-zero in the RAM of element.It should be noted that RAM_WIDTH, ROW_WIDTH, COLUMN_WIDTH, DATA_WIDTH meet following relationship: RAM_WIDTH=ROW_WIDTH+COLUMN_WIDTH+DATA_ WIDTH。
Meanwhile the sum of the nonzero element of cumulative all column of B matrix, obtain B_DATA_NO_0_SUM.
Step S2023: the column mark of the nonzero element of the second flag bit generation module 105 statistics every a line of B matrix is utilized Position.Statistic processes specifically includes:
B_No_0_flag bit wide is p (B matrix column number), is initialized as 0, generates mould when being input to the second flag bit The line number of the B element (B element have passed through the processing of detection module 101 at this time, obtain line number and row information) of block 105 is constant (B_row_num is constant) changes the value of B_No_0_flag according to its row number B_column_num.If the line number of B element changes Then the 4th random access memory 108 is written in B_No_0_flag by (B_row_num variation), and set is 0 to B_No_0_flag again, so Change the value of B_No_0_flag according to the row number B_column_num of B element afterwards.
Specially 1 is set by the corresponding bit of B_No_0_flag according to the value of B_column_num.For example, B square The column number of battle array is p=8, i.e. B has 8 column.
B_row_num=1, the B_column_num=1 of current B matrix element, then B_No_0_flag=8 ' b0000_ 0010;
After inputting a B element again, B_row_num=1, B_column_num=5, then B_No_0_flag=8 ' b0010_0010;
After inputting a B element again, B_row_num=1, B_column_num=7, then B_No_0_flag=8 ' b1010_0010;
B_row_num=3, B_column_num=4 after a B element are inputted again, at this time line number (the B_row_ of B data Num it) changes and Ram_2 then is written into B_No_0_flag=8 ' b1010_0010 first, then set B_No_0_flag= 8 ' b 0000_0000 update B_No_0_flag=8 ' b 0001_0000 further according to B_column_num=4;
After inputting a B element again, B_row_num=3, B_column_num=6, then B_No_0_flag=8 ' b 0101_0000。
Step S203: controller 109 from the first random access memory 102, the negated neutral element of the second random access memory 103, from Column flag bit is read in third random access memory 105, the 4th random access memory 106 and line flag position carries out logic control, specifically The following steps are included:
Step S2031: a line nonzero element of the first random access memory 102 is read, the line number A_row_ of element A is obtained Num, row number A_column_num, numerical value A_data.
Step S2032: a third random access memory 106 is read, the column mark of the nonzero element of A matrix current line is obtained Position A_No_0_flag.
Step S2033: reading the 4th random access memory 107, obtains B matrix when the line flag of the nonzero element in forefront Position B_No_0_flag.
Step S2034: sending A_No_0_flag and B_No_0_flag into address generation module 108, obtains being read In the address of the second random access memory 103, (purpose is only by B matrix when the nonzero element that forefront participates in operation is read out to element Come, the nonzero element for being not involved in operation is not read, and arithmetic speed is greatly accelerated), concrete operation method is:
Carry out step-by-step and operation, i.e. Valid_No_0_flag=A_No_0_flag&B_No_0_flag (& indicate step-by-step with Operation);And then the position of nonzero element in the Valid_No_0_flag of blip position is obtained, setting addr_0, addr_1 ..., Addr_Valid_No_0_cnt-1, wherein Valid_No_0_cnt indicates the quantity of nonzero element in Valid_No_0_flag; Then addr_0, addr_1 ..., addr_Valid_No_0_cnt-1 are converted to it in all nonzero elements of B_No_0_flag In position, obtain addr_B_0, addr_B_1 ..., addr_B_Valid_No_0_cnt-1;It is directed to B matrix current operation The address of the row Ram_1 to be read are as follows:
B_addr=B_No_0_sum+addr_B_0
B_addr=B_No_0_sum+addr_B_1
………
B_addr=B_No_0_sum+addr_B_Valid_No_0_cnt-1.
Finally, the second random access memory 103 is read according to obtained B_addr, by the element A and B that have identical row number member Element carries out multiplication operation, and sends the result to accumulator 110.
After reading Ram_1, B_No_0_sum executes accumulation operations, i.e. B_No_0_sum=B_No_0_sum+B_ No_0_cnt, wherein B_No_0_cnt indicates the number of all non-zero bit in current line B_No_0_flag.
Step S2034 is illustrated for convenience of understanding, specific as follows:
Assuming that A_No_0_flag=8 ' b 0101_1010, B_No_0_flag=8 ' b1101_0110, then it should in B matrix The quantity B_No_0_cnt=5 of column nonzero element;So, step-by-step and operation after, blip position C_No_0_flag=A_No_ 0_flag&B_No_0_flag=8 ' b 0101_0010, it is seen that the quantity Valid_No_0_cnt=3 of effective nonzero element, Position of each effective nonzero element in blip position is respectively addr_0=1, addr_1=4, addr_2=6, and Position of each effective nonzero element in the line flag position of B matrix is addr_B_0=0, addr_B_1=2, addr_B_2 =3.It finally successively goes to read 0,2,3 addresses in the second random access memory 103, the B element of reading is subjected to phase with corresponding element A Multiplication, and result is sent into accumulator 110, final updating B_No_0_sum=0+B_No_0_cnt=5.
Later, into next round, in the 4th random access memory 107 of calculating and then secondary reading for the column element for completing B, A line flag position B_No_0_flag=8 ' b0100_1110 of B matrix is obtained, the number of the column nonzero element in B matrix is obtained Measure B_No_0_cnt=4.So, blip position C_No_0_flag=A_No_0_flag&B_No_0_flag=8 ' b0100_ 1010, it is seen then that the quantity Valid_No_0_cnt=3 of effective nonzero element, each effective nonzero element is in blip Position in position is respectively addr_0=1, addr_1=3, addr_2=6, and each effective nonzero element is in the row of B matrix Position in flag bit is respectively addr_B_0=0, addr_B_1=2, addr_B_2=3.Finally, successively going to read second at random (5+0) of memory 103, (5+2), the address (5+3), by the B element of reading and corresponding element A progress multiplication operation, and to As a result accumulator 110, final updating B_No_0_sum=5+B_No_0_cnt=9 are sent to.
Step S204: repeating step S203, and the process flow of last accumulator 110 is as follows:
When the second random access memory 103 read B element in B_row_num constant (remaining as the B element of same row), Then Value=Value+A_data*B_data, and the line number for recording Value is Value_row_num=A_row_num, row number For Value_column_num=B_row_num;
When B_row_num variation (as new a line B data) in the B element that the second random access memory 103 is read, then will The value of Value, including Value_row_num, Value_column_num write into output module 111 together, and then, Value is multiple Position is 0.
Wherein, Value indicates the result of product of A matrix, B matrix, and Value_row_num indicates that A matrix, B matrix multiply Line number of the product result in matrix of consequence, Value_column_num indicate the result of product of A matrix, B matrix in matrix of consequence In row number.
Step S205: after calculating is fully completed, i.e. A_addr=A_DATA_NO_0_SUM or B_addr=A_DATA_ When NO_0_SUM, interrupt signal IRQ is generated, prime module or drive module is notified to calculate knot to read sparse matrix product Fruit.
As it can be seen that a kind of circuit for realizing sparse matrix multiplication operation provided in this embodiment, in a first aspect, propose a kind of A, B matrix storage method carries out non-zero detection to A, B matrix, only stores the non-zero and corresponding line number row number of A, B matrix Information is greatly saved storage resource, while accelerating calculating speed.Second aspect devises address generation module 108, Only read participate in operation B element, for be not involved in work as forefront element do not read, greatly accelerate calculating speed.Third Aspect, reasonable control logic realizes the quick multiplication of A matrix, B matrix non-zero data, and obtains the line number of calculated result Row information, only the nonzero value of storage calculated result and corresponding row/column information, further save storage resource.
To sum up, it is continuous to complete sparse matrix for a kind of circuit for realizing sparse matrix multiplication operation provided in this embodiment Quick product calculates, and computational efficiency is higher, can be widely applied to include the popular neck such as big data, image procossing, machine learning Domain.
In addition, it further includes as described above that the FPGA plate, which includes FPGA plate ontology, the present invention also provides a kind of FPGA plate A kind of circuit for realizing sparse matrix multiplication operation.
A kind of FPGA plate of the present embodiment is for realizing a kind of circuit for realizing sparse matrix multiplication operation above-mentioned, therefore The embodiment part of the visible circuit for realizing sparse matrix multiplication operation one of above of the FPGA classes of specific embodiment, Its specific embodiment is referred to the description of corresponding various pieces embodiment, and the work of its effect and foregoing circuit embodiment With corresponding, not reinflated introduction herein.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other The difference of embodiment, same or similar part may refer to each other between each embodiment.For being filled disclosed in embodiment For setting, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is referring to method part Explanation.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
A kind of circuit for realizing sparse matrix multiplication operation provided by the present invention and FPGA plate have been carried out in detail above It introduces.Used herein a specific example illustrates the principle and implementation of the invention, the explanation of above embodiments It is merely used to help understand method and its core concept of the invention.It should be pointed out that for the ordinary skill people of the art Member for, without departing from the principle of the present invention, can with several improvements and modifications are made to the present invention, these improve and Modification is also fallen within the protection scope of the claims of the present invention.

Claims (8)

1. a kind of circuit for realizing sparse matrix multiplication operation characterized by comprising detection module, the first random access memory, Second random access memory, the first flag bit generation module, the second flag bit generation module, third random access memory, the 4th are at random Memory, address generation module, controller, accumulator, output module;
Wherein, whether the element that the detection module is used to detect in the first matrix inputted by row is nonzero element, and by institute The nonzero element stated in the first matrix stores in the second matrix for being also used to detect to first random access memory by column input Element whether be nonzero element, and by the nonzero element storage in second matrix to second random access memory;
The first flag bit generator is used for according to the row number of nonzero element in first random access memory, generates described the The column flag bit of one matrix rows, and the column flag bit is stored to the third random access memory, it is also used to according to The line number of nonzero element in second random access memory, generates the line flag position that second matrix respectively arranges, and by the line flag The 4th random access memory is arrived in position storage;
The address generation module is used to carry out step-by-step and operation to the column flag bit and the line flag position, obtains target mark Will position;
The controller for depositing from first random access memory and described second at random respectively according to the blip position Reservoir reads multipair nonzero element, carries out multiplying to each pair of nonzero element read, obtains multiple first operation results;
The accumulator obtains the second operation result for adding up to the multiple first operation result, and by described the Two operation results are stored to the output module;
The output module obtains first matrix and second matrix for carrying out tissue to second operation result Matrix multiplication operation result.
2. circuit as described in claim 1, which is characterized in that the address generation module is specifically used for: in response to the control The enabled instruction that device processed is sent obtains column flag bit and line flag from the third random access memory and the 4th random access memory Position, and step-by-step and operation are carried out to the column flag bit and the line flag position, obtain blip position, wherein the starting Instruction is what the controller generated and sent when reading store the nonzero element in first random access memory line by line;
The controller is specifically used for: according to the blip position, nonzero element is read from second random access memory, it will The nonzero element forms multipair nonzero element with the nonzero element read from first random access memory, and to each pair of non-zero entry Element carries out multiplying, obtains multiple first operation results.
3. circuit as claimed in claim 2, which is characterized in that the controller is specifically used for:
According to the blip position, determine the nonzero element for participating in this multiplying in second random access memory Destination address, and nonzero element is read from second random access memory according to the destination address.
4. circuit as claimed in claim 3, which is characterized in that the controller is also used to:
It is determining to be located at same row with the nonzero element for participating in this multiplying according to the blip position, and do not join With the nonzero element of this multiplying, and deleted.
5. circuit as claimed in claim 4, which is characterized in that the circuit further include:
Interrupt signal generation module: for not deposited in detecting first random access memory or second random access memory In the unread nonzero element arrived, interrupt signal is generated, to prompt the first matrix described in front-end module and second matrix Matrix multiplication operation completed.
6. circuit as described in claim 1, which is characterized in that the first flag bit generator is specifically used for:
The nonzero element in first random access memory is obtained one by one;
Whether the line number for judging the nonzero element is identical as the line number of previous nonzero element;
If they are the same, the line flag position of the current line is updated according to the row number of the nonzero element;
If not identical, by the numerical value storage of the line flag position of current line to the third random access memory, line flag is initialized Position, and the line flag position after initialization is updated according to the nonzero element.
7. circuit as claimed in any one of claims 1 to 6, which is characterized in that the detection module is specifically used for:
By the line number of nonzero element, first square in the numerical value of the nonzero element in first matrix, first matrix The row number of nonzero element is stored to first random access memory in battle array, is also used to the nonzero element in second matrix The line number of nonzero element in numerical value, second matrix, in second matrix nonzero element row number storage to described second Random access memory.
8. a kind of FPGA plate, which is characterized in that further include as described in claim 1-7 any one including FPGA plate ontology A kind of circuit for realizing sparse matrix multiplication operation.
CN201910016870.0A 2019-01-08 2019-01-08 A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate Withdrawn CN109740116A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910016870.0A CN109740116A (en) 2019-01-08 2019-01-08 A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910016870.0A CN109740116A (en) 2019-01-08 2019-01-08 A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate

Publications (1)

Publication Number Publication Date
CN109740116A true CN109740116A (en) 2019-05-10

Family

ID=66363863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910016870.0A Withdrawn CN109740116A (en) 2019-01-08 2019-01-08 A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate

Country Status (1)

Country Link
CN (1) CN109740116A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704799A (en) * 2019-09-06 2020-01-17 苏州浪潮智能科技有限公司 Data processing equipment and system
CN111798363A (en) * 2020-07-06 2020-10-20 上海兆芯集成电路有限公司 Graphics processor
CN112306660A (en) * 2020-11-05 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Data processing method and system based on RISC-V coprocessor
WO2022053032A1 (en) * 2020-09-11 2022-03-17 北京希姆计算科技有限公司 Matrix calculation circuit, method, electronic device, and computer-readable storage medium
WO2022148181A1 (en) * 2021-01-08 2022-07-14 苏州浪潮智能科技有限公司 Sparse matrix accelerated computing method and apparatus, device, and medium
WO2023272917A1 (en) * 2021-06-28 2023-01-05 华中科技大学 Sparse matrix storage and computation system and method
CN117155843A (en) * 2023-10-31 2023-12-01 苏州元脑智能科技有限公司 Data transmission method, device, routing node, computer network and medium

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704799A (en) * 2019-09-06 2020-01-17 苏州浪潮智能科技有限公司 Data processing equipment and system
CN111798363A (en) * 2020-07-06 2020-10-20 上海兆芯集成电路有限公司 Graphics processor
US11409523B2 (en) * 2020-07-06 2022-08-09 Glenfly Technology Co., Ltd. Graphics processing unit
CN111798363B (en) * 2020-07-06 2024-06-04 格兰菲智能科技有限公司 Graphics processor
WO2022053032A1 (en) * 2020-09-11 2022-03-17 北京希姆计算科技有限公司 Matrix calculation circuit, method, electronic device, and computer-readable storage medium
CN112306660A (en) * 2020-11-05 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Data processing method and system based on RISC-V coprocessor
WO2022148181A1 (en) * 2021-01-08 2022-07-14 苏州浪潮智能科技有限公司 Sparse matrix accelerated computing method and apparatus, device, and medium
WO2023272917A1 (en) * 2021-06-28 2023-01-05 华中科技大学 Sparse matrix storage and computation system and method
CN117155843A (en) * 2023-10-31 2023-12-01 苏州元脑智能科技有限公司 Data transmission method, device, routing node, computer network and medium
CN117155843B (en) * 2023-10-31 2024-02-23 苏州元脑智能科技有限公司 Data transmission method, device, routing node, computer network and medium

Similar Documents

Publication Publication Date Title
CN109740116A (en) A kind of circuit that realizing sparse matrix multiplication operation and FPGA plate
Imani et al. Ultra-efficient processing in-memory for data intensive applications
CN107340993B (en) Arithmetic device and method
CN112114776A (en) Quantum multiplication method and device, electronic device and storage medium
WO2022148181A1 (en) Sparse matrix accelerated computing method and apparatus, device, and medium
CN107423816A (en) A kind of more computational accuracy Processing with Neural Network method and systems
GB2580153A (en) Converting floating point numbers to reduce the precision
CN109284824A (en) A kind of device for being used to accelerate the operation of convolution sum pond based on Reconfiguration Technologies
CN113033769B (en) Probabilistic calculation neural network method and asynchronous logic circuit
TW202230165A (en) Device and method of compute in memory
CN113805842A (en) Integrative device of deposit and calculation based on carry look ahead adder realizes
CN108960414A (en) Method for realizing single broadcast multiple operations based on deep learning accelerator
US20170168775A1 (en) Methods and Apparatuses for Performing Multiplication
CN115879530A (en) Method for optimizing array structure of RRAM (resistive random access memory) memory computing system
CN106021188A (en) Parallel hardware architecture and parallel computing method for floating point matrix inversion
CN116362314A (en) Integrated storage and calculation device and calculation method
CN113222129B (en) Convolution operation processing unit and system based on multi-level cache cyclic utilization
US20230253032A1 (en) In-memory computation device and in-memory computation method to perform multiplication operation in memory cell array according to bit orders
Ahn Computation of deep belief networks using special-purpose hardware architecture
CN110245756A (en) Method for handling the programming device of data group and handling data group
CN115481364A (en) Parallel computing method for large-scale elliptic curve multi-scalar multiplication based on GPU (graphics processing Unit) acceleration
CN113378115A (en) Near-memory sparse vector multiplier based on magnetic random access memory
CN114881239A (en) Method and apparatus for constructing quantum generator, medium, and electronic apparatus
CN111221500B (en) Massively parallel associative multiplier-accumulator
CN116151171B (en) Full-connection I Xin Moxing annealing treatment circuit based on parallel tempering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190510

WW01 Invention patent application withdrawn after publication