CN112799635B

CN112799635B - Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation

Info

Publication number: CN112799635B
Application number: CN202110170947.7A
Authority: CN
Inventors: 周晓辉; 袁博; 华诚
Original assignee: Computing Chip Shenzhen Information Technology Co ltd
Current assignee: Computing Chip Shenzhen Information Technology Co ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2022-11-15
Anticipated expiration: 2041-02-08
Also published as: CN112799635A

Abstract

The invention belongs to the field of computer application equipment, and particularly relates to a novel method for solving inner products of a dense matrix and a sparse matrix by means of outer product accumulation. The method comprises the following effective steps: a. firstly, completing reading of original data of a dense matrix A (MxK) and a sparse matrix B (KxN); b. performing line-direction data compression on the sparse matrix B (KxN), and performing line-direction compression on the sparse matrix B (KxN) in two-dimensional layout to obtain a line-direction compressed sparse matrix B (KxN) in one-dimensional layout; c. reading the ith column of the row-wise compressed sparse matrix B (KxN) and reading the ith row outer product operation of the dense matrix A (MxK) to generate ci; d. accumulating the obtained outer product results to obtain an inner product of a dense matrix A (MxK) and a sparse matrix B (KxN); in the step B, the compression characteristic of the sparse matrix B (KxN) is that the first element value of the starting point of each row is retained, and whether the element value is zero or not, the non-starting point element of each row is determined whether to be retained or not according to whether the element value is zero or not, and if the element value is zero, the element is discarded, and if the element value is non-zero, the element is retained.

Description

Novel method for solving inner product of dense matrix and sparse matrix through outer product accumulation

Technical Field

The invention belongs to the field of computer application equipment, and particularly relates to a novel method for solving inner products of a dense matrix and a sparse matrix by means of outer product accumulation.

Background

Matrix inner product multiplication is the basis of algorithms in numerous fields such as numerical calculation, machine learning, engineering simulation and the like, and dense matrix A (MxK) and sparse matrix B (KxN) inner product multiplication is a common matrix inner product calculation scene.

Dense matrix a (MxK) and sparse matrix B (KxN), which are characterized by the presence of a certain number of zero elements in sparse matrix B (KxN). The part of zero elements can still ensure the correctness of the operation result without participating in calculation, and meanwhile, if the zero elements directly abandon storage, the utilization efficiency of the memory can be improved.

However, skipping zero elements in the sparse matrix B (KxN) during computation and storage results in discontinuous memory access, which is not conducive to utilizing cache levels in modern processor architectures. The computational efficiency involved in the inner product of sparse matrices is thus lower than the inner product of dense matrices and dense matrices.

Here, the inner product of the dense matrix a (4 x 4) and the sparse matrix B (4 x 3) is taken as an example, and the sparse matrix B (4 x 3) is a general sparse matrix which is characterized in that the distribution of zero elements has no regularity. FIG. 1 is the first step of solving the inner product calculation of dense matrix A (4 x 4) and sparse matrix B (4 x 3) in the manner of summation of outer products. In this step, first column a1 (4 x 1) of dense matrix a (4 x 4) and first row B1 (1 x 3) of sparse matrix B (4 x 3) are taken out to perform outer product operation to obtain result c1, and so on, column 2 of dense matrix a and row 2B 2 of sparse matrix B also perform outer product operation to obtain result c2. The dense matrix A (4 x 4) has four columns, the sparse matrix B (4 x 3) has four rows, and the outer products of the columns and the rows in one-to-one correspondence are c1 (4 x 3), c2 (4 x 3), c3 (4 x 3) and c4 (4 x 3). And adding corresponding elements of the four outer product results of c1, c2, c3 and c4 to obtain an inner product result. In general matrix inner product multiplication, the value of the target matrix (i, j) position is obtained by inner product of the ith row of the dense matrix a (4 x 4) and the jth column of the sparse matrix B (4 x 3). The calculation of the whole inner product is completed, and the sparse matrix B (4 x 3) needs to be repeatedly read and calculated by the dense matrix A (4 x 4) for four times.

Fig. 2 illustrates the outer product calculation logic of the first column a1 (4 x 1) of the dense matrix a (4 x 4) and the first row B1 (1 x 3) of the sparse matrix B (4 x 3). The outer product of the row vector a1 (4 x 1) and the row vector b1 (1 x 3) forms a matrix c1 of 4 rows and 3 columns. The calculation process is equivalent to multiplying each element of the row vector a1 by the row vector b1. The computation of the ith column ai (4 x 1) of the dense matrix A (4 x 4) and the ith row bi (1 x 3) of the sparse matrix B (4 x 3) is equivalent to the logic of the c1 computation. A zero element appears at a certain position in a certain row of the sparse matrix B (4 x 3), and the influence on the calculation result is that a zero vector is formed in the row direction at the position.

Fig. 3 illustrates the process of four matrix additions of the outer product results c1, c2, c3, c4 described above, which follow the process of adding corresponding elements in mathematics. The above is a description of the process of solving the inner product of the dense matrix a (4 x 4) and the sparse matrix B (4 x 3) by the outer product accumulation method. It can be seen that in the process of solving the inner product of the dense matrix a (4 x 4) and the sparse matrix B (4 x 3) by the outer product accumulation, a row-wise zero vector generated by multiplying one column of the dense matrix a (4 x 4) by a zero element in a certain row of the sparse matrix B (4 x 3) is not specially considered. If the line direction zero vector is not calculated, the line direction zero vector also participates in the subsequent accumulation process, and the correctness of the inner product result is not influenced. However, in the existing outer product accumulation calculation, zero elements in the sparse matrix participate in the outer product calculation, which not only reduces the calculation efficiency of the outer product calculation, but also increases the requirement of writing zero values into the memory after the outer product calculation, thereby reducing the effective bandwidth of the memory writing.

Disclosure of Invention

Aiming at the technical problems of the inner product multiplication of the dense matrix A (4 x 4) and the sparse matrix B (4 x 3), the invention provides a novel method for solving the inner product of the dense matrix and the sparse matrix through outer product accumulation, which is simple in method, can effectively avoid invalid operation and save memory access.

In order to achieve the above object, the technical solution adopted by the present invention is that the present invention provides a novel method for solving inner products of dense matrix and sparse matrix by means of outer product accumulation, which comprises the following effective steps:

a. firstly, completing reading of original data of a dense matrix A (MxK) and a sparse matrix B (KxN);

b. performing line-direction data compression on the sparse matrix B (KxN), and performing line-direction compression on the sparse matrix B (KxN) in two-dimensional layout to obtain a line-direction compressed sparse matrix B (KxN) in one-dimensional layout;

c. reading the ith row of the row-wise compressed sparse matrix B (KxN) and reading the ith row outer product operation of the dense matrix A (MxK) to generate ci;

d. accumulating the obtained outer product results to obtain an inner product of a dense matrix A (MxK) and a sparse matrix B (KxN); in the step B, the compression characteristic of the sparse matrix B (KxN) is that the first element value of the starting point of each row is retained, and whether the element value is zero or not, the non-starting point element of each row is determined whether to be retained or not according to whether the element value is zero or not, and if the element value is zero, the element is discarded, and if the element value is non-zero, the element is retained.

Preferably, the method also comprises a step e, judging whether the last column of the sparse matrix B (KxN) is finished or not, and if so, finishing the calculation; if not, increasing i by 1 and repeating the steps c to e.

Preferably, in the step B, the row-wise compressed sparse matrix B (KxN) includes a total length after matrix compression, a column index array and a compressed element value array, wherein the total number of elements of the compressed total length sparse matrix B (4 x 3) is the sum of the numbers of elements reserved; the column index array is used for representing column index values of elements reserved in each row under an original two-dimensional layout; the array of compressed element values is used to sequentially fill in the compressed element values in column order.

Compared with the prior art, the invention has the advantages and positive effects that,

1. the invention provides a novel method for solving inner products of a dense matrix and a sparse matrix by accumulating outer products, which is characterized in that a sparse matrix B (KxN) is compressed in a row direction, a first element and other non-zero elements in each row are reserved, the positions of the compressed elements are conveniently positioned, and meanwhile, a large number of zero elements are removed, so that invalid numerical value calculation is avoided, the aims of saving memory access and invalid operation are fulfilled, the operation efficiency is improved, and the guarantee is provided for the quick operation of numerical value calculation, machine learning and engineering simulation related technologies.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive labor.

FIG. 1, example of inner product multiplication solved by outer product accumulation of dense matrix A (4 x 4) and sparse matrix B (4 x 3)

FIG. 2, example of performing outer product computation of the first column of dense matrix A (4 x 4) and the first row of sparse matrix B (4 x 3)

FIG. 3, example of accumulation of outer product results of dense matrix A (4 x 4) and sparse matrix B (4 x 3)

FIG. 4 is the conversion process of the sparse matrix B (4 x 3) simplified row to the compressed storage mode

FIG. 5, sparse matrix B (4 x 3) reduced row-wise compressed column index array logic example

FIG. 6 is a diagram illustrating an example of the column and row correspondence relationship of the outer product operation of the one-dimensional array formed by the row-wise compression of the dense matrix A (4 x 4) and the sparse matrix B (4 x 3)

FIG. 7 is an example of a process for performing outer product accumulation of columns of dense matrix A (4 x 4) and reduced rows of sparse matrix B (4 x 3) into a one-dimensional array of rows

FIG. 8 is a flow chart of the accumulation of the outer products of the dense matrix A (MxK) and the reduced row-wise compressed sparse matrix B (KxN).

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, the present invention will be further described with reference to the accompanying drawings and examples. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and thus the present invention is not limited to the specific embodiments of the present disclosure.

The embodiment aims to solve the problems that the inner product is large in calculation amount and the zero element calculation is repeated when the inner product is solved by the accumulation of the sparse matrix B (KxN) and the dense matrix A (MxK). For this reason, the example of solving inner product multiplication by outer product accumulation of the sparse matrix B (4 x 3) and the dense matrix a (4 x 4) provided in this embodiment is the same as the background art, and the sparse matrix B (4 x 3) and the dense matrix a (4 x 4) are used in this embodiment to describe the technical solution provided in this embodiment, so as to better understand the technology provided in the present invention.

Firstly, reading a sparse matrix B (4 x 3) and a dense matrix A (4 x 4) of original data is completed, the specific values of which are shown in figure 1,

fig. 4 illustrates a conversion process 400 of sparse matrix B (4 x 3) according to the SCSR storage scheme. 400 involving a two-dimensional layoutA sparse matrix B (4 x 3) 401 and a one-dimensional layout 402 of the compacted row-wise compressed sparse matrix B (4 x 3) in the memory. 401 non-zero elements are represented by "b _ij The "representation, the zero element is represented by a black square" \9632; ". The column index and row index may start from 0 or 1, without affecting the implementation of the method.

In 402, the compressed total length 403 of the matrix, the column index array 404 and the compressed element value array 405 form all components of the row direction compressed storage of the sparse matrix B (4 x 3), and the memory address 406 is used for assisting the explanation and is not needed in the actual algorithm implementation. Dashed arrows in 400 are used to indicate the positions of the elements in 401 mapped in 402 after being stored in compact row-wise compression.

The method aims at the compression characteristic of the sparse matrix and is characterized in that the first element value of the starting point of each row is reserved, and whether the element value is zero or not, the non-starting point element of each row is judged whether the element value is zero or not. If the value is zero, discarding, and if the value is non-zero, keeping. Compared with a general line compression method, the method of the invention calculates the range of the memory area where a line element is compressed by judging whether the value in the row index array 404 is zero or not. The first element of each row is retained, regardless of whether the element is zero or not, other elements of each row are normally processed, and zero elements are discarded. This has the advantage that the position of the compressed element can be located using only the column index array. The data structure storing the compression results is also a simple point. The elements are conveniently positioned.

The total length 403 after matrix compression represents the sum of the number of elements retained after compression of the sparse matrix B (4 x 3). The total length 403 after matrix compression ensures that no over-bound access is made when accessing the memory space of the column index array 404 and the compressed element value array 405.

The row index array 404 records the elements remaining in each row, which have the row index values under the original two-dimensional layout 401. The compressed element value array 405 sequentially stores compressed element values in the sparse matrix B (4 × 3) at consecutive memory addresses. The nth address offset position addrN of the column index array 404 stores a value indicating the column number of the compressed element value array 405 under the two-dimensional layout of the sparse matrix B (4 x 3) for the element value stored at the same address offset position addrN.

Although the first row and first column element of the two-dimensional layout 401 of the sparse matrix B (4 x 3) in fig. 4 is zero, it is still retained and placed at the first position of the compressed element value array 405 in the step of compressed storage, and its column index value0 is also indicated by the first position of the column index array 404, and the specific mapping relationship is indicated by the arrow 410. Arrow 411 indicates the mapping relationship of the second non-zero element in the first row of the sparse matrix B (4 x 3). The third element of the first row is a zero value, which is discarded and not mapped into the column index array 404 and the compressed element value array 405. The first element in the second row of sparse matrix B (4 x 3) is a non-zero value, requiring retention in the array of column indices 404 and the array of compressed element values 405, as indicated by arrow 412.

In the column index array 504 of fig. 5, the value stored in the first address addr0 is rvalue0, the actual value of rvalue0 is 0 in this example, the value stored in the second address addr1 is rvalue1, the actual value of rvalue1 is 1 in this example, the value stored in the third address addr2 is rvalue2, and the actual value of rvalue2 is 0 in this example. Since the actual value of rvalue2 is 0, and the column index value of each row has and only one element is zero, the position represented by rvalue2 is the start of a new row.

The range of values of rvalue in the column index array 504 between two adjacent 0 values constitutes the data range of each row after compression. The Nth occurrence of rvalue is a value of 0, indicating that the column index following the value of 0 is the Nth row-column index data.

For the sparse matrix B (4 x 3) of this example, the reduced compressed element range of the sparse first row, which originally contained zero elements, is: greater than or equal to rvalue0 and less than rvalue2; the range of elements after compaction compression of the sparse second line, which originally contained zero elements, is greater than or equal to rvalue2 and less than rvalue4.

The last line, line K, of the sparse matrix B (KxN), the element range after compaction is: the memory address offset corresponding to the 0 value appearing at the Kth time in the row index array 504 is larger than or equal to the total length 503 after matrix compression.

Fig. 6 illustrates a correspondence relationship between the ith column of the dense matrix a (4 x 4) and the ith row of the reduced row-wise compressed sparse matrix B (4 x 3) to perform an outer product operation. The first row of the one-dimensional layout compressed element array 605 of the compressed sparse matrix B (4 x 3) to which the first column 606 of the dense matrix a (4 x 4) performs the outer calculation is indicated by the arrow 610 to the extent. In the same way the second row of the compressed sparse matrix B (4 x 3) corresponding to the second column 607 of the dense matrix a (4 x 4) is indicated by arrow 611. The third and fourth rows of the compressed sparse matrix B (4 x 3) corresponding to the third and

fourth columns

608 and 609 of the dense matrix a (4 x 4) are indicated by

arrows

612 and 613, respectively.

Fig. 7 shows that the outer product computation 722 is performed on the first column 706 of the dense matrix a (4 x 4) and the first row 710 of the row-wise compressed sparse matrix B (4 x 3) to generate the outer product computation result 714, where the outer product computation result 714 includes a result 712 of performing the outer product on the first column 706 of the dense matrix a (4 x 4) and the first row 710 of the row-wise compressed sparse matrix B (4 x 3), and the outer product computation result 714 includes a result 713 of performing the outer product on the second row 710 of the row-wise compressed sparse matrix B (4 x 3) on the first column 706 of the dense matrix a (4 x 4) in the outer product computation result 714. Although the first element of the first row 710 of the row-wise and row-wise compressed sparse matrix B (4 x 3) is zero, the outer product operation is still performed in the method of the present invention

The outer product calculation result 714 is equivalent to c1 in fig. 3, the outer product calculation result 714 is mapped into a two-dimensional row-column layout of c1 (4 x 3), an intermediate result 718 of the inner product is formed, a memory region value of a zero value generated by the outer product calculation 722 in the intermediate result 718 is represented by "0", and a memory region of a zero value not generated by the outer product calculation is represented by a black block "\9632;".

Similarly, the second column 707 of the dense matrix a (4 x 4) and the second row 711 of the row-wise compressed sparse matrix B (4 x 3) perform an outer product calculation 722 to generate an outer product calculation result 717, and the outer product calculation result 717 is equivalent to c2 in fig. 3. The outer product computation result 717 is mapped into a two-dimensional c2 (4 x 3) row-column layout, forming an inner product intermediate result 719. The intermediate result of the inner product 718 and the intermediate result of the inner product 719 perform a matrix addition 720 and the corresponding position elements are added to form an intermediate result of the inner product 721. The intermediate inner product result 721 is equivalent to c1 (4 x 3) + c2 (4 x 3). In fig. 6, 608 and 612 perform the outer product calculation to generate c3 (4 x 3), and 609 and 613 perform the outer product calculation to generate c4 (4 x 3). The above C3 (4 x 3) and C4 (4 x 3) are added to the intermediate inner product result 721, so as to form a complete inner product calculation result C (4 x 3) = a (4 x 4) B (4 x 3) = C1 (4 x 3) + C2 (4 x 3) + C3 (4 x 3) + C4 (4 x 3).

In the 710 and 711 compressed rows, after the sparse matrix B (4 × 3) is compressed in a reduced row direction, the third zero element of the first row and the third zero element of the second row are discarded. Compared with the situation that the sparse matrix B (4 x 3) in FIG. 2 is not subjected to compaction compression, the memory storage of the third zero element in the first row is reduced, the outer product calculation is performed, the proportion of invalid operation is reduced, and the proportion of valid calculation is improved.

FIG. 8 illustrates a data flow 800 involved in the present invention, wherein the original data reading of the dense matrix A (MxK) and the sparse matrix B (KxN) is first performed by block 801. The 802 module performs reduced row-wise data compression of the sparse matrix B (KxN), the specific row-wise data compression process being illustrated in fig. 4-5. Next, outer product calculation is performed, and the simplified row is read to the ith row of the compressed sparse matrix B (KxN) by the 803 module and the ith column of the dense matrix a (MxK) is read by the 804 module to generate ci, respectively, and the outer product calculation of the columns and the rows is illustrated in fig. 7. The outer product accumulation process is then completed by 806, which is to repeatedly accumulate the current result in the same block memory space and superimpose the current result on the value of the memory space in which the previous calculation result was located. And finally judging whether the last row of the sparse matrix B (KxN) is finished or not. If yes, finishing the calculation; if not, then block i is incremented by 1, 808, and the module returns to 803 to perform 803-807 in turn.

The above description is only a preferred embodiment of the present invention, and not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or changes to the equivalent embodiments with equivalent changes, without departing from the technical spirit of the present invention, and any simple modification, equivalent change and change made to the above embodiments according to the technical spirit of the present invention still belong to the protection scope of the technical spirit of the present invention.

Claims

1. A novel method for solving inner products of a dense matrix and a sparse matrix by accumulation of outer products is characterized by comprising the following effective steps:

c. reading the ith column of the row-wise compressed sparse matrix B (KxN) and reading the ith row outer product operation of the dense matrix A (MxK) to generate ci;

2. The novel method for solving inner products of dense matrix and sparse matrix by accumulating outer products according to claim 1, further comprising e step, e, judging whether the last column of sparse matrix B (KxN) is finished, if yes, finishing the calculation; if not, increasing the value of i by 1, and repeating the steps c to e.

3. The method for solving the inner product of the dense matrix and the sparse matrix through the novel outer product accumulation as claimed in claim 2, wherein in the step B, the row-wise compressed sparse matrix B (KxN) comprises a total length after matrix compression, a column index array and a compressed element value array, wherein the total number sum of elements of the compressed total length sparse matrix B (4 x 3) is retained; the column index array is used for representing column index values of elements reserved in each row under an original two-dimensional layout; the compressed element value array is used for sequentially filling compressed element values according to the column sequence.