CN115309333A - Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof - Google Patents

Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof Download PDF

Info

Publication number
CN115309333A
CN115309333A CN202210931011.6A CN202210931011A CN115309333A CN 115309333 A CN115309333 A CN 115309333A CN 202210931011 A CN202210931011 A CN 202210931011A CN 115309333 A CN115309333 A CN 115309333A
Authority
CN
China
Prior art keywords
sparse
matrix
strip
data storage
storage format
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210931011.6A
Other languages
Chinese (zh)
Inventor
陈长波
刘丽
杨文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN202210931011.6A priority Critical patent/CN115309333A/en
Publication of CN115309333A publication Critical patent/CN115309333A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0661Format or protocol conversion arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a data storage format of a strip sparse matrix and a multiplication acceleration method thereof, belonging to the field of matrix data storage formats and high-performance multiplication calculation, wherein the data storage format of the strip sparse matrix is provided by utilizing a special structure of a sparse zone of a matrix (namely column subscripts of all rows in the same sparse zone are the same), so that redundant storage of subscripts of the same column in the same sparse zone is avoided, and the data storage format is an efficient compressed storage format; meanwhile, based on the data storage format, the invention provides a multiplication acceleration method of the data storage format of the strip-shaped sparse matrix, which realizes the multiplexing of the rows of the dense matrix, reduces the consumption of overall memory data transmission and improves the calculation efficiency.

Description

Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof
Technical Field
The invention relates to a realization method for improving strip-shaped sparse matrix multiplication, belongs to the field of matrix data storage formats and high-performance multiplication calculation, and particularly relates to a data storage format of a strip-shaped sparse matrix and a multiplication acceleration method thereof.
Background
A sparse matrix is a matrix whose elements are mostly zero, while the proportion of non-zero elements is very small, often less than 1% of the total. The sparse matrix multiplication is widely applied to the fields of large-scale scientific calculation, deep learning, graphic analysis, economic modeling and the like. The sparse matrix multiplication problem in the deep learning field is generally expressed as sparse-dense matrix multiplication (SpMM), and is generally expressed as a process of solving a matrix C with a size of m × n given a sparse matrix a with a size of m × k and a dense matrix B with a size of k × n. In the specific context of deep learning, efficient sparsity multiplication has become one of the main methods to improve training and reasoning performance and to reduce model size while maintaining accuracy. In particular, in Graph Neural Networks (GNNs), the structure of the input graph is represented as an adjacency matrix or adjacency table, usually with sparsity. SpMM is a time-consuming step as a main operation in GNN even on parallel hardware such as a GPU, and has great significance for accelerating SpMM algorithm in order to realize high performance in training and reasoning of GNN application programs.
In order to save storage space and reduce the number of matrix multiplications, a sparse matrix is usually stored by a compression method, that is, only non-zero elements in the matrix are stored, the overall performance of matrix multiplication can be improved by this efficient storage method, and the most common storage format is a CSR method, see document [1]. Let the size of the sparse matrix a be m × k and the number of non-zero elements contained be nnz. The CSR format compresses non-zero elements by row, requiring three arrays to represent matrix a:
val [ nnz ]: storing the value of each non-zero element;
colInd [ nnz ]: storing the column index of each non-zero element;
rowPtr [ m +1]: the starting offset positions of the 1 st non-zero element of each row at val and colInd, i.e. the total number of non-zero elements preceding the row, are stored, so that the last element represents the total number nnz of non-zero elements of the matrix.
The core code for implementing the SpMM based on the CSR format is as follows:
Figure BSA0000280261380000011
Figure BSA0000280261380000021
with the development of the technology, the existing sparse matrix multiplication based on the GPU mainly adopts a blocking technology for improving the sparse matrix storage format, so as to gather data into dense blocks and then perform calculation. Currently, regarding the sparse matrix storage format, a common sparse matrix blocking storage format for general purpose processors is the BCSR format proposed in Vuduc 2005. The BCSR format is an extension of CSR, the storage mode is that the sparse matrix as a whole is divided into the whole matrix according to the blocks with fixed size, the size of each block can be defined as r × c, and the sparse matrix is divided into the blocks according to the rows
Figure BSA0000280261380000022
The blocks of (a) are stored in a CSR manner, the overall storage being substantially the same as the CSR, wherein,
Figure BSA0000280261380000023
indicating rounding up. BCSR has the advantage of data continuity compared with CSR, and the parallel optimization efficiency is improved by the fixed-size blocks, thereby being beneficial to realizing load balance. However, BCSR treats each block as a dense matrix during storage and stores it as a dense matrix, thus explicitly introducing zero elements, especially when non-zero elements are distributed more dispersively in the sparse matrix, with storage cost and performance penalty. And BCSR has the following limitations when storing non-zero sub-blocks: (1) the row and column value of the index coordinate must be multiple of the row and column size of the block, otherwise, the whole row or the zero element of the whole row needs to be filled; (2) since the entire matrix uses a single tile size, dense tiles of small original matrices must be filled with more zeros to maintain the structure of the tiles.
[1]M.Naumov,L.Chien,P.Vandermersch,and U.Kapasi:Cusparse library.2010.
[2]Huang,G.,et al.,GE-SpMM:General-purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks.2020.
[3] Website address: https: // docs.nvidia.com/cuda/cusparse
Disclosure of Invention
Sparse matrices have a common matrix with a local feature structure: a banded sparse matrix, i.e. in one sparse matrix, there are several sparse bands, as in fig. 1. In view of this, the present invention provides a data storage format (band sparse matrix format) of a banded sparse matrix, which is an improvement on the sparse matrix storage format CSR, according to the property of the banded sparse matrix: the BRCV aims to reduce redundant storage space to a certain extent, and meanwhile compared with a CSR format, the efficient calculation of a banded sparse matrix SpMM algorithm on a GPU is achieved.
In order to achieve the purpose, the invention provides the following technical scheme:
a data storage format of a strip sparse matrix is characterized by comprising five arrays: b. rc, rv, c, v for sparse bands containing p { D 1 ,…,D p An m-k dimensional strip-shaped sparse matrix A, b is a height accumulated value of a sparse band, rc is an accumulated value of a non-zero element number of a first row of the sparse band, rv is an accumulated value of the total number of all non-zero elements of the sparse band, c is a column subscript of the non-zero element of the first row of each sparse band, and v is a value of each non-zero element in the strip-shaped sparse matrix A; the sparse zone is a plurality of rows with the same number of non-zero elements of a plurality of continuous rows in the matrix and the same corresponding column subscripts.
Further, the storage format is selected by comparing the number p and the dimension m of the sparse bands of the sparse matrix A in advance, when p is less than (2 m-1)/4, the data storage format of the banded sparse matrix is adopted for processing, otherwise, the CSR format is adopted for processing.
Method for data storage format of a banded sparse matrix a, characterized in that it comprises the following steps:
s31: traversing the banded sparse matrix A, and sequentially identifying p sparse bands { D 1 ,…,D p At D 1 Front-inserting an empty sparse band D 0 With which the array sparse band { D is formed 0 ,D 1 ,…,D p };
S32: traversing the array sparse zone to obtain D 0 To D i Are added to obtain b [ i ] respectively]A value of (d); further, a p +1 dimensional array b = { b [ i } is obtained]|0≤i≤p};
S33: traversing array sparse zone, and dividing D 0 To D i The non-zero element numbers of the first line are accumulated to respectively obtain rc [ i]A value of (d); then, an array rc = { rc [ i ] of p +1 dimensions is obtained]|0≤i≤p};
S34: traversing the previous p groups of sparse bands to obtain a sparse band D 0 To D i The total number of non-zero elements of all the rows are accumulated to respectively obtain rv [ i]To obtain a p-dimensional array rv = { rv [ i) }]|0≤i≤p-1};
S35: traverse sparse bands { D) of banded sparse matrix A 1 ,…,D p Sequentially extracting column subscripts of all non-zero elements in a first row of each sparse band, and sequentially storing the column subscripts into an array c;
s36: traversing the strip-shaped sparse matrix A, and recording the value v [ i ] of each non-zero element, thereby obtaining an array v = { v [ i ] |1 is not less than i and not more than nnz }, wherein nnz is the total number of the non-zero elements of the strip-shaped sparse matrix;
s37: and packing the five arrays of b, rc, rv, c and v to serve as a data storage format of the strip-shaped sparse matrix A.
For multiplication C = a × B, where B is a dense matrix of dimensions k × n, a multiplication acceleration method applied to a data storage format of a strip-like sparse matrix a, characterized by comprising the steps of:
s41: from the first sparse band D of the banded sparse matrix A 1 Initially, from array b, the ith sparse band D is calculated i Corresponding height h i =b[i]-b[i-1];
S42: according to the array rc, calculating the ith sparse band D i The corresponding number of non-zero elements per row T = rc [ i [ ]]-rc[i-1];
S43: traverse the ith sparse band D i For T non-zero elements in each row, obtaining the value of each non-zero element according to the array v, and respectively calculating the column subscript col according to the array c, thereby obtaining the value of the element on the col row corresponding to the matrix B which is multiplied by the non-zero element;
s44: will sparsely band D i The corresponding row non-zero element is multiplied with the value of the column element on the col row corresponding to the matrix B, the obtained products are accumulated, and D is obtained i The product of each row in (a) and B;
s45: repeating the steps S41 to S44, traversing all sparse bands { D ] in the strip-shaped sparse matrix A 1 ,…,D p Get sparse band D 1 ,…,D p And accumulating the product of the matrix C and the product of the matrix B to obtain the value of the matrix C.
The multiplication acceleration method of the data storage format of the strip sparse matrix applied to the matrix multiplication C = A × B comprises the following specific algorithms:
Figure BSA0000280261380000031
Figure BSA0000280261380000041
from a storage complexity perspective, the storage space of the CSR format is related to the sparse matrix row size m and the total number of non-zero elements nnz of the entire matrix. For a matrix with sparsity, the CSR format needs to store the column index of each non-zero element, however, the column indices of the non-zero elements between rows in a band are all the same, thus causing redundant storage; the data storage format of the strip-shaped sparse matrix provided by the invention only needs to store the column subscript of the first row of the non-zero elements in the same strip, so that the storage space of the column subscript can be reduced, especially for the condition that the height of the strip is higher, and the storage complexity of the sparse matrix is further reduced. Specifically, for a sparse matrix with a strip structure, if the number of non-zero elements per row is nz, the storage complexity of the CSR format is: 2nnz + m +1=2m + m + 1; the storage complexity of the data storage format of the strip sparse matrix is as follows: (p + 1) + (p + 1) + p + nz + m + nz = m + nz + p + nz +3p +2 (2). Obviously, by taking the difference, (1) - (2) = (m-p) × nz + m-3p-1 (3), it can be seen from the equation (3) that the storage complexity of the CSR format is higher than that of the BRCV format when p ≦ m and nz is larger; on the other hand, assuming nz =1, equation (3) can be reduced to 2m-4p-1, so that when p is much smaller than m, the required storage complexity of the CSR format is higher than that of the BRCV format.
Further, sparse band D i The multiplication of the corresponding row nonzero elements and the values of the column elements on the first col row corresponding to the matrix B are independent from each other, and the parallel calculation mode can be adopted for acceleration. Meanwhile, the products of all sparse bands and B in the banded sparse matrix A are mutually independent in the process, and the acceleration can also be carried out in a parallel computing mode.
Further, due to the sparse band { D) under the banded structure 1 ,…,D p The column index of each row in the row is the same, and the ith sparse band D can be determined by accessing the rc array i Column index of each non-zero element, thereby determining the calculation D i And reading the row information of the dense matrix B required by multiplication of all the continuous rows at one time, thereby realizing row multiplexing of the matrix B with the same steps S43-S45 and improving the calculation efficiency.
The invention has the beneficial effects that: the invention provides a data storage format of a strip-shaped sparse matrix and a multiplication acceleration method thereof, wherein the data storage format of the strip-shaped sparse matrix is provided by utilizing a special structure of a sparse zone of the matrix, and is a high-efficiency compressed storage format; meanwhile, based on the data storage format, the invention provides a multiplication acceleration method of the data storage format of the strip-shaped sparse matrix, which realizes the multiplexing of the rows of the dense matrix, reduces the consumption of overall memory data transmission and improves the calculation efficiency.
Drawings
For the purpose and technical solution of the present invention, the present invention is explained by the following drawings:
FIG. 1 is a schematic diagram of a banded sparse matrix, wherein the black portions are sparse bands and the blank portions are 0;
fig. 2 is an example of a band-like sparse matrix a of embodiment 1 of the present invention;
FIG. 3 is a flow chart of a data storage format of a banded sparse matrix of the present invention;
FIG. 4 is a flowchart of a multiplication acceleration method for a data storage format of a strip-like sparse matrix according to the present invention;
fig. 5 is a BCSR division example of the banded sparse matrix a in embodiment 1 of the present invention;
FIG. 6 is a large-scale banded sparse matrix format diagram in example 2 of the present invention;
fig. 7 is a graph of performance comparison experiment results of NVIDIA GPU RTX2060 in embodiment 2 of the present invention with matrix dimension m = k = n = 1024; wherein the number of non-zero elements per row of the sparse matrix is 2 i (32≤2 i ≤512);
Fig. 8 is a graph of performance comparison experiment results of NVIDIA GPU RTX2060 in embodiment 2 of the present invention with matrix dimensions of m = k = n = 2048; wherein the number of non-zero elements per row of the sparse matrix is 2 i (32≤2 i ≤1024);
Fig. 9 is a graph of performance comparison experiment results of NVIDIA gpu rtx3080 in the environment of example 2 of the present invention, where the matrix dimension is m = k = n = 1024; wherein the number of non-zero elements per row of the sparse matrix is 2 i (32≤2 i ≤512);
Fig. 10 is a graph of performance comparison experiment results of NVIDIA GPU RTX3080 in embodiment 2 of the present invention with matrix dimensions m = k = n = 2048; wherein the number of non-zero elements per row of the sparse matrix is 2 i (32≤2 i ≤1024)。
Detailed Description
In order to make the technical solutions, objects of the embodiments, and advantages of the system architectures, etc. more apparent, the preferred embodiments of the present invention will be described in detail below with reference to fig. 2 and 4.
Example 1: for the 7 × 7 dimensional strip-shaped sparse matrix a shown in fig. 2 and for multiplication C = a × B, where B is a 7 × n dimensional dense matrix, in order to improve the storage efficiency and the calculation efficiency of multiplication with other matrices, the present invention provides "a data storage format of a strip-shaped sparse matrix and a multiplication acceleration method thereof", in conjunction with fig. 3, the specific steps are:
step 1: traversing the banded sparse matrix A, and sequentially identifying 2 sparse bands { D } 1 ,D 2 At D 1 Front-inserting an empty sparse band D 0 With which the array sparse band { D is formed 0 ,D 1 ,D 2 };
And 2, step: traversing array sparse zone, and dividing D 0 To D 0 、D 0 To D 1 、D 0 To D 2 Are added to obtain b [ 0] respectively]、b[1]And b [ 2]]And thus an array b = [0,4,7 ] is obtained];
And 3, step 3: traversing the array sparse zone to obtain D 0 To D 0 、D 0 To D 1 、D 0 To D 2 The non-zero element numbers of the first row are accumulated to respectively obtain rc [0]、rc[1]And rc [ 2]]To obtain the array rc = [0,3,5 =];
And 4, step 4: traversing the first 2 groups of sparse zones, and taking the sparse zone D 0 To D 0 And D 0 To D 1 The total number of non-zero elements of all the rows is accumulated to respectively obtain rv [0]And rv [1]]To obtain the array rv = [0, 12 =];
And 5: sparse band { D) for traversing banded sparse matrix 1 ,D 2 And sequentially extracting column subscripts of all non-zero elements in the leading line of each sparse band, and sequentially storing the column subscripts into an array c to further obtain an array c = [1,5,6,2, 4]];
Step 6: the strip-shaped sparse matrix A is traversed, and the value of each non-zero element is recorded, so that an array v = [2,7,3, 2,8,2, 1,4,1,3,5,1, 2] is obtained.
And 7: and packaging the five arrays of b, rc, rv, c and v to obtain the data storage format of the strip-shaped sparse matrix a in the embodiment.
In order to better show the advantages of the method of the present invention, this embodiment 1 further provides corresponding storage representations in the CSR format and the BCSR format, respectively.
With reference to fig. 2, when the banded sparse matrix a is stored in the CSR data format, the CSR data storage format is represented as follows with reference to the related introduction of the background art: rowPtr [8] = [0,3,6,9, 12, 14, 16, 18]; colInd [18] = [1,5,6,2, 4]; val [18] = [2,7,3, 2,8,2, 1,4,1,3,5,1, 2].
Further, when the strip-shaped sparse matrix a adopts a BCSR data storage format, the division manner is as shown in fig. 5, and the BCSR data storage format can be expressed as: rowPtr [5] = [0,3,6,8, 10]; colInd [10] = [0,2,3, 1,2]; <xnotran> val [40] = [0,2,0,3,0,7,0,2,3,0,8,0,0,2,0,4,0,2,0,1,1,0,4,0,1,0,5,0,3,0,1,0,2,0,0,0,2,0,0,0]. </xnotran>
Therefore, the CSR format needs 44 data type spaces to store the array information, the BCSR format needs 55 data type spaces to store the array information, and the data storage format of the banded sparse matrix provided by the invention only needs 31 data type spaces to store the array information. Therefore, the data storage format of the strip sparse matrix adopting the method of the invention is better to process, and the requirement of p =2 < (2 x 7-1)/4 is met.
Further, with reference to fig. 4, a sparse dense matrix multiplication (C = a × B) of a data storage format of a strip-shaped sparse matrix is adopted, and the specific implementation steps are as follows:
step (1): when calculating the jth column of the matrix C, from the first sparse band D of the banded sparse matrix A 1 Initially, calculate D from array b 1 Corresponding height h 1 =b[1]-b[0]=4, i.e. D 1 There are 4 rows; according to the array rc, D is calculated 1 The corresponding number of non-zero elements in each row is T 1 =rc[1]-rc[0]=3;
Step (2): traverse sparse zone D 1 Calculating D from the array c and the array rc 1 Column subscript of first row first non-zero element: col 1 =c[0+rc[0]]=1; calculate the column index for the second non-zero element: col 2 =c[1+rc[0]]=5; calculate the column index of the third non-zero element: col 3 =r[2+rc[0]]=6, thus obtaining the second col of the matrix B multiplied by these non-zero elements respectively i The value of the row element;
and (3): traversing sparse zone D 1 For the first row, first an intermediate result is initialized, according to the arrays rc and T 1 The total number of non-zero elements of all lines preceding this line is calculated: r 1 =rv[0]+0*T 1 =0. Diluting the stripThe first non-zero element v of the first row of the sparse matrix A 1 =v[R 1 +0]=2 and B [ col 1 ][j]Multiplying, and accumulating the product result to result; second non-zero element v 2 =v[R 1 +1]=7 and B [ col 2 ][j]Multiplying, and accumulating the result to result; a third non-zero element v 3 =v[R 1 +2]=3 and B [ col 3 ][j]Multiplying, accumulating the result to finally obtain C0][j]= result; for D 1 The second, third and fourth lines of (1) are calculated in the same way to obtain C1][j]、C[2][j]、C[3][j]The value of (c).
And (4): repeating the steps (1) to (3), and traversing the residual sparse zone D of the strip-shaped sparse matrix A 2 Calculating D from the array b 2 Corresponding height h 2 =b[2]-b[1]=3, i.e. D 2 There are 3 rows; according to the array rc, D is calculated 2 The corresponding number of non-zero elements per line is T 2 =rc[1]-rc[0]=3;
And (5): traverse sparse zone D 2 Calculating D from the array c and the array rc 2 Column subscript of first row first non-zero element: col 1 =c[0+rc[1]]=2; calculate the column index for the second non-zero element: col 2 =c[1+rc[1]]=4;
And (6): traversing sparse zone D 2 For the first row (i.e., the fifth row of A), the intermediate result is first initialized, according to the arrays rc and T 2 The value of (d) calculates the total number of non-zero elements of all lines preceding this line: r is 2 =rv[0]+0*T 2 And =12. Taking the first non-zero element v of the fifth row of the strip-shaped sparse matrix A 1 =v[R 2 +0]=2 and B [ col 1 ][j]Multiplying, and accumulating the product result to result; second non-zero element v 2 =v[R 2 +1]=7 and B [ col 2 ][j]Multiplying, accumulating the result to finally obtain C4][j]= result; for D 21 The same calculation is carried out on the second row and the third row of the above-mentioned method, and C5 can be obtained][j]And C6][j]The value of (c).
And (7): j starts at 0 and loops through each column until n-1, resulting in the value of matrix C.
Example 2: in order to better show the advantages of the method, the invention aims atThe present embodiment is intended to perform calculation of matrix multiplication C = a × B using band-like sparse matrices a of different dimensions (m = k = n =1024, m = k = n = 2048), where the band-like sparse matrix a is randomly generated according to the format shown in fig. 6, and 2 random band-like sparse matrices are randomly generated for each row of the band-like sparse matrix a in consideration of the influence of sparsity 5 ~2 10 A non-zero element, B is a dense matrix generated randomly. The method of the present invention is now used to perform a floating-point operation capability comparison test on a GPU as compared to prior art methods, wherein the prior art methods include: (ii) SpMM Algorithm based on CSR Format [ 2]]The method is marked as CSR in the experimental result chart (FIG. 7-FIG. 10); reference is made to document [3] for an SpMM algorithm implemented by the CuSPARSE library provided by NVIDIA]The experimental result chart (fig. 7 to fig. 10) of the method is marked as cuspasse.
With reference to fig. 6, since p = m/16 < (2 m-1)/4 is satisfied, it can be predicted that the data storage format of the banded sparse matrix using the method of the present invention is better to process.
Through the implementation of program codes, the floating-point operation capability comparison experiment of three multiplication methods (including the method of the present invention) on the GPU is completed, and the experimental results are shown in fig. 7 to 10. The floating-point operation capability represents the number of floating-point operations to be executed in a unit time. Therefore, the floating point operation capability of the method of the invention in each dimension is superior to that of the prior art.
In the following, the advantage of the multiplication acceleration method using the data storage format of the strip sparse matrix in the method of the present invention is further elucidated from the viewpoint of GPU implementation.
In this embodiment, the matrix C is divided into sub-blocks of size 16 × 32, and similarly, the thread block size is set to 16 × 32, so that one thread block is responsible for one sub-block of C and one thread is responsible for calculating one element of C. I.e. each thread block is responsible for one sub-block of the banded sparse matrix a of size 16 x k and one sub-block of the dense matrix B of size k x 32. For an A subblock stored in a CSR format, two-dimensional shared memory spaces with the size of 16 multiplied by 32 are set, and the non-zero element value and the row index of the A subblock are respectively read in circularly so as to reduce the time consumption for reading data from the global memory; for the A matrix stored in the data storage format of the strip-shaped sparse matrix, in the calculation of one block of the C, all rows have the same column subscript in the A subblock of the strip-shaped sparse matrix needing to be read, so compared with the algorithm of the CSR format, the method can further set a shared memory space with the size of 32 multiplied by 32 to store one subblock of the dense matrix B needing to be calculated each time, thereby realizing the row multiplexing of the dense matrix B.
Finally, it is noted that the above-mentioned preferred embodiments illustrate rather than limit the invention, and that, although the invention has been described in detail with reference to the above-mentioned preferred embodiments, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the scope of the invention as defined by the appended claims.

Claims (7)

1. A data storage format of a strip sparse matrix is characterized by comprising five arrays: b. rc, rv, c and v; for sparse bands containing p { D 1 ,…,D p The m-k dimensional strip-shaped sparse matrix A of the structure is characterized in that b is a height accumulated value of a sparse band, rc is an accumulated value of the number of nonzero elements of the first row of the sparse band, rv is an accumulated value of the total number of all the nonzero elements of the sparse band, c is a column subscript of the nonzero element of the first row of each sparse band, and v is a value of each nonzero element in the strip-shaped sparse matrix A; the sparse zone is a plurality of rows with the same number of non-zero elements of a plurality of continuous rows in the matrix and the same corresponding column subscripts.
2. The data storage format of the strip sparse matrix as claimed in claim 1, wherein the storage format is selected by comparing the number p and dimension m of the sparse bands of the sparse matrix a in advance, and when p < (2 m-1)/4, the data storage format of the strip sparse matrix is adopted for processing, otherwise, the data storage format of the strip sparse matrix is adopted for processing.
3. Method of data storage format of a strip sparse matrix a applied to a data storage format of a strip sparse matrix according to claim 1, characterized by comprising the steps of:
s31: traversing the banded sparse matrix A, and sequentially identifying p sparse bands { D 1 ,…,D p At D } 1 Front-inserting an empty sparse band D 0 With which the array sparse band { D is formed 0 ,D 1 ,…,D p };
S32: traversing the array sparse zone to obtain D 0 To D i Are added to obtain b [ i ] respectively]A value of (d); further, an array b = { b [ i ] of p +1 dimensions is obtained]|0≤i≤p};
S33: traversing the array sparse zone to obtain D 0 To D i The non-zero element numbers of the first line are accumulated to respectively obtain rc [ i]A value of (d); further, an array rc = { rc [ i ] of p +1 dimensions is obtained]|0≤i≤p};
S34: traversing the previous p groups of sparse bands to obtain a sparse band D 0 To D i The total number of non-zero elements of all the rows is accumulated to respectively obtain rv [ i]To obtain a p-dimensional array rv = { rv [ i) }]|0≤i≤p-1};
S35: traverse sparse bands { D) of banded sparse matrix A 1 ,…,D p Sequentially extracting column subscripts of all non-zero elements in the first row of each sparse band, and sequentially storing the column subscripts into an array c;
s36: traversing the strip-shaped sparse matrix A, and recording the value v [ i ] of each nonzero element, thereby obtaining an array v = { v [ i ] |1 is not less than i and not more than nnz }, wherein nnz is the total number of the nonzero elements of the strip-shaped sparse matrix;
s37: and packaging the five arrays of b, rc, rv, c and v to obtain a data storage format of the strip-shaped sparse matrix A.
4. A multiplication acceleration method for a data storage format of a strip-like sparse matrix applied to the data storage format of a strip-like sparse matrix according to claim 1, for multiplication C = a × B, where B is a k × n-dimensional dense matrix, characterized by comprising the steps of:
s41: from the first sparse band D of the banded sparse matrix A 1 Initially, from array b, the ith sparse band D is calculated i Correspond toHeight h of i =b[i]-b[i-1];
S42: according to the array rc, calculating the ith sparse band D i The corresponding number of non-zero elements per row T = rc [ i [ ]]-rc[i-1];
S43: traverse the ith sparse band D i For T non-zero elements in each row, obtaining the value of each non-zero element according to the array v, and respectively calculating the column subscript col according to the array c, thereby obtaining the value of the element on the col row corresponding to the matrix B which is multiplied by the non-zero element;
s44: will sparsely band D i The corresponding row non-zero element is multiplied with the value of the column element on the col row corresponding to the matrix B, the obtained products are accumulated, and D is obtained i The product of each row in (a) and B;
s45: repeating the steps S41 to S44, traversing all sparse bands { D ] in the banded sparse matrix A 1 ,…,D p Get sparse band D 1 ,…,D p And accumulating the product of the matrix C and the product of the matrix B to obtain the value of the matrix C.
5. The multiplication acceleration method for data storage format of strip-like sparse matrix according to claim 4, characterized in that the sparse band D of step S44 i The multiplication of the corresponding row nonzero elements and the values of the column elements on the first col row corresponding to the matrix B are independent from each other, and the parallel calculation mode can be adopted for acceleration.
6. The multiplication acceleration method for the data storage format of the strip-shaped sparse matrix as claimed in claim 4, wherein the step S45 of traversing the products of all the sparse bands and B in the strip-shaped sparse matrix a is mutually independent in process, and can be accelerated by adopting a parallel computation mode.
7. The multiplication acceleration method for data storage format of strip-like sparse matrix as claimed in claim 4, wherein in said steps S43 to S45, the ith sparse band D can be determined by accessing rc array i Column subscript of each non-zero elementTo thereby determine the calculation D i The row information of the dense matrix B required by multiplication of all the continuous rows is read at one time, so that the row multiplexing of the same matrix B is realized, and the calculation efficiency is improved.
CN202210931011.6A 2022-07-29 2022-07-29 Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof Pending CN115309333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210931011.6A CN115309333A (en) 2022-07-29 2022-07-29 Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210931011.6A CN115309333A (en) 2022-07-29 2022-07-29 Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof

Publications (1)

Publication Number Publication Date
CN115309333A true CN115309333A (en) 2022-11-08

Family

ID=83859634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210931011.6A Pending CN115309333A (en) 2022-07-29 2022-07-29 Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof

Country Status (1)

Country Link
CN (1) CN115309333A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982398A (en) * 2023-03-13 2023-04-18 苏州浪潮智能科技有限公司 Graph structure data processing method, system, computer device and storage medium
CN117579225A (en) * 2023-11-21 2024-02-20 四川新视创伟超高清科技有限公司 Sparse matrix coding and data storage method for unstructured regular distribution

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982398A (en) * 2023-03-13 2023-04-18 苏州浪潮智能科技有限公司 Graph structure data processing method, system, computer device and storage medium
CN115982398B (en) * 2023-03-13 2023-05-16 苏州浪潮智能科技有限公司 Graph structure data processing method, system, computer device and storage medium
CN117579225A (en) * 2023-11-21 2024-02-20 四川新视创伟超高清科技有限公司 Sparse matrix coding and data storage method for unstructured regular distribution
CN117579225B (en) * 2023-11-21 2024-05-10 四川新视创伟超高清科技有限公司 Sparse matrix coding and data storage method for unstructured regular distribution

Similar Documents

Publication Publication Date Title
CN115309333A (en) Data storage format of strip-shaped sparse matrix and multiplication acceleration method thereof
US10534839B2 (en) Method for matrix by vector multiplication for use in artificial neural network
KR101196566B1 (en) Multiprocessor system, and its information processing method
US20180046895A1 (en) Device and method for implementing a sparse neural network
US7669014B2 (en) Transpose memory and method thereof
EP3529701B1 (en) Hardware double buffering using a special purpose computational unit
CN109726314B (en) Bitmap-based sparse matrix compression storage method
US11544542B2 (en) Computing device and method
CN110399591B (en) Data processing method and device based on convolutional neural network
CN109840585B (en) Sparse two-dimensional convolution-oriented operation method and system
CN110807170A (en) Multi-sample multi-channel convolution neural network Same convolution vectorization implementation method
US11580402B2 (en) System and method for compressing activation data
CN114491402A (en) Calculation method for sparse matrix vector multiplication access optimization
CN115186802A (en) Block sparse method and device based on convolutional neural network and processing unit
US9213680B2 (en) Method and structure for fast in-place transformation of standard full and packed matrix data formats
US6961474B1 (en) Huffman encoder for encoding/decoding DCT coefficients
CN117454946A (en) Tensor core architecture system supporting unstructured sparse matrix computation
CN113592075B (en) Convolution operation device, method and chip
CN110766136A (en) Compression method of sparse matrix and vector
CN112953549B (en) Storage processing method and device for sparse matrix
CN113794709B (en) Hybrid coding method for binary sparse matrix
CN116051345A (en) Image data processing method, device, computer equipment and readable storage medium
CN115885249A (en) System and method for accelerating training of deep learning networks
CN107992555B (en) Method for storing and reading raster data
CN116167423B (en) Device and accelerator for realizing CNN convolution layer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination