CN105068787A

CN105068787A - Heterogeneous parallel computing method for sparse matrix-vector multiplication

Info

Publication number: CN105068787A
Application number: CN201510540568.7A
Authority: CN
Inventors: 董守斌; 张铃启; 陈泽邦
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2015-08-28
Filing date: 2015-08-28
Publication date: 2015-11-18

Abstract

The invention discloses a heterogeneous parallel computing method for sparse matrix-vector multiplication. The method comprises the following steps of: reading a sparse matrix stored in a hard disk by a CPU, determining an adjustable parameter K of the spare matrix, and according to the adjustable parameter K of the spare matrix, applying memory storage space including storage space required for an ELL (ELLPACK) storage structure and a CSR (Compressed Sparse Row) storage structure; at the same time, applying GPU storage space required for the ELL storage structure; filling memory storage space applied by the CPU with matrix data to generate a mixed storage structure; copying data stored in the ELL storage structure in a memory to the GPU storage space for storage; and finally, performing sparse matrix-vector multiplication by using the storage structure after completion of processing. According to the computing method, the computing capabilities of the CPU and a GPU are utilized at the same time when a sparse matrix-vector multiplication computation task is executed by a computer, so that the best computing characteristics of the CPU and the GPU can be exerted separately.

Description

A kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication

Technical field

The present invention relates to a kind of date storage method, particularly a kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication.

Technical background

Dredging matrix-vector multiplication (SparseMatrix-VectorMultiplication, be called for short SpMV) is one of calculating the most frequently used in scientific algorithm and engineer applied.In a lot of data mining application, often can run into the situation that data are extremely sparse, such data are typically expressed as sparse matrix.When data scale is very large, utilizes CPU-GPU heterogeneous computing platforms popular at present to realize SpMV heterogeneous Computing and be very important.

Generally speaking, heterogeneous platform carries out SpMV calculating, has following two kinds of computation schemas:

The first pattern is CPU/GPU cooperated computing, and CPU is good data processing, allows send to GPU to calculate data afterwards, after GPU has calculated, data is transmitted back CPU.Whole process is although be parallel, and in fact the work of CPU and GPU is but serial, and when GPU calculates, CPU is vacant, and the computing power of CPU has been slatterned.

The second pattern is that CPU/GPU calculates jointly, after CPU is good data processing, carries out task division, a part of data are transferred on GPU, jointly complete all calculation tasks afterwards with GPU.This pattern compares front a kind of pattern, while GPU calculates, used the computing power of CPU.Nonetheless, this pattern is still not enough to some extent, and it, using the computing node of CPU and GPU as equity, distributes similar calculation task, and this ignores the difference in CPU and GPU architecture, CPU and GPU will be caused all can not to reach peak performance.

In Sparse Matrix-Vector multiplication, sparse matrix storage is a key issue, directly affects calculated performance.At present, the sparse matrix file layout of main flow has: coordinate format (CoordinateFormat, be called for short COO), loose line compressed format (CompressedSparseRowFormat is called for short CSR), diagonal line form (DiagonalFormat, be called for short DIA), the file layouts such as ellpack form (ELLPACKformat is called for short ELL), often kind of file layout has their own characteristics each, and respectively has oneself optimal application scenarios.Article " EfficientSparseMatrix-VectorMultiplicationonCUDA " TechnicalReportbyNathanBell, the research display of MichaelGarland: in SpMV task, use DIA and ELL sparse matrix storage mode can utilize the bandwidth of GPU better, but the performance of ELL storage mode affect larger by concrete Data distribution8; In SpMV task the mixed mode of COO and ELL on GPU performance best because the advantage shown when it uses GPU process ELL, again by evading the defect of ELL in conjunction with COO.And CSR is a kind of modal matrix of coefficients file layout, it is the further compressed format of COO, is more applicable to using CPU process relative to other data storage method.But all these storage modes at present, are all not suitable at the enterprising row relax of CPU-GPU heterogeneous computing platforms.

Summary of the invention

The object of the invention is to overcome the shortcoming of prior art and deficiency, a kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication is provided.

Object of the present invention is achieved through the following technical solutions: a kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication, comprises the following steps:

S1, determine the value of the adjustable parameter K of sparse matrix;

S2, CPU read the sparse matrix data be stored on hard disk, and apply for memory storage space according to the adjustable parameter K of sparse matrix, and described memory storage space comprises the storage space of ELL storage organization and CSR storage organization needs; Also apply for the GPU storage space that ELL storage organization needs simultaneously;

S3, sparse matrix data are compressed after be stored in the memory storage space that CPU applies for and generate mixing storage organization;

S4, the data Replica stored in ELL storage organization to be stored in GPU storage space;

The storage organization that S5, use have processed carries out Sparse Matrix-Vector multiplication, and CPU calculates the corresponding computing of CSR storage organization, and GPU calculates the corresponding computing of ELL storage organization, and the addition of CPU and GPU result obtains net result.

In step S3, when storing after by the compression of described sparse matrix, first by all data stuffings that can be stored in ELL in ELL structure, then by remaining data according to the common storage mode of CSR storage organization, be stored in CSR storage organization.

In step S3, when compressing described sparse matrix, for the situation of carrying out compressing for row, described matrix data is filled in array according to row; For the situation of carrying out compressing for row, described matrix data is filled in array according to row.

In step S4, the data content mixed in internal memory in storage organization in ELL storage organization is copied and is transferred on the correspondence position of ELL storage organization on GPU storage space the data Replica in described ELL storage organization to the specific practice of GPU storage space, meanwhile, corresponding in CPU releasing memory data space.

Compared with prior art, tool has the following advantages and beneficial effect in the present invention:

1, computing method of the present invention can make computing machine when carrying out SpMV calculation task, utilize the computing power of CPU and GPU simultaneously, make CPU and GPU can play optimum estimated performance separately.

2, the present invention generates method simply, can expand to cluster environment easily.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the heterogeneous Computing method of a kind of Sparse Matrix-Vector multiplication of the present invention.

Embodiment

Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.

A heterogeneous Computing method for Sparse Matrix-Vector multiplication, specifically comprises the following steps:

S1, determine the value of the adjustable parameter K of sparse matrix.

K value can ELL matrix compression to minimum quantity for identifying.When Hybrid (ELL+COO/CSR), K value is for determining that how many data are stored in ELL, how much is stored in COO/CSR in this storage organization, can think that K value determines the calculated amount of GPU and CPU.K value is larger, and GPU calculated amount can be made larger, and CPU calculated amount is less.K value can make CPU and GPU complete time of calculating when getting optimum equal.K value can simply use mean value to determine.Namely for data according to row compression, then by every row average amount as K value; For data according to row compression, then by every column average data as K value.

S2, CPU read the sparse matrix be stored on hard disk, and apply for memory storage space according to described adjustable parameter K, comprise the storage space of ELL storage organization and CSR storage organization needs; GPU also applies for the GPU storage space that ELL storage organization needs simultaneously.

S3, matrix data is filled in memory storage space that CPU applies for and generates mixing storage organization.

Store in S4, GPU storage space that the data Replica stored in ELL storage organization is applied for GPU.

In described step S2, ELL storage organization comprises array ell_data and ell_indexes, CSR storage organization comprises array csr_data, csr_indexes and csr_ptr.

In described step S3, first by all data stuffings that can be stored in ELL in ELL structure, then by remaining data according to the common storage mode of CSR storage organization, be stored in CSR storage organization.

In described step S3, when compressing sparse matrix, for the situation of carrying out compressing for row, described matrix data is filled in array according to row; For the situation of carrying out compressing for row, described matrix data is filled in array according to row.

In described step S4, the data content mixed in storage organization in ELL storage organization is copied and be transferred on the correspondence position of ELL storage organization on GPU storage space in internal memory, meanwhile, data space corresponding in CPU releasing memory.

In described step S5, GPU carries out Sparse Matrix-Vector multiplication according to the ELL structure be stored on GPU storage space, and CPU carries out matrix of coefficients vector multiplication according to the CSR structure stored with on memory headroom simultaneously, the result of calculation summation that CPU and GPU obtains.

For more clearly setting forth principle of the present invention, with embodiment, implementation procedure of the present invention is described below.

This process is to carry out data compression for row.First CPU reading matrix file is as follows:

	1	2	3	4	5	6	7	8	9	10
											1	3			1
2		5						3		9
											3			2
4		7					3		9
											5		5

6	1		3			2
							7	2		1
8		9			2
							9		5		1
10	5			6	3

By the known total line number 10 of read data information, total non-zero number 23, average each row of data number 2.3, desirable K=2.Application memory storage space, particularly, ell_data size is 20, ell_indexes size be 20, csr_data size is 4.Padding data is on ELL storage organization.

Its abstract form is:

Data are filled in array according to row and obtain:

ell_data[]＝{3,5,2,7,5,1,2,9,5,5,1,3,-,3,-,3,1,2,1,6}

ell_indexes[]＝{1,2,4,2,3,1,3,2,4,1,6,8,-,7,-,5,6,8,7,6}

2.2) upper in array remaining data being filled into CSR structure:

csr_data[]＝{9,9,2,3}

csr_indexes[]＝{10,9,9,8}

csr_ptr[]＝{0,0,1,1,2,2,3,3,3,3,4}

The data content mixed in internal memory in storage organization in ELL storage organization copies and is transferred on the correspondence position of ELL storage organization on GPU storage space by GPU, and meanwhile, data space corresponding in CPU releasing memory, completes matrix storing process.

Now hypothesis uses this matrix and following vector to do matrix-vector multiplication

Vector＝{2,1,0,2,5,0,2,3,0,5}

This storage organization is used to carry out matrix of coefficients vector multiplication process prescription as follows:

GPU carries out Sparse Matrix-Vector multiplication according to the ELL structure be stored on GPU storage space, can obtain result:

GPU_Result＝{6,14,4,13,0,17,0,15,12,10}

CPU carries out matrix of coefficients vector multiplication according to the CSR structure stored with on memory headroom, can obtain result:

CPU_Result＝{0,45,0,0,0,0,0,0,0,9}

Final CPU_Result and GPU_Result is added and obtains:

Result＝{6,59,4,13,0,17,0,15,12,19}

Above-described embodiment is the present invention's preferably embodiment; but embodiments of the present invention are not restricted to the described embodiments; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.

Claims

1. a heterogeneous Computing method for Sparse Matrix-Vector multiplication, is characterized in that, comprise the following steps:

S1, determine the value of the adjustable parameter K of sparse matrix;

2. the heterogeneous Computing method of Sparse Matrix-Vector multiplication according to claim 1, it is characterized in that: in step S3, when storing after described sparse matrix is compressed, first by all data stuffings that can be stored in ELL in ELL structure, then by remaining data according to the common storage mode of CSR storage organization, be stored in CSR storage organization.

3. the heterogeneous Computing method of Sparse Matrix-Vector multiplication according to claim 1, is characterized in that: in step S3, and when compressing described sparse matrix, for the situation of carrying out compressing for row, described matrix data is filled in array according to row; For the situation of carrying out compressing for row, described matrix data is filled in array according to row.

4. the heterogeneous Computing method of Sparse Matrix-Vector multiplication according to claim 1, it is characterized in that: in step S4, the data content mixed in internal memory in storage organization in ELL storage organization is copied and is transferred on the correspondence position of ELL storage organization on GPU storage space the data Replica in described ELL storage organization to the specific practice of GPU storage space, meanwhile, corresponding in CPU releasing memory data space.