CN105068787A - Heterogeneous parallel computing method for sparse matrix-vector multiplication - Google Patents

Heterogeneous parallel computing method for sparse matrix-vector multiplication Download PDF

Info

Publication number
CN105068787A
CN105068787A CN201510540568.7A CN201510540568A CN105068787A CN 105068787 A CN105068787 A CN 105068787A CN 201510540568 A CN201510540568 A CN 201510540568A CN 105068787 A CN105068787 A CN 105068787A
Authority
CN
China
Prior art keywords
ell
gpu
sparse matrix
storage
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510540568.7A
Other languages
Chinese (zh)
Inventor
董守斌
张铃启
陈泽邦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201510540568.7A priority Critical patent/CN105068787A/en
Publication of CN105068787A publication Critical patent/CN105068787A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a heterogeneous parallel computing method for sparse matrix-vector multiplication. The method comprises the following steps of: reading a sparse matrix stored in a hard disk by a CPU, determining an adjustable parameter K of the spare matrix, and according to the adjustable parameter K of the spare matrix, applying memory storage space including storage space required for an ELL (ELLPACK) storage structure and a CSR (Compressed Sparse Row) storage structure; at the same time, applying GPU storage space required for the ELL storage structure; filling memory storage space applied by the CPU with matrix data to generate a mixed storage structure; copying data stored in the ELL storage structure in a memory to the GPU storage space for storage; and finally, performing sparse matrix-vector multiplication by using the storage structure after completion of processing. According to the computing method, the computing capabilities of the CPU and a GPU are utilized at the same time when a sparse matrix-vector multiplication computation task is executed by a computer, so that the best computing characteristics of the CPU and the GPU can be exerted separately.

Description

A kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication
Technical field
The present invention relates to a kind of date storage method, particularly a kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication.
Technical background
Dredging matrix-vector multiplication (SparseMatrix-VectorMultiplication, be called for short SpMV) is one of calculating the most frequently used in scientific algorithm and engineer applied.In a lot of data mining application, often can run into the situation that data are extremely sparse, such data are typically expressed as sparse matrix.When data scale is very large, utilizes CPU-GPU heterogeneous computing platforms popular at present to realize SpMV heterogeneous Computing and be very important.
Generally speaking, heterogeneous platform carries out SpMV calculating, has following two kinds of computation schemas:
The first pattern is CPU/GPU cooperated computing, and CPU is good data processing, allows send to GPU to calculate data afterwards, after GPU has calculated, data is transmitted back CPU.Whole process is although be parallel, and in fact the work of CPU and GPU is but serial, and when GPU calculates, CPU is vacant, and the computing power of CPU has been slatterned.
The second pattern is that CPU/GPU calculates jointly, after CPU is good data processing, carries out task division, a part of data are transferred on GPU, jointly complete all calculation tasks afterwards with GPU.This pattern compares front a kind of pattern, while GPU calculates, used the computing power of CPU.Nonetheless, this pattern is still not enough to some extent, and it, using the computing node of CPU and GPU as equity, distributes similar calculation task, and this ignores the difference in CPU and GPU architecture, CPU and GPU will be caused all can not to reach peak performance.
In Sparse Matrix-Vector multiplication, sparse matrix storage is a key issue, directly affects calculated performance.At present, the sparse matrix file layout of main flow has: coordinate format (CoordinateFormat, be called for short COO), loose line compressed format (CompressedSparseRowFormat is called for short CSR), diagonal line form (DiagonalFormat, be called for short DIA), the file layouts such as ellpack form (ELLPACKformat is called for short ELL), often kind of file layout has their own characteristics each, and respectively has oneself optimal application scenarios.Article " EfficientSparseMatrix-VectorMultiplicationonCUDA " TechnicalReportbyNathanBell, the research display of MichaelGarland: in SpMV task, use DIA and ELL sparse matrix storage mode can utilize the bandwidth of GPU better, but the performance of ELL storage mode affect larger by concrete Data distribution8; In SpMV task the mixed mode of COO and ELL on GPU performance best because the advantage shown when it uses GPU process ELL, again by evading the defect of ELL in conjunction with COO.And CSR is a kind of modal matrix of coefficients file layout, it is the further compressed format of COO, is more applicable to using CPU process relative to other data storage method.But all these storage modes at present, are all not suitable at the enterprising row relax of CPU-GPU heterogeneous computing platforms.
Summary of the invention
The object of the invention is to overcome the shortcoming of prior art and deficiency, a kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication is provided.
Object of the present invention is achieved through the following technical solutions: a kind of heterogeneous Computing method of Sparse Matrix-Vector multiplication, comprises the following steps:
S1, determine the value of the adjustable parameter K of sparse matrix;
S2, CPU read the sparse matrix data be stored on hard disk, and apply for memory storage space according to the adjustable parameter K of sparse matrix, and described memory storage space comprises the storage space of ELL storage organization and CSR storage organization needs; Also apply for the GPU storage space that ELL storage organization needs simultaneously;
S3, sparse matrix data are compressed after be stored in the memory storage space that CPU applies for and generate mixing storage organization;
S4, the data Replica stored in ELL storage organization to be stored in GPU storage space;
The storage organization that S5, use have processed carries out Sparse Matrix-Vector multiplication, and CPU calculates the corresponding computing of CSR storage organization, and GPU calculates the corresponding computing of ELL storage organization, and the addition of CPU and GPU result obtains net result.
In step S3, when storing after by the compression of described sparse matrix, first by all data stuffings that can be stored in ELL in ELL structure, then by remaining data according to the common storage mode of CSR storage organization, be stored in CSR storage organization.
In step S3, when compressing described sparse matrix, for the situation of carrying out compressing for row, described matrix data is filled in array according to row; For the situation of carrying out compressing for row, described matrix data is filled in array according to row.
In step S4, the data content mixed in internal memory in storage organization in ELL storage organization is copied and is transferred on the correspondence position of ELL storage organization on GPU storage space the data Replica in described ELL storage organization to the specific practice of GPU storage space, meanwhile, corresponding in CPU releasing memory data space.
Compared with prior art, tool has the following advantages and beneficial effect in the present invention:
1, computing method of the present invention can make computing machine when carrying out SpMV calculation task, utilize the computing power of CPU and GPU simultaneously, make CPU and GPU can play optimum estimated performance separately.
2, the present invention generates method simply, can expand to cluster environment easily.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the heterogeneous Computing method of a kind of Sparse Matrix-Vector multiplication of the present invention.
Embodiment
Below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.
A heterogeneous Computing method for Sparse Matrix-Vector multiplication, specifically comprises the following steps:
S1, determine the value of the adjustable parameter K of sparse matrix.
K value can ELL matrix compression to minimum quantity for identifying.When Hybrid (ELL+COO/CSR), K value is for determining that how many data are stored in ELL, how much is stored in COO/CSR in this storage organization, can think that K value determines the calculated amount of GPU and CPU.K value is larger, and GPU calculated amount can be made larger, and CPU calculated amount is less.K value can make CPU and GPU complete time of calculating when getting optimum equal.K value can simply use mean value to determine.Namely for data according to row compression, then by every row average amount as K value; For data according to row compression, then by every column average data as K value.
S2, CPU read the sparse matrix be stored on hard disk, and apply for memory storage space according to described adjustable parameter K, comprise the storage space of ELL storage organization and CSR storage organization needs; GPU also applies for the GPU storage space that ELL storage organization needs simultaneously.
S3, matrix data is filled in memory storage space that CPU applies for and generates mixing storage organization.
Store in S4, GPU storage space that the data Replica stored in ELL storage organization is applied for GPU.
The storage organization that S5, use have processed carries out Sparse Matrix-Vector multiplication, and CPU calculates the corresponding computing of CSR storage organization, and GPU calculates the corresponding computing of ELL storage organization, and the addition of CPU and GPU result obtains net result.
In described step S2, ELL storage organization comprises array ell_data and ell_indexes, CSR storage organization comprises array csr_data, csr_indexes and csr_ptr.
In described step S3, first by all data stuffings that can be stored in ELL in ELL structure, then by remaining data according to the common storage mode of CSR storage organization, be stored in CSR storage organization.
In described step S3, when compressing sparse matrix, for the situation of carrying out compressing for row, described matrix data is filled in array according to row; For the situation of carrying out compressing for row, described matrix data is filled in array according to row.
In described step S4, the data content mixed in storage organization in ELL storage organization is copied and be transferred on the correspondence position of ELL storage organization on GPU storage space in internal memory, meanwhile, data space corresponding in CPU releasing memory.
In described step S5, GPU carries out Sparse Matrix-Vector multiplication according to the ELL structure be stored on GPU storage space, and CPU carries out matrix of coefficients vector multiplication according to the CSR structure stored with on memory headroom simultaneously, the result of calculation summation that CPU and GPU obtains.
For more clearly setting forth principle of the present invention, with embodiment, implementation procedure of the present invention is described below.
This process is to carry out data compression for row.First CPU reading matrix file is as follows:
1 2 3 4 5 6 7 8 9 10
1 3 1
2 5 3 9
3 2
4 7 3 9
5 5
6 1 3 2
7 2 1
8 9 2
9 5 1
10 5 6 3
By the known total line number 10 of read data information, total non-zero number 23, average each row of data number 2.3, desirable K=2.Application memory storage space, particularly, ell_data size is 20, ell_indexes size be 20, csr_data size is 4.Padding data is on ELL storage organization.
Its abstract form is:
Data are filled in array according to row and obtain:
ell_data[]={3,5,2,7,5,1,2,9,5,5,1,3,-,3,-,3,1,2,1,6}
ell_indexes[]={1,2,4,2,3,1,3,2,4,1,6,8,-,7,-,5,6,8,7,6}
2.2) upper in array remaining data being filled into CSR structure:
csr_data[]={9,9,2,3}
csr_indexes[]={10,9,9,8}
csr_ptr[]={0,0,1,1,2,2,3,3,3,3,4}
The data content mixed in internal memory in storage organization in ELL storage organization copies and is transferred on the correspondence position of ELL storage organization on GPU storage space by GPU, and meanwhile, data space corresponding in CPU releasing memory, completes matrix storing process.
Now hypothesis uses this matrix and following vector to do matrix-vector multiplication
Vector={2,1,0,2,5,0,2,3,0,5}
This storage organization is used to carry out matrix of coefficients vector multiplication process prescription as follows:
GPU carries out Sparse Matrix-Vector multiplication according to the ELL structure be stored on GPU storage space, can obtain result:
GPU_Result={6,14,4,13,0,17,0,15,12,10}
CPU carries out matrix of coefficients vector multiplication according to the CSR structure stored with on memory headroom, can obtain result:
CPU_Result={0,45,0,0,0,0,0,0,0,9}
Final CPU_Result and GPU_Result is added and obtains:
Result={6,59,4,13,0,17,0,15,12,19}
Above-described embodiment is the present invention's preferably embodiment; but embodiments of the present invention are not restricted to the described embodiments; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.

Claims (4)

1. a heterogeneous Computing method for Sparse Matrix-Vector multiplication, is characterized in that, comprise the following steps:
S1, determine the value of the adjustable parameter K of sparse matrix;
S2, CPU read the sparse matrix data be stored on hard disk, and apply for memory storage space according to the adjustable parameter K of sparse matrix, and described memory storage space comprises the storage space of ELL storage organization and CSR storage organization needs; Also apply for the GPU storage space that ELL storage organization needs simultaneously;
S3, sparse matrix data are compressed after be stored in the memory storage space that CPU applies for and generate mixing storage organization;
S4, the data Replica stored in ELL storage organization to be stored in GPU storage space;
The storage organization that S5, use have processed carries out Sparse Matrix-Vector multiplication, and CPU calculates the corresponding computing of CSR storage organization, and GPU calculates the corresponding computing of ELL storage organization, and the addition of CPU and GPU result obtains net result.
2. the heterogeneous Computing method of Sparse Matrix-Vector multiplication according to claim 1, it is characterized in that: in step S3, when storing after described sparse matrix is compressed, first by all data stuffings that can be stored in ELL in ELL structure, then by remaining data according to the common storage mode of CSR storage organization, be stored in CSR storage organization.
3. the heterogeneous Computing method of Sparse Matrix-Vector multiplication according to claim 1, is characterized in that: in step S3, and when compressing described sparse matrix, for the situation of carrying out compressing for row, described matrix data is filled in array according to row; For the situation of carrying out compressing for row, described matrix data is filled in array according to row.
4. the heterogeneous Computing method of Sparse Matrix-Vector multiplication according to claim 1, it is characterized in that: in step S4, the data content mixed in internal memory in storage organization in ELL storage organization is copied and is transferred on the correspondence position of ELL storage organization on GPU storage space the data Replica in described ELL storage organization to the specific practice of GPU storage space, meanwhile, corresponding in CPU releasing memory data space.
CN201510540568.7A 2015-08-28 2015-08-28 Heterogeneous parallel computing method for sparse matrix-vector multiplication Pending CN105068787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510540568.7A CN105068787A (en) 2015-08-28 2015-08-28 Heterogeneous parallel computing method for sparse matrix-vector multiplication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510540568.7A CN105068787A (en) 2015-08-28 2015-08-28 Heterogeneous parallel computing method for sparse matrix-vector multiplication

Publications (1)

Publication Number Publication Date
CN105068787A true CN105068787A (en) 2015-11-18

Family

ID=54498168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510540568.7A Pending CN105068787A (en) 2015-08-28 2015-08-28 Heterogeneous parallel computing method for sparse matrix-vector multiplication

Country Status (1)

Country Link
CN (1) CN105068787A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407158A (en) * 2016-09-12 2017-02-15 东南大学 GPU accelerated method for performing batch processing of isomorphic sparse matrixes multiplied by full vectors
CN106775594A (en) * 2017-01-13 2017-05-31 中国科学院软件研究所 A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method
CN108733348A (en) * 2017-04-21 2018-11-02 上海寒武纪信息科技有限公司 The method for merging vector multiplier and carrying out operation using it
CN111428192A (en) * 2020-03-19 2020-07-17 湖南大学 Method and system for optimizing high performance computational architecture sparse matrix vector multiplication
CN112818179A (en) * 2019-11-18 2021-05-18 中国科学院深圳先进技术研究院 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment
CN112905241A (en) * 2019-03-15 2021-06-04 英特尔公司 Sparse optimization for matrix accelerator architecture
CN113360188A (en) * 2021-05-18 2021-09-07 中国石油大学(北京) Parallel processing method and device for optimizing sparse matrix-vector multiplication
CN117311948A (en) * 2023-11-27 2023-12-29 湖南迈曦软件有限责任公司 Automatic multiple substructure data processing method for heterogeneous parallelism of CPU and GPU

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078226A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Sparse Matrix-Vector Multiplication on Graphics Processor Units
CN102436438A (en) * 2011-12-13 2012-05-02 华中科技大学 Sparse matrix data storage method based on ground power unit (GPU)
US20120143932A1 (en) * 2010-12-06 2012-06-07 International Business Machines Corporation Data Structure For Tiling And Packetizing A Sparse Matrix
CN103383772A (en) * 2012-05-01 2013-11-06 辉达公司 System, method, and computer program product for performing graph coloring
US20140108481A1 (en) * 2012-10-14 2014-04-17 Microsoft Corporation Universal fpga/asic matrix-vector multiplication architecture
CN103984527A (en) * 2014-04-01 2014-08-13 杭州电子科技大学 Method optimizing sparse matrix vector multiplication to improve incompressible pipe flow simulation efficiency
WO2014139140A1 (en) * 2013-03-15 2014-09-18 Hewlett-Packard Development Company, L.P. Co-processor-based array-oriented database processing
CN104461466A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model
CN104636273A (en) * 2015-02-28 2015-05-20 中国科学技术大学 Storage method of sparse matrix on SIMD multi-core processor with multi-level cache
US20150160371A1 (en) * 2013-12-06 2015-06-11 Schlumberger Technology Corporation Gpu accelerated deflation in geomechanics simulator

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078226A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Sparse Matrix-Vector Multiplication on Graphics Processor Units
US20120143932A1 (en) * 2010-12-06 2012-06-07 International Business Machines Corporation Data Structure For Tiling And Packetizing A Sparse Matrix
CN102436438A (en) * 2011-12-13 2012-05-02 华中科技大学 Sparse matrix data storage method based on ground power unit (GPU)
CN103383772A (en) * 2012-05-01 2013-11-06 辉达公司 System, method, and computer program product for performing graph coloring
US20140108481A1 (en) * 2012-10-14 2014-04-17 Microsoft Corporation Universal fpga/asic matrix-vector multiplication architecture
WO2014139140A1 (en) * 2013-03-15 2014-09-18 Hewlett-Packard Development Company, L.P. Co-processor-based array-oriented database processing
CN104461466A (en) * 2013-09-25 2015-03-25 广州中国科学院软件应用技术研究所 Method for increasing computing speed through parallel computing based on MPI and OpenMP hybrid programming model
US20150160371A1 (en) * 2013-12-06 2015-06-11 Schlumberger Technology Corporation Gpu accelerated deflation in geomechanics simulator
CN103984527A (en) * 2014-04-01 2014-08-13 杭州电子科技大学 Method optimizing sparse matrix vector multiplication to improve incompressible pipe flow simulation efficiency
CN104636273A (en) * 2015-02-28 2015-05-20 中国科学技术大学 Storage method of sparse matrix on SIMD multi-core processor with multi-level cache

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NATHAN BELL,MICHAEL GARLANDY: ""Effcient Sparse Matrix-Vector Multiplication on CUDA"", 《NVIDIA TECHNICAL REPORT》 *
阳王东,李肯立: ""准对角矩阵与向量相乘在CPU+GPU异构集群上的实现与优化"", 《小型微型计算机系统》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407158A (en) * 2016-09-12 2017-02-15 东南大学 GPU accelerated method for performing batch processing of isomorphic sparse matrixes multiplied by full vectors
CN106407158B (en) * 2016-09-12 2019-01-29 东南大学 A kind of batch processing isomorphism sparse matrix that GPU accelerates multiplies the processing method of full vector
CN106775594A (en) * 2017-01-13 2017-05-31 中国科学院软件研究所 A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method
CN108733348A (en) * 2017-04-21 2018-11-02 上海寒武纪信息科技有限公司 The method for merging vector multiplier and carrying out operation using it
CN112905241A (en) * 2019-03-15 2021-06-04 英特尔公司 Sparse optimization for matrix accelerator architecture
CN112905241B (en) * 2019-03-15 2024-03-29 英特尔公司 Sparse optimization for matrix accelerator architecture
CN112818179A (en) * 2019-11-18 2021-05-18 中国科学院深圳先进技术研究院 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment
CN112818179B (en) * 2019-11-18 2022-06-21 中国科学院深圳先进技术研究院 Graph traversal access and storage optimization method and system based on Hybrid storage format and electronic equipment
CN111428192A (en) * 2020-03-19 2020-07-17 湖南大学 Method and system for optimizing high performance computational architecture sparse matrix vector multiplication
CN113360188A (en) * 2021-05-18 2021-09-07 中国石油大学(北京) Parallel processing method and device for optimizing sparse matrix-vector multiplication
CN113360188B (en) * 2021-05-18 2023-10-31 中国石油大学(北京) Parallel processing method and device for optimizing sparse matrix-vector multiplication
CN117311948A (en) * 2023-11-27 2023-12-29 湖南迈曦软件有限责任公司 Automatic multiple substructure data processing method for heterogeneous parallelism of CPU and GPU
CN117311948B (en) * 2023-11-27 2024-03-19 湖南迈曦软件有限责任公司 Automatic multiple substructure data processing method for heterogeneous parallelism of CPU and GPU

Similar Documents

Publication Publication Date Title
CN105068787A (en) Heterogeneous parallel computing method for sparse matrix-vector multiplication
KR102316670B1 (en) computational accelerator
CN108510064B (en) Processing system and method for artificial neural network comprising multiple core processing modules
JP6147244B2 (en) Processing method and processing apparatus for single channel convolutional layer, and processing method and processing apparatus for multichannel convolutional layer
Kim et al. FPGA-based CNN inference accelerator synthesized from multi-threaded C software
US20190079903A1 (en) Providing matrix multiplication using vector registers in processor-based devices
CN111831254A (en) Image processing acceleration method, image processing model storage method and corresponding device
CN113469350B (en) Deep convolutional neural network acceleration method and system suitable for NPU
CN110516316B (en) GPU acceleration method for solving Euler equation by interrupted Galerkin method
CN108897716B (en) Data processing device and method for reducing calculation amount through memory read-write operation
CN106681688A (en) Set similarity calculation method and system based on minhash
CN114995782B (en) Data processing method, device, equipment and readable storage medium
KR20210059623A (en) Electronic device and method for inference Binary and Ternary Neural Networks
Bai et al. Factorisation of RSA-704 with cado-nfs
Kahl et al. Minimal projective reconstruction including missing data
US20190392100A1 (en) Storage medium which stores instructions for a simulation method in a semiconductor design process, semiconductor design system that performs the simulation method in the semiconductor design process, and simulation method in the semiconductor design process
CN111667052A (en) Standard and nonstandard volume consistency transformation method for special neural network accelerator
Pecher et al. New formulation of the Green element method to maintain its second-order accuracy in 2D/3D
US9135369B2 (en) System, method, and computer program product for performing graph aggregation
CN115424114A (en) Image processing method and device, and training method and device of image processing model
Wang et al. EI-MOR: A hybrid exponential integrator and model order reduction approach for transient power/ground network analysis
Falk et al. The critical exponent, the Hausdorff dimension of the limit set and the convex core entropy of a Kleinian group
US11580191B1 (en) Method and system for convolution
Daloukas et al. A 3-D Fast Transform-based preconditioner for large-scale power grid analysis on massively parallel architectures
Cross et al. CJS: Custom Jacobi Solver

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151118