CN112540718A - Sparse matrix storage method for Schenk core architecture - Google Patents

Sparse matrix storage method for Schenk core architecture Download PDF

Info

Publication number
CN112540718A
CN112540718A CN201910898286.2A CN201910898286A CN112540718A CN 112540718 A CN112540718 A CN 112540718A CN 201910898286 A CN201910898286 A CN 201910898286A CN 112540718 A CN112540718 A CN 112540718A
Authority
CN
China
Prior art keywords
core
sparse matrix
many
column
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910898286.2A
Other languages
Chinese (zh)
Inventor
陈德训
李芳�
赵朋朋
刘鑫
徐金秀
孙唯哲
陈鑫
郭恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910898286.2A priority Critical patent/CN112540718A/en
Publication of CN112540718A publication Critical patent/CN112540718A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a sparse matrix storage method facing a Shenwei many-core architecture, wherein a many-core processor consists of 4 heterogeneous groups, each heterogeneous group comprises a master core, a slave core cluster consisting of 64 slave cores, a heterogeneous group interface and a storage controller, and the whole chip is provided with 260 computing cores; the sparse matrix storage format comprises the steps of: s1, grouping the sparse matrix on a core group array of the Shenwei many-core processor according to rows, grouping a plurality of rows from each secondary core, and grouping 64 secondary cores into 64 groups; and S2, compressing and storing the non-zero elements of the sparse matrix in each group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column indexes, row coordinates and non-zero element values. The invention can provide a uniform data organization form for the whole process solving of the application software based on the many-core processor, thereby improving the adaptability of the problem to the architecture of the Shenwei many-core.

Description

Sparse matrix storage method for Schenk core architecture
Technical Field
The invention belongs to the technical field of scientific computing, and particularly relates to a sparse matrix storage method for a Schenwark kernel architecture.
Background
Sparse matrices are key data structures and performance bottlenecks in numerical simulation calculations in many fields of natural and social sciences. Since the non-zero elements in the sparse matrix are extremely rare, the sparse matrix cannot be stored in a two-dimensional array according to a general dense matrix storage mode. So far, there has been considerable work on how to store a Sparse matrix, the most basic storage methods COO (coordinate list), CSR (Sparse Row compression), CSC (Sparse Column compression), ELLPACK, etc.
In recent years, a heterogeneous architecture high-performance computing platform based on a many-core processor gradually becomes mainstream, the number of processor cores is increased, the vectorization length is increased, and the number of cache levels and the capacity are enhanced. Under a new computer architecture, the traditional sparse matrix application oriented to the multi-core CPU cannot increase the ship height with the improvement of hardware performance, and a new sparse matrix storage mode and a corresponding parallel algorithm must be researched on the basis of an Shenzhong core processor.
Based on a GPU, new sparse matrix storage formats and algorithms such as a HYB mixed format, an ELL-R method, an ELL method, a CSMR format, a BCCOO format and the like gradually appear, but the formats and the algorithms are not suitable for the Schenwarren core architecture. In order to improve the computing performance of the sparse matrix algorithm based on an Schenwein core processor, a new sparse matrix storage format needs to be researched.
When a large sparse matrix stored according to a traditional storage format is used for matrix operation by using a many-core processor, the problems of discrete access, write conflict, poor data reusability, unbalanced load and the like are faced. Some application software based on the Schenweiman core architecture locally optimizes certain sparse matrix operation according to specific sparse matrix characteristics, and the sparse matrix operation needs to be restored to an original matrix format after being finished, so that on one hand, the conversion cost of a matrix storage type cannot be avoided, and on the other hand, the integrity of the application software is damaged.
Disclosure of Invention
The invention aims to provide a sparse matrix storage method for an application-oriented numerous-core architecture, and the sparse matrix storage format can provide a uniform data organization form for the whole process solving of application software based on a many-core processor, so that the adaptability of the problem to the application-oriented numerous-core architecture is improved.
In order to achieve the purpose, the invention adopts the technical scheme that: a sparse matrix storage method facing a Shenwei many-core architecture is characterized in that a many-core processor is composed of 4 heterogeneous groups, each heterogeneous group comprises a control core, a slave core cluster composed of 64 computing cores, heterogeneous group interfaces and a storage controller, and the whole many-core processor is provided with 260 computing cores;
the sparse matrix storage format comprises the steps of:
s1, grouping the sparse matrix on the core group array of the Shenwei many-core processor according to rows: setting the number of rows of the sparse matrix as N, setting the number of computing cores of the core group array as 64, and setting the number of rows of each computing core as 64/N, and sequentially distributing;
s2, compressing and storing non-zero elements of the sparse matrix in the group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column offset Col _ p, Row coordinates Row _ i and non-zero element Value;
s21, recording each non-zero element in the matrix in the order of columns:
s211, recording the column offset Col _ p of the first non-zero element in each column, and supplementing the total number of the non-zero elements of the matrix at the last of the column offset;
s212, recording the Row coordinate Row _ i of each nonzero element;
and S213, recording the Value of each non-zero element.
The technical scheme of further improvement in the technical scheme is as follows:
1. in the scheme, the computing core is mainly responsible for fine-grained parallel computing tasks, the computing core can directly access the main memory discretely, also can access the main memory in batches in a DMA mode, and can perform high-efficiency communication in a many-core array in a register communication mode.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the sparse matrix storage method facing the Shenwei many-core architecture adopts a sparse matrix group inner column compression storage format, can avoid the problem of write conflict and the problem of discrete access to a main memory, can improve the sparse matrix calculation efficiency, reduces the format conversion overhead during multiple matrix operations in application software, can provide a uniform data organization form for sparse matrix operations, and obtains higher solving efficiency on a many-core processor, thereby improving the many-core architecture adaptability of the application problems; the format has good adaptability of the Schmitt-Kernel framework to any sparse matrix type (asymmetric, non-centralized and irregular).
Drawings
FIG. 1 is a schematic diagram of a many-core processor architecture of the present invention;
FIG. 2 is a schematic diagram of the sparse matrix grouping of the present invention.
Detailed Description
The invention is further described below with reference to the following examples:
example (b): a sparse matrix storage method facing a Shenwei many-core architecture is characterized in that a many-core processor is composed of 4 heterogeneous groups, each heterogeneous group comprises a control core, a slave core cluster composed of 64 computing cores, heterogeneous group interfaces and a storage controller, and the whole many-core processor is provided with 260 computing cores; the control core has the functions of a conventional multi-core CPU such as calculation, communication, I/O and the like, and is also responsible for control operations such as loading and recovery of slave core tasks;
the sparse matrix storage format comprises the steps of:
s1, grouping the sparse matrix on the core group array of the Shenwei many-core processor according to rows: setting the number of rows of the sparse matrix as N, setting the number of computing cores of the core group array as 64, and setting the number of rows of each computing core as 64/N, and sequentially distributing;
s2, compressing and storing non-zero elements of the sparse matrix in the group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column offset Col _ p, Row coordinates Row _ i and non-zero element Value;
s21, recording each non-zero element in the matrix in the order of columns:
s211, recording the column offset Col _ p of the first non-zero element in each column, and supplementing the total number of the non-zero elements of the matrix at the last of the column offset;
s212, recording the Row coordinate Row _ i of each nonzero element;
and S213, recording the Value of each non-zero element.
The computing core is mainly responsible for fine-grained parallel computing tasks, the computing core can directly access the main memory discretely, also can access the main memory in batches in a DMA mode, and can carry out high-efficiency communication in a register communication mode in the many-core array.
The above-mentioned aspects of the invention are further explained as follows:
sparse matrix: english is sparse matrix, and in the matrix, if the number of elements with the numerical value of zero is far more than the number of non-zero elements and the distribution of the non-zero elements is not regular, the matrix is called sparse matrix.
Many cores: english is many core, and the processor integrates a plurality of computing cores and is oriented to the field of high-performance computing.
Aiming at the architecture of the Shenwey many-core processor, a Sparse matrix storage format CSGC is provided, namely a format of 'Sparse Group Column compression'. The sparse matrix is firstly grouped on a core group array of the Shenweiong core processor according to rows, a plurality of rows from each core are grouped into one group, and 64 secondary cores are grouped into 64 groups. Secondly, compressing and storing non-zero elements of the sparse matrix in the group according to columns. Compressing the column coordinates by saving the number of non-zero elements each column of the sparse matrix contains requires saving the column index (Col _ p), the Row coordinates (Row _ i) and the non-zero element Value (Value).
The sparse matrix shown in fig. 2 is distributed into 64 slave cores by rows, wherein the slave cores from 0 to 3 are respectively divided into three rows of sparse matrices, and the other slave cores are similar to the sparse matrices;
for slave core No. 0, CSGC:
Col_p= [0,1,2,4,4,5,6,6,7]
Row_i = [0,1,2,0,1,2,0]
Value= [3,4,6,5,-1,2,7]
for slave core No. 1, CSGC:
Col_p= [0,2,2,3,4,5,6,7,7]
Row_i = [0,1,2,0,1,2,0]
Value= [-2,4,5,6,7,8,15]
for slave core No. 2, CSGC:
Col_p= [0,0,1,1,2,3,4,5,6]
Row_i = [1,0,1,2,0,1]
Value= [7,-3,12,9,9,10]
a CSGC method is used for storing the sparse matrix, and sparse matrix vector multiplication is carried out on the Shenwei many-core processor, so that the write conflict problem and the discrete access main memory problem can be avoided. Sparse Matrix-Vector Multiplication (referred to as SpMv for short) is an example to introduce a many-core optimization algorithm. Sparse matrix vector multiplication is a very important core function in numerical calculation for calculating b = a ∅, where a is the sparse matrix, ∅ is the known vector, and b is the resultant vector. An example of a simple vector multiplication of a sparse matrix of size n x n is shown below.
Figure 266125DEST_PATH_IMAGE002
Firstly, according to the CSGC sparse matrix storage method, sparse matrices are grouped on a core group array according to rows, a result vector b is correspondingly grouped on the core group array according to rows, all calculations related to b are sequentially executed in a slave core, and the problem of write conflict does not exist.
Secondly, the sparse matrix non-zero elements required by each computational core are compactly stored continuously according to a CSGC method and can be accessed in batches through DMA.
Finally, if the LDM does not put all ∅ vectors down during the calculation, multiple batch accesses ∅ are made by DMA. Therefore, the discrete access problem of the sparse matrix is solved.
In the data structure design of the whole application software, sparse matrix operation is involved, such as sparse matrix coefficient construction, sparse linear equation set solving, sparse matrix vector multiplication and the like, and a CSGC (in-group column compression) format is adopted, so that the sparse matrix calculation efficiency can be improved, and the conversion overhead of the matrix format can be reduced.
When the sparse matrix storage method facing the Schenwein kernel architecture is adopted, the sparse matrix calculation efficiency can be improved, and the cost of format conversion during multiple matrix operations in application software is reduced; the format has good adaptability of the Schmitt-Kernel framework to any sparse matrix type (asymmetric, non-centralized and irregular).
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (2)

1. A sparse matrix storage method for a Schenwarren core architecture is characterized by comprising the following steps: the many-core processor is composed of 4 heterogeneous groups, each heterogeneous group comprises a control core, a slave core cluster composed of 64 computing cores, a heterogeneous group interface and a storage controller, and the whole many-core processor is provided with 260 computing cores;
the sparse matrix storage format comprises the steps of:
s1, grouping the sparse matrix on the core group array of the Shenwei many-core processor according to rows: setting the number of rows of the sparse matrix as N, setting the number of computing cores of the core group array as 64, and setting the number of rows of each computing core as 64/N, and sequentially distributing;
s2, compressing and storing non-zero elements of the sparse matrix in the group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column offset Col _ p, Row coordinates Row _ i and non-zero element Value;
s21, recording each non-zero element in the matrix in the order of columns:
s211, recording the column offset Col _ p of the first non-zero element in each column, and supplementing the total number of the non-zero elements of the matrix at the last of the column offset;
s212, recording the Row coordinate Row _ i of each nonzero element;
and S213, recording the Value of each non-zero element.
2. The sparse matrix storage method for the scheimpflug core architecture as claimed in claim 1, wherein: the computing core is mainly responsible for fine-grained parallel computing tasks, the computing core can directly access the main memory discretely, also can access the main memory in batches in a DMA mode, and can carry out high-efficiency communication in a register communication mode in the many-core array.
CN201910898286.2A 2019-09-23 2019-09-23 Sparse matrix storage method for Schenk core architecture Withdrawn CN112540718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910898286.2A CN112540718A (en) 2019-09-23 2019-09-23 Sparse matrix storage method for Schenk core architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910898286.2A CN112540718A (en) 2019-09-23 2019-09-23 Sparse matrix storage method for Schenk core architecture

Publications (1)

Publication Number Publication Date
CN112540718A true CN112540718A (en) 2021-03-23

Family

ID=75013010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910898286.2A Withdrawn CN112540718A (en) 2019-09-23 2019-09-23 Sparse matrix storage method for Schenk core architecture

Country Status (1)

Country Link
CN (1) CN112540718A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505766A (en) * 2021-09-09 2021-10-15 北京智源人工智能研究院 Image target detection method and device, electronic equipment and storage medium
CN114385972A (en) * 2021-12-20 2022-04-22 北京科技大学 Parallel computing method for directly solving structured triangular sparse linear equation set
WO2023097970A1 (en) * 2021-12-01 2023-06-08 北京微电子技术研究所 Many-core definable distributed shared storage structure

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150067009A1 (en) * 2013-08-30 2015-03-05 Microsoft Corporation Sparse matrix data structure
CN104636273A (en) * 2015-02-28 2015-05-20 中国科学技术大学 Storage method of sparse matrix on SIMD multi-core processor with multi-level cache
CN106775594A (en) * 2017-01-13 2017-05-31 中国科学院软件研究所 A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150067009A1 (en) * 2013-08-30 2015-03-05 Microsoft Corporation Sparse matrix data structure
CN104636273A (en) * 2015-02-28 2015-05-20 中国科学技术大学 Storage method of sparse matrix on SIMD multi-core processor with multi-level cache
CN106775594A (en) * 2017-01-13 2017-05-31 中国科学院软件研究所 A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113505766A (en) * 2021-09-09 2021-10-15 北京智源人工智能研究院 Image target detection method and device, electronic equipment and storage medium
CN113505766B (en) * 2021-09-09 2022-01-04 北京智源人工智能研究院 Image target detection method and device, electronic equipment and storage medium
WO2023097970A1 (en) * 2021-12-01 2023-06-08 北京微电子技术研究所 Many-core definable distributed shared storage structure
CN114385972A (en) * 2021-12-20 2022-04-22 北京科技大学 Parallel computing method for directly solving structured triangular sparse linear equation set
CN114385972B (en) * 2021-12-20 2023-09-01 北京科技大学 Parallel computing method for directly solving structured triangular sparse linear equation set

Similar Documents

Publication Publication Date Title
CN108805266B (en) Reconfigurable CNN high-concurrency convolution accelerator
CN112540718A (en) Sparse matrix storage method for Schenk core architecture
CN104636273B (en) A kind of sparse matrix storage method on SIMD many-core processors with Multi-Level Cache
WO2019205617A1 (en) Calculation method and apparatus for matrix multiplication
CN102110079B (en) Tuning calculation method of distributed conjugate gradient method based on MPI
US10417257B2 (en) Non-blocking database table alteration
CN110516316B (en) GPU acceleration method for solving Euler equation by interrupted Galerkin method
CN104361113A (en) OLAP (On-Line Analytical Processing) query optimization method in memory and flesh memory hybrid storage mode
CN111368484B (en) Cosmic N-body numerical simulation optimization method and system based on Shenwei architecture
CN111429974B (en) Molecular dynamics simulation short-range force parallel optimization method on super computer platform
CN115048215A (en) Method for realizing diagonal matrix SPMV (sparse matrix) on GPU (graphics processing Unit) based on mixed compression format
CN112560356A (en) Sparse matrix vector multiply many-core optimization method for many-core architecture
CN103593304A (en) Quantization method for efficiently using caches on basis of parallel device model
CN109271344B (en) Data preprocessing method based on parallel file reading of Shenwei chip architecture
Zhang et al. Evaluating low-memory GEMMs for convolutional neural network inference on FPGAS
CN113704691B (en) Small-scale symmetric matrix parallel tri-diagonalization method of Shenwei many-core processor
CN115906684A (en) Hydrodynamics multi-grid solver parallel optimization method for Shenwei architecture
CN105573834B (en) A kind of higher-dimension vocabulary tree constructing method based on heterogeneous platform
CN110244933B (en) Matrix transposition method based on CUDA
CN114969857A (en) Structural design optimization method, system, computer equipment and storage medium
Zhang et al. Yolov3-tiny Object Detection SoC Based on FPGA Platform
Brown et al. Nemo-cnn: An efficient near-memory accelerator for convolutional neural networks
KR102578762B1 (en) Neural processor
CN112559435A (en) Many-core parallel optimization method based on large-scale three-dimensional matrix transposition of on-chip blocking
CN103150290A (en) Novel numerical simulation method for three-dimensional incompressible pipe flow

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210323