CN112540718A - Sparse matrix storage method for Schenk core architecture - Google Patents
Sparse matrix storage method for Schenk core architecture Download PDFInfo
- Publication number
- CN112540718A CN112540718A CN201910898286.2A CN201910898286A CN112540718A CN 112540718 A CN112540718 A CN 112540718A CN 201910898286 A CN201910898286 A CN 201910898286A CN 112540718 A CN112540718 A CN 112540718A
- Authority
- CN
- China
- Prior art keywords
- core
- sparse matrix
- many
- column
- computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a sparse matrix storage method facing a Shenwei many-core architecture, wherein a many-core processor consists of 4 heterogeneous groups, each heterogeneous group comprises a master core, a slave core cluster consisting of 64 slave cores, a heterogeneous group interface and a storage controller, and the whole chip is provided with 260 computing cores; the sparse matrix storage format comprises the steps of: s1, grouping the sparse matrix on a core group array of the Shenwei many-core processor according to rows, grouping a plurality of rows from each secondary core, and grouping 64 secondary cores into 64 groups; and S2, compressing and storing the non-zero elements of the sparse matrix in each group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column indexes, row coordinates and non-zero element values. The invention can provide a uniform data organization form for the whole process solving of the application software based on the many-core processor, thereby improving the adaptability of the problem to the architecture of the Shenwei many-core.
Description
Technical Field
The invention belongs to the technical field of scientific computing, and particularly relates to a sparse matrix storage method for a Schenwark kernel architecture.
Background
Sparse matrices are key data structures and performance bottlenecks in numerical simulation calculations in many fields of natural and social sciences. Since the non-zero elements in the sparse matrix are extremely rare, the sparse matrix cannot be stored in a two-dimensional array according to a general dense matrix storage mode. So far, there has been considerable work on how to store a Sparse matrix, the most basic storage methods COO (coordinate list), CSR (Sparse Row compression), CSC (Sparse Column compression), ELLPACK, etc.
In recent years, a heterogeneous architecture high-performance computing platform based on a many-core processor gradually becomes mainstream, the number of processor cores is increased, the vectorization length is increased, and the number of cache levels and the capacity are enhanced. Under a new computer architecture, the traditional sparse matrix application oriented to the multi-core CPU cannot increase the ship height with the improvement of hardware performance, and a new sparse matrix storage mode and a corresponding parallel algorithm must be researched on the basis of an Shenzhong core processor.
Based on a GPU, new sparse matrix storage formats and algorithms such as a HYB mixed format, an ELL-R method, an ELL method, a CSMR format, a BCCOO format and the like gradually appear, but the formats and the algorithms are not suitable for the Schenwarren core architecture. In order to improve the computing performance of the sparse matrix algorithm based on an Schenwein core processor, a new sparse matrix storage format needs to be researched.
When a large sparse matrix stored according to a traditional storage format is used for matrix operation by using a many-core processor, the problems of discrete access, write conflict, poor data reusability, unbalanced load and the like are faced. Some application software based on the Schenweiman core architecture locally optimizes certain sparse matrix operation according to specific sparse matrix characteristics, and the sparse matrix operation needs to be restored to an original matrix format after being finished, so that on one hand, the conversion cost of a matrix storage type cannot be avoided, and on the other hand, the integrity of the application software is damaged.
Disclosure of Invention
The invention aims to provide a sparse matrix storage method for an application-oriented numerous-core architecture, and the sparse matrix storage format can provide a uniform data organization form for the whole process solving of application software based on a many-core processor, so that the adaptability of the problem to the application-oriented numerous-core architecture is improved.
In order to achieve the purpose, the invention adopts the technical scheme that: a sparse matrix storage method facing a Shenwei many-core architecture is characterized in that a many-core processor is composed of 4 heterogeneous groups, each heterogeneous group comprises a control core, a slave core cluster composed of 64 computing cores, heterogeneous group interfaces and a storage controller, and the whole many-core processor is provided with 260 computing cores;
the sparse matrix storage format comprises the steps of:
s1, grouping the sparse matrix on the core group array of the Shenwei many-core processor according to rows: setting the number of rows of the sparse matrix as N, setting the number of computing cores of the core group array as 64, and setting the number of rows of each computing core as 64/N, and sequentially distributing;
s2, compressing and storing non-zero elements of the sparse matrix in the group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column offset Col _ p, Row coordinates Row _ i and non-zero element Value;
s21, recording each non-zero element in the matrix in the order of columns:
s211, recording the column offset Col _ p of the first non-zero element in each column, and supplementing the total number of the non-zero elements of the matrix at the last of the column offset;
s212, recording the Row coordinate Row _ i of each nonzero element;
and S213, recording the Value of each non-zero element.
The technical scheme of further improvement in the technical scheme is as follows:
1. in the scheme, the computing core is mainly responsible for fine-grained parallel computing tasks, the computing core can directly access the main memory discretely, also can access the main memory in batches in a DMA mode, and can perform high-efficiency communication in a many-core array in a register communication mode.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the sparse matrix storage method facing the Shenwei many-core architecture adopts a sparse matrix group inner column compression storage format, can avoid the problem of write conflict and the problem of discrete access to a main memory, can improve the sparse matrix calculation efficiency, reduces the format conversion overhead during multiple matrix operations in application software, can provide a uniform data organization form for sparse matrix operations, and obtains higher solving efficiency on a many-core processor, thereby improving the many-core architecture adaptability of the application problems; the format has good adaptability of the Schmitt-Kernel framework to any sparse matrix type (asymmetric, non-centralized and irregular).
Drawings
FIG. 1 is a schematic diagram of a many-core processor architecture of the present invention;
FIG. 2 is a schematic diagram of the sparse matrix grouping of the present invention.
Detailed Description
The invention is further described below with reference to the following examples:
example (b): a sparse matrix storage method facing a Shenwei many-core architecture is characterized in that a many-core processor is composed of 4 heterogeneous groups, each heterogeneous group comprises a control core, a slave core cluster composed of 64 computing cores, heterogeneous group interfaces and a storage controller, and the whole many-core processor is provided with 260 computing cores; the control core has the functions of a conventional multi-core CPU such as calculation, communication, I/O and the like, and is also responsible for control operations such as loading and recovery of slave core tasks;
the sparse matrix storage format comprises the steps of:
s1, grouping the sparse matrix on the core group array of the Shenwei many-core processor according to rows: setting the number of rows of the sparse matrix as N, setting the number of computing cores of the core group array as 64, and setting the number of rows of each computing core as 64/N, and sequentially distributing;
s2, compressing and storing non-zero elements of the sparse matrix in the group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column offset Col _ p, Row coordinates Row _ i and non-zero element Value;
s21, recording each non-zero element in the matrix in the order of columns:
s211, recording the column offset Col _ p of the first non-zero element in each column, and supplementing the total number of the non-zero elements of the matrix at the last of the column offset;
s212, recording the Row coordinate Row _ i of each nonzero element;
and S213, recording the Value of each non-zero element.
The computing core is mainly responsible for fine-grained parallel computing tasks, the computing core can directly access the main memory discretely, also can access the main memory in batches in a DMA mode, and can carry out high-efficiency communication in a register communication mode in the many-core array.
The above-mentioned aspects of the invention are further explained as follows:
sparse matrix: english is sparse matrix, and in the matrix, if the number of elements with the numerical value of zero is far more than the number of non-zero elements and the distribution of the non-zero elements is not regular, the matrix is called sparse matrix.
Many cores: english is many core, and the processor integrates a plurality of computing cores and is oriented to the field of high-performance computing.
Aiming at the architecture of the Shenwey many-core processor, a Sparse matrix storage format CSGC is provided, namely a format of 'Sparse Group Column compression'. The sparse matrix is firstly grouped on a core group array of the Shenweiong core processor according to rows, a plurality of rows from each core are grouped into one group, and 64 secondary cores are grouped into 64 groups. Secondly, compressing and storing non-zero elements of the sparse matrix in the group according to columns. Compressing the column coordinates by saving the number of non-zero elements each column of the sparse matrix contains requires saving the column index (Col _ p), the Row coordinates (Row _ i) and the non-zero element Value (Value).
The sparse matrix shown in fig. 2 is distributed into 64 slave cores by rows, wherein the slave cores from 0 to 3 are respectively divided into three rows of sparse matrices, and the other slave cores are similar to the sparse matrices;
for slave core No. 0, CSGC:
Col_p= [0,1,2,4,4,5,6,6,7]
Row_i = [0,1,2,0,1,2,0]
Value= [3,4,6,5,-1,2,7]
for slave core No. 1, CSGC:
Col_p= [0,2,2,3,4,5,6,7,7]
Row_i = [0,1,2,0,1,2,0]
Value= [-2,4,5,6,7,8,15]
for slave core No. 2, CSGC:
Col_p= [0,0,1,1,2,3,4,5,6]
Row_i = [1,0,1,2,0,1]
Value= [7,-3,12,9,9,10]
a CSGC method is used for storing the sparse matrix, and sparse matrix vector multiplication is carried out on the Shenwei many-core processor, so that the write conflict problem and the discrete access main memory problem can be avoided. Sparse Matrix-Vector Multiplication (referred to as SpMv for short) is an example to introduce a many-core optimization algorithm. Sparse matrix vector multiplication is a very important core function in numerical calculation for calculating b = a ∅, where a is the sparse matrix, ∅ is the known vector, and b is the resultant vector. An example of a simple vector multiplication of a sparse matrix of size n x n is shown below.
Firstly, according to the CSGC sparse matrix storage method, sparse matrices are grouped on a core group array according to rows, a result vector b is correspondingly grouped on the core group array according to rows, all calculations related to b are sequentially executed in a slave core, and the problem of write conflict does not exist.
Secondly, the sparse matrix non-zero elements required by each computational core are compactly stored continuously according to a CSGC method and can be accessed in batches through DMA.
Finally, if the LDM does not put all ∅ vectors down during the calculation, multiple batch accesses ∅ are made by DMA. Therefore, the discrete access problem of the sparse matrix is solved.
In the data structure design of the whole application software, sparse matrix operation is involved, such as sparse matrix coefficient construction, sparse linear equation set solving, sparse matrix vector multiplication and the like, and a CSGC (in-group column compression) format is adopted, so that the sparse matrix calculation efficiency can be improved, and the conversion overhead of the matrix format can be reduced.
When the sparse matrix storage method facing the Schenwein kernel architecture is adopted, the sparse matrix calculation efficiency can be improved, and the cost of format conversion during multiple matrix operations in application software is reduced; the format has good adaptability of the Schmitt-Kernel framework to any sparse matrix type (asymmetric, non-centralized and irregular).
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.
Claims (2)
1. A sparse matrix storage method for a Schenwarren core architecture is characterized by comprising the following steps: the many-core processor is composed of 4 heterogeneous groups, each heterogeneous group comprises a control core, a slave core cluster composed of 64 computing cores, a heterogeneous group interface and a storage controller, and the whole many-core processor is provided with 260 computing cores;
the sparse matrix storage format comprises the steps of:
s1, grouping the sparse matrix on the core group array of the Shenwei many-core processor according to rows: setting the number of rows of the sparse matrix as N, setting the number of computing cores of the core group array as 64, and setting the number of rows of each computing core as 64/N, and sequentially distributing;
s2, compressing and storing non-zero elements of the sparse matrix in the group according to columns, compressing column coordinates by storing the number of the non-zero elements contained in each column of the sparse matrix, and storing column offset Col _ p, Row coordinates Row _ i and non-zero element Value;
s21, recording each non-zero element in the matrix in the order of columns:
s211, recording the column offset Col _ p of the first non-zero element in each column, and supplementing the total number of the non-zero elements of the matrix at the last of the column offset;
s212, recording the Row coordinate Row _ i of each nonzero element;
and S213, recording the Value of each non-zero element.
2. The sparse matrix storage method for the scheimpflug core architecture as claimed in claim 1, wherein: the computing core is mainly responsible for fine-grained parallel computing tasks, the computing core can directly access the main memory discretely, also can access the main memory in batches in a DMA mode, and can carry out high-efficiency communication in a register communication mode in the many-core array.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910898286.2A CN112540718A (en) | 2019-09-23 | 2019-09-23 | Sparse matrix storage method for Schenk core architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910898286.2A CN112540718A (en) | 2019-09-23 | 2019-09-23 | Sparse matrix storage method for Schenk core architecture |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112540718A true CN112540718A (en) | 2021-03-23 |
Family
ID=75013010
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910898286.2A Withdrawn CN112540718A (en) | 2019-09-23 | 2019-09-23 | Sparse matrix storage method for Schenk core architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112540718A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505766A (en) * | 2021-09-09 | 2021-10-15 | 北京智源人工智能研究院 | Image target detection method and device, electronic equipment and storage medium |
CN114385972A (en) * | 2021-12-20 | 2022-04-22 | 北京科技大学 | Parallel computing method for directly solving structured triangular sparse linear equation set |
WO2023097970A1 (en) * | 2021-12-01 | 2023-06-08 | 北京微电子技术研究所 | Many-core definable distributed shared storage structure |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150067009A1 (en) * | 2013-08-30 | 2015-03-05 | Microsoft Corporation | Sparse matrix data structure |
CN104636273A (en) * | 2015-02-28 | 2015-05-20 | 中国科学技术大学 | Storage method of sparse matrix on SIMD multi-core processor with multi-level cache |
CN106775594A (en) * | 2017-01-13 | 2017-05-31 | 中国科学院软件研究所 | A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method |
-
2019
- 2019-09-23 CN CN201910898286.2A patent/CN112540718A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150067009A1 (en) * | 2013-08-30 | 2015-03-05 | Microsoft Corporation | Sparse matrix data structure |
CN104636273A (en) * | 2015-02-28 | 2015-05-20 | 中国科学技术大学 | Storage method of sparse matrix on SIMD multi-core processor with multi-level cache |
CN106775594A (en) * | 2017-01-13 | 2017-05-31 | 中国科学院软件研究所 | A kind of Sparse Matrix-Vector based on the domestic processor of Shen prestige 26010 multiplies isomery many-core implementation method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113505766A (en) * | 2021-09-09 | 2021-10-15 | 北京智源人工智能研究院 | Image target detection method and device, electronic equipment and storage medium |
CN113505766B (en) * | 2021-09-09 | 2022-01-04 | 北京智源人工智能研究院 | Image target detection method and device, electronic equipment and storage medium |
WO2023097970A1 (en) * | 2021-12-01 | 2023-06-08 | 北京微电子技术研究所 | Many-core definable distributed shared storage structure |
CN114385972A (en) * | 2021-12-20 | 2022-04-22 | 北京科技大学 | Parallel computing method for directly solving structured triangular sparse linear equation set |
CN114385972B (en) * | 2021-12-20 | 2023-09-01 | 北京科技大学 | Parallel computing method for directly solving structured triangular sparse linear equation set |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108805266B (en) | Reconfigurable CNN high-concurrency convolution accelerator | |
CN112540718A (en) | Sparse matrix storage method for Schenk core architecture | |
CN104636273B (en) | A kind of sparse matrix storage method on SIMD many-core processors with Multi-Level Cache | |
WO2019205617A1 (en) | Calculation method and apparatus for matrix multiplication | |
CN102110079B (en) | Tuning calculation method of distributed conjugate gradient method based on MPI | |
US10417257B2 (en) | Non-blocking database table alteration | |
CN110516316B (en) | GPU acceleration method for solving Euler equation by interrupted Galerkin method | |
CN104361113A (en) | OLAP (On-Line Analytical Processing) query optimization method in memory and flesh memory hybrid storage mode | |
CN111368484B (en) | Cosmic N-body numerical simulation optimization method and system based on Shenwei architecture | |
CN111429974B (en) | Molecular dynamics simulation short-range force parallel optimization method on super computer platform | |
CN115048215A (en) | Method for realizing diagonal matrix SPMV (sparse matrix) on GPU (graphics processing Unit) based on mixed compression format | |
CN112560356A (en) | Sparse matrix vector multiply many-core optimization method for many-core architecture | |
CN103593304A (en) | Quantization method for efficiently using caches on basis of parallel device model | |
CN109271344B (en) | Data preprocessing method based on parallel file reading of Shenwei chip architecture | |
Zhang et al. | Evaluating low-memory GEMMs for convolutional neural network inference on FPGAS | |
CN113704691B (en) | Small-scale symmetric matrix parallel tri-diagonalization method of Shenwei many-core processor | |
CN115906684A (en) | Hydrodynamics multi-grid solver parallel optimization method for Shenwei architecture | |
CN105573834B (en) | A kind of higher-dimension vocabulary tree constructing method based on heterogeneous platform | |
CN110244933B (en) | Matrix transposition method based on CUDA | |
CN114969857A (en) | Structural design optimization method, system, computer equipment and storage medium | |
Zhang et al. | Yolov3-tiny Object Detection SoC Based on FPGA Platform | |
Brown et al. | Nemo-cnn: An efficient near-memory accelerator for convolutional neural networks | |
KR102578762B1 (en) | Neural processor | |
CN112559435A (en) | Many-core parallel optimization method based on large-scale three-dimensional matrix transposition of on-chip blocking | |
CN103150290A (en) | Novel numerical simulation method for three-dimensional incompressible pipe flow |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210323 |