CN112560356A - Sparse matrix vector multiply many-core optimization method for many-core architecture - Google Patents

Sparse matrix vector multiply many-core optimization method for many-core architecture Download PDF

Info

Publication number
CN112560356A
CN112560356A CN201910919675.9A CN201910919675A CN112560356A CN 112560356 A CN112560356 A CN 112560356A CN 201910919675 A CN201910919675 A CN 201910919675A CN 112560356 A CN112560356 A CN 112560356A
Authority
CN
China
Prior art keywords
vector
block
sparse matrix
small
many
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910919675.9A
Other languages
Chinese (zh)
Inventor
郭恒
陈鑫
刘鑫
陈德训
李芳�
徐金秀
孙唯哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910919675.9A priority Critical patent/CN112560356A/en
Publication of CN112560356A publication Critical patent/CN112560356A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a many-core architecture-oriented sparse matrix vector multiply many-core optimization method, which comprises the following steps of: s1, known: a sparse matrix A with m rows and n columns and a vector x with the length of n; solving a vector y with the length of m, wherein y = Ax is a dot product of the sparse matrix A and the vector x; s1, defining the size blk _ x _ size of the x vector block, and blocking the x vector elements according to the subscripts of the x vector elements to block the x vector; and S2, counting the number of the x vector block corresponding to the column number of each row non-zero element in the original sparse matrix, namely the sparse matrix A, according to the blocking information of the x vector, namely the number information of the x vector block where the x vector element obtained by solving in S1 is located, thereby counting the number information of the x vector block needed by each row of the sparse matrix during sparse matrix vector multiplication. The invention improves the integral many-core acceleration performance, improves the locality of data access, and has obvious optimization effect on the non-structural grid CFD application.

Description

Sparse matrix vector multiply many-core optimization method for many-core architecture
Technical Field
The invention belongs to the technical field of sparse matrix vector multiplication, and particularly relates to a sparse matrix vector multiplication many-core optimization method for many-core architecture.
Background
With the continuous deepening of the application research of the unstructured grid CFD and the rapid development of the supercomputer technology, the many-core acceleration of sparse matrix vector multiplication becomes one of the key points of the optimization research of the CFD application.
As the non-zero elements of the sparse matrix generated by the CFD application of the unstructured grid are loosely distributed and the coding span of different non-zero elements is large, one of the computational cores of the program, namely sparse matrix vector multiplication (Spmv), has a quite obvious discrete access problem, which also becomes a difficulty of the multi-core optimization of the CFD application of the unstructured grid.
With the increase of the grid scale, the problem of discrete access in sparse matrix vector multiplication becomes more obvious, great performance loss is brought to the performance of CFD application, and sometimes the problem even becomes the main bottleneck of the whole application. In order to solve the problem of sparse matrix vector multiply many-core optimization, aiming at the application characteristics of the non-structural grid CFD, the property of the sparse matrix applied by the non-structural grid CFD needs to be fully utilized, a more definite and more detailed many-core optimization algorithm is provided, and the overall performance of a program is improved.
For the many-core optimization problem of sparse matrix vector multiplication, the traditional method is to perform row-column partitioning on an original sparse matrix according to a proper size, and each acceleration core is responsible for calculating a part of matrix block data. The method can effectively reduce the problem of discrete access in sparse matrix vector multiplication, but the sparse matrix generated in the non-structural grid CFD application is often extremely sparse (each row of the sparse matrix is often only provided with a plurality of non-zero elements), and the distribution of the non-zero elements has a certain statistical rule. Because the matrix is extremely sparse, a very serious load balancing problem occurs in the traditional fixed-width row-column blocking method, and the data transmission bandwidth between the CPU and the many-core coprocessor cannot be effectively utilized, so that the Spmv many-core optimization effect for CFD application is poor.
Disclosure of Invention
The invention aims to provide a sparse matrix vector multiply many-core optimization method facing a many-core architecture, which greatly improves the locality of data access, effectively utilizes the data transmission bandwidth between a CPU and a many-core coprocessor and has obvious optimization effect on the application of a non-structural grid CFD.
In order to achieve the purpose, the invention adopts the technical scheme that: a sparse matrix vector multiply many-core optimization method for many-core architecture comprises the following steps:
s1, known: a sparse matrix A with m rows and n columns and a vector x with the length of n; solving a vector y with the length of m, wherein the y is the dot product of the sparse matrix A and the vector x;
s1, defining the size blk _ x _ size of the x vector block, and blocking the x vector elements according to the subscripts of the x vector elements to block the x vector;
s2, counting the number of an original sparse matrix, namely the number of an x vector block corresponding to the column number of each row of non-zero elements in the sparse matrix A according to the blocking information of the x vector, namely the number information of the x vector block where the x vector element obtained by solving in S1 is located, and thus counting the number information of the x vector block needed by each row of the sparse matrix when the sparse matrix is multiplied by the vector;
s3, combining all matrix rows with the same number of x vector blocks required in sparse matrix vector multiplication to form all small _ blocks;
s4, taking block as a performance evaluation function when Spmv, and combining part small _ block to form big _ block on the premise that the number of the dependent x vector blocks does not exceed a threshold value;
s5, establishing a sparse matrix A ' according to big _ block information, wherein the sparse matrix A comprises a plurality of lines in the sparse matrix A and x vector block information required in sparse matrix vector multiplication calculation, mapping the lines in the A to the A ' according to big _ block array sequence, and writing the lines in the A to the A ' according to big _ block array sequence;
s6, carrying out Spmv many-core acceleration according to matrix information after the block division of the preprocessing stages S1-S5, namely in a sparse matrix A', non-zero elements are arranged according to a big _ block array sequence, in a Spmv many-core acceleration algorithm, a computing unit is big _ block, in one core, non-zero element data in big _ block is imported, x vector block data required in big _ block is imported, and then a computing result is written back to a main memory;
the big _ block is a basic data unit processed by each computing core in the many cores.
The technical scheme of further improvement in the technical scheme is as follows:
1. in the above scheme, in step S1, the x-vector chunk size blk _ x _ size is 256 or 512, the number of x chunks into which the element at the lower index _ x position in the x-vector is divided is ind _ x/blk _ x _ size, and then ind _ x/blk _ x _ size is rounded down.
2. In the foregoing solution, the statistical manner in step S2 is as follows: assuming that the column number of a column in which a non-zero element in the sparse matrix A is located is c, the number of x vector blocks required by the non-zero element in the sparse matrix vector multiplication is c
Figure BDA0002217174640000021
Therefore, the number information of the x vector blocks corresponding to each row of non-zero elements in the matrix A can be calculated, and in one row of the matrix A, if the number of the x vector blocks calculated by a plurality of non-zero elements is the same, the number is only recorded once when the number of the x vector blocks required by the row in the matrix A is recorded.
3. In the foregoing scheme, in step S3, the small _ block is calculated in S2 to obtain x vector block number information required by each row of non-zero elements in the matrix a when performing sparse matrix vector multiplication, the small _ block is a small block, row numbers of multiple rows in the matrix a are recorded in one small block, and the x vector block number information required by the rows in a corresponding to the row numbers in one small block in a when performing sparse matrix vector multiplication is completely the same.
4. In the foregoing solution, the operation in step S4 is specifically:
s41, for a small _ block, assuming that it includes k rows in the sparse matrix a, where the k rows in a include a total number of nonzero elements sum, and the number of x vector blocks required by the rows in the small _ block when performing sparse matrix vector multiplication is n _ blk _ x, for the small _ block, defining a performance evaluation function value ═ sum/(n _ blk _ x _ size), setting a threshold value of an upper limit of the number of x vector blocks to 10, that is, if n _ blk _ x >10, then ue is 0;
s42, traversing the original small _ blocks, when traversing to the ith small _ block, calculating a value obtained by combining the small _ block with the rest small _ blocks which do not participate in combination, selecting the jth small _ block which can enable the value to reach the maximum value, and combining the ith small _ block with the jth small _ block to form a new small _ block;
and S43, traversing the small _ block array once to form a new small _ block array, then traversing again on the basis of the new small _ block array, and repeating for 3-10 times, so that the finally formed small _ block array is renamed to be a big block, namely a big _ block array.
5. In the above scheme, the merging of the two small _ blocks means: and merging the lines in the two small _ blocks into a new small _ block, and recalculating the x vector block information required by the lines in the new small _ block when the sparse matrix vector multiplication is carried out.
6. In the above scheme, the big _ block includes multiple rows of non-zero elements in the original sparse matrix, and the number of x vector blocks required in the Spmv calculation is set according to a threshold, and is at most 10.
7. In the above scheme, the threshold is set to 8, 10 or 12.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the sparse matrix vector multiplication and many-core optimization method oriented to the many-core architecture fully utilizes the sparse matrix structural characteristics of the CFD application of the non-structural grid, fully utilizes the data transmission bandwidth between a CPU and a many-core coprocessor by utilizing an irregular blocking mode, improves the integral many-core acceleration performance, improves the locality of data access, and has obvious optimization effect on the CFD application of the non-structural grid; the method has good applicability to non-structural mesh CFD application, and has good application prospect to other science applications with similar matrix properties generated by non-structural meshes.
Drawings
FIG. 1 is a schematic diagram of the irregular partition of the sparse matrix according to the present invention.
Detailed Description
The invention is further described below with reference to the following examples:
example (b): a sparse matrix vector multiply many-core optimization method for many-core architecture comprises the following steps:
s1, known: a sparse matrix A with m rows and n columns and a vector x with the length of n; solving a vector y with the length of m, wherein the y is the dot product of the sparse matrix A and the vector x;
s1, defining the size blk _ x _ size of the x vector block, and blocking the x vector elements according to the subscripts of the x vector elements to block the x vector;
s2, counting the number of an original sparse matrix, namely the number of an x vector block corresponding to the column number of each row of non-zero elements in the sparse matrix A according to the blocking information of the x vector, namely the number information of the x vector block where the x vector element obtained by solving in S1 is located, and thus counting the number information of the x vector block needed by each row of the sparse matrix when the sparse matrix is multiplied by the vector;
s3, combining all matrix rows with the same number of x vector blocks required in sparse matrix vector multiplication to form all small _ blocks;
s4, taking block as a performance evaluation function when Spmv, and combining part small _ block to form big _ block on the premise that the number of the dependent x vector blocks does not exceed a threshold value;
s5, establishing a sparse matrix A ' according to big _ block information, wherein the sparse matrix A comprises a plurality of lines in the sparse matrix A and x vector block information required in sparse matrix vector multiplication calculation, mapping the lines in the A to the A ' according to big _ block array sequence, and writing the lines in the A to the A ' according to big _ block array sequence;
s6, carrying out Spmv many-core acceleration according to matrix information after the block division of the preprocessing stages S1-S5, namely in a sparse matrix A', non-zero elements are arranged according to a big _ block array sequence, in a Spmv many-core acceleration algorithm, a computing unit is big _ block, in one core, non-zero element data in big _ block is imported, x vector block data required in big _ block is imported, and then a computing result is written back to a main memory;
the big _ block is a basic data unit processed by each computing core in the many cores.
The technical scheme of further improvement in the technical scheme is as follows:
the x-vector chunk size blk _ x _ size in step S1 above is 256 or 512, and the number of x chunks into which the elements at the lower index _ x position in the x-vector are divided is ind _ x/blk _ x _ size, and then ind _ x/blk _ x _ size is rounded down.
The statistical method in the step S2 is as follows: assuming that the column number of a column in which a non-zero element in the sparse matrix A is located is c, the number of x vector blocks required by the non-zero element during the sparse matrix vector multiplication is c
Figure BDA0002217174640000041
Therefore, the number information of the x vector blocks corresponding to each row of non-zero elements in the matrix A can be calculated, and in one row of the matrix A, if the number of the x vector blocks calculated by a plurality of non-zero elements is the same, the number is only recorded once when the number of the x vector blocks required by the row in the matrix A is recorded.
In the step S3, the small _ block is calculated in S2 to obtain the number information of x vector blocks required by each row of non-zero elements in the matrix a when performing sparse matrix vector multiplication, the small _ block is a small block, the row numbers of multiple rows in the matrix a are recorded in one small block, and the number information of x vector blocks required by the rows in a corresponding to the row numbers in one small block, when performing sparse matrix vector multiplication, in a small block, are completely the same.
The operation in step S4 is specifically:
s41, for a small _ block, assuming that it includes k rows in the sparse matrix a, where the k rows in a include a total number of nonzero elements sum, and the number of x vector blocks required by the rows in the small _ block when performing sparse matrix vector multiplication is n _ blk _ x, for the small _ block, defining a performance evaluation function value ═ sum/(n _ blk _ x _ size), setting a threshold value of an upper limit of the number of x vector blocks to 10, that is, if n _ blk _ x >10, then ue is 0;
s42, traversing the original small _ blocks, when traversing to the ith small _ block, calculating a value obtained by combining the small _ block with the rest small _ blocks which do not participate in combination, selecting the jth small _ block which can enable the value to reach the maximum value, and combining the ith small _ block with the jth small _ block to form a new small _ block;
and S43, traversing the small _ block array once to form a new small _ block array, then traversing again on the basis of the new small _ block array, and repeating for 3-10 times, so that the finally formed small _ block array is renamed to be a big block, namely a big _ block array.
The two small _ block combinations refer to: and merging the rows in the two small _ blocks into a new small _ block, and recalculating the x vector block information required by the rows in the new small _ block when the sparse matrix vector multiplication is carried out.
The big _ block comprises a plurality of lines of non-zero elements in the original sparse matrix, and the number of x vector blocks required in Spmv calculation is set according to a threshold value and is at most 10.
The threshold is set to 8, 10 or 12.
The above-mentioned aspects of the invention are further explained as follows:
at present, the solution of the large-scale sparse matrix linear equation set mainly adopts iteration methods such as PCG (pulse code generator) and the like. Spmv is a relatively time-consuming part of a single iteration of the PCG. In the PCG, the value of the sparse matrix is kept unchanged, so that the preprocessing part only needs to do once when the PCG solves a large-scale sparse matrix linear equation set, and the time can be ignored compared with the time consumed by iteration. In addition, for the mapping operation of the row elements of the original sparse matrix, the PCG is essentially to solve the equation set Ax ═ b, and simultaneously exchange the row elements at corresponding positions in a, x and b, and the equation is still true, that is, the mapping operation of the row of the sparse matrix only needs to be performed once in the preprocessing stage and the PCG iteration ending stage, and the loss caused by the mapping operation can be ignored compared with the whole iteration time.
After the optimization, the sparse matrix structure characteristic applied by the non-structural grid CFD can be fully utilized in the whole Spmv many-core acceleration process, the data transmission bandwidth between the CPU and the many-core coprocessor is fully utilized by utilizing an irregular blocking mode, and the whole many-core acceleration performance is improved. The experimental tests show that the many-core acceleration strategy provided by the invention can effectively improve the Spmv performance by 12-14 times compared with a main core within 10 ten thousand grids.
When the sparse matrix vector multiply-many-core optimization method facing the many-core architecture is adopted, the property of a sparse matrix in the application of the non-structural grid CFD is fully utilized, the data transmission bandwidth between a CPU and a many-core coprocessor is fully utilized in a sparse matrix row mapping and irregular blocking mode, and the Spmv many-core performance is improved.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
discrete memory access: the English language is discrete-time storage, and due to the data structure characteristics of the unstructured grid, stored data are discrete and irregular, so that the Cache hit rate in the calculation is low, and a CPU (central processing unit) often needs to frequently access a memory. In application programs, discrete access problems widely occur in problems such as flux calculation, large-scale linear equation set solution, sparse matrix vector multiplication and the like. This is a common phenomenon for scientific computing type applications.
Spmv: sparse Matrix-Vector Multiplication (Sparse Matrix-Vector Multiplication) is one of the common computational cores in the scientific computational problem, and the solved problem is in the form of y ═ Ax, wherein a and x are known Sparse moments and vectors respectively, y is a Vector to be solved, and the solving method is Matrix Vector Multiplication of the Sparse Matrix a and the known Vector x. Because the non-zero elements in the sparse matrix are irregularly distributed, the Spmv faces a more serious scattered memory access problem, and the point of reducing or avoiding the scattered memory access is the optimization of the Spmv.
Ax ═ b: a large scale sparse matrix linear system of equations. Knowing the sparse matrix a and the vector b, the x vector satisfying the condition is solved.
PCG: conjugate gradient method with preconditioner. The method is a common effective method for solving a large-scale sparse matrix equation set (Ax ═ b). As an iterative method, each step of iteration of the PCG has at least one Spmv operation, and the Spmv operation occupies a larger time proportion in a single iteration and is one of the key points of the optimization of the PCG.
CFD: and calculating fluid mechanics. The fluid flow problem is solved numerically.
The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (8)

1. A sparse matrix vector multiply many-core optimization method for many-core architecture is characterized by comprising the following steps: knowing a sparse matrix A with m rows and n columns and a vector x with the length of n; solving a vector y with the length of m, wherein the y is the dot product of the sparse matrix A and the vector x;
the sparse matrix vector multiply-many kernel optimization method comprises the following steps:
s1, defining the size blk _ x _ size of the x vector block, and blocking the x vector elements according to the subscripts of the x vector elements;
s2, counting an original sparse matrix, namely the number of an x vector block corresponding to the column number of each row of non-zero elements in the sparse matrix A according to the blocking information of the x vector, namely the number information of the x vector block where the x vector element is located obtained in S1, and thus counting the number information of the x vector block needed by each row of the sparse matrix when the sparse matrix vector is multiplied;
s3, combining all matrix rows with the same number of x vector blocks required in sparse matrix vector multiplication to form all small _ blocks;
s4, taking block as a performance evaluation function when Spmv, and combining part small _ block to form big _ block on the premise that the number of the dependent x vector blocks does not exceed a threshold value;
s5, establishing a sparse matrix A ' according to big _ block information, wherein the sparse matrix A comprises a plurality of lines in the sparse matrix A and x vector block information required in sparse matrix vector multiplication calculation, mapping the lines in the A to the A ' according to big _ block array sequence, and writing the lines in the A to the A ' according to big _ block array sequence;
s6, carrying out Spmv many-core acceleration according to matrix information after the block division of the preprocessing stages S1-S5, namely in a sparse matrix A', non-zero elements are arranged according to a big _ block array sequence, in a Spmv many-core acceleration algorithm, a computing unit is big _ block, in one core, non-zero element data in big _ block is imported, x vector block data required in big _ block is imported, and then a computing result is written back to a main memory;
the big _ block is a basic data unit processed by each computing core in the many cores.
2. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 1, wherein: in step S1, the x-vector chunk size blk _ x _ size is 256 or 512, the number of x chunks into which the element at the position of the lower index _ x in the x-vector is divided is ind _ x/blk _ x _ size, and then ind _ x/blk _ x _ size is rounded down.
3. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 1, wherein: the statistical method in step S2 is as follows: assuming that the column number of a column in which a non-zero element in the sparse matrix A is located is c, the number of x vector blocks required by the non-zero element during the sparse matrix vector multiplication is c
Figure FDA0002217174630000011
Therefore, the number information of the x vector blocks corresponding to each row of non-zero elements in the matrix A can be calculated, and in one row of the matrix A, if the number of the x vector blocks calculated by a plurality of non-zero elements is the same, the number is only recorded once when the number of the x vector blocks required by the row in the matrix A is recorded.
4. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 1, wherein: in the step S3, the small _ block is calculated in S2 to obtain the number information of x vector blocks required by each row of non-zero elements in the matrix a when sparse matrix vector multiplication is performed, the small _ block is a small block, the row numbers of multiple rows in the matrix a are recorded in one small block, and the number information of x vector blocks required by the rows in the matrix a corresponding to the row numbers in one small block, when sparse matrix vector multiplication is performed, are completely the same.
5. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 1, wherein: the specific operation in step S4 is:
s41, for a small _ block, assuming that it includes k rows in the sparse matrix a, where the total number of non-zero elements included in the k rows in a is sum, and the number of x vector blocks required by the rows in the small _ block when performing sparse matrix vector multiplication is n _ blk _ x, defining a performance evaluation function value/(n _ blk _ x _ size) for the small _ block, and setting a threshold of an upper limit of the number of x vector blocks to 10, that is, if n _ blk _ x >10, the value is 0;
s42, traversing the original small _ blocks, when traversing to the ith small _ block, calculating a value obtained by combining the small _ block with the rest small _ blocks which do not participate in combination, selecting the jth small _ block which can enable the value to reach the maximum value, and combining the ith small _ block with the jth small _ block to form a new small _ block;
and S43, traversing the small _ block array once to form a new small _ block array, then traversing again on the basis of the new small _ block array, and repeating for 3-10 times, so that the finally formed small _ block array is renamed to be a big block, namely a big _ block array.
6. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 5, wherein: the two small _ block combinations refer to: and merging the rows in the two small _ blocks into a new small _ block, and recalculating the x vector block information required by the rows in the new small _ block when the sparse matrix vector multiplication is carried out.
7. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 1, wherein: the big _ block comprises a plurality of lines of non-zero elements in the original sparse matrix, the number of x vector blocks required in Spmv calculation is set according to a threshold value, and the number is 10 at most.
8. The many-core architecture-oriented sparse matrix vector multiply many-core optimization method of claim 1, wherein: the threshold is set to 8, 10 or 12.
CN201910919675.9A 2019-09-26 2019-09-26 Sparse matrix vector multiply many-core optimization method for many-core architecture Withdrawn CN112560356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910919675.9A CN112560356A (en) 2019-09-26 2019-09-26 Sparse matrix vector multiply many-core optimization method for many-core architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910919675.9A CN112560356A (en) 2019-09-26 2019-09-26 Sparse matrix vector multiply many-core optimization method for many-core architecture

Publications (1)

Publication Number Publication Date
CN112560356A true CN112560356A (en) 2021-03-26

Family

ID=75030138

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910919675.9A Withdrawn CN112560356A (en) 2019-09-26 2019-09-26 Sparse matrix vector multiply many-core optimization method for many-core architecture

Country Status (1)

Country Link
CN (1) CN112560356A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114329315A (en) * 2022-01-04 2022-04-12 中国空气动力研究与发展中心计算空气动力研究所 Static aeroelastic rapid solving method based on dynamic modal decomposition technology
CN114925320B (en) * 2021-09-28 2023-10-20 华为技术有限公司 Data processing method and related device
WO2024007652A1 (en) * 2022-07-06 2024-01-11 芯和半导体科技(上海)股份有限公司 Accelerated solving method for large sparse matrix, system, and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114925320B (en) * 2021-09-28 2023-10-20 华为技术有限公司 Data processing method and related device
CN114329315A (en) * 2022-01-04 2022-04-12 中国空气动力研究与发展中心计算空气动力研究所 Static aeroelastic rapid solving method based on dynamic modal decomposition technology
CN114329315B (en) * 2022-01-04 2023-03-31 中国空气动力研究与发展中心计算空气动力研究所 Static aeroelastic rapid solving method based on dynamic modal decomposition technology
WO2024007652A1 (en) * 2022-07-06 2024-01-11 芯和半导体科技(上海)股份有限公司 Accelerated solving method for large sparse matrix, system, and storage medium

Similar Documents

Publication Publication Date Title
CN112560356A (en) Sparse matrix vector multiply many-core optimization method for many-core architecture
CN104915322B (en) A kind of hardware-accelerated method of convolutional neural networks
CN103336758B (en) The sparse matrix storage means of a kind of employing with the sparse row of compression of local information and the SpMV implementation method based on the method
CN108805272A (en) A kind of general convolutional neural networks accelerator based on FPGA
CN102110079B (en) Tuning calculation method of distributed conjugate gradient method based on MPI
Chen et al. A two-layered parallel static security assessment for large-scale grids based on GPU
CN107451097B (en) High-performance implementation method of multi-dimensional FFT on domestic Shenwei 26010 multi-core processor
CN116384312B (en) Circuit yield analysis method based on parallel heterogeneous computation
Yang et al. A Winograd-based CNN accelerator with a fine-grained regular sparsity pattern
CN102841881A (en) Multiple integral computing method based on many-core processor
CN106021188A (en) Parallel hardware architecture and parallel computing method for floating point matrix inversion
CN112434451A (en) Finite element analysis method based on block parallel computation
CN107256203A (en) The implementation method and device of a kind of matrix-vector multiplication
Li et al. An experimental study on deep learning based on different hardware configurations
CN116303219A (en) Grid file acquisition method and device and electronic equipment
CN106547722A (en) A kind of big data parallel optimization method
Razumchik et al. Some Ergodicity And Truncation Bounds For A Small Scale Markovian Supercomputer Model.
CN104657108B (en) A kind of management method and system of the event queue of the software simulator of microprocessor
Li et al. Memory saving method for enhanced convolution of deep neural network
CN106919536A (en) A kind of accelerated method and its accelerator for being applied to triangular matrix and matrix multiplication
CN114969857A (en) Structural design optimization method, system, computer equipment and storage medium
CN112446004A (en) Unstructured grid DILU preconditioned child-many-core parallel optimization algorithm
CN109992860A (en) Electro-magnetic transient parallel simulation method and system based on GPU
Bleile et al. Thin-Threads: An Approach for History-Based Monte Carlo on GPUs
CN110021059B (en) High-efficiency Marking Cubes isosurface extraction method and system without redundant computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210326