CN118193914A - LU decomposition method, device, equipment and storage medium for distributed platform - Google Patents
LU decomposition method, device, equipment and storage medium for distributed platform Download PDFInfo
- Publication number
- CN118193914A CN118193914A CN202410185225.2A CN202410185225A CN118193914A CN 118193914 A CN118193914 A CN 118193914A CN 202410185225 A CN202410185225 A CN 202410185225A CN 118193914 A CN118193914 A CN 118193914A
- Authority
- CN
- China
- Prior art keywords
- sparse
- matrix
- sub
- decomposition
- zero
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 147
- 238000000354 decomposition reaction Methods 0.000 title claims abstract description 100
- 239000011159 matrix material Substances 0.000 claims abstract description 171
- 230000008569 process Effects 0.000 claims abstract description 96
- 238000013507 mapping Methods 0.000 claims abstract description 27
- 238000004590 computer program Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 abstract description 30
- 238000004422 calculation algorithm Methods 0.000 abstract description 14
- 230000005540 biological transmission Effects 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 12
- 238000004891 communication Methods 0.000 description 8
- 238000013506 data mapping Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a distributed platform-oriented LU decomposition method, device, equipment and storage medium, comprising the following steps: acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format; and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition. According to the embodiment of the invention, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded to the distributed platform, the problem of data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a distributed platform-oriented LU decomposition method, apparatus, device, and storage medium.
Background
The purpose of the sparse matrix LU decomposition is to compute the solution vector X of the linear system ax=b, where a is a sparse matrix and B is a dense vector. There are two different solving methods for solving ax=b, one is a method of directly obtaining an accurate solution in a limited number of steps (direct method), and the other is a method of continuously obtaining an accurate solution through iteration (iterative method). The LU decomposition algorithm direct method can obtain more accurate solution, so that the LU decomposition algorithm direct method has wide application in a plurality of fields such as scientific calculation and simulation.
Because of the data dependency problem of LU decomposition, LU decomposition is typically deployed on machines sharing memory to perform main computation. However, because of the severe dependency coupling in LU decomposition, LU decomposition is deployed on machines sharing memory to perform computation, which may cause LU decomposition algorithm to fail to fully utilize large-scale cluster computing power.
Disclosure of Invention
The invention provides a distributed platform-oriented LU decomposition method, device, equipment and storage medium, which are used for solving the defect that the LU decomposition algorithm cannot fully utilize large-scale cluster computing power due to the severe dependency coupling relation in LU decomposition, which is calculated on a shared memory machine in the prior art, and realizing that the large-scale cluster computing power is called to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process, thereby fully utilizing sparsity to accelerate the calculation and keeping better expansibility.
The invention provides an LU decomposition method for a distributed platform, which comprises the following steps:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
According to the LU decomposition method for the distributed platform provided by the present invention, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform to perform sparse LU decomposition, including:
mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition;
each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
According to the LU decomposition method facing to the distributed platform provided by the present invention, the sub-matrix blocks allocated to the two-stage sparse storage format are stored, including:
Storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure;
and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
According to the LU decomposition method facing to the distributed platform provided by the present invention, the distributed non-zero submatrix block for calculation is stored in a first layer sparse structure, which includes:
And storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
According to the LU decomposition method for a distributed platform provided by the present invention, the storing the non-zero elements in the non-zero submatrix block in the second-layer sparse structure includes:
and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
According to the LU decomposition method for the distributed platform provided by the present invention, mapping the sub-matrix blocks belonging to each process to the corresponding process for sparse LU decomposition includes:
mapping the sub-matrix blocks belonging to each process into the corresponding process;
in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
According to the LU decomposition method facing to the distributed platform provided by the present invention, the cutting of the original sparse matrix includes:
cutting the original sparse matrix according to a preset block size.
The invention also provides an LU decomposition device facing the distributed platform, which comprises:
The storage module is used for acquiring an original sparse matrix, cutting the original sparse matrix and storing the original sparse matrix on the distributed platform by adopting a two-stage sparse storage format;
And the mapping module is used for mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to carry out sparse LU decomposition.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the LU decomposition method facing the distributed platform when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the LU decomposition method for a distributed platform as set forth in any one of the above.
According to the LU decomposition method, device, equipment and storage medium for the distributed platform, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform to carry out sparse LU decomposition, on one hand, the two-stage sparse storage format is adopted to store the sparse matrix, so that the sparse characteristic of the original sparse matrix can be reserved, on the other hand, the two-stage sparse storage format can be expanded to the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling a large-scale cluster calculation force to complete a sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized to carry out acceleration calculation and keep good expansibility.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an LU decomposition method facing to a distributed platform according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a two-level sparse storage format provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data mapping method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the relationship dependency of LU decomposition provided by an embodiment of the invention;
Fig. 5 is an expansibility comparison schematic diagram of a two-stage sparse storage format and a data mapping mode for a distributed platform according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an LU decomposition device facing a distributed platform according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
It is further intended that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The term "at least one" in the present invention means one or more, and "a plurality" means two or more. The terms "first," "second," "third," "fourth," and the like in this disclosure, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Fig. 1 is a flow chart of an LU decomposition method facing to a distributed platform according to an embodiment of the present invention. Referring to fig. 1, an embodiment of the present invention provides a LU decomposition method for a distributed platform, where the method specifically includes the following steps:
Step 101, an original sparse matrix is obtained, and the original sparse matrix is cut and stored on a distributed platform in a two-stage sparse storage format.
It should be noted that, the execution body of the LU decomposition method for a distributed platform provided by the embodiment of the present invention may be an electronic device, a component in the electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. Illustratively, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a wearable device, an Ultra-mobile Personal computer (Ultra-mobile Personal Computer, UMPC), a netbook or a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a Personal computer (Personal Computer, PC), a Television (Television, TV), a teller machine, a self-service machine, and the like, which is not particularly limited by the embodiments of the present invention.
The following description of the embodiment of the present invention uses a server as an execution body. In the embodiment of the invention, the original sparse matrix can be firstly obtained, the original sparse matrix is cut and stored on the distributed platform by adopting a two-stage sparse storage format, and then the sub-matrix blocks in LU decomposition are mapped to different processes in the distributed platform, so that the large-scale cluster computing power can be called by overall coordination of data transmission and calculation of each process to complete the sparse LU algorithm.
Wherein, the sparse matrix refers to a matrix with most elements of 0. In practical applications, the data of images, social networks, texts, etc. can be represented as sparse matrices.
Specifically, the original sparse matrix may be divided into a plurality of sub-matrix blocks according to a division rule, where each sub-matrix block may include partial data of the sparse matrix, and the sub-matrix blocks are stored on the distributed platform in a two-stage sparse storage format.
According to the embodiment of the invention, the two-stage sparse storage format is adopted for storage, so that the sparsity among the sparse sub-matrix blocks can be maintained while the sparsity among the sparse sub-matrix blocks is maintained, and the distributed storage and distributed computing performance is optimized.
Step 102, mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
Specifically, the sub-matrix blocks obtained by cutting can be distributed to different processes, so that each process has a data block to be processed. Each process can perform sparse LU decomposition calculation on the sub-matrix blocks allocated to itself in parallel. After each process completes the calculation, communication and integration can be performed, so that the LU decomposition of the whole sparse matrix can obtain a correct result.
In an alternative scheme, tasks with large task quantity can be migrated to a process with less load to balance the overall load, so that the expandability of the distributed sparse LU decomposition algorithm is improved.
According to the embodiment of the invention, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform for sparse LU decomposition, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded on the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
In an alternative embodiment, the mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform for sparse LU decomposition includes: mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition; each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
The CSC (Compressed Sparse Column) format may be a compressed format for representing a sparse matrix. In the embodiment of the invention, the original sparse matrix can be cut and then sent to different processors, the submatrix block is used as the minimum operation unit, and the small-block matrix is stored by using the CSC format. Wherein non-zero elements of the sub-matrix blocks are stored in columns and three arrays may be used to represent data in CSC format.
In distributed computing, a two-dimensional process network may be used to assign computing tasks to different processing units and to communicate and cooperate between the processing units. A two-dimensional process network may be used to represent a logically two-dimensional grid structure in which each grid node may represent a processing unit (e.g., a process or thread) and may communicate with its neighboring nodes.
In the embodiment of the invention, the sub-matrix blocks belonging to each process can be distributed to each process through a two-dimensional process network for sparse LU decomposition, and each process can adopt a two-stage sparse storage format to store the distributed sub-matrix blocks. At the data block level, data blocks may be compressed using CSC format based on the data blocks.
According to the embodiment of the invention, the sub-matrix blocks belonging to each process are mapped to the corresponding process for sparse LU decomposition, each process stores the allocated sub-matrix blocks by adopting a two-stage sparse storage format, so that the data dependency relationship of LU decomposition can be solved, the calculation of the large-scale cluster can be fully invoked, and therefore, the sparsity can be fully utilized for accelerating calculation and better expansibility can be maintained.
In an alternative embodiment, the storing the allocated sub-matrix blocks in a two-level sparse storage format includes: storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure; and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
In an embodiment of the present invention, each process may store the allocated non-zero sub-matrix blocks for computation in a first layer of sparse structure, and store the non-zero elements in these non-zero sub-matrix blocks in a second layer of sparse structure. Most elements in the sparse matrix are 0, only a few elements are non-zero elements, and only the non-zero elements can be stored by adopting a sparse storage format, so that the use amount of storage space is reduced, and the distributed computation efficiency of the sparse matrix is improved.
According to the embodiment of the invention, the first layer of sparse structure is adopted to store the non-zero sub-matrix blocks, and the second layer of sparse structure is adopted to store the non-zero elements in the non-zero sub-matrix blocks, so that the sparsity among the sparse sub-matrix blocks can be maintained, and the sparsity inside the sparse sub-matrix blocks can be maintained, thereby optimizing the distributed storage and the distributed computing performance.
In an alternative embodiment, the storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure includes: and storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
Fig. 2 is a schematic diagram of a two-level sparse storage format according to an embodiment of the present invention. Referring to fig. 2, consecutively numbered sub-matrix blocks may represent sub-matrix blocks of non-zero blocks, and empty blocks may be represented by spaces. In a two-dimensional process network, non-zero sub-matrix blocks may be allocated to four processes P0, P1, P2 and P3, i.e., sub-matrix blocks numbered 1,2, 6, 8, 13, 14 and 16 are allocated to process P0, sub-matrix blocks numbered 7 and 15 are allocated to process P1, sub-matrix blocks numbered 4, 9, 10 and 12 are allocated to process P2, and sub-matrix blocks numbered 3, 5 and 11 are allocated to process P3.
Wherein each process stores only important sub-matrix blocks for computation and uses the sub-matrix blocks' position information in the original sparse matrix to accelerate communications.
The first auxiliary array may be represented by blk_ ColumnPointer, the second auxiliary array may be represented by blk_ RowIndex, and the third auxiliary array may be represented by blk_value. In the embodiment of the invention, the first layer sparse storage structure can adopt three auxiliary arrays of blk_ ColumnPointer, blk _ RowIndex and blk_value to store the positions of the submatrix blocks. Where blk_ ColumnPointer may store the prefix of the sub-matrix block, blk_ RowIndex may store the row index of each non-zero sub-matrix block in each column, and blk_value may store each non-zero sub-matrix block pointer in each column.
According to the embodiment of the invention, the first layer of sparse structure is adopted to store the non-zero sub-matrix blocks for calculation, so that the sparsity among the sparse sub-matrix blocks can be maintained.
In an alternative embodiment, the storing the non-zero elements in the non-zero submatrix block in a second layer sparse structure includes: and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
With continued reference to FIG. 2, a fourth auxiliary array may be represented by ColumnPointer, a fifth auxiliary array may be represented by RowIndex, and a sixth auxiliary array may be represented by Value. In the embodiment of the invention, the second layer sparse storage structure can adopt three auxiliary arrays ColumnPointer, rowIndex and Value to store the positions of the submatrix blocks. Wherein ColumnPointer may store the prefix of the non-zero element in the non-zero sub-matrix block, rowIndex may store the row index of each non-zero element in each column, value may store each non-zero element pointer in each column.
In the second layer sparse structure, non-zero points within the sub-matrix blocks may be stored in compressed CSC format, and the stored result of the sub-matrix block numbered 6 is shown with reference to fig. 2.
According to the embodiment of the invention, the second layer of sparse structure is adopted to store the non-zero elements in the non-zero sub-matrix blocks, so that the sparsity among the sparse sub-matrix blocks can be maintained while the sparsity among the sparse sub-matrix blocks is maintained.
In an alternative embodiment, the mapping the sub-matrix blocks belonging to each process to the corresponding process for sparse LU decomposition includes: mapping the sub-matrix blocks belonging to each process into the corresponding process; in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
According to the embodiment of the invention, the sub-matrix blocks are stored by using the two-stage sparse storage format, and the sub-matrix blocks belonging to each process are mapped to the corresponding process according to the data mapping method, so that the two-stage sparse storage format can be expanded to the distributed platform based on the two-stage sparse storage format, the calculation force of the large-scale distributed cluster is fully mobilized to complete the sparse LU decomposition algorithm, and the sparsity can be fully utilized to accelerate calculation and keep good expansibility.
Fig. 3 is a schematic diagram of a data mapping manner according to an embodiment of the present invention. Referring to fig. 3, five time slices for LU value factorization under multiple processes are shown, mapping the sub-matrix blocks belonging to each process into the respective corresponding process. The sub-matrix blocks numbered 1,2, 6, 8, 13, 14 and 16 are allocated to the process P0, the sub-matrix blocks numbered 7 and 15 are allocated to the process P1, the sub-matrix blocks numbered 4, 9, 10 and 12 are allocated to the process P2, and the sub-matrix blocks numbered 3, 5 and 11 are allocated to the process P3.
In sparse LU decomposition, LU factors of the matrix can be represented as a lower triangular matrix L and an upper triangular matrix U, respectively. The calculation of the two factor matrices is obtained by performing a series of row transformations on the original sparse matrix, and in this process, certain dependency exists between the sub-matrix blocks.
Fig. 4 is a schematic diagram of relationship dependency of LU decomposition according to an embodiment of the present invention. Referring to fig. 4, in the embodiment of the present invention, in each process, according to the time slice sequence shown in fig. 3, sparse LU decomposition may be performed on the allocated sub-matrix blocks based on the dependency relationship between the sub-matrix blocks, so as to ensure the accuracy of calculation.
In an alternative embodiment, the cutting the original sparse matrix includes: cutting the original sparse matrix according to a preset block size.
In the embodiment of the invention, when the original sparse matrix is cut, the original sparse matrix can be cut according to the preset fixed block size, so that a plurality of sub-matrix blocks with the same block size are obtained. Specifically, the preset block size can be set to 4000 and the original sparse matrix is cut, so that the used processor can exert the computing capacity as much as possible, and meanwhile, higher expandability is ensured.
In practical applications, the preset block size may also be set to other values. It should be noted that, if the preset block size is set too large, the problem of too long communication time and influence on the decomposition efficiency is easily caused; if the preset block size is set too small, the utilization efficiency of the calculator is liable to be insufficient.
In the embodiment of the invention, the sparse LU decomposition method can be expanded to the distributed platform by a two-stage sparse storage format and a data mapping mode facing the distributed platform, thereby being beneficial to calling large-scale cluster computing power to complete the sparse LU decomposition algorithm.
Fig. 5 is an expansibility comparison schematic diagram of a two-stage sparse storage format and a data mapping mode for a distributed platform according to an embodiment of the present invention. Referring to fig. 5, 8 sparse matrices may be tested on a distributed platform having 32 nodes, each node having 4a 100 GPUs (Graphics Processing Unit, graphics processors) or 4 MI50 GPUs. The result shows that the method provided by the embodiment of the invention can fully utilize sparsity to accelerate calculation, can keep good expansibility, can expand LU decomposition into a distributed system of 32 nodes, and can keep good expansibility under 128 processes.
According to the embodiment of the invention, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform for sparse LU decomposition, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded on the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
The LU decomposition device for the distributed platform provided by the present invention is described below, and the LU decomposition device for the distributed platform described below and the LU decomposition method for the distributed platform described above may be referred to correspondingly.
Fig. 6 is a schematic structural diagram of an LU decomposition device facing a distributed platform according to an embodiment of the present invention. Referring to fig. 6, an embodiment of the present invention provides an LU decomposition device facing a distributed platform, where the device may specifically include the following modules:
The storage module 601 is configured to obtain an original sparse matrix, cut the original sparse matrix, and store the original sparse matrix on a distributed platform in a two-stage sparse storage format;
the mapping module 602 is configured to map the sub-matrix blocks obtained by cutting to different processes in the distributed platform for sparse LU decomposition.
In an alternative embodiment, the mapping module is specifically configured to:
mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition;
each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
In an alternative embodiment, the storage module is specifically configured to:
Storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure;
and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
In an alternative embodiment, the storage module is specifically configured to:
And storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
In an alternative embodiment, the storage module is specifically configured to:
and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
In an alternative embodiment, the mapping module is specifically configured to:
mapping the sub-matrix blocks belonging to each process into the corresponding process;
in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
In an alternative embodiment, the storage module is specifically configured to:
cutting the original sparse matrix according to a preset block size.
According to the embodiment of the invention, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform for sparse LU decomposition, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded on the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. The processor 710 may invoke logic instructions in the memory 730 to perform a distributed platform oriented LU decomposition method, the method comprising:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In yet another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the LU decomposition method for a distributed platform provided by the methods described above, the method comprising:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. The LU decomposition method for the distributed platform is characterized by comprising the following steps of:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
2. The method of claim 1, wherein mapping the cut sub-matrix blocks onto different processes in the distributed platform for sparse LU decomposition comprises:
mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition;
each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
3. The method of claim 2, wherein storing the allocated sub-matrix blocks in a two-level sparse storage format comprises:
Storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure;
and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
4. A method according to claim 3, wherein storing the allocated non-zero sub-matrix blocks for computation in a first layer sparse structure comprises:
And storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
5. A method according to claim 3, wherein said storing non-zero elements in said non-zero sub-matrix blocks in a second layer sparse structure comprises:
and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
6. The method of claim 2, wherein mapping the sub-matrix blocks belonging to each process to a corresponding process for sparse LU decomposition comprises:
mapping the sub-matrix blocks belonging to each process into the corresponding process;
in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
7. The method of claim 1, wherein the cutting the original sparse matrix comprises:
cutting the original sparse matrix according to a preset block size.
8. An LU decomposition device facing a distributed platform, comprising:
The storage module is used for acquiring an original sparse matrix, cutting the original sparse matrix and storing the original sparse matrix on the distributed platform by adopting a two-stage sparse storage format;
And the mapping module is used for mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to carry out sparse LU decomposition.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the distributed platform oriented LU decomposition method according to any one of claims 1-7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the LU decomposition method for a distributed platform according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410185225.2A CN118193914A (en) | 2024-02-19 | 2024-02-19 | LU decomposition method, device, equipment and storage medium for distributed platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410185225.2A CN118193914A (en) | 2024-02-19 | 2024-02-19 | LU decomposition method, device, equipment and storage medium for distributed platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118193914A true CN118193914A (en) | 2024-06-14 |
Family
ID=91412791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410185225.2A Pending CN118193914A (en) | 2024-02-19 | 2024-02-19 | LU decomposition method, device, equipment and storage medium for distributed platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118193914A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118410214A (en) * | 2024-07-04 | 2024-07-30 | 浪潮智慧科技有限公司 | Meteorological data processing method, equipment and medium based on sparse matrix |
-
2024
- 2024-02-19 CN CN202410185225.2A patent/CN118193914A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118410214A (en) * | 2024-07-04 | 2024-07-30 | 浪潮智慧科技有限公司 | Meteorological data processing method, equipment and medium based on sparse matrix |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108170639B (en) | Tensor CP decomposition implementation method based on distributed environment | |
US20190266217A1 (en) | Apparatus and method for matrix computation | |
US20230026006A1 (en) | Convolution computation engine, artificial intelligence chip, and data processing method | |
CN111898733B (en) | Deep separable convolutional neural network accelerator architecture | |
CN118193914A (en) | LU decomposition method, device, equipment and storage medium for distributed platform | |
US20170206089A1 (en) | Information processing apparatus and computational method | |
CN109145255B (en) | Heterogeneous parallel computing method for updating sparse matrix LU decomposition row | |
EP4227886A1 (en) | Matrix operation method and apparatus for image data, device, and storage medium | |
CN112668708B (en) | Convolution operation device for improving data utilization rate | |
CN109993293B (en) | Deep learning accelerator suitable for heap hourglass network | |
CN114035936A (en) | Multidimensional parallel processing method, system and equipment based on artificial intelligence and readable storage medium | |
WO2022040575A1 (en) | Tabular convolution and acceleration | |
WO2021036729A1 (en) | Matrix computation method, computation device, and processor | |
CN113918120A (en) | Computing device, neural network processing apparatus, chip, and method of processing data | |
CN114138231B (en) | Method, circuit and SOC for executing matrix multiplication operation | |
EP4095719A1 (en) | Sparse matrix multiplication in hardware | |
CN110688055B (en) | Data access method and system in large graph calculation | |
CN118210624A (en) | GPU parallel finite element method based on CUDA | |
CN117786299A (en) | Sparse matrix solving method, system, equipment and medium | |
US20210294608A1 (en) | Processing in memory methods for convolutional operations | |
CN115952391A (en) | Data processing method and device, electronic equipment and storage medium | |
CN114817845B (en) | Data processing method, device, electronic equipment and storage medium | |
CN105608056A (en) | Flink based large-scale matrix parallelization computing method | |
CN115481364A (en) | Parallel computing method for large-scale elliptic curve multi-scalar multiplication based on GPU (graphics processing Unit) acceleration | |
CN113900808A (en) | MPI parallel data structure based on arbitrary polyhedron unstructured grid |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |