CN118193914A - LU decomposition method, device, equipment and storage medium for distributed platform - Google Patents

LU decomposition method, device, equipment and storage medium for distributed platform Download PDF

Info

Publication number
CN118193914A
CN118193914A CN202410185225.2A CN202410185225A CN118193914A CN 118193914 A CN118193914 A CN 118193914A CN 202410185225 A CN202410185225 A CN 202410185225A CN 118193914 A CN118193914 A CN 118193914A
Authority
CN
China
Prior art keywords
sparse
matrix
sub
decomposition
zero
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410185225.2A
Other languages
Chinese (zh)
Inventor
刘伟峰
付旭
金洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum Beijing
Original Assignee
China University of Petroleum Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum Beijing filed Critical China University of Petroleum Beijing
Priority to CN202410185225.2A priority Critical patent/CN118193914A/en
Publication of CN118193914A publication Critical patent/CN118193914A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention provides a distributed platform-oriented LU decomposition method, device, equipment and storage medium, comprising the following steps: acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format; and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition. According to the embodiment of the invention, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded to the distributed platform, the problem of data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.

Description

LU decomposition method, device, equipment and storage medium for distributed platform
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a distributed platform-oriented LU decomposition method, apparatus, device, and storage medium.
Background
The purpose of the sparse matrix LU decomposition is to compute the solution vector X of the linear system ax=b, where a is a sparse matrix and B is a dense vector. There are two different solving methods for solving ax=b, one is a method of directly obtaining an accurate solution in a limited number of steps (direct method), and the other is a method of continuously obtaining an accurate solution through iteration (iterative method). The LU decomposition algorithm direct method can obtain more accurate solution, so that the LU decomposition algorithm direct method has wide application in a plurality of fields such as scientific calculation and simulation.
Because of the data dependency problem of LU decomposition, LU decomposition is typically deployed on machines sharing memory to perform main computation. However, because of the severe dependency coupling in LU decomposition, LU decomposition is deployed on machines sharing memory to perform computation, which may cause LU decomposition algorithm to fail to fully utilize large-scale cluster computing power.
Disclosure of Invention
The invention provides a distributed platform-oriented LU decomposition method, device, equipment and storage medium, which are used for solving the defect that the LU decomposition algorithm cannot fully utilize large-scale cluster computing power due to the severe dependency coupling relation in LU decomposition, which is calculated on a shared memory machine in the prior art, and realizing that the large-scale cluster computing power is called to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process, thereby fully utilizing sparsity to accelerate the calculation and keeping better expansibility.
The invention provides an LU decomposition method for a distributed platform, which comprises the following steps:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
According to the LU decomposition method for the distributed platform provided by the present invention, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform to perform sparse LU decomposition, including:
mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition;
each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
According to the LU decomposition method facing to the distributed platform provided by the present invention, the sub-matrix blocks allocated to the two-stage sparse storage format are stored, including:
Storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure;
and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
According to the LU decomposition method facing to the distributed platform provided by the present invention, the distributed non-zero submatrix block for calculation is stored in a first layer sparse structure, which includes:
And storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
According to the LU decomposition method for a distributed platform provided by the present invention, the storing the non-zero elements in the non-zero submatrix block in the second-layer sparse structure includes:
and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
According to the LU decomposition method for the distributed platform provided by the present invention, mapping the sub-matrix blocks belonging to each process to the corresponding process for sparse LU decomposition includes:
mapping the sub-matrix blocks belonging to each process into the corresponding process;
in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
According to the LU decomposition method facing to the distributed platform provided by the present invention, the cutting of the original sparse matrix includes:
cutting the original sparse matrix according to a preset block size.
The invention also provides an LU decomposition device facing the distributed platform, which comprises:
The storage module is used for acquiring an original sparse matrix, cutting the original sparse matrix and storing the original sparse matrix on the distributed platform by adopting a two-stage sparse storage format;
And the mapping module is used for mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to carry out sparse LU decomposition.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the LU decomposition method facing the distributed platform when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the LU decomposition method for a distributed platform as set forth in any one of the above.
According to the LU decomposition method, device, equipment and storage medium for the distributed platform, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform to carry out sparse LU decomposition, on one hand, the two-stage sparse storage format is adopted to store the sparse matrix, so that the sparse characteristic of the original sparse matrix can be reserved, on the other hand, the two-stage sparse storage format can be expanded to the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling a large-scale cluster calculation force to complete a sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized to carry out acceleration calculation and keep good expansibility.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an LU decomposition method facing to a distributed platform according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a two-level sparse storage format provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data mapping method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the relationship dependency of LU decomposition provided by an embodiment of the invention;
Fig. 5 is an expansibility comparison schematic diagram of a two-stage sparse storage format and a data mapping mode for a distributed platform according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an LU decomposition device facing a distributed platform according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
It is further intended that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The term "at least one" in the present invention means one or more, and "a plurality" means two or more. The terms "first," "second," "third," "fourth," and the like in this disclosure, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
Fig. 1 is a flow chart of an LU decomposition method facing to a distributed platform according to an embodiment of the present invention. Referring to fig. 1, an embodiment of the present invention provides a LU decomposition method for a distributed platform, where the method specifically includes the following steps:
Step 101, an original sparse matrix is obtained, and the original sparse matrix is cut and stored on a distributed platform in a two-stage sparse storage format.
It should be noted that, the execution body of the LU decomposition method for a distributed platform provided by the embodiment of the present invention may be an electronic device, a component in the electronic device, an integrated circuit, or a chip. The electronic device may be a mobile electronic device or a non-mobile electronic device. Illustratively, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a wearable device, an Ultra-mobile Personal computer (Ultra-mobile Personal Computer, UMPC), a netbook or a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), and the non-mobile electronic device may be a server, a network attached storage (Network Attached Storage, NAS), a Personal computer (Personal Computer, PC), a Television (Television, TV), a teller machine, a self-service machine, and the like, which is not particularly limited by the embodiments of the present invention.
The following description of the embodiment of the present invention uses a server as an execution body. In the embodiment of the invention, the original sparse matrix can be firstly obtained, the original sparse matrix is cut and stored on the distributed platform by adopting a two-stage sparse storage format, and then the sub-matrix blocks in LU decomposition are mapped to different processes in the distributed platform, so that the large-scale cluster computing power can be called by overall coordination of data transmission and calculation of each process to complete the sparse LU algorithm.
Wherein, the sparse matrix refers to a matrix with most elements of 0. In practical applications, the data of images, social networks, texts, etc. can be represented as sparse matrices.
Specifically, the original sparse matrix may be divided into a plurality of sub-matrix blocks according to a division rule, where each sub-matrix block may include partial data of the sparse matrix, and the sub-matrix blocks are stored on the distributed platform in a two-stage sparse storage format.
According to the embodiment of the invention, the two-stage sparse storage format is adopted for storage, so that the sparsity among the sparse sub-matrix blocks can be maintained while the sparsity among the sparse sub-matrix blocks is maintained, and the distributed storage and distributed computing performance is optimized.
Step 102, mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
Specifically, the sub-matrix blocks obtained by cutting can be distributed to different processes, so that each process has a data block to be processed. Each process can perform sparse LU decomposition calculation on the sub-matrix blocks allocated to itself in parallel. After each process completes the calculation, communication and integration can be performed, so that the LU decomposition of the whole sparse matrix can obtain a correct result.
In an alternative scheme, tasks with large task quantity can be migrated to a process with less load to balance the overall load, so that the expandability of the distributed sparse LU decomposition algorithm is improved.
According to the embodiment of the invention, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform for sparse LU decomposition, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded on the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
In an alternative embodiment, the mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform for sparse LU decomposition includes: mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition; each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
The CSC (Compressed Sparse Column) format may be a compressed format for representing a sparse matrix. In the embodiment of the invention, the original sparse matrix can be cut and then sent to different processors, the submatrix block is used as the minimum operation unit, and the small-block matrix is stored by using the CSC format. Wherein non-zero elements of the sub-matrix blocks are stored in columns and three arrays may be used to represent data in CSC format.
In distributed computing, a two-dimensional process network may be used to assign computing tasks to different processing units and to communicate and cooperate between the processing units. A two-dimensional process network may be used to represent a logically two-dimensional grid structure in which each grid node may represent a processing unit (e.g., a process or thread) and may communicate with its neighboring nodes.
In the embodiment of the invention, the sub-matrix blocks belonging to each process can be distributed to each process through a two-dimensional process network for sparse LU decomposition, and each process can adopt a two-stage sparse storage format to store the distributed sub-matrix blocks. At the data block level, data blocks may be compressed using CSC format based on the data blocks.
According to the embodiment of the invention, the sub-matrix blocks belonging to each process are mapped to the corresponding process for sparse LU decomposition, each process stores the allocated sub-matrix blocks by adopting a two-stage sparse storage format, so that the data dependency relationship of LU decomposition can be solved, the calculation of the large-scale cluster can be fully invoked, and therefore, the sparsity can be fully utilized for accelerating calculation and better expansibility can be maintained.
In an alternative embodiment, the storing the allocated sub-matrix blocks in a two-level sparse storage format includes: storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure; and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
In an embodiment of the present invention, each process may store the allocated non-zero sub-matrix blocks for computation in a first layer of sparse structure, and store the non-zero elements in these non-zero sub-matrix blocks in a second layer of sparse structure. Most elements in the sparse matrix are 0, only a few elements are non-zero elements, and only the non-zero elements can be stored by adopting a sparse storage format, so that the use amount of storage space is reduced, and the distributed computation efficiency of the sparse matrix is improved.
According to the embodiment of the invention, the first layer of sparse structure is adopted to store the non-zero sub-matrix blocks, and the second layer of sparse structure is adopted to store the non-zero elements in the non-zero sub-matrix blocks, so that the sparsity among the sparse sub-matrix blocks can be maintained, and the sparsity inside the sparse sub-matrix blocks can be maintained, thereby optimizing the distributed storage and the distributed computing performance.
In an alternative embodiment, the storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure includes: and storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
Fig. 2 is a schematic diagram of a two-level sparse storage format according to an embodiment of the present invention. Referring to fig. 2, consecutively numbered sub-matrix blocks may represent sub-matrix blocks of non-zero blocks, and empty blocks may be represented by spaces. In a two-dimensional process network, non-zero sub-matrix blocks may be allocated to four processes P0, P1, P2 and P3, i.e., sub-matrix blocks numbered 1,2, 6, 8, 13, 14 and 16 are allocated to process P0, sub-matrix blocks numbered 7 and 15 are allocated to process P1, sub-matrix blocks numbered 4, 9, 10 and 12 are allocated to process P2, and sub-matrix blocks numbered 3, 5 and 11 are allocated to process P3.
Wherein each process stores only important sub-matrix blocks for computation and uses the sub-matrix blocks' position information in the original sparse matrix to accelerate communications.
The first auxiliary array may be represented by blk_ ColumnPointer, the second auxiliary array may be represented by blk_ RowIndex, and the third auxiliary array may be represented by blk_value. In the embodiment of the invention, the first layer sparse storage structure can adopt three auxiliary arrays of blk_ ColumnPointer, blk _ RowIndex and blk_value to store the positions of the submatrix blocks. Where blk_ ColumnPointer may store the prefix of the sub-matrix block, blk_ RowIndex may store the row index of each non-zero sub-matrix block in each column, and blk_value may store each non-zero sub-matrix block pointer in each column.
According to the embodiment of the invention, the first layer of sparse structure is adopted to store the non-zero sub-matrix blocks for calculation, so that the sparsity among the sparse sub-matrix blocks can be maintained.
In an alternative embodiment, the storing the non-zero elements in the non-zero submatrix block in a second layer sparse structure includes: and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
With continued reference to FIG. 2, a fourth auxiliary array may be represented by ColumnPointer, a fifth auxiliary array may be represented by RowIndex, and a sixth auxiliary array may be represented by Value. In the embodiment of the invention, the second layer sparse storage structure can adopt three auxiliary arrays ColumnPointer, rowIndex and Value to store the positions of the submatrix blocks. Wherein ColumnPointer may store the prefix of the non-zero element in the non-zero sub-matrix block, rowIndex may store the row index of each non-zero element in each column, value may store each non-zero element pointer in each column.
In the second layer sparse structure, non-zero points within the sub-matrix blocks may be stored in compressed CSC format, and the stored result of the sub-matrix block numbered 6 is shown with reference to fig. 2.
According to the embodiment of the invention, the second layer of sparse structure is adopted to store the non-zero elements in the non-zero sub-matrix blocks, so that the sparsity among the sparse sub-matrix blocks can be maintained while the sparsity among the sparse sub-matrix blocks is maintained.
In an alternative embodiment, the mapping the sub-matrix blocks belonging to each process to the corresponding process for sparse LU decomposition includes: mapping the sub-matrix blocks belonging to each process into the corresponding process; in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
According to the embodiment of the invention, the sub-matrix blocks are stored by using the two-stage sparse storage format, and the sub-matrix blocks belonging to each process are mapped to the corresponding process according to the data mapping method, so that the two-stage sparse storage format can be expanded to the distributed platform based on the two-stage sparse storage format, the calculation force of the large-scale distributed cluster is fully mobilized to complete the sparse LU decomposition algorithm, and the sparsity can be fully utilized to accelerate calculation and keep good expansibility.
Fig. 3 is a schematic diagram of a data mapping manner according to an embodiment of the present invention. Referring to fig. 3, five time slices for LU value factorization under multiple processes are shown, mapping the sub-matrix blocks belonging to each process into the respective corresponding process. The sub-matrix blocks numbered 1,2, 6, 8, 13, 14 and 16 are allocated to the process P0, the sub-matrix blocks numbered 7 and 15 are allocated to the process P1, the sub-matrix blocks numbered 4, 9, 10 and 12 are allocated to the process P2, and the sub-matrix blocks numbered 3, 5 and 11 are allocated to the process P3.
In sparse LU decomposition, LU factors of the matrix can be represented as a lower triangular matrix L and an upper triangular matrix U, respectively. The calculation of the two factor matrices is obtained by performing a series of row transformations on the original sparse matrix, and in this process, certain dependency exists between the sub-matrix blocks.
Fig. 4 is a schematic diagram of relationship dependency of LU decomposition according to an embodiment of the present invention. Referring to fig. 4, in the embodiment of the present invention, in each process, according to the time slice sequence shown in fig. 3, sparse LU decomposition may be performed on the allocated sub-matrix blocks based on the dependency relationship between the sub-matrix blocks, so as to ensure the accuracy of calculation.
In an alternative embodiment, the cutting the original sparse matrix includes: cutting the original sparse matrix according to a preset block size.
In the embodiment of the invention, when the original sparse matrix is cut, the original sparse matrix can be cut according to the preset fixed block size, so that a plurality of sub-matrix blocks with the same block size are obtained. Specifically, the preset block size can be set to 4000 and the original sparse matrix is cut, so that the used processor can exert the computing capacity as much as possible, and meanwhile, higher expandability is ensured.
In practical applications, the preset block size may also be set to other values. It should be noted that, if the preset block size is set too large, the problem of too long communication time and influence on the decomposition efficiency is easily caused; if the preset block size is set too small, the utilization efficiency of the calculator is liable to be insufficient.
In the embodiment of the invention, the sparse LU decomposition method can be expanded to the distributed platform by a two-stage sparse storage format and a data mapping mode facing the distributed platform, thereby being beneficial to calling large-scale cluster computing power to complete the sparse LU decomposition algorithm.
Fig. 5 is an expansibility comparison schematic diagram of a two-stage sparse storage format and a data mapping mode for a distributed platform according to an embodiment of the present invention. Referring to fig. 5, 8 sparse matrices may be tested on a distributed platform having 32 nodes, each node having 4a 100 GPUs (Graphics Processing Unit, graphics processors) or 4 MI50 GPUs. The result shows that the method provided by the embodiment of the invention can fully utilize sparsity to accelerate calculation, can keep good expansibility, can expand LU decomposition into a distributed system of 32 nodes, and can keep good expansibility under 128 processes.
According to the embodiment of the invention, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform for sparse LU decomposition, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded on the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
The LU decomposition device for the distributed platform provided by the present invention is described below, and the LU decomposition device for the distributed platform described below and the LU decomposition method for the distributed platform described above may be referred to correspondingly.
Fig. 6 is a schematic structural diagram of an LU decomposition device facing a distributed platform according to an embodiment of the present invention. Referring to fig. 6, an embodiment of the present invention provides an LU decomposition device facing a distributed platform, where the device may specifically include the following modules:
The storage module 601 is configured to obtain an original sparse matrix, cut the original sparse matrix, and store the original sparse matrix on a distributed platform in a two-stage sparse storage format;
the mapping module 602 is configured to map the sub-matrix blocks obtained by cutting to different processes in the distributed platform for sparse LU decomposition.
In an alternative embodiment, the mapping module is specifically configured to:
mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition;
each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
In an alternative embodiment, the storage module is specifically configured to:
Storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure;
and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
In an alternative embodiment, the storage module is specifically configured to:
And storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
In an alternative embodiment, the storage module is specifically configured to:
and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
In an alternative embodiment, the mapping module is specifically configured to:
mapping the sub-matrix blocks belonging to each process into the corresponding process;
in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
In an alternative embodiment, the storage module is specifically configured to:
cutting the original sparse matrix according to a preset block size.
According to the embodiment of the invention, the original sparse matrix is obtained, the original sparse matrix is cut and stored on the distributed platform by adopting the two-stage sparse storage format, the sub-matrix blocks obtained by cutting are mapped to different processes in the distributed platform for sparse LU decomposition, on one hand, the sparse characteristic of the original sparse matrix can be reserved by adopting the two-stage sparse storage format for storage, on the other hand, the two-stage sparse storage format can be expanded on the distributed platform, the problem of the data dependency relationship of LU decomposition is solved, and the realization of calling large-scale cluster calculation force to complete the sparse LU decomposition algorithm by comprehensively coordinating the data transmission and calculation of each process is realized, so that the sparsity can be fully utilized for accelerating calculation and better expansibility is kept.
Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. The processor 710 may invoke logic instructions in the memory 730 to perform a distributed platform oriented LU decomposition method, the method comprising:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In yet another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the LU decomposition method for a distributed platform provided by the methods described above, the method comprising:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. The LU decomposition method for the distributed platform is characterized by comprising the following steps of:
acquiring an original sparse matrix, cutting the original sparse matrix, and storing the original sparse matrix on a distributed platform by adopting a two-stage sparse storage format;
and mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to perform sparse LU decomposition.
2. The method of claim 1, wherein mapping the cut sub-matrix blocks onto different processes in the distributed platform for sparse LU decomposition comprises:
mapping the sub-matrix blocks belonging to each process to the corresponding process through a two-dimensional process network to perform sparse LU decomposition;
each process stores the allocated sub-matrix blocks in a two-stage sparse storage format.
3. The method of claim 2, wherein storing the allocated sub-matrix blocks in a two-level sparse storage format comprises:
Storing the allocated non-zero submatrix blocks for computation in a first layer of sparse structure;
and storing non-zero elements in the non-zero submatrix block in a second-layer sparse structure.
4. A method according to claim 3, wherein storing the allocated non-zero sub-matrix blocks for computation in a first layer sparse structure comprises:
And storing the prefix of the non-zero submatrix block by using a first auxiliary array, storing the row index of each non-zero submatrix block in each column by using a second auxiliary array, and storing the pointer of each non-zero submatrix block in each column by using a third auxiliary array so as to store the position of the non-zero submatrix block.
5. A method according to claim 3, wherein said storing non-zero elements in said non-zero sub-matrix blocks in a second layer sparse structure comprises:
and storing the prefix of the non-zero element in the non-zero sub-matrix block by adopting a fourth auxiliary array, storing the row index of each non-zero element in each column by adopting a fifth auxiliary array, and storing each non-zero element pointer in each column by adopting a sixth auxiliary array so as to store the position of the non-zero element in the non-zero sub-matrix block.
6. The method of claim 2, wherein mapping the sub-matrix blocks belonging to each process to a corresponding process for sparse LU decomposition comprises:
mapping the sub-matrix blocks belonging to each process into the corresponding process;
in each process, sparse LU decomposition is performed on the allocated sub-matrix blocks based on the dependency relationship between the respective sub-matrix blocks in the sparse LU decomposition.
7. The method of claim 1, wherein the cutting the original sparse matrix comprises:
cutting the original sparse matrix according to a preset block size.
8. An LU decomposition device facing a distributed platform, comprising:
The storage module is used for acquiring an original sparse matrix, cutting the original sparse matrix and storing the original sparse matrix on the distributed platform by adopting a two-stage sparse storage format;
And the mapping module is used for mapping the sub-matrix blocks obtained by cutting to different processes in the distributed platform to carry out sparse LU decomposition.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the distributed platform oriented LU decomposition method according to any one of claims 1-7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the LU decomposition method for a distributed platform according to any one of claims 1 to 7.
CN202410185225.2A 2024-02-19 2024-02-19 LU decomposition method, device, equipment and storage medium for distributed platform Pending CN118193914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410185225.2A CN118193914A (en) 2024-02-19 2024-02-19 LU decomposition method, device, equipment and storage medium for distributed platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410185225.2A CN118193914A (en) 2024-02-19 2024-02-19 LU decomposition method, device, equipment and storage medium for distributed platform

Publications (1)

Publication Number Publication Date
CN118193914A true CN118193914A (en) 2024-06-14

Family

ID=91412791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410185225.2A Pending CN118193914A (en) 2024-02-19 2024-02-19 LU decomposition method, device, equipment and storage medium for distributed platform

Country Status (1)

Country Link
CN (1) CN118193914A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118410214A (en) * 2024-07-04 2024-07-30 浪潮智慧科技有限公司 Meteorological data processing method, equipment and medium based on sparse matrix

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118410214A (en) * 2024-07-04 2024-07-30 浪潮智慧科技有限公司 Meteorological data processing method, equipment and medium based on sparse matrix

Similar Documents

Publication Publication Date Title
CN108170639B (en) Tensor CP decomposition implementation method based on distributed environment
US20190266217A1 (en) Apparatus and method for matrix computation
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN118193914A (en) LU decomposition method, device, equipment and storage medium for distributed platform
US20170206089A1 (en) Information processing apparatus and computational method
CN109145255B (en) Heterogeneous parallel computing method for updating sparse matrix LU decomposition row
EP4227886A1 (en) Matrix operation method and apparatus for image data, device, and storage medium
CN112668708B (en) Convolution operation device for improving data utilization rate
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
CN114035936A (en) Multidimensional parallel processing method, system and equipment based on artificial intelligence and readable storage medium
WO2022040575A1 (en) Tabular convolution and acceleration
WO2021036729A1 (en) Matrix computation method, computation device, and processor
CN113918120A (en) Computing device, neural network processing apparatus, chip, and method of processing data
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
EP4095719A1 (en) Sparse matrix multiplication in hardware
CN110688055B (en) Data access method and system in large graph calculation
CN118210624A (en) GPU parallel finite element method based on CUDA
CN117786299A (en) Sparse matrix solving method, system, equipment and medium
US20210294608A1 (en) Processing in memory methods for convolutional operations
CN115952391A (en) Data processing method and device, electronic equipment and storage medium
CN114817845B (en) Data processing method, device, electronic equipment and storage medium
CN105608056A (en) Flink based large-scale matrix parallelization computing method
CN115481364A (en) Parallel computing method for large-scale elliptic curve multi-scalar multiplication based on GPU (graphics processing Unit) acceleration
CN113900808A (en) MPI parallel data structure based on arbitrary polyhedron unstructured grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination