CN105045768A - Method and system for achieving GMRES algorithm - Google Patents

Method and system for achieving GMRES algorithm Download PDF

Info

Publication number
CN105045768A
CN105045768A CN201510551642.5A CN201510551642A CN105045768A CN 105045768 A CN105045768 A CN 105045768A CN 201510551642 A CN201510551642 A CN 201510551642A CN 105045768 A CN105045768 A CN 105045768A
Authority
CN
China
Prior art keywords
matrix
cpu
block
coefficients
hand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510551642.5A
Other languages
Chinese (zh)
Inventor
王明清
张清
张广勇
吴韶华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201510551642.5A priority Critical patent/CN105045768A/en
Publication of CN105045768A publication Critical patent/CN105045768A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a method and system for achieving a GMRES algorithm. The method comprises the steps that a main CPU controls a main process to partition a coefficient matrix and a right-end vector of a linear system of equations, and thus coefficient matrix blocks and right-end vector blocks are obtained; the main CPU controls the main process to allocate the coefficient matrix blocks obtained through partition and the right-end vector blocks obtained through partition to all processes of the main CPU and all processes of other CPUs; all the CPUs control MICs corresponding to all the processes in a coordination mode, and a solution of the linear system of equations is solved through the GMRES algorithm according to the allocated coefficient matrix blocks and the allocated right-end vector blocks. By means of the scheme, the GMRES algorithm is achieved in parallel through the method of using a CPU cluster to control the multiple MICs, and the operation speed is improved.

Description

A kind of method and system realizing GMRES algorithm
Technical field
The present invention relates to data processing technique, espespecially one realizes the method and system of broad sense minimum residual methods (GMRES, GeneralizedMinimalResidual) algorithm.
Background technology
As everyone knows, solving of mathematics physics model is one of requisite work in numerous engineering production and scientific research field.Along with the development of computing machine, finite difference (FD, FiniteDifference), Finite Element Method (FEM, FiniteElementMethod), boundary element method (BEM, BoundaryElementMethod), a series of numerical computation method such as non-mesh method (MM, MeshlessMethod) is born in succession.Particularly FEM, up to the present theoretical system is fairly perfect, and widespread use in the every field such as machine-building, materials processing, scientific research, become the important tool in engineering design.These numerical computation methods have a something in common: the mathematics physics model that practical problems derives is separated into a linear algebraic equation systems by specific mode.In addition, structure analysis, network analysis, world measurement, data and optimization problem etc. all often run into the Solve problems of system of linear equations.Therefore, that can not exaggerate says that most of scientific and engineering problem all will be summed up as the Solve problems of a system of linear equations.But along with the increase of problem scale, the solving of system of linear equations becomes engineering and produces and a large bottleneck in scientific research.For extensive and even ultra-large system of linear equations, solve and become very difficult.Especially when sparse matrix is asymmetric and without ad hoc structure, solve and may complete hardly.
GMRES is the common method solving asymmetrical sparse matrix, also be one of classic algorithm in krylov (Krylov) subspace, it carrys out iterative, fast convergence rate, the advantage of good stability by the minimum residual of Krylov subspace vector.To system of linear equations Ax=b; Wherein, A is matrix of coefficients, and b is right-hand-side vector, and the m rank Krylov subspace of GMRES algorithm is K m=span (r 0, Ar 0..., A m-1r 0), wherein, r 0=b-Ax 0, x 0for the initial vector of x, GMRES makes residual Ax by solving mthe vector x that-b is minimum m∈ K mapproach the exact solution of Ax=b.
The basic process of the extensive system of linear equations of GMRES Algorithm for Solving is adopted roughly to comprise:
Initialization vector x obtains x 0, step-up error ε;
According to formula r 0=b-Ax 0calculate r 0, according to formula β=|| r 0|| 2calculate β; Wherein, || || 2be 2 norms.
According to r 0arnoldi (Arnoldi) orthogonalization procedure is adopted to calculate orthonormal basis with A with upper Heisenberg (Hessenberg) matrix
Solve y mmake reach minimum; Wherein, e 1=(1,0 ..., 0);
According to formula x m=x' 0+ V my mcalculate x m; Wherein, x' 0for the x that the last time calculates m, when first time calculates, get x 0, according to formula r m=b-Ax mcalculate r m, when || r m|| 2during < ε, xm is the solution of Ax=b, when || r m|| 2during>=ε, according to r madopt Arnoldi orthogonalization procedure to obtain new orthonormal basis and new upper Hessenberg matrix with A, and continue subsequent process.
The method of the existing GMRES of realization algorithm roughly comprises: adopt CPU to realize GMRES algorithm.
Existingly realize in the method for GMRES algorithm, because the operand of GMRES algorithm is larger, and CPU arithmetic capability is limited, and therefore, arithmetic speed is slow.
Summary of the invention
In order to solve the problem, the present invention proposes a kind of method and system realizing GMRES algorithm, can arithmetic speed be improved.
In order to achieve the above object, the present invention proposes a kind of method realizing GMRES algorithm, comprising:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process;
The matrix of coefficients block that division obtains by host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;
Many core coprocessor MIC corresponding to each process of each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.
Preferably, described host CPU controls host process and the matrix of coefficients of system of linear equations and right-hand-side vector is carried out division and obtain matrix of coefficients block and right-hand-side vector block comprises:
Described host CPU controls described host process to carry out divided by row and obtains described matrix of coefficients block and described right-hand-side vector block by the matrix of coefficients of described system of linear equations and described right-hand-side vector.
Preferably, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and comprises:
Each described CPU controls the solution of linear equation described in each described process initialization, and step-up error;
Each described CPU controls MIC corresponding to each described process according to formula r ' 0=b'-A'x 0calculate r ' 0; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x 0for the solution of described initialized system of linear equations;
MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r ' 0, adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;
Each described CPU control MIC corresponding to each described process according to formula β '=|| r ' 0|| 2calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;
Each described CPU solves y according to described normal orthogonal matrix and described upper Hessenberg matrix-block mmake reach minimum; Wherein, for described upper Hessenberg matrix-block;
Each described CPU controls MIC corresponding to each described process according to formula x' m=x " 0+ V ' my mcalculate x' m, by the x' calculated msend to described host CPU to carry out integration and obtain x m; Wherein, x " 0for the x' that the last time calculates m, V ' mfor described normal orthogonal matrix;
Described host CPU is by described x msend to other processes of self and each process of other CPU;
Each described CPU controls each process according to formula r' m=b'-A'x mcalculate r ' m, and calculate || r' m|| 2, by what calculate || r' m|| 2send to described host CPU to carry out conformity calculation to obtain || r m|| 2;
Described host CPU is judged || r m|| 2< ε, determines x mfor the solution of described system of linear equations; Wherein, ε is the error arranged.
Preferably, when described host CPU is judged || r m|| 2during>=ε, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and also comprises:
MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r ' m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.
Preferably, the MIC that each process of described each CPU Collaborative Control is corresponding is according to the matrix of coefficients block distributed and r ' 0, adopt Arnoldi orthogonalization procedure calculating normal orthogonal matrix and upper Hessenberg matrix-block to comprise:
Each described CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in described Arnoldi orthogonalization procedure, higher order vector subtracts each other, sum higher order vector product in higher order vector, and result of calculation is returned to described host CPU and integrates, or carry out low order matrix multiplication of vectors, low order vector subtracts each other, low order inner product of vectors and low order vector product or other low exponent arithmetic(al)s.
The invention allows for a kind of system realizing GMRES algorithm, at least comprise:
A host CPU and one or more other CPU, one or more many core coprocessor MIC;
Wherein, host CPU is used for:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by control host process; The matrix of coefficients block that division obtains by control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;
Host CPU and other CPU are used for:
Many core coprocessor MIC corresponding to each process of Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.
Preferably, described host CPU and other CPU specifically for:
Control the solution of linear equation described in each described process initialization, and step-up error;
Control MIC corresponding to each described process according to formula r ' 0=b'-A'x 0calculate r ' 0; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x 0for the solution of described initialized linear equation;
MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r ' 0, adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;
Control MIC corresponding to each described process according to formula β '=|| r ' 0|| 2calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;
Y is solved according to described normal orthogonal matrix and described upper Hessenberg matrix-block mmake reach minimum; Wherein, for described upper Hessenberg matrix-block;
Control MIC corresponding to each described process according to formula x' m=x " 0+ V ' my mcalculate x' m, by the x' calculated msend to described host CPU to carry out integration and obtain x m; Wherein, x " 0for the x' that the last time calculates m, V ' mfor described normal orthogonal matrix;
Control each process according to formula r' m=b'-A'x mcalculate r ' m, and calculate || r' m|| 2, by what calculate || r' m|| 2send to described host CPU to carry out conformity calculation to obtain || r m|| 2;
Described host CPU also for:
By described x msend to other processes of self and each process of other CPU; Judge || r m|| 2< ε, determines x mfor the solution of described system of linear equations; Wherein, ε is the error arranged.
Preferably, described host CPU also for:
Judge || r m|| 2>=ε, controls described host process and sends a notification message to each process of self other processes and other CPU described;
Described host CPU and other CPU also for:
Receive notification message, MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r ' m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.
Compared with prior art, technical scheme of the present invention comprises: the matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process; The matrix of coefficients block that division obtains by host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU; MIC corresponding to each process of each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations.By the solution of the present invention, the method adopting CPU cluster to control multiple MIC carrys out Parallel Implementation GMRES algorithm, improves arithmetic speed.
Accompanying drawing explanation
Be described the accompanying drawing in the embodiment of the present invention below, the accompanying drawing in embodiment is for a further understanding of the present invention, is used from explanation the present invention, does not form limiting the scope of the invention with instructions one.
Fig. 1 is the process flow diagram that the present invention realizes the method for GMRES algorithm;
Fig. 2 is the structure composition schematic diagram that the present invention realizes the system of GMRES algorithm.
Embodiment
For the ease of the understanding of those skilled in the art, below in conjunction with accompanying drawing, the invention will be further described, can not be used for limiting the scope of the invention.It should be noted that, when not conflicting, the various modes in the embodiment in the application and embodiment can combine mutually.
See Fig. 1, the present invention proposes a kind of method realizing GMRES algorithm, comprising:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by step 100, host CPU control host process.Specifically comprise:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out divided by row and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process.
Wherein, during division, should ensure that each CPU distributes to the line number of the matrix of coefficients block of each process and right-hand-side vector block as far as possible equal.Matrix of coefficients block and the right-hand-side vector block of distributing to same process should be corresponding.
Such as, division factor matrix and right-hand-side vector can be come in accordance with the following methods: when line number (as 101) aliquant process number (as 10) of matrix of coefficients, the line number of remainder be given at random certain some process (as one of them process overabsorption 1 row).General when distributing, the line number distributing to a process is continuous print.
The matrix of coefficients block that division obtains by step 101, host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU.
In this step, matrix of coefficients block and right-hand-side vector block can arbitrarily distribute, as long as ensure to distribute to just passable corresponding with the position of right-hand-side vector block in right-hand-side vector in the position of matrix of coefficients block in matrix of coefficients of same process, such as the matrix of coefficients block of 1 ~ 10 row is distributed to some processes, so corresponding, also the right-hand-side vector block of 1 ~ 10 row must be distributed to this process.
The matrix of coefficients block distributed and right-hand-side vector block can be sent to each process by message passing interface (MPI, MessagePassingInterface) by host CPU.MPI issues and jointly safeguards a kind of message passing interface by numerous parallel computer manufacturer, software development organization and Parallel application unit in May, 1994, current one of most popular Parallel Programming Environment in the world, the especially scalable parallel computer of distributed storage and the one programming example of network of workstations and a group of planes.MPI, primarily of Fortran+MPI or C+MPI composition, has up to a hundred function call interface, can directly call.MPI has lot of advantages: have portability and ease for use; There is complete asynchronous communication function; There is formal and detailed explication.MPI is on PC, MSWindows and all main UNIX/Linux workstations, accomplished on main flow parallel machine, in the distributed storage environment that the more senior and abstract program based on level messages transmission procedure is formed, the benefit that MPI standardization brings is obvious.
Many core coprocessors (MIC, ManyIntegratedCore) corresponding to each process of step 102, each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.Specifically comprise:
Each CPU controls the solution of each process initialization linear equation, and step-up error;
Each CPU controls MIC corresponding to each process according to formula r ' 0=b'-A'x 0calculate r ' 0; Wherein, b' is the right-hand-side vector block distributed, and A' is the matrix of coefficients block distributed, x 0for the solution of initialized linear equation;
MIC corresponding to each process of each CPU Collaborative Control is according to the matrix of coefficients block distributed and r ' 0, adopt Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;
Each CPU control MIC corresponding to each process according to formula β '=|| r ' 0|| 2calculate β ', and sent to by the β ' calculated host CPU to carry out integration to obtain β;
Each CPU is according to orthonormal basis and upper Hessenberg Matrix Solving y mmake reach minimum; Wherein, for upper Hessenberg matrix;
Each CPU controls MIC corresponding to each process according to formula x' m=x " 0+ V ' my mcalculate x' m, by the x' calculated msend to host CPU to carry out integration and obtain x m; Wherein, x " 0for the x' that the last time calculates m, V ' mfor normal orthogonal matrix;
Host CPU is by x msend to other processes of self and each process of other CPU;
Each CPU controls each process according to formula r' m=b'-A'x mcalculate r ' m, and calculate || r' m|| 2, by what calculate || r' m|| 2send to host CPU to carry out conformity calculation to obtain || r m|| 2;
Host CPU is judged || r m|| 2< ε, determines x mfor the solution of system of linear equations; Wherein, ε is the error arranged.
Wherein, when host CPU is judged || r m|| 2during>=ε, MIC corresponding to each process of each CPU Collaborative Control is according to the matrix of coefficients block distributed and r ' m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.
Wherein, when calculating for the first time, generally by x 0be initialized as null vector, therefore, x' m=V ' my m.
Wherein, the MIC that each process of each CPU Collaborative Control is corresponding is according to the matrix of coefficients block distributed and r ' 0, adopt Arnoldi orthogonalization procedure calculating orthonormal basis and upper Hessenberg matrix to comprise:
Each CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in Arnoldi (Arnoldi) orthogonalization procedure, higher order vector subtracts each other, sum higher order vector product in higher order vector, and result of calculation is returned to described host CPU and integrates, or carry out low order matrix multiplication of vectors, low order addition of vectors, low order vector subtract each other, low order inner product of vectors and low order vector product or other low exponent arithmetic(al)s.
Wherein, high-order refers to that the participation matrix of computing or the exponent number of vector are more than or equal to predetermined threshold value.
Wherein, each CPU can also control MIC corresponding to each process and calculate other computings between adjacent two computings (high level matrix multiplication of vectors, higher order vector are added, higher order vector subtracts each other, higher order vector in sum higher order vector product), to reduce the communication between each process, improve operation efficiency.
Wherein, each CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in Arnoldi orthogonalization procedure, higher order vector is added, higher order vector subtracts each other, in higher order vector during sum higher order vector product, control host process by host CPU first the matrix or vector that participate in computing are carried out dividing and send to other processes, by MIC corresponding to other Process flowchart, first the matrix-block after dividing or vector block are carried out that matrix-vector is multiplied, addition of vectors, vector subtract each other, inner product of vectors and vector product, then send to host process to integrate result of calculation.
Wherein, MIC is that Intel (Intel) company releases, for the many-core processor of high performance parallel computation.It develops from existing to strong (Xeon) processor products basis, and it aims at very-high performance and calculates and raw new architecture.MIC is not for replacing CPU in computer architecture, but exist as coprocessor.Mic card MIC compared with general polycaryon processor has less kernel and hardware thread.The x86 core that MIC chip has more than 50 to simplify usually, each core supports 4 hardware threads.In addition, also have 512 bits (bite) vectorial bit wide, its pair of smart peak performance reaches 1T flops (Flops, Floating-Pointoperationspersecond) per second.MIC technology will accelerate the development of high-performance calculation, the performance bottleneck of quick solution high-performance calculation application.
Wherein, MIC can adopt multithreading and line computation high level matrix multiplication of vectors, higher order vector subtract each other sum higher order vector product in (or higher order vector subtracts each other), higher order vector, and the number of the thread that each MIC can open is 4 (N core-1), wherein, N corefor the core amounts of MIC.
Wherein, when needs MIC calculates, the program that MIC needs carry out calculating adopts the mode of offload to be unloaded on MIC and runs by CPU, thus realizes the calculating of MIC.Such as, following program can be adopted to need the program unloading of MIC computing on MIC:
Wherein, the content in { } is the program needing MIC computing.
By the solution of the present invention, adopt CPU cluster to control the method for multiple MIC to realize GMRES algorithm, improve arithmetic speed.
See Fig. 2, the invention allows for a kind of system realizing GMRES algorithm, at least comprise:
A host CPU and one or more other CPU, one or more many core coprocessor MIC;
Wherein, host CPU is used for:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by control host process; The matrix of coefficients block that division obtains by control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;
Host CPU and other CPU are used for:
Many core coprocessor MIC corresponding to each process of Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.
In system of the present invention, host CPU and other CPU specifically for:
Control the solution of each process initialization linear equation, and step-up error;
Control MIC corresponding to each process according to formula r ' 0=b'-A'x 0calculate r ' 0; Wherein, b' is the right-hand-side vector block distributed, and A' is the matrix of coefficients block distributed, x 0for the solution of initialized linear equation;
MIC corresponding to each process of Collaborative Control is according to the matrix of coefficients block distributed and r ' 0, adopt Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Hessenberg matrix-block;
Control MIC corresponding to each process according to formula β '=|| r ' 0|| 2calculate β ', and sent to by the β ' calculated host CPU to carry out integration to obtain β;
Solve ym according to normal orthogonal matrix and upper Hessenberg matrix-block to make reach minimum; Wherein, for upper Hessenberg matrix-block;
Control MIC corresponding to each process according to formula x' m=x " 0+ V ' my mcalculate x' m, by the x' calculated msend to host CPU to carry out integration and obtain x m; Wherein, x " 0for the x' that the last time calculates m, V ' mfor normal orthogonal matrix;
Control each process according to formula r' m=b'-A'x mcalculate r ' m, and calculate || r' m|| 2, by what calculate || r' m|| 2send to host CPU to carry out conformity calculation to obtain || r m|| 2;
Host CPU also for:
By x msend to other processes of self and each process of other CPU; Judge || r m|| 2< ε, determines x mfor the solution of system of linear equations; Wherein, ε is the error arranged.
In system of the present invention, host CPU also for:
Judge || r m|| 2>=ε, control host process sends a notification message to other processes of self with each process of other CPU;
Host CPU and other CPU also for:
Receive notification message, MIC corresponding to each process of Collaborative Control is according to the matrix of coefficients block distributed and r ' m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.
It should be noted that; above-described embodiment is only understand for the ease of those skilled in the art; be not limited to protection scope of the present invention; under the prerequisite not departing from inventive concept of the present invention, any apparent replacement and improvement etc. that those skilled in the art make the present invention are all within protection scope of the present invention.

Claims (8)

1. realize a method for broad sense minimum residual methods GMRES algorithm, it is characterized in that, comprising:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process;
The matrix of coefficients block that division obtains by host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;
Many core coprocessor MIC corresponding to each process of each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.
2. method according to claim 1, is characterized in that, described host CPU controls host process to carry out division and obtain matrix of coefficients block and right-hand-side vector block comprises by the matrix of coefficients of system of linear equations and right-hand-side vector:
Described host CPU controls described host process to carry out divided by row and obtains described matrix of coefficients block and described right-hand-side vector block by the matrix of coefficients of described system of linear equations and described right-hand-side vector.
3. method according to claim 1, is characterized in that, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and comprise:
Each described CPU controls the solution of linear equation described in each described process initialization, and step-up error;
Each described CPU controls MIC corresponding to each described process according to formula r 0'=b'-A'x 0calculate r 0'; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x 0for the solution of described initialized system of linear equations;
MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r 0', adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;
Each described CPU controls MIC corresponding to each described process according to formula β '=‖ r 0' ‖ 2calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;
Each described CPU solves y according to described normal orthogonal matrix and described upper Hessenberg matrix-block mmake reach minimum; Wherein, for described upper Hessenberg matrix-block;
Each described CPU controls MIC corresponding to each described process according to formula x' m=x' 0+ V m' y mcalculate x' m, by the x' calculated msend to described host CPU to carry out integration and obtain x m; Wherein, x' 0for the x' that the last time calculates m, V m' be described normal orthogonal matrix;
Described host CPU is by described x msend to other processes of self and each process of other CPU;
Each described CPU controls each process according to formula r' m=b'-A'x mcalculate r m', and calculate ‖ r' m2, by the ‖ r' calculated m2send to described host CPU to carry out conformity calculation and obtain ‖ r m2;
Described host CPU judges ‖ r m2< ε, determines x mfor the solution of described system of linear equations; Wherein, ε is the error arranged.
4. method according to claim 3, is characterized in that, when described host CPU judges ‖ r m2during>=ε, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and also comprises:
MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r m', continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.
5. method according to claim 3, is characterized in that, MIC corresponding to each process of described each CPU Collaborative Control is according to the matrix of coefficients block distributed and r 0', adopt Arnoldi orthogonalization procedure calculating normal orthogonal matrix and upper Hessenberg matrix-block to comprise:
Each described CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in described Arnoldi orthogonalization procedure, higher order vector subtracts each other, sum higher order vector product in higher order vector, and result of calculation is returned to described host CPU and integrates, or carry out low order matrix multiplication of vectors, low order vector subtracts each other, low order inner product of vectors and low order vector product or other low exponent arithmetic(al)s.
6. realize a system for broad sense minimum residual methods GMRES algorithm, it is characterized in that, at least comprise:
A host CPU and one or more other CPU, one or more many core coprocessor MIC;
Wherein, host CPU is used for:
The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by control host process; The matrix of coefficients block that division obtains by control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;
Host CPU and other CPU are used for:
Many core coprocessor MIC corresponding to each process of Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.
7. system according to claim 6, is characterized in that, described host CPU and other CPU specifically for:
Control the solution of linear equation described in each described process initialization, and step-up error;
Control MIC corresponding to each described process according to formula r 0'=b'-A'x 0calculate r 0'; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x 0for the solution of described initialized linear equation;
MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r 0', adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;
Control MIC corresponding to each described process according to formula β '=‖ r 0' ‖ 2calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;
Y is solved according to described normal orthogonal matrix and described upper Hessenberg matrix-block mmake reach minimum; Wherein, for described upper Hessenberg matrix-block;
Control MIC corresponding to each described process according to formula x' m=x' 0+ V m' y mcalculate x' m, by the x' calculated msend to described host CPU to carry out integration and obtain x m; Wherein, x' 0' be the last x' calculated m, V m' be described normal orthogonal matrix;
Control each process according to formula r' m=b'-A'x mcalculate r m', and calculate ‖ r' m2, by the ‖ r' calculated m2send to described host CPU to carry out conformity calculation and obtain ‖ r m2;
Described host CPU also for:
By described x msend to other processes of self and each process of other CPU; Judge ‖ r m2< ε, determines x mfor the solution of described system of linear equations; Wherein, ε is the error arranged.
8. system according to claim 7, is characterized in that, described host CPU also for:
Judge ‖ r m2>=ε, controls described host process and sends a notification message to each process of self other processes and other CPU described;
Described host CPU and other CPU also for:
Receive notification message, MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r m', continue to perform the step adopting Arn ldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.
CN201510551642.5A 2015-09-01 2015-09-01 Method and system for achieving GMRES algorithm Pending CN105045768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510551642.5A CN105045768A (en) 2015-09-01 2015-09-01 Method and system for achieving GMRES algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510551642.5A CN105045768A (en) 2015-09-01 2015-09-01 Method and system for achieving GMRES algorithm

Publications (1)

Publication Number Publication Date
CN105045768A true CN105045768A (en) 2015-11-11

Family

ID=54452326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510551642.5A Pending CN105045768A (en) 2015-09-01 2015-09-01 Method and system for achieving GMRES algorithm

Country Status (1)

Country Link
CN (1) CN105045768A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897163A (en) * 2017-03-08 2017-06-27 郑州云海信息技术有限公司 A kind of algebra system method for solving and system based on KNL platforms
CN113191105A (en) * 2021-03-22 2021-07-30 梁文毅 Electrical simulation method based on distributed parallel operation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5490278A (en) * 1991-07-12 1996-02-06 Matsushita Electric Industrial Co., Ltd. Data processing method and apparatus employing parallel processing for solving systems of linear equations
WO2000075854A1 (en) * 1999-06-03 2000-12-14 Schlumberger Technology Corporation An improved simulation method and apparatus
CN104182209A (en) * 2014-08-27 2014-12-03 中国科学院软件研究所 PETSc-based GCRO-DR algorithm parallel processing method
CN104408019A (en) * 2014-10-29 2015-03-11 浪潮电子信息产业股份有限公司 Method for realizing GMRES (generalized minimum residual) algorithm parallel acceleration on basis of MIC (many integrated cores) platform
CN104615584A (en) * 2015-02-06 2015-05-13 中国人民解放军国防科学技术大学 Method for vectorization computing of solution of large-scale trigonometric linear system of equations for GPDSP

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5490278A (en) * 1991-07-12 1996-02-06 Matsushita Electric Industrial Co., Ltd. Data processing method and apparatus employing parallel processing for solving systems of linear equations
WO2000075854A1 (en) * 1999-06-03 2000-12-14 Schlumberger Technology Corporation An improved simulation method and apparatus
CN104182209A (en) * 2014-08-27 2014-12-03 中国科学院软件研究所 PETSc-based GCRO-DR algorithm parallel processing method
CN104408019A (en) * 2014-10-29 2015-03-11 浪潮电子信息产业股份有限公司 Method for realizing GMRES (generalized minimum residual) algorithm parallel acceleration on basis of MIC (many integrated cores) platform
CN104615584A (en) * 2015-02-06 2015-05-13 中国人民解放军国防科学技术大学 Method for vectorization computing of solution of large-scale trigonometric linear system of equations for GPDSP

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
柳有权 等: "大规模稀疏线性方程组的GMRES-GPU快速求解算法", 《计算机辅助设计与图形学学报》 *
陈志 等: "求解大规模稀疏线性方程组的算法", 《北京工业大学学报》 *
黄林显 等: "大区域地下水模拟的预优并行 GMRES(m) 算法研究", 《现代地质》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106897163A (en) * 2017-03-08 2017-06-27 郑州云海信息技术有限公司 A kind of algebra system method for solving and system based on KNL platforms
CN113191105A (en) * 2021-03-22 2021-07-30 梁文毅 Electrical simulation method based on distributed parallel operation method

Similar Documents

Publication Publication Date Title
US8713576B2 (en) Load balancing for parallel tasks
Deveci et al. Exploiting geometric partitioning in task mapping for parallel computers
US9009719B2 (en) Computer workload capacity estimation using proximity tables
US9479449B2 (en) Workload partitioning among heterogeneous processing nodes
US9477532B1 (en) Graph-data partitioning for workload-balanced distributed computation with cost estimation functions
US20130198426A1 (en) Heterogeneous parallel systems for accelerating simulations based on discrete grid numerical methods
Dhurandher et al. A cluster-based load balancing algorithm in cloud computing
CN106528490B (en) FPGA heterogeneous acceleration computing device and system
Chen et al. Tology-aware optimal data placement algorithm for network traffic optimization
US9727529B2 (en) Calculation device and calculation method for deriving solutions of system of linear equations and program that is applied to the same
Adlerborn et al. A parallel QZ algorithm for distributed memory HPC systems
CN104484234A (en) Multi-front load flow calculation method and system based on GPU (graphics processing unit)
Woźniak et al. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines
CN105260342A (en) Solving method and system for symmetric positive definite linear equation set
CN105045768A (en) Method and system for achieving GMRES algorithm
Vaidyanathan et al. Improving communication performance and scalability of native applications on intel xeon phi coprocessor clusters
KR101585980B1 (en) CR Algorithm Processing Method for Actively Utilizing Shared Memory of Multi-Proceoosr and Processor using the same
CN107220702B (en) Computer vision processing method and device of low-computing-capacity processing equipment
CN110837395A (en) Normalization processing method, device and system for multi-GPU parallel training
CN102722470B (en) Single-machine parallel solving method for linear equation group
CN115346099A (en) Image convolution method, chip, equipment and medium based on accelerator chip
Khan et al. Cost Optimization Technique of Task Allocation in Heterogeneous Distributed Computing System
Kouzinopoulos et al. Performance study of parallel hybrid multiple pattern matching algorithms for biological sequences
Gogolenko et al. Structured orthogonal inversion of block p-cyclic matrices on multicores with GPU accelerators
Markowski Determination of positive realization of two dmensional systems using digraph theory and gpu computing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151111

RJ01 Rejection of invention patent application after publication