CN105045768A

CN105045768A - Method and system for achieving GMRES algorithm

Info

Publication number: CN105045768A
Application number: CN201510551642.5A
Authority: CN
Inventors: 王明清; 张清; 张广勇; 吴韶华
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2015-11-11

Abstract

The invention discloses a method and system for achieving a GMRES algorithm. The method comprises the steps that a main CPU controls a main process to partition a coefficient matrix and a right-end vector of a linear system of equations, and thus coefficient matrix blocks and right-end vector blocks are obtained; the main CPU controls the main process to allocate the coefficient matrix blocks obtained through partition and the right-end vector blocks obtained through partition to all processes of the main CPU and all processes of other CPUs; all the CPUs control MICs corresponding to all the processes in a coordination mode, and a solution of the linear system of equations is solved through the GMRES algorithm according to the allocated coefficient matrix blocks and the allocated right-end vector blocks. By means of the scheme, the GMRES algorithm is achieved in parallel through the method of using a CPU cluster to control the multiple MICs, and the operation speed is improved.

Description

A kind of method and system realizing GMRES algorithm

Technical field

The present invention relates to data processing technique, espespecially one realizes the method and system of broad sense minimum residual methods (GMRES, GeneralizedMinimalResidual) algorithm.

Background technology

As everyone knows, solving of mathematics physics model is one of requisite work in numerous engineering production and scientific research field.Along with the development of computing machine, finite difference (FD, FiniteDifference), Finite Element Method (FEM, FiniteElementMethod), boundary element method (BEM, BoundaryElementMethod), a series of numerical computation method such as non-mesh method (MM, MeshlessMethod) is born in succession.Particularly FEM, up to the present theoretical system is fairly perfect, and widespread use in the every field such as machine-building, materials processing, scientific research, become the important tool in engineering design.These numerical computation methods have a something in common: the mathematics physics model that practical problems derives is separated into a linear algebraic equation systems by specific mode.In addition, structure analysis, network analysis, world measurement, data and optimization problem etc. all often run into the Solve problems of system of linear equations.Therefore, that can not exaggerate says that most of scientific and engineering problem all will be summed up as the Solve problems of a system of linear equations.But along with the increase of problem scale, the solving of system of linear equations becomes engineering and produces and a large bottleneck in scientific research.For extensive and even ultra-large system of linear equations, solve and become very difficult.Especially when sparse matrix is asymmetric and without ad hoc structure, solve and may complete hardly.

GMRES is the common method solving asymmetrical sparse matrix, also be one of classic algorithm in krylov (Krylov) subspace, it carrys out iterative, fast convergence rate, the advantage of good stability by the minimum residual of Krylov subspace vector.To system of linear equations Ax=b; Wherein, A is matrix of coefficients, and b is right-hand-side vector, and the m rank Krylov subspace of GMRES algorithm is K _m=span (r ₀, Ar ₀..., A ^m-1r ₀), wherein, r ₀=b-Ax ₀, x ₀for the initial vector of x, GMRES makes residual Ax by solving _mthe vector x that-b is minimum _m∈ K _mapproach the exact solution of Ax=b.

The basic process of the extensive system of linear equations of GMRES Algorithm for Solving is adopted roughly to comprise:

Initialization vector x obtains x ₀, step-up error ε;

According to formula r ₀=b-Ax ₀calculate r ₀, according to formula β=|| r ₀|| ₂calculate β; Wherein, || || ₂be 2 norms.

According to r ₀arnoldi (Arnoldi) orthogonalization procedure is adopted to calculate orthonormal basis with A with upper Heisenberg (Hessenberg) matrix

Solve y _mmake reach minimum; Wherein, e ₁=(1,0 ..., 0);

According to formula x _m=x' ₀+ V _my _mcalculate x _m; Wherein, x' ₀for the x that the last time calculates _m, when first time calculates, get x ₀, according to formula r _m=b-Ax _mcalculate r _m, when || r _m|| ₂during < ε, xm is the solution of Ax=b, when || r _m|| ₂during>=ε, according to r _madopt Arnoldi orthogonalization procedure to obtain new orthonormal basis and new upper Hessenberg matrix with A, and continue subsequent process.

The method of the existing GMRES of realization algorithm roughly comprises: adopt CPU to realize GMRES algorithm.

Existingly realize in the method for GMRES algorithm, because the operand of GMRES algorithm is larger, and CPU arithmetic capability is limited, and therefore, arithmetic speed is slow.

Summary of the invention

In order to solve the problem, the present invention proposes a kind of method and system realizing GMRES algorithm, can arithmetic speed be improved.

In order to achieve the above object, the present invention proposes a kind of method realizing GMRES algorithm, comprising:

The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process;

The matrix of coefficients block that division obtains by host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;

Many core coprocessor MIC corresponding to each process of each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.

Preferably, described host CPU controls host process and the matrix of coefficients of system of linear equations and right-hand-side vector is carried out division and obtain matrix of coefficients block and right-hand-side vector block comprises:

Described host CPU controls described host process to carry out divided by row and obtains described matrix of coefficients block and described right-hand-side vector block by the matrix of coefficients of described system of linear equations and described right-hand-side vector.

Preferably, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and comprises:

Each described CPU controls the solution of linear equation described in each described process initialization, and step-up error;

Each described CPU controls MIC corresponding to each described process according to formula r ' ₀=b'-A'x ₀calculate r ' ₀; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x ₀for the solution of described initialized system of linear equations;

MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r ' ₀, adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;

Each described CPU control MIC corresponding to each described process according to formula β '=|| r ' ₀|| ₂calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;

Each described CPU solves y according to described normal orthogonal matrix and described upper Hessenberg matrix-block _mmake reach minimum; Wherein, for described upper Hessenberg matrix-block;

Each described CPU controls MIC corresponding to each described process according to formula x' _m=x " ₀+ V ' _my _mcalculate x' _m, by the x' calculated _msend to described host CPU to carry out integration and obtain x _m; Wherein, x " ₀for the x' that the last time calculates _m, V ' _mfor described normal orthogonal matrix;

Described host CPU is by described x _msend to other processes of self and each process of other CPU;

Each described CPU controls each process according to formula r' _m=b'-A'x _mcalculate r ' _m, and calculate || r' _m|| ₂, by what calculate || r' _m|| ₂send to described host CPU to carry out conformity calculation to obtain || r _m|| ₂;

Described host CPU is judged || r _m|| ₂< ε, determines x _mfor the solution of described system of linear equations; Wherein, ε is the error arranged.

Preferably, when described host CPU is judged || r _m|| ₂during>=ε, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and also comprises:

MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r ' _m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.

Preferably, the MIC that each process of described each CPU Collaborative Control is corresponding is according to the matrix of coefficients block distributed and r ' ₀, adopt Arnoldi orthogonalization procedure calculating normal orthogonal matrix and upper Hessenberg matrix-block to comprise:

Each described CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in described Arnoldi orthogonalization procedure, higher order vector subtracts each other, sum higher order vector product in higher order vector, and result of calculation is returned to described host CPU and integrates, or carry out low order matrix multiplication of vectors, low order vector subtracts each other, low order inner product of vectors and low order vector product or other low exponent arithmetic(al)s.

The invention allows for a kind of system realizing GMRES algorithm, at least comprise:

A host CPU and one or more other CPU, one or more many core coprocessor MIC;

Wherein, host CPU is used for:

The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by control host process; The matrix of coefficients block that division obtains by control host process and right-hand-side vector block distribute to each process of self and each process of other CPU;

Host CPU and other CPU are used for:

Many core coprocessor MIC corresponding to each process of Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.

Preferably, described host CPU and other CPU specifically for:

Control the solution of linear equation described in each described process initialization, and step-up error;

Control MIC corresponding to each described process according to formula r ' ₀=b'-A'x ₀calculate r ' ₀; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x ₀for the solution of described initialized linear equation;

MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r ' ₀, adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;

Control MIC corresponding to each described process according to formula β '=|| r ' ₀|| ₂calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;

Y is solved according to described normal orthogonal matrix and described upper Hessenberg matrix-block _mmake reach minimum; Wherein, for described upper Hessenberg matrix-block;

Control MIC corresponding to each described process according to formula x' _m=x " ₀+ V ' _my _mcalculate x' _m, by the x' calculated _msend to described host CPU to carry out integration and obtain x _m; Wherein, x " ₀for the x' that the last time calculates _m, V ' _mfor described normal orthogonal matrix;

Control each process according to formula r' _m=b'-A'x _mcalculate r ' _m, and calculate || r' _m|| ₂, by what calculate || r' _m|| ₂send to described host CPU to carry out conformity calculation to obtain || r _m|| ₂;

Described host CPU also for:

By described x _msend to other processes of self and each process of other CPU; Judge || r _m|| ₂< ε, determines x _mfor the solution of described system of linear equations; Wherein, ε is the error arranged.

Preferably, described host CPU also for:

Judge || r _m|| ₂>=ε, controls described host process and sends a notification message to each process of self other processes and other CPU described;

Described host CPU and other CPU also for:

Receive notification message, MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r ' _m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.

Compared with prior art, technical scheme of the present invention comprises: the matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process; The matrix of coefficients block that division obtains by host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU; MIC corresponding to each process of each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations.By the solution of the present invention, the method adopting CPU cluster to control multiple MIC carrys out Parallel Implementation GMRES algorithm, improves arithmetic speed.

Accompanying drawing explanation

Be described the accompanying drawing in the embodiment of the present invention below, the accompanying drawing in embodiment is for a further understanding of the present invention, is used from explanation the present invention, does not form limiting the scope of the invention with instructions one.

Fig. 1 is the process flow diagram that the present invention realizes the method for GMRES algorithm;

Fig. 2 is the structure composition schematic diagram that the present invention realizes the system of GMRES algorithm.

Embodiment

For the ease of the understanding of those skilled in the art, below in conjunction with accompanying drawing, the invention will be further described, can not be used for limiting the scope of the invention.It should be noted that, when not conflicting, the various modes in the embodiment in the application and embodiment can combine mutually.

See Fig. 1, the present invention proposes a kind of method realizing GMRES algorithm, comprising:

The matrix of coefficients of system of linear equations and right-hand-side vector are carried out division and obtain matrix of coefficients block and right-hand-side vector block by step 100, host CPU control host process.Specifically comprise:

The matrix of coefficients of system of linear equations and right-hand-side vector are carried out divided by row and obtain matrix of coefficients block and right-hand-side vector block by host CPU control host process.

Wherein, during division, should ensure that each CPU distributes to the line number of the matrix of coefficients block of each process and right-hand-side vector block as far as possible equal.Matrix of coefficients block and the right-hand-side vector block of distributing to same process should be corresponding.

Such as, division factor matrix and right-hand-side vector can be come in accordance with the following methods: when line number (as 101) aliquant process number (as 10) of matrix of coefficients, the line number of remainder be given at random certain some process (as one of them process overabsorption 1 row).General when distributing, the line number distributing to a process is continuous print.

The matrix of coefficients block that division obtains by step 101, host CPU control host process and right-hand-side vector block distribute to each process of self and each process of other CPU.

In this step, matrix of coefficients block and right-hand-side vector block can arbitrarily distribute, as long as ensure to distribute to just passable corresponding with the position of right-hand-side vector block in right-hand-side vector in the position of matrix of coefficients block in matrix of coefficients of same process, such as the matrix of coefficients block of 1 ~ 10 row is distributed to some processes, so corresponding, also the right-hand-side vector block of 1 ~ 10 row must be distributed to this process.

The matrix of coefficients block distributed and right-hand-side vector block can be sent to each process by message passing interface (MPI, MessagePassingInterface) by host CPU.MPI issues and jointly safeguards a kind of message passing interface by numerous parallel computer manufacturer, software development organization and Parallel application unit in May, 1994, current one of most popular Parallel Programming Environment in the world, the especially scalable parallel computer of distributed storage and the one programming example of network of workstations and a group of planes.MPI, primarily of Fortran+MPI or C+MPI composition, has up to a hundred function call interface, can directly call.MPI has lot of advantages: have portability and ease for use; There is complete asynchronous communication function; There is formal and detailed explication.MPI is on PC, MSWindows and all main UNIX/Linux workstations, accomplished on main flow parallel machine, in the distributed storage environment that the more senior and abstract program based on level messages transmission procedure is formed, the benefit that MPI standardization brings is obvious.

Many core coprocessors (MIC, ManyIntegratedCore) corresponding to each process of step 102, each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopt GMRES algorithm to carry out solving the solution obtaining system of linear equations.Specifically comprise:

Each CPU controls the solution of each process initialization linear equation, and step-up error;

Each CPU controls MIC corresponding to each process according to formula r ' ₀=b'-A'x ₀calculate r ' ₀; Wherein, b' is the right-hand-side vector block distributed, and A' is the matrix of coefficients block distributed, x ₀for the solution of initialized linear equation;

MIC corresponding to each process of each CPU Collaborative Control is according to the matrix of coefficients block distributed and r ' ₀, adopt Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;

Each CPU control MIC corresponding to each process according to formula β '=|| r ' ₀|| ₂calculate β ', and sent to by the β ' calculated host CPU to carry out integration to obtain β;

Each CPU is according to orthonormal basis and upper Hessenberg Matrix Solving y _mmake reach minimum; Wherein, for upper Hessenberg matrix;

Each CPU controls MIC corresponding to each process according to formula x' _m=x " ₀+ V ' _my _mcalculate x' _m, by the x' calculated _msend to host CPU to carry out integration and obtain x _m; Wherein, x " ₀for the x' that the last time calculates _m, V ' _mfor normal orthogonal matrix;

Host CPU is by x _msend to other processes of self and each process of other CPU;

Each CPU controls each process according to formula r' _m=b'-A'x _mcalculate r ' _m, and calculate || r' _m|| ₂, by what calculate || r' _m|| ₂send to host CPU to carry out conformity calculation to obtain || r _m|| ₂;

Host CPU is judged || r _m|| ₂< ε, determines x _mfor the solution of system of linear equations; Wherein, ε is the error arranged.

Wherein, when host CPU is judged || r _m|| ₂during>=ε, MIC corresponding to each process of each CPU Collaborative Control is according to the matrix of coefficients block distributed and r ' _m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.

Wherein, when calculating for the first time, generally by x ₀be initialized as null vector, therefore, x' _m=V ' _my _m.

Wherein, the MIC that each process of each CPU Collaborative Control is corresponding is according to the matrix of coefficients block distributed and r ' ₀, adopt Arnoldi orthogonalization procedure calculating orthonormal basis and upper Hessenberg matrix to comprise:

Each CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in Arnoldi (Arnoldi) orthogonalization procedure, higher order vector subtracts each other, sum higher order vector product in higher order vector, and result of calculation is returned to described host CPU and integrates, or carry out low order matrix multiplication of vectors, low order addition of vectors, low order vector subtract each other, low order inner product of vectors and low order vector product or other low exponent arithmetic(al)s.

Wherein, high-order refers to that the participation matrix of computing or the exponent number of vector are more than or equal to predetermined threshold value.

Wherein, each CPU can also control MIC corresponding to each process and calculate other computings between adjacent two computings (high level matrix multiplication of vectors, higher order vector are added, higher order vector subtracts each other, higher order vector in sum higher order vector product), to reduce the communication between each process, improve operation efficiency.

Wherein, each CPU controls that MIC corresponding to each process calculates high level matrix multiplication of vectors in Arnoldi orthogonalization procedure, higher order vector is added, higher order vector subtracts each other, in higher order vector during sum higher order vector product, control host process by host CPU first the matrix or vector that participate in computing are carried out dividing and send to other processes, by MIC corresponding to other Process flowchart, first the matrix-block after dividing or vector block are carried out that matrix-vector is multiplied, addition of vectors, vector subtract each other, inner product of vectors and vector product, then send to host process to integrate result of calculation.

Wherein, MIC is that Intel (Intel) company releases, for the many-core processor of high performance parallel computation.It develops from existing to strong (Xeon) processor products basis, and it aims at very-high performance and calculates and raw new architecture.MIC is not for replacing CPU in computer architecture, but exist as coprocessor.Mic card MIC compared with general polycaryon processor has less kernel and hardware thread.The x86 core that MIC chip has more than 50 to simplify usually, each core supports 4 hardware threads.In addition, also have 512 bits (bite) vectorial bit wide, its pair of smart peak performance reaches 1T flops (Flops, Floating-Pointoperationspersecond) per second.MIC technology will accelerate the development of high-performance calculation, the performance bottleneck of quick solution high-performance calculation application.

Wherein, MIC can adopt multithreading and line computation high level matrix multiplication of vectors, higher order vector subtract each other sum higher order vector product in (or higher order vector subtracts each other), higher order vector, and the number of the thread that each MIC can open is 4 (N _core-1), wherein, N _corefor the core amounts of MIC.

Wherein, when needs MIC calculates, the program that MIC needs carry out calculating adopts the mode of offload to be unloaded on MIC and runs by CPU, thus realizes the calculating of MIC.Such as, following program can be adopted to need the program unloading of MIC computing on MIC:

Wherein, the content in { } is the program needing MIC computing.

By the solution of the present invention, adopt CPU cluster to control the method for multiple MIC to realize GMRES algorithm, improve arithmetic speed.

See Fig. 2, the invention allows for a kind of system realizing GMRES algorithm, at least comprise:

A host CPU and one or more other CPU, one or more many core coprocessor MIC;

Wherein, host CPU is used for:

Host CPU and other CPU are used for:

In system of the present invention, host CPU and other CPU specifically for:

Control the solution of each process initialization linear equation, and step-up error;

Control MIC corresponding to each process according to formula r ' ₀=b'-A'x ₀calculate r ' ₀; Wherein, b' is the right-hand-side vector block distributed, and A' is the matrix of coefficients block distributed, x ₀for the solution of initialized linear equation;

MIC corresponding to each process of Collaborative Control is according to the matrix of coefficients block distributed and r ' ₀, adopt Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Hessenberg matrix-block;

Control MIC corresponding to each process according to formula β '=|| r ' ₀|| ₂calculate β ', and sent to by the β ' calculated host CPU to carry out integration to obtain β;

Solve ym according to normal orthogonal matrix and upper Hessenberg matrix-block to make reach minimum; Wherein, for upper Hessenberg matrix-block;

Control MIC corresponding to each process according to formula x' _m=x " ₀+ V ' _my _mcalculate x' _m, by the x' calculated _msend to host CPU to carry out integration and obtain x _m; Wherein, x " ₀for the x' that the last time calculates _m, V ' _mfor normal orthogonal matrix;

Control each process according to formula r' _m=b'-A'x _mcalculate r ' _m, and calculate || r' _m|| ₂, by what calculate || r' _m|| ₂send to host CPU to carry out conformity calculation to obtain || r _m|| ₂;

Host CPU also for:

By x _msend to other processes of self and each process of other CPU; Judge || r _m|| ₂< ε, determines x _mfor the solution of system of linear equations; Wherein, ε is the error arranged.

In system of the present invention, host CPU also for:

Judge || r _m|| ₂>=ε, control host process sends a notification message to other processes of self with each process of other CPU;

Host CPU and other CPU also for:

Receive notification message, MIC corresponding to each process of Collaborative Control is according to the matrix of coefficients block distributed and r ' _m, continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.

It should be noted that; above-described embodiment is only understand for the ease of those skilled in the art; be not limited to protection scope of the present invention; under the prerequisite not departing from inventive concept of the present invention, any apparent replacement and improvement etc. that those skilled in the art make the present invention are all within protection scope of the present invention.

Claims

1. realize a method for broad sense minimum residual methods GMRES algorithm, it is characterized in that, comprising:

2. method according to claim 1, is characterized in that, described host CPU controls host process to carry out division and obtain matrix of coefficients block and right-hand-side vector block comprises by the matrix of coefficients of system of linear equations and right-hand-side vector:

3. method according to claim 1, is characterized in that, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and comprise:

Each described CPU controls MIC corresponding to each described process according to formula r ₀'=b'-A'x ₀calculate r ₀'; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x ₀for the solution of described initialized system of linear equations;

MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r ₀', adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;

Each described CPU controls MIC corresponding to each described process according to formula β '=‖ r ₀' ‖ ₂calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;

Each described CPU controls MIC corresponding to each described process according to formula x' _m=x' ₀+ V _m' y _mcalculate x' _m, by the x' calculated _msend to described host CPU to carry out integration and obtain x _m; Wherein, x' ₀for the x' that the last time calculates _m, V _m' be described normal orthogonal matrix;

Each described CPU controls each process according to formula r' _m=b'-A'x _mcalculate r _m', and calculate ‖ r' _m‖ ₂, by the ‖ r' calculated _m‖ ₂send to described host CPU to carry out conformity calculation and obtain ‖ r _m‖ ₂;

Described host CPU judges ‖ r _m‖ ₂< ε, determines x _mfor the solution of described system of linear equations; Wherein, ε is the error arranged.

4. method according to claim 3, is characterized in that, when described host CPU judges ‖ r _m‖ ₂during>=ε, MIC corresponding to each process of described each CPU Collaborative Control, according to the matrix of coefficients block distributed and right-hand-side vector block, adopts GMRES algorithm to carry out solving the solution obtaining system of linear equations and also comprises:

MIC corresponding to each described process of each described CPU Collaborative Control is according to the matrix of coefficients block of described distribution and r _m', continue to perform the step adopting Arnoldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.

5. method according to claim 3, is characterized in that, MIC corresponding to each process of described each CPU Collaborative Control is according to the matrix of coefficients block distributed and r ₀', adopt Arnoldi orthogonalization procedure calculating normal orthogonal matrix and upper Hessenberg matrix-block to comprise:

6. realize a system for broad sense minimum residual methods GMRES algorithm, it is characterized in that, at least comprise:

A host CPU and one or more other CPU, one or more many core coprocessor MIC;

Wherein, host CPU is used for:

Host CPU and other CPU are used for:

7. system according to claim 6, is characterized in that, described host CPU and other CPU specifically for:

Control MIC corresponding to each described process according to formula r ₀'=b'-A'x ₀calculate r ₀'; Wherein, b' is the right-hand-side vector block of described distribution, and A' is the matrix of coefficients block of described distribution, x ₀for the solution of described initialized linear equation;

MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r ₀', adopt Arnoldi Arnoldi orthogonalization procedure to calculate normal orthogonal matrix and upper Heisenberg Hessenberg matrix-block;

Control MIC corresponding to each described process according to formula β '=‖ r ₀' ‖ ₂calculate β ', and sent to by the β ' calculated described host CPU to carry out integration to obtain β;

Control MIC corresponding to each described process according to formula x' _m=x' ₀+ V _m' y _mcalculate x' _m, by the x' calculated _msend to described host CPU to carry out integration and obtain x _m; Wherein, x' ₀' be the last x' calculated _m, V _m' be described normal orthogonal matrix;

Control each process according to formula r' _m=b'-A'x _mcalculate r _m', and calculate ‖ r' _m‖ ₂, by the ‖ r' calculated _m‖ ₂send to described host CPU to carry out conformity calculation and obtain ‖ r _m‖ ₂;

Described host CPU also for:

By described x _msend to other processes of self and each process of other CPU; Judge ‖ r _m‖ ₂< ε, determines x _mfor the solution of described system of linear equations; Wherein, ε is the error arranged.

8. system according to claim 7, is characterized in that, described host CPU also for:

Judge ‖ r _m‖ ₂>=ε, controls described host process and sends a notification message to each process of self other processes and other CPU described;

Described host CPU and other CPU also for:

Receive notification message, MIC corresponding to each described process of Collaborative Control is according to the matrix of coefficients block of described distribution and r _m', continue to perform the step adopting Arn ldi orthogonalization procedure to calculate orthonormal basis and upper Heisenberg Hessenberg matrix.