CN106874113A

CN106874113A - A kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+

Info

Publication number: CN106874113A
Application number: CN201710037124.0A
Authority: CN
Inventors: 陆娟娟; 温柏坚; 王毅; 陆进军; 郭文鑫; 闪鑫; 彭龙; 查国强; 王彬
Original assignee: Nari Technology Co Ltd; NARI Nanjing Control System Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Current assignee: Nari Technology Co Ltd; NARI Nanjing Control System Co Ltd; Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date: 2017-01-19
Filing date: 2017-01-19
Publication date: 2017-06-20
Also published as: WO2018133348A1

Abstract

The invention discloses a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+,For the demand that bulk power grid static security analysis in practical engineering application are quickly scanned,On CUDA unified calculation framework platforms,According to system GPU configuring conditions and calculating demand,Its respective thread number is distributed using OpenMp multithreadings,Each thread is uniquely corresponding with single GPU,Developed based on CPU and GPU hybrid programmings,Build many GPU Heterogeneous Computings pattern coordinateds of CPU+ and complete forecast failure parallel computation,On the basis of single forecast failure Load flow calculation,Realize that multiple cut-offs trend iterative process high level of synchronization parallel,Static security analysis forecast failure is greatly improved by Element-Level fine grained parallel and scans parallel processing capability,For the scanning of interconnected power grid integrated dispatching system safety on line analysis and early warning provides strong technical support.

Description

A kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+

Technical field

It is quiet the present invention relates to Automation of Electric Systems dispatching technique field, more particularly to a kind of many GPU heterogeneous schemas of CPU+ State safety analysis computational methods.

Background technology

With the fast development of electric power system dispatching integrated technique, power network calculation scale expands increasingly.In the whole network state Estimate to carry out static security analysis calculating on the basis of section, calculating speed will be difficult to meet demand.On the one hand, it is necessary to be cut-off The forecast accident number of analysis significantly increases, even if being screened by direct current method, it is necessary to cut-off the forecast failure number of scanning in detail Mesh also will be considerable.On the other hand, each forecast accident is carried out cut-offfing Tidal Simulation calculating, and the increase of calculate node number is caused Single Load flow calculation speed is slow.Additionally, the expansion of power network scale causes to need the number of devices of out-of-limit monitoring and stable cross section number big Amplitude increases, and amount of calculation persistently increases.Therefore, modern power systems are developed rapidly, should to the safety analysis of scheduling static system Acid test is proposed with performance issue computationally.Conventional serial algorithm is combined using direct current method screening and is based on sparse square The means such as the local factors decomposition of battle array carry out serial engineering treatment, filter noncritical failure, effectively improve single and cut-off analysis Speed, but still engineer applied demand cannot be met.

Solution procedure finds, mutually decoupled between static security analysis forecast failure it is separate, can be analyzed to it is multiple solely Vertical full Load flow calculation, innate advantage is processed with parallelization, and this characteristic is gradually paid close attention to by researcher, and begins attempt to Using parallel algorithm seeking breakthrough.

In recent years, computer realm is tall and handsome proposes CUDA (Compute Unified Device up to company Architecture) framework, be GPU (Graphic Processing Unit, graphic process unit) development provide it is good and Row Computational frame, the research field can suitable for power system with parallel characteristics.Meanwhile, in CPU fields, in shared The OpenMP multithreadings of parallel system are deposited, it is automatic by loop parallelization by compiler, multiprocessor system can be effectively improved System application program capacity.

To sum up, for the quick scanning application scenarios of static security analysis, how effectively to use GPU parallel and the parallel skills of CPU Art, meets engineering site to dispatch automated system static security analysis computation scale and the higher and higher need of computational efficiency Ask, be the research direction probed into worth thinking deeply about.

The content of the invention

In view of the drawbacks described above of prior art, the technical problems to be solved by the invention are to provide a kind of many GPU of CPU+ Heterogeneous schemas static security analysis computational methods, for the need that bulk power grid static security analysis in practical engineering application are quickly scanned Ask, on CUDA unified calculation framework platforms, according to system GPU configuring conditions and calculating demand, using OpenMp multithreading skills Art distributes its respective thread number, and each thread is uniquely corresponding with single GPU, is developed based on CPU and GPU hybrid programmings, builds CPU+ Many GPU Heterogeneous Computings pattern coordinateds complete forecast failure parallel computation, real on the basis of single forecast failure Load flow calculation Existing multiple cut-offs trend iterative process high level of synchronization parallel, greatly improves static security analysis by Element-Level fine grained parallel pre- Fault scanning parallel processing capability is thought, for the scanning of interconnected power grid integrated dispatching system safety on line analysis and early warning has been provided The technical support of power.

To achieve the above object, the invention provides a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+, Characterized in that, on CUDA unified calculation platforms, it is comprehensive using the OpenMP multithreadings based on shared drive parallel system Close and consider that GPU configuring conditions and forecast failure calculate demand, determine CPU line number of passes mesh, build many GPU heterogeneous schemas of CPU+, often Parallel scan task is completed according to the mono- GPU patterns coordinateds of CPU+ inside block GPU card, in single forecast failure Load flow calculation base On plinth, realize that multiple cut-offs trend iterative process high level of synchronization parallel, static peace is greatly improved by Element-Level fine grained parallel Complete analysis forecast failure scans parallel processing capability.

The many GPU heterogeneous schemas static security analysis computational methods of above-mentioned a kind of CPU+, it is characterised in that including following step Suddenly：

1) the real-time section of state estimation is obtained, ground state Load flow calculation is carried out based on Newton-Raphson method, there is provided one can For the shared data section of multiplexing；

2) according to user's request, topologically sweeping is carried out to whole network equipment, forecast failure collection is formed, for each forecast failure Depth topology search is carried out, branch road-nodal information is actually cut-off in formation；

3) CUDA frameworks are initialized, according to cut-off parallel computation demand carries out packing treatment by data, and is distributed in GPU more Deposit space；

4) according to GPU configuring conditions and actual amount of calculation, rationalization distribution is carried out to cut-offfing computing resource, assessment needs are opened GPU numbers, generate corresponding Thread Count, at utmost to play GPU computation capabilities using OpenMP technologies；

5) under the pool of CPU, Load flow calculation iterative process, including the solution of admittance battle array amendment Jacobian matrix, amendment side Journey is solved, quantity of state updates, Branch Power Flow is calculated and the out-of-limit check of equipment, all transfers to GPU to complete parallel, all to cut-off failure Iteration highly-parallel, the parallel characteristics according to each step calculating task construct corresponding kernel function, in fine grained parallel aspect Elements in parallel calculating task in upper completion iterative process；

6) judge whether all scanning is completed forecast failure, if it is not, then being transferred to step 4), cut-off failure weight to remaining Resource allocation newly is carried out, if be fully completed, step 7 is transferred to；

7) result displaying, according to scanning result, the out-of-limit heavily loaded information of equipment or section caused to failure on-off is opened up Show, and result statistics is scanned according to the scheduling static system practical demand of safety analysis module.

The many GPU heterogeneous schemas static security analysis computational methods of above-mentioned a kind of CPU+, it is characterised in that the step 4) Specifically include：

(1) according to break calculation demand and GPU configuring conditions, computing resource is cut-off in optimization distribution, assesses monolithic GPU singles Number is cut-off in maximum scan：

Wherein, S_max be the total memory headroom of GPU card, S memory sizes for needed for single Load flow calculation, M_max is monolithic The maximum that GPU card can be calculated cut-offs number；

According to formula (1), the GPU card number started needed for calculating is：

Wherein, α is expressed as rounding, the GPU card number that n starts for needed for epicycle, and N is the GPU card sum of system configuration, M_ What cal was calculated for needed for epicycle static security analysis cut-offs sum, and M_maxN once calculates connecting-disconnecting function for many GPU；

According to formula (2), according to Principle of Average Allocation, every piece of actual calculating of card cut-offs number：

As M_maxN≤M_cal, it is meant that GPU all participates in calculating and cannot once complete all break calculations, then will Residue cut-offs M_cal ' carries out new round distribution according to formula (2-3) again, repeatedly calculates in batches, and computing formula is：

M_cal '=M_cal-M_maxN (4)

(2) in the ground state sectional model data basis of shared multiplexing, based on OpenMP multithreadings according to actual fortune Row GPU card number n distributes corresponding CPU line number of passes, and each CPU line journey is uniquely corresponding with single GPU, builds many GPU of CPU+ different Structure pattern, concurrent operation is carried out in single GPU card according to the mono- GPU heterogeneous schemas of CPU+.

The many GPU heterogeneous schemas static security analysis computational methods of above-mentioned a kind of CPU+, it is characterised in that the step 5) Comprise the following steps：

(1) on the basis of ground state admittance battle array Y0, branch road-nodal information is cut-off according to N-1, defines kernel function fun_ Kernel_1, failure is cut-off while carrying out node admittance battle array local correction work, and change node injection to distributing to the card Information, is cut-off by the change of admittance battle array and nodal information come analog machine；

(2) formed according to (1) and cut-off admittance battle array Ym, Jacobian matrix elements in parallel calculating task is completed, for ease of calling CUDA kernel functions, each element uses and formula is calculated as below in Load flow calculation Jacobi matrix in block form：

Wherein, G_ii,B_ii,G_ij,B_ijRespectively admittance battle array non-zero entry, θ_ijIt is i, j node phase angle differences, P_i,Q_iFor i-node is noted Enter power, V_i,V_jRespectively i, j node voltage amplitude；

Matrix in block form each element is quantity of state node voltage phase angle, and the four fundamental rules of admittance array element element and trigonometric function are transported substantially Calculate and solution procedure is independent of each other, with obvious parallel characteristics；

4 kernel function correspondences being defined respectively and calculating each element value in matrix in block form, in single deck tape-recorder, m is cut-off refined gram of failure Than H-matrix amount of calculation：

H=m × h_nozero (9)

Wherein, h_nozeroIt is H gusts of non-zero entry number, m is to cut-off number；

In CUDA frameworks, GPU has multiple SM stream handles, it is possible to provide multiple thread dispatchings.Do not have between h element Any coupled relation, the Element-Level fine grained parallel characteristic with height defines kernel function fun_kernel_2_H by formula (5) Perfect vector multiplies and vectorial add operation, and CPU calls fun_kernel_2_H functions, and GPU is incoming according to fun_kernel_2_H Function parameter enables h GPU multithreading, while perfect (5) parallel work-flow；

(3) admittance battle array Ym and start node injection rate are cut-off according to what (1) was formed, according to formula (10)

Carry out node injecting power residual computations：

Wherein, P_is,Q_isRespectively active, the idle injection rate of node i, as can be seen from the above equation, each node injection rate is residual Difference is quantity of state node voltage phase angle, and the four fundamental rules basic operation of admittance array element element and trigonometric function is residual with other node powers Difference solution procedure is mutually independent of, and defines fun_kernel_3 kernel functions, enables multiple threads in SM stream handles, completes m The parallel computation task of all calculate node unbalanced power amounts of failure is cut-off, detailed parallel computation process is similar to step (2)；

(4) check whether power residual error meets convergence basis for estimation, if it is satisfied, then (8) are jumped to, if be unsatisfactory for, Then continue to be transferred to (5) and be iterated；

First, fun_kernel_4 kernel functions are defined, GPU is restrained to respectively cut-offfing node power residual error by formula (11) Check, if certain cut-offs meets the condition of convergence, record this and cut-off convergence；

||ΔP^t,ΔQ^t| | ＜ ε t≤T (11)

Wherein, ε is power convergence criterion, Δ P^t,ΔQ^tThe respectively power deviation of iteration t times, T is maximum iteration；

Again, inconsistent due to respectively cut-offfing failure convergence iterative steps, setting k >=80% cut-offs failure and has restrained, then tie Shu Chaoliu iterative process, otherwise all failures of cut-offfing proceed trend iteration, until meeting k >=80% condition of convergence or reaching To maximum iteration, step (8) is jumped to；

(5) Jacobian matrix formed to (2) carries out LU decomposition, and the node power residual error calculated with reference to (3) enters line Property solving equations, define fun_kernel_5 kernel functions, task-level parallelism solution is carried out to system of linear equations；

(6) Solution for System of Linear Equations as obtained by (5) is updated to initial state vector by formula (12)

In formula,Respectively before i-th flow equation state vector kth of system of linear equations time iteration, after iteration Value,It is kth time correction value, n is system node number, and r is PV node number；

From above formula, each knot vector is only related to the node increment before and after updating in iterative process, is independent of In other any node calculated values, with parallel natural attribute, can realize that node voltage is more newly appointed by GPU multi-threaded parallels Business；

Formula (12) is defined as the kernel function fun_kernel_6 of plus computing, is appointed parallel for M forecast failure Business, needs the individual thread synchronizations of M × (2 (n-1)-r) to perform fun_kernel_6 and completes a quantity of state renewal calculating altogether；

(7) (2) are jumped to and continues iteration；

(8) Fault load flow result of calculation is cut-off according to currently convergent, carries out Branch Power Flow calculating, due to Branch Power Flow only It is related to branch parameters and both sides node voltage value, mutually it is independent of, fun_kernel_7 kernel functions are defined, complete Branch Power Flow Parallel computation.

(9) according to (8) Branch Power Flow result of calculation, heavily loaded out-of-limit verification is carried out to each branch road or stable cross section, and preserve It is currently all to cut-off the heavily loaded out-of-limit result that failure is caused, fun_kernel_8 kernel functions are defined, complete branch road branch road or steady Determine the section out-of-limit verification parallel computation of heavy duty.

The beneficial effects of the invention are as follows：

CPU+ many GPU heterogeneous schemas static security analysis methods of the present invention based on OpenMP technologies, unify to count in CUDA Calculate on framework platform, according to system GPU configuring conditions and demand can be calculated, corresponding line is distributed using OpenMp multithreadings Number of passes, builds many GPU Heterogeneous Computings pattern coordinateds of CPU+ and completes forecast failure parallel computation, in single forecast failure trend On the basis of calculating, realize that multiple cut-offs trend iterative process high level of synchronization parallel, effectively lifting static security analysis forecast failure Scanning parallel processing capability, it is adaptable to which the integrated large scale system of scheduling quickly scans application to static security analysis forecast failure Scene, has very important application value to lifting bulk power grid static security analysis scanning computational efficiency.

The technique effect of design of the invention, concrete structure and generation is described further below with reference to accompanying drawing, with It is fully understood from the purpose of the present invention, feature and effect.

Brief description of the drawings

The many GPU heterogeneous schemas calculation flow charts of Fig. 1 CPU+；

The many GPU heterogeneous schemas structural representations of Fig. 2 CPU+；

Fig. 3 Jacobi matrix in block form H element parallel computation schematic diagrames.

Specific embodiment

It is an object of the invention to propose a kind of fast parallel method suitable for static security analysis.For Practical Project The demand that bulk power grid static security analysis are quickly scanned in, on CUDA unified calculation framework platforms, matches somebody with somebody according to system GPU Situation and calculating demand are put, its respective thread number is distributed using OpenMp multithreadings, each thread is uniquely right with single GPU Should, developed based on CPU and GPU hybrid programmings, build many GPU Heterogeneous Computings pattern coordinateds completion forecast failures of CPU+ parallel Calculate, on the basis of single forecast failure Load flow calculation, realize that multiple cut-offs trend iterative process high level of synchronization parallel, by unit Plain level fine grained parallel greatly improves static security analysis forecast failure scanning parallel processing capability, is interconnected power grid integration The scanning of scheduling system safety on line analysis and early warning provides strong technical support.

As shown in figure 1, a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+, it is characterised in that in CUDA On unified calculation platform, using the OpenMP multithreadings based on shared drive parallel system, GPU configuring conditions are considered And forecast failure calculates demand, CPU line number of passes mesh is determined, many GPU heterogeneous schemas of CPU+ are built, according to CPU inside every piece of GPU card + mono- GPU patterns coordinated completes parallel scan task, on the basis of single forecast failure Load flow calculation, realizes that multiple is cut-off Trend iterative process high level of synchronization is parallel, and greatly improving static security analysis forecast failure by Element-Level fine grained parallel scans Parallel processing capability.

4) according to GPU configuring conditions and actual amount of calculation, rationalization distribution is carried out to cut-offfing computing resource, assessment needs are opened GPU numbers, corresponding Thread Count is generated using OpenMP technologies, at utmost to play GPU computation capabilities, OpenMP technologies are a set of guiding process of compilation schemes for shared drive parallel system multithread programming；

In the present embodiment, the step 4) specifically include：

M_cal '=M_cal-M_maxN (4)

(2) in the ground state sectional model data basis of shared multiplexing, based on OpenMP multithreadings according to actual fortune Row GPU card number n distributes corresponding CPU line number of passes, and each CPU line journey is uniquely corresponding with single GPU, builds many GPU of CPU+ different Structure pattern, as shown in Fig. 2 carrying out concurrent operation according to the mono- GPU heterogeneous schemas of CPU+ in single GPU card.

In the present embodiment, the step 5) comprise the following steps：

H=m × h_nozero (9)

In CUDA frameworks, GPU has multiple SM stream handles, it is possible to provide multiple thread dispatchings.Do not have between h element Any coupled relation, the Element-Level fine grained parallel characteristic with height defines kernel function fun_kernel_2_H by formula (5) Perfect vector multiplies and vectorial add operation, and H array elements element multithreads computing is as shown in figure 3, CPU calls fun_kernel_2_H Function, GPU enables h GPU multithreading according to the incoming function parameters of fun_kernel_2_H, and each thread corresponds in H gusts One non-zero entry, concurrently performs at the t0 moment according to formula (5), common to complete the work of Jacobian matrix Element generation；

(3) admittance battle array Ym and start node injection rate are cut-off according to what (1) was formed, node injection work(is carried out according to formula (10) Rate residual computations：

||ΔP^t,ΔQ^t| | ＜ ε t≤T (11)

(7) (2) are jumped to and continues iteration；

Preferred embodiment of the invention described in detail above.It should be appreciated that one of ordinary skill in the art without Need creative work just can make many modifications and variations with design of the invention.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical scheme, all should be in the protection domain being defined in the patent claims.

Claims

1. many GPU heterogeneous schemas static security analysis computational methods of a kind of CPU+, it is characterised in that unifiedly calculate platform in CUDA On, using the OpenMP multithreadings based on shared drive parallel system, consider GPU configuring conditions and forecast failure meter Calculation demand, determines CPU line number of passes mesh, builds many GPU heterogeneous schemas of CPU+, is assisted according to the mono- GPU patterns of CPU+ inside every piece of GPU card Parallel scan task is completed with coordinating, on the basis of single forecast failure Load flow calculation, realizes that multiple cut-offs trend iterative process High level of synchronization is parallel, and greatly improving static security analysis forecast failure by Element-Level fine grained parallel scans parallel processing energy Power.

2. a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+ as claimed in claim 1, it is characterised in that Comprise the following steps：

1) the real-time section of state estimation is obtained, ground state Load flow calculation is carried out based on Newton-Raphson method, there is provided one is available for again With shared data section；

2) according to user's request, topologically sweeping is carried out to whole network equipment, forms forecast failure collection, carried out for each forecast failure Branch road-nodal information is actually cut-off in depth topology search, formation；

3) CUDA frameworks are initialized, according to parallel computation demands of cut-offfing data packing treatment is carried out into more, and it is empty to distribute GPU internal memories Between；

4) according to GPU configuring conditions and actual amount of calculation, rationalization distribution is carried out to cut-offfing computing resource, assessment needs what is enabled GPU numbers, generate corresponding Thread Count, at utmost to play GPU computation capabilities using OpenMP technologies；

5) under the pool of CPU, Load flow calculation iterative process, including admittance battle array amendment Jacobian matrix is solved, update equation is asked Solution, quantity of state are updated, Branch Power Flow is calculated and the out-of-limit check of equipment, all transfer to GPU to complete parallel, all to cut-off failure iteration Highly-parallel, the parallel characteristics according to each step calculating task construct corresponding kernel function, complete in fine grained parallel aspect The elements in parallel calculating task into iterative process；

6) judge whether all scanning is completed forecast failure, if it is not, then being transferred to step 4), remaining failure of cut-offfing is entered again Row resource allocation, if be fully completed, is transferred to step 7；

7) result displaying, according to scanning result, the out-of-limit heavily loaded information of equipment or section caused to failure on-off is shown, and Result statistics is scanned according to the scheduling static system practical demand of safety analysis module.

3. a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+ as claimed in claim 1, it is characterised in that The step 4) specifically include：

(1) according to break calculation demand and GPU configuring conditions, computing resource is cut-off in optimization distribution, and assessment monolithic GPU singles are maximum Number is cut-off in scanning：

M_m a x = \frac{S_m a x}{S} - - - (1)

Wherein, S_max is the total memory headroom of GPU card, and S memory sizes for needed for single Load flow calculation, M_max is monolithic GPU card The maximum that can be calculated cut-offs number；

n = \{\begin{matrix} α (\frac{M_c a l}{M_\max}) + 1 & M_\max \cdot N > M_c a l \\ N & M_\max \cdot N \leq M_c a l \end{matrix} - - - (2)

Wherein, α is expressed as rounding, the GPU card number that n starts for needed for epicycle, and N is the GPU card sum of system configuration, and M_cal is What is calculated needed for epicycle static security analysis cut-offs sum, and M_maxN once calculates connecting-disconnecting function for many GPU；

m = \{\begin{matrix} \frac{M_c a l}{n} & M_\max \cdot N > M_c a l \\ M_\max & M_\max \cdot N \leq M_c a l \end{matrix} - - - (3)

As M_maxN≤M_cal, it is meant that GPU all participates in calculating and cannot once complete all break calculations, then by residue Cut-off M_cal ' carries out new round distribution according to formula (2-3) again, repeatedly calculates in batches, and computing formula is：

M_cal '=M_cal-M_maxN (4)

(2) in the ground state sectional model data basis of shared multiplexing, based on OpenMP multithreadings according to actual motion GPU Card number n distributes corresponding CPU line number of passes, and each CPU line journey is uniquely corresponding with single GPU, builds many GPU isomeries moulds of CPU+ Formula, concurrent operation is carried out in single GPU card according to the mono- GPU heterogeneous schemas of CPU+.

4. a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+ as claimed in claim 1, it is characterised in that The step 5) comprise the following steps：

(1) on the basis of ground state admittance battle array Y0, branch road-nodal information is cut-off according to N-1, defines kernel function fun_kernel_ 1, failure is cut-off while carrying out node admittance battle array local correction work, and change node injection information to distributing to the card, lead to Cross admittance gust and the change of nodal information carrys out analog machine and cut-offs；

(2) formed according to (1) and cut-off admittance battle array Ym, Jacobian matrix elements in parallel calculating task is completed, for ease of calling CUDA Kernel function, each element uses and formula is calculated as below in Load flow calculation Jacobi matrix in block form：

\begin{matrix} H_{i i} = V_{i}^{2} B_{i i} + Q_{i} \\ H_{i j} = - V_{i} V_{j} (G_{i j} {sinθ}_{i j} - B_{i j} {cosθ}_{i j}), i &NotEqual; j \end{matrix} - - - (5)

\begin{matrix} N_{i i} = - V_{i}^{2} G_{i i} - P_{i} \\ N_{i j} = - V_{i} V_{j} (G_{i j} {cosθ}_{i j} + B_{i j} {sinθ}_{i j}), i &NotEqual; j \end{matrix} - - - (6)

\begin{matrix} J_{i i} = V_{i}^{2} G_{i i} - P_{i} \\ J_{i j} = V_{i} V_{j} (G_{i j} {cosθ}_{i j} + B_{i j} {sinθ}_{i j}), i &NotEqual; j \end{matrix} - - - (7)

\begin{matrix} L_{i i} = V_{i}^{2} B_{i i} - Q_{i} \\ L_{i j} = - V_{i} V_{j} (G_{i j} {sinθ}_{i j} - B_{i j} {cosθ}_{i j}), i &NotEqual; j \end{matrix} - - - (8)

Wherein, G_ii,B_ii,G_ij,B_ijRespectively admittance battle array non-zero entry, θ_ijIt is i, j node phase angle differences, P_i,Q_iFor i-node injects work( Rate, V_i,V_jRespectively i, j node voltage amplitude；

Matrix in block form each element is quantity of state node voltage phase angle, the four fundamental rules basic operation of admittance array element element and trigonometric function and Solution procedure is independent of each other, with obvious parallel characteristics；

4 kernel function correspondences being defined respectively and calculating each element value in matrix in block form, in single deck tape-recorder, m is cut-off failure Jacobi H squares Battle array amount of calculation：

H=m × h_nozero (9)

In CUDA frameworks, GPU has multiple SM stream handles, it is possible to provide multiple thread dispatchings.It is not any between h element Coupled relation, the Element-Level fine grained parallel characteristic with height defines kernel function fun_kernel_2_H and completes by formula (5) Formula vector multiplies and vectorial add operation, and CPU calls fun_kernel_2_H functions, and GPU is according to the incoming functions of fun_kernel_2_H Parameter enables h GPU multithreading, while perfect (5) parallel work-flow；

(3) admittance battle array Ym and start node injection rate are cut-off according to what (1) was formed, carries out node injecting power according to formula (10) residual Difference is calculated：

\begin{matrix} {ΔP}_{i} = P_{i s} - V_{i} \underset{j &Element; i}{Σ} V_{j} (G_{i j} {cosθ}_{i j} + B_{i j} {sinθ}_{i j}) \\ {ΔQ}_{i} = Q_{i s} - V_{i} \underset{j &Element; i}{Σ} V_{j} (G_{i j} {sinθ}_{i j} - B_{i j} {cosθ}_{i j}) \end{matrix} - - - (10)

Wherein, P_is,Q_isRespectively active, the idle injection rate of node i, as can be seen from the above equation, each node injection rate residual error is equal It is quantity of state node voltage phase angle, the four fundamental rules basic operation of admittance array element element and trigonometric function is asked with other node power residual errors Solution preocess is mutually independent of, and defines fun_kernel_3 kernel functions, enables multiple threads in SM stream handles, completes m and cut-offs The parallel computation task of all calculate node unbalanced power amounts of failure, detailed parallel computation process is similar to step (2)；

(4) check whether power residual error meets convergence basis for estimation, if it is satisfied, then (8) are jumped to, if be unsatisfactory for, after It is continuous to be transferred to (5) and be iterated；

(5) Jacobian matrix formed to (2) carries out LU decomposition, and the node power residual error calculated with reference to (3) carries out linear side Journey group is solved, and defines fun_kernel_5 kernel functions, and task-level parallelism solution is carried out to system of linear equations；

(6) the kernel function fun_kernel_6 of plus computing is defined as, for M forecast failure parallel task, M is needed altogether The individual thread synchronizations of × (2 (n-1)-r) perform fun_kernel_6 and complete a quantity of state renewal calculating；

(7) (2) are jumped to and continues iteration；

(8) according to it is current it is convergent cut-offs Fault load flow result of calculation, carry out Branch Power Flow calculating, due to Branch Power Flow only and branch Road parameter and both sides node voltage value are related, are mutually independent of, and define fun_kernel_7 kernel functions, complete Branch Power Flow parallel Calculate.

(9) according to (8) Branch Power Flow result of calculation, heavily loaded out-of-limit verification is carried out to each branch road or stable cross section, and preserve current It is all to cut-off the heavily loaded out-of-limit result that failure is caused, fun_kernel_8 kernel functions are defined, complete branch road branch road or stabilization is disconnected The face out-of-limit verification parallel computation of heavy duty.

5. a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+ as claimed in claim 4, it is characterised in that The step 5) (4th) step be specially：

First, fun_kernel_4 kernel functions are defined, GPU carries out convergence inspection to respectively cut-offfing node power residual error by formula (11) Look into, if certain cut-offs meets the condition of convergence, record this and cut-off convergence；

||ΔP^t,ΔQ^t| | ＜ ε t≤T (11)

Again, inconsistent due to respectively cut-offfing failure convergence iterative steps, setting k >=80% cut-offs failure and has restrained, then terminate tide Stream iterative process, otherwise all failures of cut-offfing proceed trend iteration, until meeting k >=80% condition of convergence or reaching most Big iterations, jumps to step (8).

6. a kind of many GPU heterogeneous schemas static security analysis computational methods of CPU+ as claimed in claim 4, it is characterised in that The step 5) (6th) step be defined as the kernel function fun_kernel_6 of plus computing and be：

x_{i}^{k + 1} = x_{i}^{k} + {Δx}_{i}^{k}, (i = 1, 2, ..., 2 (n - 1) - r) - - - (12)

In formula,Respectively before i-th flow equation state vector kth of system of linear equations time iteration, after iteration it is worth,It is kth time correction value, n is system node number, and r is PV node number.