CN106026107A - QR decomposition method of power flow Jacobian matrix for GPU acceleration - Google Patents

QR decomposition method of power flow Jacobian matrix for GPU acceleration Download PDF

Info

Publication number
CN106026107A
CN106026107A CN201610592223.0A CN201610592223A CN106026107A CN 106026107 A CN106026107 A CN 106026107A CN 201610592223 A CN201610592223 A CN 201610592223A CN 106026107 A CN106026107 A CN 106026107A
Authority
CN
China
Prior art keywords
thread
threadid
variable
matrix
row
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610592223.0A
Other languages
Chinese (zh)
Other versions
CN106026107B (en
Inventor
周赣
孙立成
张旭
柏瑞
冯燕钧
秦成明
傅萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201610592223.0A priority Critical patent/CN106026107B/en
Publication of CN106026107A publication Critical patent/CN106026107A/en
Application granted granted Critical
Publication of CN106026107B publication Critical patent/CN106026107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Abstract

The invention discloses a QR decomposition method of a power flow Jacobian matrix for GPU acceleration. The QR decomposition method comprises the steps of carrying out QR symbol decomposition on a Jacobian matrix J in a CPU so as to acquire a Household transformation matrix V and a sparse structure of an upper triangular matrix R; carrying out parallelized layering on each column of the matrix J according to the sparse structure of the matrix R; and calculating a sub-layer QR decomposition kernel function SparseQR according to a level increasing order in a GPU. According to the invention, the efficiency of QR decomposition of the power flow Jacobian matrix is improved by using a mode of combining the process of a CPU control program for processing basic data and the GPU for processing intensive floating-point calculation, and a problem great time consumption of flow calculation in power system steady-state security analysis is solved.

Description

The QR decomposition method of the direction of energy Jacobian matrix that a kind of GPU accelerates
Technical field
The invention belongs to High performance computing in power system application, particularly relate to the electric power that a kind of GPU accelerates The GPU threaded design method that the QR of Load Flow Jacobian Matrix decomposes.
Background technology
Load flow calculation is most widely used general, the most basic and most important electric computing of one in power system.At electricity In the research of the Force system method of operation and programme, it is required for carrying out Load flow calculation to compare the method for operation or rule Draw the feasibility of power supply plan, reliability and economy.Meanwhile, for the operation shape of real-time electric power monitoring system State, it is also desirable to carry out a large amount of and quick Load flow calculation.Therefore, in programming and planning and the fortune of schedule system During line mode, use off-line Load flow calculation;In the monitoring in real time of operation states of electric power system, then use online Load flow calculation.
And in actual production process, off-line trend and online power flow calculate all has this to the calculating speed of trend The highest requirement.In relating to planning and designing and arranging the off-line trend of the method for operation, because equipment lands scheme Complicated etc. situation, the kind needing simulation run is many, and Load flow calculation amount is big, and single Load flow calculation time effects is whole Body emulation duration;And the online power flow carried out in Operation of Electric Systems calculates calculating temporal sensitivity high, need To provide calculation of tidal current in real time, as at forecast accident, the tide of the equipment impact on static security out of service In stream calculation, system needs to calculate trend distribution under a large amount of forecast accident, and makes the operation side of anticipation in real time Formula Adjusted Option.
In traditional Newton-Laphson method Load flow calculation, update equation group solves and accounts for the 70% of the Load flow calculation time, The calculating speed that update equation group solves affects the overall performance of program.And along with CPU calculates what speed promoted Slowing down, the single Load flow calculation calculating time of present stage has reached a bottleneck.At present Load flow calculation is added Speed method is concentrated mainly on use cluster and multiple-core server carries out coarseness acceleration, during actual one-tenth produces to many trends The research accelerating single trend internal arithmetic is less.
GPU is a kind of many-core parallel processor, will be considerably beyond CPU in the quantity of processing unit.Tradition On GPU be only responsible for figure and render, and CPU has all been given in most process.Present GPU is Method battle array is a kind of multinuclear, and multithreading has powerful calculating ability and high bandwidth of memory, programmable process Device.Under universal computer model, GPU works as the coprocessor of CPU, is divided by task reasonable distribution Solution completes high-performance calculation.
Sparse vectors solves calculating and has concurrency.Equation system matrix number is carried out QR symbol decomposition After, obtain Household transformation matrix V and the sparsity structure of upper triangular matrix R battle array, sparse according to R battle array Structure, row each to matrix J carry out parallelization layering.Wherein the calculating of the row in every layer is separate, does not depend on The relation of relying, natural can be processed by parallel calculating, is suitable for GPU and accelerates.Therefore by CPU and GPU Between reasonably scheduling can be rapidly completed equation group coefficient matrix and carry out QR decomposition, and solve sparse linear side Journey group, Chinese scholars has begun to carry out GPU the method that sparse vectors accelerates to solve and carries out Research, but the most deep optimization threaded design, study merely computational threads design from the distribution of amount of calculation, To thread calculation, data directory mode is not furtherd investigate, it is impossible to make program give full play to GPU Advantage.
It would therefore be highly desirable to solution the problems referred to above.
Summary of the invention
Goal of the invention: for the deficiencies in the prior art, the present invention provide one to be greatly decreased the direction of energy is refined can More refined than the direction of energy of Matrix QR Decomposition calculating time a kind of GPU acceleration that Load flow calculation speed can be promoted QR decomposition method than matrix.
Load flow calculation: electrodynamic noun, refers in given power system network topology, component parameters and generating, bears Under lotus parameter conditions, calculate active power, reactive power and voltage distribution in power network.
Parallel computation: relative to serial arithmetic, is a kind of algorithm that once can perform multiple instruction, it is therefore an objective to carry The high speed that calculates, and by expanding problem solving scale, solves the large-scale and computational problem of complexity.
GPU: graphic process unit (English: GraphicsProcessingUnit, abbreviation: GPU).
The invention discloses the QR decomposition method of the direction of energy Jacobian matrix that a kind of GPU accelerates, described Method includes:
(1) CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household transformation matrix V and the sparsity structure of upper triangular matrix R battle array;According to the sparsity structure of R battle array, row each to matrix J are carried out parallel Hierarchies;
(2) GPU presses order calculating layering QR decomposition kernel function SparseQR that level is incremented by.
Wherein, in described step (1), the n of matrix J is arranged and is integrated in MaxLevel layer by parallelization layering, Belong to the row in same layer and carry out QR decomposition parallel;The quantity of every layer of row comprised is Levelnum (k), k table Show level number;In storage kth floor, all row number are to mapping table Mapk
Preferably, in described step (2), layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, its thread block size NthreadIt is fixed as 128, when k layer is calculated, thread block quantity Nblocks=Levelnum (k), total number of threads is: Nblocks×Nthreads;The order being incremented by according to level, calls Kernel function SparseQR < Levelnum (k), Nthreads> decompose all row belonging to kth layer; SparseQR < Levelnum (k), Nthreads> calculation process be:
((2.1) CUDA is the thread in each thread distribution thread block index blockID and thread block automatically Index threadID;
(2.2) blockID and threadID is assigned to variable bid and t, is indexed by bid and t afterwards T thread in bid thread block;
(2.3) bid thread block are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) row;
In (2.4) bid thread block, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, perform with Lower calculating:
First, and calculating intermediate variable β=2V (i:n, i)T(i:n, j), wherein (i:n i) is Household to V to J The vector that in transformation matrix V, the i-th~n row element of the i-th row is constituted, (i:n is j) in Jacobian matrix J the to J The vector that i-th~n row element of j row is constituted;Then, use formula J (i:n, j)=J (and i:n, j)-β V (i:n, I) the jth row of Jacobian matrix J are updated;
(2.5) the Household conversion vector of jth row is calculated;
(2.6) update J matrix jth to arrange: J (j, and j)=a, J (j+1:n, j)=0.
Preferably, in described step (2.4):
First, and calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), concrete calculation procedure is,
1) judging whether thread number t is less than n i, otherwise this thread terminates to perform;
2) the variable Temp+=2V in thread (i+t, i) × J (and i+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12) β=Cache (0);
Then, ((i:n, j) (i:n i) updates the jth of Jacobian matrix J to-β V for i:n, j)=J to use formula J Row, concretely comprise the following steps,
1) judging whether thread number t is less than n i, otherwise thread terminates to perform;
2) J (t+i, j)=J (and t+i, j)-β V (t+i, i);
3) t=t+128, returns 1).
Further, in described step (2.5),
First, variable a is calculated2=J (j:n, j)TJ (j:n, j), concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) the variable Temp, Temp+=J in thread (j+t, j) × J (and j+t, j);;
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)a2=Cache (0);
Secondly, calculate, V (j:n, j)=J (j:n, j) aej(j:n), it is wherein ejBe jth element be 1 Unit vector;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=J (t+j, j) aej(t+j);;
3) t=t+128, returns 1).
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) variable in thread be Temp, Temp+=V (j+t, j) × V (and j+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)b2=Cache (0);
Finally, V (j:n, j)=V (j:n, j)/b are calculated;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=V (t+j, j)/b;
3) t=t+128, returns 1).
Beneficial effect: compared with the prior art, the invention have the benefit that first the present invention uses CPU pair The Jacobian matrix J of the direction of energy carries out QR symbol decomposition, according to the sparse format of R battle array, it is possible to reduce no Necessary Floating-point Computation;Secondly being assigned to by each row of J battle array according to the sparsity structure of R battle array can be with parallel computation Different levels, and layering result is passed to GPU;Furthermore in GPU, press the order calculating layering that level is incremented by QR decompose kernel function SparseQR, and GPU thread have employed reduction algorithm process dot-product operation, carry High computational efficiency;The last present invention utilizes the flow process of CPU control program and processes at basic data and GPU Manage the pattern that intensive floating-point operation combines and improve the efficiency that direction of energy Jacobian matrix QR decomposes, solve The problem that Load flow calculation in power system static safety analysis of having determined is the biggest.
Accompanying drawing illustrates:
Fig. 1 is the example calculation time of the present invention;
Fig. 2 is the example layering result of the present invention;
Fig. 3 is the schematic flow sheet of the present invention.
Detailed description of the invention:
As it is shown on figure 3, the QR decomposition method of the direction of energy Jacobian matrix of a kind of GPU of present invention acceleration, Described method includes:
(1) CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household transformation matrix V and the sparsity structure of upper triangular matrix R matrix, the sparsity structure of the J after symbol decomposition is equal to V+R; According to the sparsity structure of R battle array, row each to matrix J carry out parallelization layering.
(2) GPU presses order calculating layering QR decomposition kernel function SparseQR that level is incremented by.
One, CPU carries out QR symbol decomposition method to direction of energy Jacobian matrix J
First, CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household conversion Matrix V and the sparsity structure of upper triangular matrix R battle array, the sparsity structure of the J after symbol decomposition is equal to V+R; Then, the n of matrix J is arranged and is integrated in MaxLevel layer by parallelization layering, belongs to the row in same layer also Row carries out QR decomposition;The quantity of every layer of row comprised is that Levelnum (k), k represent level number;Storage kth layer In all row number to mapping table Mapk.Finally, GPU is calculated desired data and is transferred to GPU by CPU, number According to including: Jacobian matrix J, its dimension n, upper triangular matrix R battle array, number of plies MaxLevel, every layer comprises Columns Levelnum and mapping table Map.
Wherein QR symbol decomposition principle and the parallelization principle of stratification see
" DirectMethodsforSparseLinearSystems " TimothyA.Davis, SIAM, Philadelphia, 2006.The QR symbol that this patent uses decomposes and parallelization blocking routine sees CSparse:
AConciseSparseMatrixpackage.VERSION3.1.4, Copyright (c) 2006-2014, TimothyA.Davis, Oct10,2014.Household shift theory sees document: Hu Bingxin, Li Ning, Lv Jun. a kind of complex QR decomposition algorithm [J] using Household conversion Recursive Implementation. Journal of System Simulation, 2004,16 (11): 2432-2434.
Two, GPU presses sequence starting layering QR decomposition kernel function SparseQR that level is incremented by
Layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, its thread block size Nthread It is fixed as 128, when k layer is calculated, thread block quantity Nblocks=Levelnum (k), total number of threads For: Nblocks×Nthreads;According to level be incremented by order, call kernel function SparseQR < Levelnum (k), Nthreads> decompose all row belonging to kth layer.
SparseQR < Levelnum (k), Nthreads> calculation process be:
(1) the thread rope during CUDA is each thread distribution thread block index blockID and thread block automatically Draw threadID;
(2) blockID and threadID is assigned to variable bid and t, is indexed by bid and t afterwards T thread in bid thread block;
(3) bid thread block are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) row;
In (4) bid thread block, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, below execution Calculate:
First, and calculating intermediate variable β=2V (i:n, i)T(i:n, j), wherein (i:n i) is Household to V to J The vector that in transformation matrix V, the i-th~n row element of the i-th row is constituted, (i:n is j) in Jacobian matrix J the to J The vector that i-th~n row element of j row is constituted;Then, use formula J (i:n, j)=J (and i:n, j)-β V (i:n, I) the jth row of Jacobian matrix J are updated;
First, and calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), concrete calculation procedure is,
1) judging whether thread number t is less than n i, otherwise this thread terminates to perform;
2) the variable Temp+=2V in thread (i+t, i) × J (and i+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12) β=Cache (0);
Then, ((i:n, j) (i:n i) updates the jth of Jacobian matrix J to-β V for i:n, j)=J to use formula J Row, concretely comprise the following steps,
1) judging whether thread number t is less than n i, otherwise thread terminates to perform;
2) J (t+i, j)=J (and t+i, j)-β V (t+i, i);
3) t=t+128, returns 1).
(5) the Household conversion calculating jth row is vectorial:
First, variable a is calculated2=J (j:n, j)TJ (j:n, j), concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) the variable Temp, Temp+=J in thread (j+t, j) × J (and j+t, j);;
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)a2=Cache (0);
Secondly, calculate, V (j:n, j)=J (j:n, j) aej(j:n), it is wherein ejBe jth element be 1 Unit vector;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=J (t+j, j) aej(t+j);;
3) t=t+128, returns 1).
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) variable in thread be Temp, Temp+=V (j+t, j) × V (and j+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)b2=Cache (0);
Finally, V (j:n, j)=V (j:n, j)/b are calculated;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=V (t+j, j)/b;
3) t=t+128, returns 1).
(6) update J battle array jth to arrange: J (j, and j)=a, J (j+1:n, j)=0.
GPU used in the present invention calculate platform be equipped with TeslaK20CGPU card with The peak bandwidth of IntelXeonE5-2620CPU, GPU is up to 208GB/s, single-precision floating point amount of calculation peak value Up to 3.52Tflops, CPU frequency is 2GHz.CPU calculates platform and is equipped with The CPU of IntelCorei7-3520M2.90GHz.Calculate on platform respectively to 16557 dimensions at CPU and GPU Sparse Jacobian matrix example is tested, and Fig. 1 is that the 16557 sparse Jacobian matrixes of dimension calculate storehouse in difference On the calculating time.The QR Algorithm for Solving time using two-stage paralleling tactic on GPU is 67ms, falls behind In the value decomposition time of KLU, the slightly fast and calculating time of UMPACK.GPU accelerating algorithm lags behind KLU calculates the reason of time and is mainly every layer of columns comprised along with the number of plies increase minimizing, the 1st layer of bag rapidly Containing 1642 row, first 30 layers every layer arranges more than 100, but only remains 1~2 row after 207 layers.Above level In can the number of columns of parallel computation many, after row in level need serial to perform, it is impossible to give full play to GPU Calculated performance.Fig. 2 is that the degree of parallelism change curve of the 16557 sparse Jacobian matrixes of dimension (eliminates every layer of bag of tree The columns contained, vertical coordinate denary logarithm represents).

Claims (5)

1. the QR decomposition method of the direction of energy Jacobian matrix of a GPU acceleration, it is characterised in that: institute The method of stating includes:
(1) CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household transformation matrix V and the sparsity structure of upper triangular matrix R battle array;According to the sparsity structure of R battle array, row each to matrix J are carried out parallel Hierarchies;
(2) GPU presses order calculating layering QR decomposition kernel function SparseQR that level is incremented by.
The QR decomposition method of the direction of energy Jacobian matrix that GPU the most according to claim 1 accelerates, It is characterized in that: in described step (1), the n of matrix J is arranged and is integrated into MaxLevel by parallelization layering In Ceng, belong to the row in same layer and carry out QR decomposition parallel;The quantity of every layer of row comprised is Levelnum (k), K represents level number;In storage kth floor, all row number are to mapping table Mapk
The QR decomposition method of the direction of energy Jacobian matrix that GPU the most according to claim 1 accelerates, It is characterized in that: in described step (2), layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, its thread block size NthreadIt is fixed as 128, when k layer is calculated, thread block quantity Nblocks=Levelnum (k), total number of threads is: Nblocks×Nthreads;The order being incremented by according to level, calls Kernel function SparseQR < Levelnum (k), Nthreads> decompose all row belonging to kth layer; SparseQR < Levelnum (k), Nthreads> calculation process be:
(2.1) the thread rope during CUDA is each thread distribution thread block index blockID and thread block automatically Draw threadID;
(2.2) blockID and threadID is assigned to variable bid and t, is indexed by bid and t afterwards T thread in bid thread block;
(2.3) bid thread block are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) row;
In (2.4) bid thread block, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, perform with Lower calculating:
First, and calculating intermediate variable β=2V (i:n, i)T(i:n, j), wherein (i:n i) is Household to V to J The vector that in transformation matrix V, the i-th~n row element of the i-th row is constituted, (i:n is j) in Jacobian matrix J the to J The vector that i-th~n row element of j row is constituted;Then, use formula J (i:n, j)=J (and i:n, j)-β V (i:n, I) the jth row of Jacobian matrix J are updated;
(2.5) the Household conversion vector of jth row is calculated;
(2.6) update J matrix jth to arrange: J (j, and j)=a, J (j+1:n, j)=0.
The QR decomposition side of the direction of energy Jacobian matrix that GPU the most according to claim 3 accelerates Method, it is characterised in that: in described step (2.4):
First, and calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), concrete calculation procedure is,
1) judging whether thread number t is less than n i, otherwise this thread terminates to perform;
2) the variable Temp+=2V in thread (i+t, i) × J (and i+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12) β=Cache (0);
Then, ((i:n, j) (i:n i) updates the jth of Jacobian matrix J to-β V for i:n, j)=J to use formula J Row, concretely comprise the following steps,
1) judging whether thread number t is less than n i, otherwise thread terminates to perform;
2) J (t+i, j)=J (and t+i, j)-β V (t+i, i);
3) t=t+128, returns 1).
The QR decomposition method of the direction of energy Jacobian matrix that GPU the most according to claim 3 accelerates, It is characterized in that: in described step (2.5),
First, variable a is calculated2=J (j:n, j)TJ (j:n, j), concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) the variable Temp, Temp+=J in thread (j+t, j) × J (and j+t, j);;
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)a2=Cache (0);
Secondly, calculate, V (j:n, j)=J (j:n, j) aej(j:n), it is wherein ejBe jth element be 1 Unit vector;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=J (t+j, j) aej(t+j);;
3) t=t+128, returns 1).
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) variable in thread be Temp, Temp+=V (j+t, j) × V (and j+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)b2=Cache (0);
Finally, V (j:n, j)=V (j:n, j)/b are calculated;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=V (t+j, j)/b;
3) t=t+128, returns 1).
CN201610592223.0A 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates Active CN106026107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610592223.0A CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610592223.0A CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Publications (2)

Publication Number Publication Date
CN106026107A true CN106026107A (en) 2016-10-12
CN106026107B CN106026107B (en) 2019-01-29

Family

ID=57113899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610592223.0A Active CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Country Status (1)

Country Link
CN (1) CN106026107B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368454A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN107368368A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
CN110718919A (en) * 2019-09-25 2020-01-21 北京交通大学 GPU acceleration-based large power grid static safety analysis fault screening method
CN111313413A (en) * 2020-03-20 2020-06-19 国网浙江省电力公司 Power system state estimation method based on parallel acceleration of graphics processor
CN111416441A (en) * 2020-04-09 2020-07-14 东南大学 Power grid topology analysis method based on GPU hierarchical acceleration

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078226A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Sparse Matrix-Vector Multiplication on Graphics Processor Units
CN103617150A (en) * 2013-11-19 2014-03-05 国家电网公司 GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN103793590A (en) * 2012-11-01 2014-05-14 同济大学 GPU-based computation method for quickly solving power flow in distribution networks
CN104158182A (en) * 2014-08-18 2014-11-19 国家电网公司 Large-scale power grid flow correction equation parallel solving method
CN104484234A (en) * 2014-11-21 2015-04-01 中国电力科学研究院 Multi-front load flow calculation method and system based on GPU (graphics processing unit)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110078226A1 (en) * 2009-09-30 2011-03-31 International Business Machines Corporation Sparse Matrix-Vector Multiplication on Graphics Processor Units
CN103793590A (en) * 2012-11-01 2014-05-14 同济大学 GPU-based computation method for quickly solving power flow in distribution networks
CN103617150A (en) * 2013-11-19 2014-03-05 国家电网公司 GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN104158182A (en) * 2014-08-18 2014-11-19 国家电网公司 Large-scale power grid flow correction equation parallel solving method
CN104484234A (en) * 2014-11-21 2015-04-01 中国电力科学研究院 Multi-front load flow calculation method and system based on GPU (graphics processing unit)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
梁阳豆: "CUDA平台下的电力系统最优潮流并行计算研究", 《中国优秀硕士学位论文全文数据库》 *
穆帅等: "基于GPU的多层次并行QR分解算法研究", 《计算机仿真》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368454A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN107368368A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
CN110718919A (en) * 2019-09-25 2020-01-21 北京交通大学 GPU acceleration-based large power grid static safety analysis fault screening method
CN110718919B (en) * 2019-09-25 2021-06-01 北京交通大学 GPU acceleration-based large power grid static safety analysis fault screening method
CN111313413A (en) * 2020-03-20 2020-06-19 国网浙江省电力公司 Power system state estimation method based on parallel acceleration of graphics processor
CN111416441A (en) * 2020-04-09 2020-07-14 东南大学 Power grid topology analysis method based on GPU hierarchical acceleration

Also Published As

Publication number Publication date
CN106026107B (en) 2019-01-29

Similar Documents

Publication Publication Date Title
CN106026107A (en) QR decomposition method of power flow Jacobian matrix for GPU acceleration
CN106157176B (en) A kind of LU decomposition method for the direction of energy Jacobian matrix that GPU accelerates
CN103617150A (en) GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
WO2018133348A1 (en) Static security analysis computation method, apparatus, and computer storage medium
CN101694940B (en) Optimal power flow implementation method considering transient security constraints
CN104158182B (en) A kind of large scale electric network trend update equation Parallel implementation method
CN103607466B (en) A kind of wide-area multi-stage distributed parallel grid analysis method based on cloud computing
CN105391057A (en) GPU thread design method of power flow Jacobian matrix calculation
CN101706741A (en) Method for partitioning dynamic tasks of CPU and GPU based on load balance
CN106407158A (en) GPU accelerated method for performing batch processing of isomorphic sparse matrixes multiplied by full vectors
CN105576648A (en) Static security analysis double-layer parallel method based on GPU-CUP heterogeneous computing platform
CN101833438A (en) General data processing method based on multiple parallel
CN106354479B (en) A kind of GPU acceleration QR decomposition method of a large amount of isomorphism sparse matrixes
CN102841881A (en) Multiple integral computing method based on many-core processor
CN102902657A (en) Method for accelerating FFT (Fast Fourier Transform) by using GPU (Graphic Processing Unit)
CN109753682B (en) Finite element stiffness matrix simulation method based on GPU (graphics processing Unit) end
CN107368454A (en) A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN106021943B (en) A kind of DC Line Fault screening technique of combination GPU software and hardware architecture features design
CN107256203A (en) The implementation method and device of a kind of matrix-vector multiplication
CN105955712A (en) Direct current fault screening method based on GPU acceleration
CN107423259A (en) A kind of GPU of domino optimization accelerates trigonometric equation group back substitution method on electric power
Wenjie et al. An expansion-aided synchronous conservative time management algorithm on GPU
CN109992860A (en) Electro-magnetic transient parallel simulation method and system based on GPU
CN107368368A (en) A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
CN103793745B (en) A kind of distributed particle group optimizing method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210009 No. 87 Dingjiaqiao, Gulou District, Nanjing City, Jiangsu Province

Applicant after: Southeast University

Address before: No. 2, four archway in Xuanwu District, Nanjing, Jiangsu

Applicant before: Southeast University

GR01 Patent grant
GR01 Patent grant