CN106026107A - QR decomposition method of power flow Jacobian matrix for GPU acceleration - Google Patents
QR decomposition method of power flow Jacobian matrix for GPU acceleration Download PDFInfo
- Publication number
- CN106026107A CN106026107A CN201610592223.0A CN201610592223A CN106026107A CN 106026107 A CN106026107 A CN 106026107A CN 201610592223 A CN201610592223 A CN 201610592223A CN 106026107 A CN106026107 A CN 106026107A
- Authority
- CN
- China
- Prior art keywords
- thread
- threadid
- variable
- matrix
- row
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J3/00—Circuit arrangements for ac mains or ac distribution networks
- H02J3/04—Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
- H02J3/06—Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
-
- H—ELECTRICITY
- H02—GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
- H02J—CIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
- H02J2203/00—Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
- H02J2203/20—Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
Landscapes
- Engineering & Computer Science (AREA)
- Power Engineering (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a QR decomposition method of a power flow Jacobian matrix for GPU acceleration. The QR decomposition method comprises the steps of carrying out QR symbol decomposition on a Jacobian matrix J in a CPU so as to acquire a Household transformation matrix V and a sparse structure of an upper triangular matrix R; carrying out parallelized layering on each column of the matrix J according to the sparse structure of the matrix R; and calculating a sub-layer QR decomposition kernel function SparseQR according to a level increasing order in a GPU. According to the invention, the efficiency of QR decomposition of the power flow Jacobian matrix is improved by using a mode of combining the process of a CPU control program for processing basic data and the GPU for processing intensive floating-point calculation, and a problem great time consumption of flow calculation in power system steady-state security analysis is solved.
Description
Technical field
The invention belongs to High performance computing in power system application, particularly relate to the electric power that a kind of GPU accelerates
The GPU threaded design method that the QR of Load Flow Jacobian Matrix decomposes.
Background technology
Load flow calculation is most widely used general, the most basic and most important electric computing of one in power system.At electricity
In the research of the Force system method of operation and programme, it is required for carrying out Load flow calculation to compare the method for operation or rule
Draw the feasibility of power supply plan, reliability and economy.Meanwhile, for the operation shape of real-time electric power monitoring system
State, it is also desirable to carry out a large amount of and quick Load flow calculation.Therefore, in programming and planning and the fortune of schedule system
During line mode, use off-line Load flow calculation;In the monitoring in real time of operation states of electric power system, then use online
Load flow calculation.
And in actual production process, off-line trend and online power flow calculate all has this to the calculating speed of trend
The highest requirement.In relating to planning and designing and arranging the off-line trend of the method for operation, because equipment lands scheme
Complicated etc. situation, the kind needing simulation run is many, and Load flow calculation amount is big, and single Load flow calculation time effects is whole
Body emulation duration;And the online power flow carried out in Operation of Electric Systems calculates calculating temporal sensitivity high, need
To provide calculation of tidal current in real time, as at forecast accident, the tide of the equipment impact on static security out of service
In stream calculation, system needs to calculate trend distribution under a large amount of forecast accident, and makes the operation side of anticipation in real time
Formula Adjusted Option.
In traditional Newton-Laphson method Load flow calculation, update equation group solves and accounts for the 70% of the Load flow calculation time,
The calculating speed that update equation group solves affects the overall performance of program.And along with CPU calculates what speed promoted
Slowing down, the single Load flow calculation calculating time of present stage has reached a bottleneck.At present Load flow calculation is added
Speed method is concentrated mainly on use cluster and multiple-core server carries out coarseness acceleration, during actual one-tenth produces to many trends
The research accelerating single trend internal arithmetic is less.
GPU is a kind of many-core parallel processor, will be considerably beyond CPU in the quantity of processing unit.Tradition
On GPU be only responsible for figure and render, and CPU has all been given in most process.Present GPU is
Method battle array is a kind of multinuclear, and multithreading has powerful calculating ability and high bandwidth of memory, programmable process
Device.Under universal computer model, GPU works as the coprocessor of CPU, is divided by task reasonable distribution
Solution completes high-performance calculation.
Sparse vectors solves calculating and has concurrency.Equation system matrix number is carried out QR symbol decomposition
After, obtain Household transformation matrix V and the sparsity structure of upper triangular matrix R battle array, sparse according to R battle array
Structure, row each to matrix J carry out parallelization layering.Wherein the calculating of the row in every layer is separate, does not depend on
The relation of relying, natural can be processed by parallel calculating, is suitable for GPU and accelerates.Therefore by CPU and GPU
Between reasonably scheduling can be rapidly completed equation group coefficient matrix and carry out QR decomposition, and solve sparse linear side
Journey group, Chinese scholars has begun to carry out GPU the method that sparse vectors accelerates to solve and carries out
Research, but the most deep optimization threaded design, study merely computational threads design from the distribution of amount of calculation,
To thread calculation, data directory mode is not furtherd investigate, it is impossible to make program give full play to GPU
Advantage.
It would therefore be highly desirable to solution the problems referred to above.
Summary of the invention
Goal of the invention: for the deficiencies in the prior art, the present invention provide one to be greatly decreased the direction of energy is refined can
More refined than the direction of energy of Matrix QR Decomposition calculating time a kind of GPU acceleration that Load flow calculation speed can be promoted
QR decomposition method than matrix.
Load flow calculation: electrodynamic noun, refers in given power system network topology, component parameters and generating, bears
Under lotus parameter conditions, calculate active power, reactive power and voltage distribution in power network.
Parallel computation: relative to serial arithmetic, is a kind of algorithm that once can perform multiple instruction, it is therefore an objective to carry
The high speed that calculates, and by expanding problem solving scale, solves the large-scale and computational problem of complexity.
GPU: graphic process unit (English: GraphicsProcessingUnit, abbreviation: GPU).
The invention discloses the QR decomposition method of the direction of energy Jacobian matrix that a kind of GPU accelerates, described
Method includes:
(1) CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household transformation matrix
V and the sparsity structure of upper triangular matrix R battle array;According to the sparsity structure of R battle array, row each to matrix J are carried out parallel
Hierarchies;
(2) GPU presses order calculating layering QR decomposition kernel function SparseQR that level is incremented by.
Wherein, in described step (1), the n of matrix J is arranged and is integrated in MaxLevel layer by parallelization layering,
Belong to the row in same layer and carry out QR decomposition parallel;The quantity of every layer of row comprised is Levelnum (k), k table
Show level number;In storage kth floor, all row number are to mapping table Mapk。
Preferably, in described step (2), layering QR decomposes kernel function and is defined as SparseQR < Nblocks,
Nthreads>, its thread block size NthreadIt is fixed as 128, when k layer is calculated, thread block quantity
Nblocks=Levelnum (k), total number of threads is: Nblocks×Nthreads;The order being incremented by according to level, calls
Kernel function SparseQR < Levelnum (k), Nthreads> decompose all row belonging to kth layer;
SparseQR < Levelnum (k), Nthreads> calculation process be:
((2.1) CUDA is the thread in each thread distribution thread block index blockID and thread block automatically
Index threadID;
(2.2) blockID and threadID is assigned to variable bid and t, is indexed by bid and t afterwards
T thread in bid thread block;
(2.3) bid thread block are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) row;
In (2.4) bid thread block, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, perform with
Lower calculating:
First, and calculating intermediate variable β=2V (i:n, i)T(i:n, j), wherein (i:n i) is Household to V to J
The vector that in transformation matrix V, the i-th~n row element of the i-th row is constituted, (i:n is j) in Jacobian matrix J the to J
The vector that i-th~n row element of j row is constituted;Then, use formula J (i:n, j)=J (and i:n, j)-β V (i:n,
I) the jth row of Jacobian matrix J are updated;
(2.5) the Household conversion vector of jth row is calculated;
(2.6) update J matrix jth to arrange: J (j, and j)=a, J (j+1:n, j)=0.
Preferably, in described step (2.4):
First, and calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), concrete calculation procedure is,
1) judging whether thread number t is less than n i, otherwise this thread terminates to perform;
2) the variable Temp+=2V in thread (i+t, i) × J (and i+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12) β=Cache (0);
Then, ((i:n, j) (i:n i) updates the jth of Jacobian matrix J to-β V for i:n, j)=J to use formula J
Row, concretely comprise the following steps,
1) judging whether thread number t is less than n i, otherwise thread terminates to perform;
2) J (t+i, j)=J (and t+i, j)-β V (t+i, i);
3) t=t+128, returns 1).
Further, in described step (2.5),
First, variable a is calculated2=J (j:n, j)TJ (j:n, j), concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) the variable Temp, Temp+=J in thread (j+t, j) × J (and j+t, j);;
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)a2=Cache (0);
Secondly, calculate, V (j:n, j)=J (j:n, j) aej(j:n), it is wherein ejBe jth element be 1
Unit vector;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=J (t+j, j) aej(t+j);;
3) t=t+128, returns 1).
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) variable in thread be Temp, Temp+=V (j+t, j) × V (and j+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)b2=Cache (0);
Finally, V (j:n, j)=V (j:n, j)/b are calculated;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=V (t+j, j)/b;
3) t=t+128, returns 1).
Beneficial effect: compared with the prior art, the invention have the benefit that first the present invention uses CPU pair
The Jacobian matrix J of the direction of energy carries out QR symbol decomposition, according to the sparse format of R battle array, it is possible to reduce no
Necessary Floating-point Computation;Secondly being assigned to by each row of J battle array according to the sparsity structure of R battle array can be with parallel computation
Different levels, and layering result is passed to GPU;Furthermore in GPU, press the order calculating layering that level is incremented by
QR decompose kernel function SparseQR, and GPU thread have employed reduction algorithm process dot-product operation, carry
High computational efficiency;The last present invention utilizes the flow process of CPU control program and processes at basic data and GPU
Manage the pattern that intensive floating-point operation combines and improve the efficiency that direction of energy Jacobian matrix QR decomposes, solve
The problem that Load flow calculation in power system static safety analysis of having determined is the biggest.
Accompanying drawing illustrates:
Fig. 1 is the example calculation time of the present invention;
Fig. 2 is the example layering result of the present invention;
Fig. 3 is the schematic flow sheet of the present invention.
Detailed description of the invention:
As it is shown on figure 3, the QR decomposition method of the direction of energy Jacobian matrix of a kind of GPU of present invention acceleration,
Described method includes:
(1) CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household transformation matrix
V and the sparsity structure of upper triangular matrix R matrix, the sparsity structure of the J after symbol decomposition is equal to V+R;
According to the sparsity structure of R battle array, row each to matrix J carry out parallelization layering.
(2) GPU presses order calculating layering QR decomposition kernel function SparseQR that level is incremented by.
One, CPU carries out QR symbol decomposition method to direction of energy Jacobian matrix J
First, CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household conversion
Matrix V and the sparsity structure of upper triangular matrix R battle array, the sparsity structure of the J after symbol decomposition is equal to V+R;
Then, the n of matrix J is arranged and is integrated in MaxLevel layer by parallelization layering, belongs to the row in same layer also
Row carries out QR decomposition;The quantity of every layer of row comprised is that Levelnum (k), k represent level number;Storage kth layer
In all row number to mapping table Mapk.Finally, GPU is calculated desired data and is transferred to GPU by CPU, number
According to including: Jacobian matrix J, its dimension n, upper triangular matrix R battle array, number of plies MaxLevel, every layer comprises
Columns Levelnum and mapping table Map.
Wherein QR symbol decomposition principle and the parallelization principle of stratification see
" DirectMethodsforSparseLinearSystems " TimothyA.Davis, SIAM, Philadelphia,
2006.The QR symbol that this patent uses decomposes and parallelization blocking routine sees CSparse:
AConciseSparseMatrixpackage.VERSION3.1.4, Copyright (c) 2006-2014,
TimothyA.Davis, Oct10,2014.Household shift theory sees document: Hu Bingxin, Li Ning,
Lv Jun. a kind of complex QR decomposition algorithm [J] using Household conversion Recursive Implementation. Journal of System Simulation,
2004,16 (11): 2432-2434.
Two, GPU presses sequence starting layering QR decomposition kernel function SparseQR that level is incremented by
Layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, its thread block size Nthread
It is fixed as 128, when k layer is calculated, thread block quantity Nblocks=Levelnum (k), total number of threads
For: Nblocks×Nthreads;According to level be incremented by order, call kernel function SparseQR < Levelnum (k),
Nthreads> decompose all row belonging to kth layer.
SparseQR < Levelnum (k), Nthreads> calculation process be:
(1) the thread rope during CUDA is each thread distribution thread block index blockID and thread block automatically
Draw threadID;
(2) blockID and threadID is assigned to variable bid and t, is indexed by bid and t afterwards
T thread in bid thread block;
(3) bid thread block are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) row;
In (4) bid thread block, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, below execution
Calculate:
First, and calculating intermediate variable β=2V (i:n, i)T(i:n, j), wherein (i:n i) is Household to V to J
The vector that in transformation matrix V, the i-th~n row element of the i-th row is constituted, (i:n is j) in Jacobian matrix J the to J
The vector that i-th~n row element of j row is constituted;Then, use formula J (i:n, j)=J (and i:n, j)-β V (i:n,
I) the jth row of Jacobian matrix J are updated;
First, and calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), concrete calculation procedure is,
1) judging whether thread number t is less than n i, otherwise this thread terminates to perform;
2) the variable Temp+=2V in thread (i+t, i) × J (and i+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12) β=Cache (0);
Then, ((i:n, j) (i:n i) updates the jth of Jacobian matrix J to-β V for i:n, j)=J to use formula J
Row, concretely comprise the following steps,
1) judging whether thread number t is less than n i, otherwise thread terminates to perform;
2) J (t+i, j)=J (and t+i, j)-β V (t+i, i);
3) t=t+128, returns 1).
(5) the Household conversion calculating jth row is vectorial:
First, variable a is calculated2=J (j:n, j)TJ (j:n, j), concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) the variable Temp, Temp+=J in thread (j+t, j) × J (and j+t, j);;
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)a2=Cache (0);
Secondly, calculate, V (j:n, j)=J (j:n, j) aej(j:n), it is wherein ejBe jth element be 1
Unit vector;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=J (t+j, j) aej(t+j);;
3) t=t+128, returns 1).
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) variable in thread be Temp, Temp+=V (j+t, j) × V (and j+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)b2=Cache (0);
Finally, V (j:n, j)=V (j:n, j)/b are calculated;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=V (t+j, j)/b;
3) t=t+128, returns 1).
(6) update J battle array jth to arrange: J (j, and j)=a, J (j+1:n, j)=0.
GPU used in the present invention calculate platform be equipped with TeslaK20CGPU card with
The peak bandwidth of IntelXeonE5-2620CPU, GPU is up to 208GB/s, single-precision floating point amount of calculation peak value
Up to 3.52Tflops, CPU frequency is 2GHz.CPU calculates platform and is equipped with
The CPU of IntelCorei7-3520M2.90GHz.Calculate on platform respectively to 16557 dimensions at CPU and GPU
Sparse Jacobian matrix example is tested, and Fig. 1 is that the 16557 sparse Jacobian matrixes of dimension calculate storehouse in difference
On the calculating time.The QR Algorithm for Solving time using two-stage paralleling tactic on GPU is 67ms, falls behind
In the value decomposition time of KLU, the slightly fast and calculating time of UMPACK.GPU accelerating algorithm lags behind
KLU calculates the reason of time and is mainly every layer of columns comprised along with the number of plies increase minimizing, the 1st layer of bag rapidly
Containing 1642 row, first 30 layers every layer arranges more than 100, but only remains 1~2 row after 207 layers.Above level
In can the number of columns of parallel computation many, after row in level need serial to perform, it is impossible to give full play to GPU
Calculated performance.Fig. 2 is that the degree of parallelism change curve of the 16557 sparse Jacobian matrixes of dimension (eliminates every layer of bag of tree
The columns contained, vertical coordinate denary logarithm represents).
Claims (5)
1. the QR decomposition method of the direction of energy Jacobian matrix of a GPU acceleration, it is characterised in that: institute
The method of stating includes:
(1) CPU carries out QR symbol decomposition to Jacobian matrix J, obtain Household transformation matrix
V and the sparsity structure of upper triangular matrix R battle array;According to the sparsity structure of R battle array, row each to matrix J are carried out parallel
Hierarchies;
(2) GPU presses order calculating layering QR decomposition kernel function SparseQR that level is incremented by.
The QR decomposition method of the direction of energy Jacobian matrix that GPU the most according to claim 1 accelerates,
It is characterized in that: in described step (1), the n of matrix J is arranged and is integrated into MaxLevel by parallelization layering
In Ceng, belong to the row in same layer and carry out QR decomposition parallel;The quantity of every layer of row comprised is Levelnum (k),
K represents level number;In storage kth floor, all row number are to mapping table Mapk。
The QR decomposition method of the direction of energy Jacobian matrix that GPU the most according to claim 1 accelerates,
It is characterized in that: in described step (2), layering QR decomposes kernel function and is defined as SparseQR < Nblocks,
Nthreads>, its thread block size NthreadIt is fixed as 128, when k layer is calculated, thread block quantity
Nblocks=Levelnum (k), total number of threads is: Nblocks×Nthreads;The order being incremented by according to level, calls
Kernel function SparseQR < Levelnum (k), Nthreads> decompose all row belonging to kth layer;
SparseQR < Levelnum (k), Nthreads> calculation process be:
(2.1) the thread rope during CUDA is each thread distribution thread block index blockID and thread block automatically
Draw threadID;
(2.2) blockID and threadID is assigned to variable bid and t, is indexed by bid and t afterwards
T thread in bid thread block;
(2.3) bid thread block are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) row;
In (2.4) bid thread block, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, perform with
Lower calculating:
First, and calculating intermediate variable β=2V (i:n, i)T(i:n, j), wherein (i:n i) is Household to V to J
The vector that in transformation matrix V, the i-th~n row element of the i-th row is constituted, (i:n is j) in Jacobian matrix J the to J
The vector that i-th~n row element of j row is constituted;Then, use formula J (i:n, j)=J (and i:n, j)-β V (i:n,
I) the jth row of Jacobian matrix J are updated;
(2.5) the Household conversion vector of jth row is calculated;
(2.6) update J matrix jth to arrange: J (j, and j)=a, J (j+1:n, j)=0.
The QR decomposition side of the direction of energy Jacobian matrix that GPU the most according to claim 3 accelerates
Method, it is characterised in that: in described step (2.4):
First, and calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), concrete calculation procedure is,
1) judging whether thread number t is less than n i, otherwise this thread terminates to perform;
2) the variable Temp+=2V in thread (i+t, i) × J (and i+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12) β=Cache (0);
Then, ((i:n, j) (i:n i) updates the jth of Jacobian matrix J to-β V for i:n, j)=J to use formula J
Row, concretely comprise the following steps,
1) judging whether thread number t is less than n i, otherwise thread terminates to perform;
2) J (t+i, j)=J (and t+i, j)-β V (t+i, i);
3) t=t+128, returns 1).
The QR decomposition method of the direction of energy Jacobian matrix that GPU the most according to claim 3 accelerates,
It is characterized in that: in described step (2.5),
First, variable a is calculated2=J (j:n, j)TJ (j:n, j), concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) the variable Temp, Temp+=J in thread (j+t, j) × J (and j+t, j);;
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)a2=Cache (0);
Secondly, calculate, V (j:n, j)=J (j:n, j) aej(j:n), it is wherein ejBe jth element be 1
Unit vector;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=J (t+j, j) aej(t+j);;
3) t=t+128, returns 1).
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Concrete calculation procedure is as follows:
1) judging whether thread number t is less than n j, otherwise this thread terminates to perform;
2) variable in thread be Temp, Temp+=V (j+t, j) × V (and j+t, j);
3) t=t+128, returns 1) perform;
4) in thread, the value of Temp is assigned to shared drive variable Cache [threadID];
5) thread in thread block is synchronized;
6) variable M=128/2;
7) 8 are performed when M is not equal to 0);Otherwise terminate to calculate, jump to 12);
8) judge that threadID performs less than M: Cache [threadID] +=Cache [threadID+M], no
Then thread does not performs;
9) thread in thread block is synchronized;
10) M/=2;
11) 7 are returned) perform;
12)b2=Cache (0);
Finally, V (j:n, j)=V (j:n, j)/b are calculated;
Specifically comprise the following steps that
1) judging whether thread number t is less than n j, otherwise thread terminates to perform;
2) V (t+j, j)=V (t+j, j)/b;
3) t=t+128, returns 1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610592223.0A CN106026107B (en) | 2016-07-26 | 2016-07-26 | A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610592223.0A CN106026107B (en) | 2016-07-26 | 2016-07-26 | A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106026107A true CN106026107A (en) | 2016-10-12 |
CN106026107B CN106026107B (en) | 2019-01-29 |
Family
ID=57113899
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610592223.0A Active CN106026107B (en) | 2016-07-26 | 2016-07-26 | A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106026107B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368368A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method |
CN107368454A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating |
CN110718919A (en) * | 2019-09-25 | 2020-01-21 | 北京交通大学 | GPU acceleration-based large power grid static safety analysis fault screening method |
CN111313413A (en) * | 2020-03-20 | 2020-06-19 | 国网浙江省电力公司 | Power system state estimation method based on parallel acceleration of graphics processor |
CN111416441A (en) * | 2020-04-09 | 2020-07-14 | 东南大学 | Power grid topology analysis method based on GPU hierarchical acceleration |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110078226A1 (en) * | 2009-09-30 | 2011-03-31 | International Business Machines Corporation | Sparse Matrix-Vector Multiplication on Graphics Processor Units |
CN103617150A (en) * | 2013-11-19 | 2014-03-05 | 国家电网公司 | GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system |
CN103793590A (en) * | 2012-11-01 | 2014-05-14 | 同济大学 | GPU-based computation method for quickly solving power flow in distribution networks |
CN104158182A (en) * | 2014-08-18 | 2014-11-19 | 国家电网公司 | Large-scale power grid flow correction equation parallel solving method |
CN104484234A (en) * | 2014-11-21 | 2015-04-01 | 中国电力科学研究院 | Multi-front load flow calculation method and system based on GPU (graphics processing unit) |
-
2016
- 2016-07-26 CN CN201610592223.0A patent/CN106026107B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110078226A1 (en) * | 2009-09-30 | 2011-03-31 | International Business Machines Corporation | Sparse Matrix-Vector Multiplication on Graphics Processor Units |
CN103793590A (en) * | 2012-11-01 | 2014-05-14 | 同济大学 | GPU-based computation method for quickly solving power flow in distribution networks |
CN103617150A (en) * | 2013-11-19 | 2014-03-05 | 国家电网公司 | GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system |
CN104158182A (en) * | 2014-08-18 | 2014-11-19 | 国家电网公司 | Large-scale power grid flow correction equation parallel solving method |
CN104484234A (en) * | 2014-11-21 | 2015-04-01 | 中国电力科学研究院 | Multi-front load flow calculation method and system based on GPU (graphics processing unit) |
Non-Patent Citations (2)
Title |
---|
梁阳豆: "CUDA平台下的电力系统最优潮流并行计算研究", 《中国优秀硕士学位论文全文数据库》 * |
穆帅等: "基于GPU的多层次并行QR分解算法研究", 《计算机仿真》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368368A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method |
CN107368454A (en) * | 2017-06-22 | 2017-11-21 | 东南大学 | A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating |
CN110718919A (en) * | 2019-09-25 | 2020-01-21 | 北京交通大学 | GPU acceleration-based large power grid static safety analysis fault screening method |
CN110718919B (en) * | 2019-09-25 | 2021-06-01 | 北京交通大学 | GPU acceleration-based large power grid static safety analysis fault screening method |
CN111313413A (en) * | 2020-03-20 | 2020-06-19 | 国网浙江省电力公司 | Power system state estimation method based on parallel acceleration of graphics processor |
CN111416441A (en) * | 2020-04-09 | 2020-07-14 | 东南大学 | Power grid topology analysis method based on GPU hierarchical acceleration |
Also Published As
Publication number | Publication date |
---|---|
CN106026107B (en) | 2019-01-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106026107A (en) | QR decomposition method of power flow Jacobian matrix for GPU acceleration | |
CN106157176B (en) | A kind of LU decomposition method for the direction of energy Jacobian matrix that GPU accelerates | |
CN103617150A (en) | GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system | |
WO2018133348A1 (en) | Static security analysis computation method, apparatus, and computer storage medium | |
CN105576648B (en) | Static security analysis double-layer parallel method based on GPU-CPU heterogeneous computing platform | |
CN101694940B (en) | Optimal power flow implementation method considering transient security constraints | |
CN104158182B (en) | A kind of large scale electric network trend update equation Parallel implementation method | |
CN106407158A (en) | GPU accelerated method for performing batch processing of isomorphic sparse matrixes multiplied by full vectors | |
CN101706741A (en) | Method for partitioning dynamic tasks of CPU and GPU based on load balance | |
CN105391057B (en) | A kind of GPU threaded design methods that direction of energy Jacobi battle array calculates | |
CN106354479B (en) | A kind of GPU acceleration QR decomposition method of a large amount of isomorphism sparse matrixes | |
Chen et al. | A two-layered parallel static security assessment for large-scale grids based on GPU | |
CN101833438A (en) | General data processing method based on multiple parallel | |
CN102902657A (en) | Method for accelerating FFT (Fast Fourier Transform) by using GPU (Graphic Processing Unit) | |
CN102841881A (en) | Multiple integral computing method based on many-core processor | |
CN107368454A (en) | A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating | |
CN109753682B (en) | Finite element stiffness matrix simulation method based on GPU (graphics processing Unit) end | |
CN117767289A (en) | Determination method and device for micro-grid energy scheduling, electronic equipment and storage medium | |
CN107256203A (en) | The implementation method and device of a kind of matrix-vector multiplication | |
CN107368368A (en) | A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method | |
CN107423259A (en) | A kind of GPU of domino optimization accelerates trigonometric equation group back substitution method on electric power | |
CN105955712A (en) | Direct current fault screening method based on GPU acceleration | |
Wenjie et al. | An expansion-aided synchronous conservative time management algorithm on GPU | |
CN109992860A (en) | Electro-magnetic transient parallel simulation method and system based on GPU | |
CN103793745B (en) | A kind of distributed particle group optimizing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 210009 No. 87 Dingjiaqiao, Gulou District, Nanjing City, Jiangsu Province Applicant after: Southeast University Address before: No. 2, four archway in Xuanwu District, Nanjing, Jiangsu Applicant before: Southeast University |
|
GR01 | Patent grant | ||
GR01 | Patent grant |