CN106026107B - A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates - Google Patents

A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates Download PDF

Info

Publication number
CN106026107B
CN106026107B CN201610592223.0A CN201610592223A CN106026107B CN 106026107 B CN106026107 B CN 106026107B CN 201610592223 A CN201610592223 A CN 201610592223A CN 106026107 B CN106026107 B CN 106026107B
Authority
CN
China
Prior art keywords
thread
threadid
variable
cache
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610592223.0A
Other languages
Chinese (zh)
Other versions
CN106026107A (en
Inventor
周赣
孙立成
张旭
柏瑞
冯燕钧
秦成明
傅萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201610592223.0A priority Critical patent/CN106026107B/en
Publication of CN106026107A publication Critical patent/CN106026107A/en
Application granted granted Critical
Publication of CN106026107B publication Critical patent/CN106026107B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/04Circuit arrangements for ac mains or ac distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

Disclosed herein is a kind of QR decomposition methods of GPU direction of energy Jacobian matrix accelerated, which comprises carries out the decomposition of QR symbol to Jacobian matrix J in CPU, obtains the sparsity structure of Household transformation matrix V and R gusts of upper triangular matrix;According to R gusts of sparsity structure, matrix J is respectively arranged and carries out parallelization layering;Layering QR, which is calculated, by the sequence that level is incremented by GPU decomposes kernel function SparseQR.The present invention using CPU control program process and handle basic data and GPU handles the efficiency that the mode that intensive floating-point operation combines improves direction of energy Jacobian matrix QR decomposition, solve the problems, such as that Load flow calculation is time-consuming big in power system static safety analysis.

Description

A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates
Technical field
The direction of energy accelerated the invention belongs to High performance computing in power system application field more particularly to a kind of GPU is refined The GPU threaded design method decomposed than the QR of matrix.
Background technique
Load flow calculation is most widely used, most basic and most important a kind of electrical operation in electric system.In power train In the research of the method for operation of uniting and programme, require to carry out Load flow calculation to compare the method for operation or plan power supply plan Feasibility, reliability and economy.Meanwhile in order to monitor the operating status of electric system in real time, it is also desirable to carry out a large amount of and fast The Load flow calculation of speed.Therefore, in the method for operation of programming and planning and schedule system, using offline Load flow calculation;In electricity In the real time monitoring of Force system operating status, then calculated using online power flow.
And in actual production process, no matter offline trend and online power flow calculating all there is this to compare the calculating speed of trend High requirement.In being related to planning and designing and the offline trend for arranging the method for operation, situations such as landing scheme because of equipment, is complicated, needs Want the type of simulation run more, Load flow calculation amount is big, and single Load flow calculation time effects integrally emulate duration;And in electric system The online power flow carried out in operation is calculated to temporal sensitivity height is calculated, and is needed to provide calculation of tidal current in real time, is such as being envisioned In accident, the equipment Load flow calculation out of service to the influence of static security, system needs to calculate trend under a large amount of forecast accidents Distribution, and the method for operation Adjusted Option of anticipation is made in real time.
In traditional Newton-Laphson method Load flow calculation, the solution of update equation group accounts for the 70% of the Load flow calculation time, amendment The calculating speed of solving equations influences the overall performance of program.And slowing down with the promotion of CPU calculating speed, list at this stage A Load flow calculation calculating time has reached a bottleneck.The accelerated method of Load flow calculation is concentrated mainly on using cluster at present Coarseness acceleration carried out to more trends with multiple-core server, it is practical at the research that single trend internal arithmetic is accelerated in production compared with It is few.
GPU is a kind of many-core parallel processor, will be considerably beyond CPU in the quantity of processing unit.GPU traditionally is only It is responsible for figure rendering, and CPU has all been given in most processing.Method battle array is a kind of multicore, multithreading, tool to present GPU There are powerful calculating ability and high bandwidth of memory, programmable processor.Under universal computer model, association of the GPU as CPU Processor work, is decomposed by task reasonable distribution and completes high-performance calculation.
Sparse vectors, which solve to calculate, has concurrency.After carrying out the decomposition of QR symbol to equation system matrix number, obtain To Household transformation matrix V and R gusts of upper triangular matrix of sparsity structure, according to R gusts of sparsity structure, progress is respectively arranged matrix J Parallelization layering.Wherein the calculating of the column in every layer is mutually indepedent, naturally can be by parallel calculating without dependence Reason is suitble to GPU to accelerate.Therefore by the way that reasonably equation group coefficient matrix progress QR can be rapidly completed in scheduling between CPU and GPU It decomposes, and solves sparse vectors, domestic and foreign scholars, which have begun, carries out what sparse vectors acceleration solved to GPU Method is studied, but not deep optimization threaded design, is studied computational threads from the distribution of calculation amount merely and is set Meter, to thread calculation, data directory mode is not furtherd investigate, and program can not be made to give full play to the advantage of GPU.
It would therefore be highly desirable to solve the above problems.
Summary of the invention
Goal of the invention: in view of the deficiencies of the prior art, the present invention, which provides, a kind of can be greatly decreased direction of energy Jacobean matrix The QR for the direction of energy Jacobian matrix that battle array QR decomposition computation time and a kind of GPU that can promote Load flow calculation speed accelerate is decomposed Method.
Load flow calculation: electrodynamic noun refers in given power system network topology, component parameters and power generation, load parameter Under the conditions of, calculate the distribution of active power, reactive power and voltage in power network.
Parallel computation: being a kind of algorithm of primary executable multiple instruction, it is therefore an objective to improve and calculate relative to serial arithmetic Speed, and by expanding problem solving scale, solve large-scale and complicated computational problem.
GPU: graphics processor (English: GraphicsProcessingUnit, abbreviation: GPU).
The invention discloses a kind of QR decomposition methods of GPU direction of energy Jacobian matrix accelerated, which comprises
(1) decomposition of QR symbol is carried out to Jacobian matrix J in CPU, obtains Household transformation matrix V and upper triangular matrix R The sparsity structure of battle array;According to R gusts of sparsity structure, matrix J is respectively arranged and carries out parallelization layering;
(2) layering QR is calculated by the sequence that level is incremented by GPU decompose kernel function SparseQR.
Wherein, in the step (1), parallelization, which is layered, is integrated into the n column of matrix J in MaxLevel layers, belongs to same Column in layer carry out QR decomposition parallel;The quantity of every layer of column for including is Levelnum (k), and k indicates level number;It stores in kth layer All row numbers are to mapping table Mapk
Preferably, in the step (2), layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, Its thread block size Nthread128 are fixed as, when calculating k layers, thread number of blocks Nblocks=Levelnum (k), always Number of threads are as follows: Nblocks×Nthreads;According to the sequence that level is incremented by, call kernel function SparseQR < Levelnum (k), Nthreads> decompose all column for belonging to kth layer;SparseQR < Levelnum (k), Nthreads> calculation process are as follows:
((2.1) CUDA is the thread index in per thread distribution thread block index blockID and thread block automatically threadID;
(2.2) blockID and threadID are assigned to variable bid and t, index bid line by bid and t later T thread in journey block;
(2.3) bid thread blocks are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) it arranges;
In (2.4) bid thread blocks, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, executes following calculate:
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), wherein V (i:n, i) is Household transformation The vector that i-th~n row element of the i-th column is constituted in matrix V, J (i:n, j) are i-th~n row members that jth arranges in Jacobian matrix J The vector that element is constituted;Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J;
(2.5) Household for calculating jth column converts vector;
(2.6) J matrix jth column: J (j, j)=a, J (j+1:n, j)=0 are updated.
Preferably, in the step (2.4):
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), specifically calculating step is,
1) judge whether thread number t is less than n-i, otherwise the thread terminates to execute;
2) the variable Temp+=2V (i+t, i) in thread × J (i+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12) (0) β=Cache;
Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J, specifically Step is,
1) judge whether thread number t is less than n-i, otherwise thread terminates to execute;
2) J (t+i, j)=J (t+i, j)-β V (t+i, i);
3) 1) t=t+128 is returned;
Further, in the step (2.5),
Firstly, calculating variable a2=J (j:n, j)TJ (j:n, j), steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) the variable Temp, Temp+=J (j+t, j) in thread × J (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)a2=Cache (0);
Secondly, calculating, V (j:n, j)=J (j:n, j)-aej(j:n), wherein being ejBe j-th of element be 1 unit to Amount;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=J (t+j, j)-aej(t+j);
3) 1) t=t+128 is returned;
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) variable in thread is Temp, Temp+=V (j+t, j) × V (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)b2=Cache (0);
Finally, calculating V (j:n, j)=V (j:n, j)/b;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=V (t+j, j)/b;
3) 1) t=t+128 is returned.
The utility model has the advantages that compared with the prior art, the invention has the benefit that the present invention is using CPU to the direction of energy first Jacobian matrix J carry out the decomposition of QR symbol, according to R gusts of sparse format, it is possible to reduce unnecessary Floating-point Computation;Secondly root Assigning to J gusts of each column according to R gusts of sparsity structures can be with the different levels of parallel computation, and layering result is transmitted to GPU;Furthermore Layering QR is calculated by the sequence that level is incremented by GPU and decomposes kernel function SparseQR, and is used reduction in GPU thread and calculated Method handles dot-product operation, improves computational efficiency;The last present invention using CPU control program process and handle basic data and GPU handles the efficiency that the mode that intensive floating-point operation combines improves direction of energy Jacobian matrix QR decomposition, solves The time-consuming big problem of Load flow calculation in power system static safety analysis.
Detailed description of the invention:
Fig. 1 is the example calculation time of the invention;
Fig. 2 is that example of the invention is layered result;
Fig. 3 is flow diagram of the invention.
Specific embodiment:
As shown in figure 3, the QR decomposition method for the direction of energy Jacobian matrix that a kind of GPU of the present invention accelerates, the method Include:
(1) decomposition of QR symbol is carried out to Jacobian matrix J in CPU, obtains Household transformation matrix V and upper three angular moment The sparsity structure of the sparsity structure of battle array R matrix, the J after symbol decomposition is equal to V+R;According to R gusts of sparsity structure, to matrix J Each column carry out parallelization layering.
(2) layering QR is calculated by the sequence that level is incremented by GPU decompose kernel function SparseQR.
One, QR symbol decomposition method is carried out to direction of energy Jacobian matrix J in CPU
Firstly, carrying out the decomposition of QR symbol to Jacobian matrix J in CPU, Household transformation matrix V and upper three are obtained The sparsity structure of the sparsity structure of R gusts of angle battle array, the J after symbol decomposition is equal to V+R;Then, parallelization is layered the n of matrix J Column are integrated into MaxLevel layers, and the column belonged in same layer carry out QR decomposition parallel;The quantity of every layer of column for including is Levelnum (k), k indicate level number;All row numbers are stored in kth layer to mapping table Mapk.Finally, CPU is counted needed for calculating GPU According to GPU is transferred to, data include: Jacobian matrix J, dimension n, and R gusts of upper triangular matrix, number of plies MaxLevel, every layer includes Columns Levelnum and mapping table Map.
Wherein QR symbol decomposition principle and the parallelization principle of stratification referring to
" DirectMethodsforSparseLinearSystems " TimothyA.Davis, SIAM, Philadelphia, 2006.The QR symbol that this patent uses decomposes and parallelization blocking routine is referring to CSparse:
AConciseSparseMatrixpackage.VERSION3.1.4, Copyright (c) 2006-2014, TimothyA.Davis, Oct10,2014.Household shift theory is referring to document: a kind of use of Hu Bingxin, Li Ning, Lv Jun Household converts complex QR decomposition algorithm [J] Journal of System Simulation of Recursive Implementation, 2004,16 (11): 2432- 2434。
Two, the sequence starting layering QR being incremented by GPU by level decomposes kernel function SparseQR
Layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, thread block size NthreadIt is fixed It is 128, when calculating k layers, thread number of blocks Nblocks=Levelnum (k), total number of threads are as follows: Nblocks× Nthreads;According to the sequence that level is incremented by, call kernel function SparseQR < Levelnum (k), Nthreads> to decompose belong to K layers of all column.
SparseQR < Levelnum (k), Nthreads> calculation process are as follows:
(1) CUDA is the thread index in per thread distribution thread block index blockID and thread block automatically threadID;
(2) blockID and threadID are assigned to variable bid and t, index bid thread by bid and t later T thread in block;
(3) bid thread blocks are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) it arranges;
In (4) bid thread blocks, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, executes following calculate:
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), wherein V (i:n, i) is Household transformation The vector that i-th~n row element of the i-th column is constituted in matrix V, J (i:n, j) are i-th~n row members that jth arranges in Jacobian matrix J The vector that element is constituted;Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J;
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), specifically calculating step is,
1) judge whether thread number t is less than n-i, otherwise the thread terminates to execute;
2) the variable Temp+=2V (i+t, i) in thread × J (i+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12) (0) β=Cache;
Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J, specifically Step is,
1) judge whether thread number t is less than n-i, otherwise thread terminates to execute;
2) J (t+i, j)=J (t+i, j)-β V (t+i, i);
3) 1) t=t+128 is returned.
(5) Household for calculating jth column converts vector:
Firstly, calculating variable a2=J (j:n, j)TJ (j:n, j), steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) the variable Temp, Temp+=J (j+t, j) in thread × J (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)a2=Cache (0);
Secondly, calculating, V (j:n, j)=J (j:n, j)-aej(j:n), wherein being ejBe j-th of element be 1 unit to Amount;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=J (t+j, j)-aej(t+j);
3) 1) t=t+128 is returned;
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) variable in thread is Temp, Temp+=V (j+t, j) × V (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)b2=Cache (0);
Finally, calculating V (j:n, j)=V (j:n, j)/b;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=V (t+j, j)/b;
3) 1) t=t+128 is returned.
(6) J gusts of jth column: J (j, j)=a, J (j+1:n, j)=0 are updated.
GPU computing platform used in the present invention is equipped with a TeslaK20CGPU card and IntelXeonE5-2620CPU, The peak bandwidth of GPU is up to 208GB/s, and single-precision floating point calculation amount peak value is up to 3.52Tflops, CPU frequency 2GHz.CPU The CPU of computing platform outfit IntelCorei7-3520M2.90GHz.It is dilute to 16557 dimensions respectively in CPU and GPU computing platform It dredges Jacobian matrix example to be tested, Fig. 1 is the 16557 sparse Jacobian matrixes of dimension in the calculating on different calculating libraries Between.The time is solved as 67ms using the QR algorithm of two-stage paralleling tactic on GPU, lags behind the value decomposition time of KLU, slightly fastly With the calculating time of UMPACK.GPU accelerating algorithm lag behind KLU calculate the time the reason of the mainly every layer columns for including with The number of plies increases reduction rapidly, and the 1st layer includes 1642 column, and first 30 layers every layer is greater than 100 column, but only remains 1~2 later to 207 layers Column.In the level of front can parallel computation number of columns it is more, behind column in level need it is serial execute, be unable to give full play GPU Calculated performance.Fig. 2 is that (every layer of the elimination-tree columns for including is indulged for the degree of parallelism change curves of the 16557 sparse Jacobian matrixes of dimension Coordinate denary logarithm indicates).

Claims (3)

1. a kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates, it is characterised in that: the described method includes:
(1) decomposition of QR symbol is carried out to Jacobian matrix J in CPU, obtains Household transformation matrix V and R gusts of upper triangular matrix Sparsity structure;According to R gusts of sparsity structure, matrix J is respectively arranged and carries out parallelization layering;The parallelization is layered the n of matrix J Column are integrated into MaxLevel layers, and the column belonged in same layer carry out QR decomposition parallel;The quantity of every layer of column for including is Levelnum (k), k indicate level number;All row numbers are stored in kth layer to mapping table Mapk
(2) layering QR is calculated by the sequence that level is incremented by GPU decompose kernel function SparseQR;It is layered QR and decomposes kernel function It is defined as SparseQR < Nblocks, Nthreads>, thread block size Nthreads128 are fixed as, when calculating k layers, line Journey number of blocks Nblocks=Levelnum (k), total number of threads are as follows: Nblocks×Nthreads;According to the sequence that level is incremented by, call Kernel function SparseQR < Levelnum (k), Nthreads> decompose all column for belonging to kth layer;SparseQR<Levelnum (k), Nthreads> calculation process are as follows:
(2.1) CUDA is the thread index threadID in per thread distribution thread block index blockID and thread block automatically;
(2.2) blockID and threadID are assigned to variable bid and t, index bid thread block by bid and t later In t thread;
(2.3) bid thread blocks are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) it arranges;
In (2.4) bid thread blocks, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, executes following calculate:
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), wherein V (i:n, i) is Household transformation matrix V In i-th column i-th~n row element constitute vector, J (i:n, j) be in Jacobian matrix J jth arrange i-th~n row element structure At vector;Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J;
(2.5) Household for calculating jth column converts vector;
(2.6) J matrix jth column: J (j, j)=a, J (j+1:n, j)=0 are updated.
2. the QR decomposition method for the direction of energy Jacobian matrix that GPU according to claim 1 accelerates, it is characterised in that: In step (2.4):
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), specifically calculating step is,
1) judge whether thread number t is less than n-i, otherwise the thread terminates to execute;
2) the variable Temp+=2V (i+t, i) in thread × J (i+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread is not held Row;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12) (0) β=Cache;
Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J, specific steps For,
1) judge whether thread number t is less than n-i, otherwise thread terminates to execute;
2) J (t+i, j)=J (t+i, j)-β V (t+i, i);
3) 1) t=t+128 is returned.
3. the QR decomposition method for the direction of energy Jacobian matrix that GPU according to claim 1 accelerates, it is characterised in that: In step (2.5),
Firstly, calculating variable a2=J (j:n, j)TJ (j:n, j), steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) the variable Temp, Temp+=J (j+t, j) in thread × J (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread is not held Row;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)a2=Cache (0);
Secondly, calculating, V (j:n, j)=J (j:n, j)-aej(j:n), wherein being ejIt is the unit vector that j-th of element is 1;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=J (t+j, j)-aej(t+j);
3) 1) t=t+128 is returned;
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) variable in thread is Temp, Temp+=V (j+t, j) × V (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread is not held Row;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)b2=Cache (0);
Finally, calculating V (j:n, j)=V (j:n, j)/b;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=V (t+j, j)/b;
3) 1) t=t+128 is returned.
CN201610592223.0A 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates Active CN106026107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610592223.0A CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610592223.0A CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Publications (2)

Publication Number Publication Date
CN106026107A CN106026107A (en) 2016-10-12
CN106026107B true CN106026107B (en) 2019-01-29

Family

ID=57113899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610592223.0A Active CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Country Status (1)

Country Link
CN (1) CN106026107B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368454A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN107368368A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
CN110718919B (en) * 2019-09-25 2021-06-01 北京交通大学 GPU acceleration-based large power grid static safety analysis fault screening method
CN111313413A (en) * 2020-03-20 2020-06-19 国网浙江省电力公司 Power system state estimation method based on parallel acceleration of graphics processor
CN111416441B (en) * 2020-04-09 2021-08-10 东南大学 Power grid topology analysis method based on GPU hierarchical acceleration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617150A (en) * 2013-11-19 2014-03-05 国家电网公司 GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN103793590A (en) * 2012-11-01 2014-05-14 同济大学 GPU-based computation method for quickly solving power flow in distribution networks
CN104158182A (en) * 2014-08-18 2014-11-19 国家电网公司 Large-scale power grid flow correction equation parallel solving method
CN104484234A (en) * 2014-11-21 2015-04-01 中国电力科学研究院 Multi-front load flow calculation method and system based on GPU (graphics processing unit)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364739B2 (en) * 2009-09-30 2013-01-29 International Business Machines Corporation Sparse matrix-vector multiplication on graphics processor units

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793590A (en) * 2012-11-01 2014-05-14 同济大学 GPU-based computation method for quickly solving power flow in distribution networks
CN103617150A (en) * 2013-11-19 2014-03-05 国家电网公司 GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN104158182A (en) * 2014-08-18 2014-11-19 国家电网公司 Large-scale power grid flow correction equation parallel solving method
CN104484234A (en) * 2014-11-21 2015-04-01 中国电力科学研究院 Multi-front load flow calculation method and system based on GPU (graphics processing unit)

Also Published As

Publication number Publication date
CN106026107A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
CN106157176B (en) A kind of LU decomposition method for the direction of energy Jacobian matrix that GPU accelerates
CN106026107B (en) A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates
CN106407158B (en) A kind of batch processing isomorphism sparse matrix that GPU accelerates multiplies the processing method of full vector
CN103617150A (en) GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN101694940B (en) Optimal power flow implementation method considering transient security constraints
CN103607466B (en) A kind of wide-area multi-stage distributed parallel grid analysis method based on cloud computing
CN105391057B (en) A kind of GPU threaded design methods that direction of energy Jacobi battle array calculates
CN104504257B (en) A kind of online Prony analysis methods calculated based on Dual parallel
CN101976835A (en) Parallel computation method for Newton power flow of large-scale electric power system
CN105790266A (en) Microgrid parallel multi-target robust optimization scheduling integrated control method
CN103996147A (en) Comprehensive evaluation method for power distribution network
Tian et al. Product cooperative disassembly sequence and task planning based on genetic algorithm
Li et al. Optimum design of ship cabin equipment layout based on SLP method and genetic algorithm
CN106505575A (en) A kind of Line Flow economic load dispatching method based on Granule Computing
CN106354479B (en) A kind of GPU acceleration QR decomposition method of a large amount of isomorphism sparse matrixes
CN103617494A (en) Wide-area multi-stage distributed parallel power grid analysis system
CN107368454A (en) A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN115051360A (en) Online computing method and device for operation risk of electric power system based on integrated knowledge migration
CN107423259A (en) A kind of GPU of domino optimization accelerates trigonometric equation group back substitution method on electric power
Tan et al. A fast and stable forecasting model to forecast power load
CN109698516A (en) The maximum capacity computing system and method for renewable energy access power distribution network
CN107368368A (en) A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
Li et al. A hierarchical visualization analysis model of power big data
CN107392429A (en) Under the direction of energy that a kind of GPU accelerates method is pushed away before trigonometric equation group
Han et al. Research on the Application of Modern Power System Based on Automatic Control Technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210009 No. 87 Dingjiaqiao, Gulou District, Nanjing City, Jiangsu Province

Applicant after: Southeast University

Address before: No. 2, four archway in Xuanwu District, Nanjing, Jiangsu

Applicant before: Southeast University

GR01 Patent grant
GR01 Patent grant