[go: up one dir, main page]

CN106026107B - A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates - Google Patents

A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates Download PDF

Info

Publication number
CN106026107B
CN106026107B CN201610592223.0A CN201610592223A CN106026107B CN 106026107 B CN106026107 B CN 106026107B CN 201610592223 A CN201610592223 A CN 201610592223A CN 106026107 B CN106026107 B CN 106026107B
Authority
CN
China
Prior art keywords
thread
threadid
variable
matrix
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610592223.0A
Other languages
Chinese (zh)
Other versions
CN106026107A (en
Inventor
周赣
孙立成
张旭
柏瑞
冯燕钧
秦成明
傅萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201610592223.0A priority Critical patent/CN106026107B/en
Publication of CN106026107A publication Critical patent/CN106026107A/en
Application granted granted Critical
Publication of CN106026107B publication Critical patent/CN106026107B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/04Circuit arrangements for AC mains or AC distribution networks for connecting networks of the same frequency but supplied from different sources
    • H02J3/06Controlling transfer of power between connected networks; Controlling sharing of load between connected networks
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本文公开了一种GPU加速的电力潮流雅可比矩阵的QR分解方法,所述方法包括:CPU中对雅可比矩阵J进行QR符号分解,得到Household变换矩阵V和上三角阵R阵的稀疏结构;根据R阵的稀疏结构,对矩阵J各列进行并行化分层;GPU中按层次递增的顺序计算分层QR分解内核函数SparseQR。本发明利用CPU控制程序的流程并处理基础数据和GPU处理密集的浮点运算相结合的模式提高了电力潮流雅可比矩阵QR分解的效率,解决了电力系统静态安全性分析中潮流计算耗时大的问题。

This paper discloses a GPU-accelerated power flow Jacobian matrix QR decomposition method. The method includes: performing QR symbol decomposition on the Jacobian matrix J in the CPU to obtain the sparse structure of the Household transformation matrix V and the upper triangular matrix R; According to the sparse structure of the R matrix, each column of the matrix J is parallelized and layered; the layered QR decomposition kernel function SparseQR is calculated in the GPU in the order of increasing layers. The invention utilizes the CPU control program flow and the combination mode of processing basic data and GPU processing intensive floating-point operations to improve the efficiency of the power flow Jacobian matrix QR decomposition, and solves the problem that the power flow calculation takes a long time in the static safety analysis of the power system. The problem.

Description

A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates
Technical field
The direction of energy accelerated the invention belongs to High performance computing in power system application field more particularly to a kind of GPU is refined The GPU threaded design method decomposed than the QR of matrix.
Background technique
Load flow calculation is most widely used, most basic and most important a kind of electrical operation in electric system.In power train In the research of the method for operation of uniting and programme, require to carry out Load flow calculation to compare the method for operation or plan power supply plan Feasibility, reliability and economy.Meanwhile in order to monitor the operating status of electric system in real time, it is also desirable to carry out a large amount of and fast The Load flow calculation of speed.Therefore, in the method for operation of programming and planning and schedule system, using offline Load flow calculation;In electricity In the real time monitoring of Force system operating status, then calculated using online power flow.
And in actual production process, no matter offline trend and online power flow calculating all there is this to compare the calculating speed of trend High requirement.In being related to planning and designing and the offline trend for arranging the method for operation, situations such as landing scheme because of equipment, is complicated, needs Want the type of simulation run more, Load flow calculation amount is big, and single Load flow calculation time effects integrally emulate duration;And in electric system The online power flow carried out in operation is calculated to temporal sensitivity height is calculated, and is needed to provide calculation of tidal current in real time, is such as being envisioned In accident, the equipment Load flow calculation out of service to the influence of static security, system needs to calculate trend under a large amount of forecast accidents Distribution, and the method for operation Adjusted Option of anticipation is made in real time.
In traditional Newton-Laphson method Load flow calculation, the solution of update equation group accounts for the 70% of the Load flow calculation time, amendment The calculating speed of solving equations influences the overall performance of program.And slowing down with the promotion of CPU calculating speed, list at this stage A Load flow calculation calculating time has reached a bottleneck.The accelerated method of Load flow calculation is concentrated mainly on using cluster at present Coarseness acceleration carried out to more trends with multiple-core server, it is practical at the research that single trend internal arithmetic is accelerated in production compared with It is few.
GPU is a kind of many-core parallel processor, will be considerably beyond CPU in the quantity of processing unit.GPU traditionally is only It is responsible for figure rendering, and CPU has all been given in most processing.Method battle array is a kind of multicore, multithreading, tool to present GPU There are powerful calculating ability and high bandwidth of memory, programmable processor.Under universal computer model, association of the GPU as CPU Processor work, is decomposed by task reasonable distribution and completes high-performance calculation.
Sparse vectors, which solve to calculate, has concurrency.After carrying out the decomposition of QR symbol to equation system matrix number, obtain To Household transformation matrix V and R gusts of upper triangular matrix of sparsity structure, according to R gusts of sparsity structure, progress is respectively arranged matrix J Parallelization layering.Wherein the calculating of the column in every layer is mutually indepedent, naturally can be by parallel calculating without dependence Reason is suitble to GPU to accelerate.Therefore by the way that reasonably equation group coefficient matrix progress QR can be rapidly completed in scheduling between CPU and GPU It decomposes, and solves sparse vectors, domestic and foreign scholars, which have begun, carries out what sparse vectors acceleration solved to GPU Method is studied, but not deep optimization threaded design, is studied computational threads from the distribution of calculation amount merely and is set Meter, to thread calculation, data directory mode is not furtherd investigate, and program can not be made to give full play to the advantage of GPU.
It would therefore be highly desirable to solve the above problems.
Summary of the invention
Goal of the invention: in view of the deficiencies of the prior art, the present invention, which provides, a kind of can be greatly decreased direction of energy Jacobean matrix The QR for the direction of energy Jacobian matrix that battle array QR decomposition computation time and a kind of GPU that can promote Load flow calculation speed accelerate is decomposed Method.
Load flow calculation: electrodynamic noun refers in given power system network topology, component parameters and power generation, load parameter Under the conditions of, calculate the distribution of active power, reactive power and voltage in power network.
Parallel computation: being a kind of algorithm of primary executable multiple instruction, it is therefore an objective to improve and calculate relative to serial arithmetic Speed, and by expanding problem solving scale, solve large-scale and complicated computational problem.
GPU: graphics processor (English: GraphicsProcessingUnit, abbreviation: GPU).
The invention discloses a kind of QR decomposition methods of GPU direction of energy Jacobian matrix accelerated, which comprises
(1) decomposition of QR symbol is carried out to Jacobian matrix J in CPU, obtains Household transformation matrix V and upper triangular matrix R The sparsity structure of battle array;According to R gusts of sparsity structure, matrix J is respectively arranged and carries out parallelization layering;
(2) layering QR is calculated by the sequence that level is incremented by GPU decompose kernel function SparseQR.
Wherein, in the step (1), parallelization, which is layered, is integrated into the n column of matrix J in MaxLevel layers, belongs to same Column in layer carry out QR decomposition parallel;The quantity of every layer of column for including is Levelnum (k), and k indicates level number;It stores in kth layer All row numbers are to mapping table Mapk
Preferably, in the step (2), layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, Its thread block size Nthread128 are fixed as, when calculating k layers, thread number of blocks Nblocks=Levelnum (k), always Number of threads are as follows: Nblocks×Nthreads;According to the sequence that level is incremented by, call kernel function SparseQR < Levelnum (k), Nthreads> decompose all column for belonging to kth layer;SparseQR < Levelnum (k), Nthreads> calculation process are as follows:
((2.1) CUDA is the thread index in per thread distribution thread block index blockID and thread block automatically threadID;
(2.2) blockID and threadID are assigned to variable bid and t, index bid line by bid and t later T thread in journey block;
(2.3) bid thread blocks are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) it arranges;
In (2.4) bid thread blocks, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, executes following calculate:
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), wherein V (i:n, i) is Household transformation The vector that i-th~n row element of the i-th column is constituted in matrix V, J (i:n, j) are i-th~n row members that jth arranges in Jacobian matrix J The vector that element is constituted;Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J;
(2.5) Household for calculating jth column converts vector;
(2.6) J matrix jth column: J (j, j)=a, J (j+1:n, j)=0 are updated.
Preferably, in the step (2.4):
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), specifically calculating step is,
1) judge whether thread number t is less than n-i, otherwise the thread terminates to execute;
2) the variable Temp+=2V (i+t, i) in thread × J (i+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12) (0) β=Cache;
Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J, specifically Step is,
1) judge whether thread number t is less than n-i, otherwise thread terminates to execute;
2) J (t+i, j)=J (t+i, j)-β V (t+i, i);
3) 1) t=t+128 is returned;
Further, in the step (2.5),
Firstly, calculating variable a2=J (j:n, j)TJ (j:n, j), steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) the variable Temp, Temp+=J (j+t, j) in thread × J (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)a2=Cache (0);
Secondly, calculating, V (j:n, j)=J (j:n, j)-aej(j:n), wherein being ejBe j-th of element be 1 unit to Amount;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=J (t+j, j)-aej(t+j);
3) 1) t=t+128 is returned;
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) variable in thread is Temp, Temp+=V (j+t, j) × V (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)b2=Cache (0);
Finally, calculating V (j:n, j)=V (j:n, j)/b;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=V (t+j, j)/b;
3) 1) t=t+128 is returned.
The utility model has the advantages that compared with the prior art, the invention has the benefit that the present invention is using CPU to the direction of energy first Jacobian matrix J carry out the decomposition of QR symbol, according to R gusts of sparse format, it is possible to reduce unnecessary Floating-point Computation;Secondly root Assigning to J gusts of each column according to R gusts of sparsity structures can be with the different levels of parallel computation, and layering result is transmitted to GPU;Furthermore Layering QR is calculated by the sequence that level is incremented by GPU and decomposes kernel function SparseQR, and is used reduction in GPU thread and calculated Method handles dot-product operation, improves computational efficiency;The last present invention using CPU control program process and handle basic data and GPU handles the efficiency that the mode that intensive floating-point operation combines improves direction of energy Jacobian matrix QR decomposition, solves The time-consuming big problem of Load flow calculation in power system static safety analysis.
Detailed description of the invention:
Fig. 1 is the example calculation time of the invention;
Fig. 2 is that example of the invention is layered result;
Fig. 3 is flow diagram of the invention.
Specific embodiment:
As shown in figure 3, the QR decomposition method for the direction of energy Jacobian matrix that a kind of GPU of the present invention accelerates, the method Include:
(1) decomposition of QR symbol is carried out to Jacobian matrix J in CPU, obtains Household transformation matrix V and upper three angular moment The sparsity structure of the sparsity structure of battle array R matrix, the J after symbol decomposition is equal to V+R;According to R gusts of sparsity structure, to matrix J Each column carry out parallelization layering.
(2) layering QR is calculated by the sequence that level is incremented by GPU decompose kernel function SparseQR.
One, QR symbol decomposition method is carried out to direction of energy Jacobian matrix J in CPU
Firstly, carrying out the decomposition of QR symbol to Jacobian matrix J in CPU, Household transformation matrix V and upper three are obtained The sparsity structure of the sparsity structure of R gusts of angle battle array, the J after symbol decomposition is equal to V+R;Then, parallelization is layered the n of matrix J Column are integrated into MaxLevel layers, and the column belonged in same layer carry out QR decomposition parallel;The quantity of every layer of column for including is Levelnum (k), k indicate level number;All row numbers are stored in kth layer to mapping table Mapk.Finally, CPU is counted needed for calculating GPU According to GPU is transferred to, data include: Jacobian matrix J, dimension n, and R gusts of upper triangular matrix, number of plies MaxLevel, every layer includes Columns Levelnum and mapping table Map.
Wherein QR symbol decomposition principle and the parallelization principle of stratification referring to
" DirectMethodsforSparseLinearSystems " TimothyA.Davis, SIAM, Philadelphia, 2006.The QR symbol that this patent uses decomposes and parallelization blocking routine is referring to CSparse:
AConciseSparseMatrixpackage.VERSION3.1.4, Copyright (c) 2006-2014, TimothyA.Davis, Oct10,2014.Household shift theory is referring to document: a kind of use of Hu Bingxin, Li Ning, Lv Jun Household converts complex QR decomposition algorithm [J] Journal of System Simulation of Recursive Implementation, 2004,16 (11): 2432- 2434。
Two, the sequence starting layering QR being incremented by GPU by level decomposes kernel function SparseQR
Layering QR decomposes kernel function and is defined as SparseQR < Nblocks, Nthreads>, thread block size NthreadIt is fixed It is 128, when calculating k layers, thread number of blocks Nblocks=Levelnum (k), total number of threads are as follows: Nblocks× Nthreads;According to the sequence that level is incremented by, call kernel function SparseQR < Levelnum (k), Nthreads> to decompose belong to K layers of all column.
SparseQR < Levelnum (k), Nthreads> calculation process are as follows:
(1) CUDA is the thread index in per thread distribution thread block index blockID and thread block automatically threadID;
(2) blockID and threadID are assigned to variable bid and t, index bid thread by bid and t later T thread in block;
(3) bid thread blocks are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) it arranges;
In (4) bid thread blocks, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, executes following calculate:
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), wherein V (i:n, i) is Household transformation The vector that i-th~n row element of the i-th column is constituted in matrix V, J (i:n, j) are i-th~n row members that jth arranges in Jacobian matrix J The vector that element is constituted;Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J;
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), specifically calculating step is,
1) judge whether thread number t is less than n-i, otherwise the thread terminates to execute;
2) the variable Temp+=2V (i+t, i) in thread × J (i+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12) (0) β=Cache;
Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J, specifically Step is,
1) judge whether thread number t is less than n-i, otherwise thread terminates to execute;
2) J (t+i, j)=J (t+i, j)-β V (t+i, i);
3) 1) t=t+128 is returned.
(5) Household for calculating jth column converts vector:
Firstly, calculating variable a2=J (j:n, j)TJ (j:n, j), steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) the variable Temp, Temp+=J (j+t, j) in thread × J (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)a2=Cache (0);
Secondly, calculating, V (j:n, j)=J (j:n, j)-aej(j:n), wherein being ejBe j-th of element be 1 unit to Amount;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=J (t+j, j)-aej(t+j);
3) 1) t=t+128 is returned;
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) variable in thread is Temp, Temp+=V (j+t, j) × V (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread It does not execute;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)b2=Cache (0);
Finally, calculating V (j:n, j)=V (j:n, j)/b;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=V (t+j, j)/b;
3) 1) t=t+128 is returned.
(6) J gusts of jth column: J (j, j)=a, J (j+1:n, j)=0 are updated.
GPU computing platform used in the present invention is equipped with a TeslaK20CGPU card and IntelXeonE5-2620CPU, The peak bandwidth of GPU is up to 208GB/s, and single-precision floating point calculation amount peak value is up to 3.52Tflops, CPU frequency 2GHz.CPU The CPU of computing platform outfit IntelCorei7-3520M2.90GHz.It is dilute to 16557 dimensions respectively in CPU and GPU computing platform It dredges Jacobian matrix example to be tested, Fig. 1 is the 16557 sparse Jacobian matrixes of dimension in the calculating on different calculating libraries Between.The time is solved as 67ms using the QR algorithm of two-stage paralleling tactic on GPU, lags behind the value decomposition time of KLU, slightly fastly With the calculating time of UMPACK.GPU accelerating algorithm lag behind KLU calculate the time the reason of the mainly every layer columns for including with The number of plies increases reduction rapidly, and the 1st layer includes 1642 column, and first 30 layers every layer is greater than 100 column, but only remains 1~2 later to 207 layers Column.In the level of front can parallel computation number of columns it is more, behind column in level need it is serial execute, be unable to give full play GPU Calculated performance.Fig. 2 is that (every layer of the elimination-tree columns for including is indulged for the degree of parallelism change curves of the 16557 sparse Jacobian matrixes of dimension Coordinate denary logarithm indicates).

Claims (3)

1. a kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates, it is characterised in that: the described method includes:
(1) decomposition of QR symbol is carried out to Jacobian matrix J in CPU, obtains Household transformation matrix V and R gusts of upper triangular matrix Sparsity structure;According to R gusts of sparsity structure, matrix J is respectively arranged and carries out parallelization layering;The parallelization is layered the n of matrix J Column are integrated into MaxLevel layers, and the column belonged in same layer carry out QR decomposition parallel;The quantity of every layer of column for including is Levelnum (k), k indicate level number;All row numbers are stored in kth layer to mapping table Mapk
(2) layering QR is calculated by the sequence that level is incremented by GPU decompose kernel function SparseQR;It is layered QR and decomposes kernel function It is defined as SparseQR < Nblocks, Nthreads>, thread block size Nthreads128 are fixed as, when calculating k layers, line Journey number of blocks Nblocks=Levelnum (k), total number of threads are as follows: Nblocks×Nthreads;According to the sequence that level is incremented by, call Kernel function SparseQR < Levelnum (k), Nthreads> decompose all column for belonging to kth layer;SparseQR<Levelnum (k), Nthreads> calculation process are as follows:
(2.1) CUDA is the thread index threadID in per thread distribution thread block index blockID and thread block automatically;
(2.2) blockID and threadID are assigned to variable bid and t, index bid thread block by bid and t later In t thread;
(2.3) bid thread blocks are responsible for jth=Map that QR decomposes Jacobian matrix Jk(bid) it arranges;
In (2.4) bid thread blocks, variable i is incremented to j-1 from 1, if R (i, j) ≠ 0, executes following calculate:
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), wherein V (i:n, i) is Household transformation matrix V In i-th column i-th~n row element constitute vector, J (i:n, j) be in Jacobian matrix J jth arrange i-th~n row element structure At vector;Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J;
(2.5) Household for calculating jth column converts vector;
(2.6) J matrix jth column: J (j, j)=a, J (j+1:n, j)=0 are updated.
2. the QR decomposition method for the direction of energy Jacobian matrix that GPU according to claim 1 accelerates, it is characterised in that: In step (2.4):
Firstly, calculating intermediate variable β=2V (i:n, i)TJ (i:n, j), specifically calculating step is,
1) judge whether thread number t is less than n-i, otherwise the thread terminates to execute;
2) the variable Temp+=2V (i+t, i) in thread × J (i+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread is not held Row;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12) (0) β=Cache;
Then, it is arranged using the jth that formula J (i:n, j)=J (i:n, j)-β V (i:n, i) updates Jacobian matrix J, specific steps For,
1) judge whether thread number t is less than n-i, otherwise thread terminates to execute;
2) J (t+i, j)=J (t+i, j)-β V (t+i, i);
3) 1) t=t+128 is returned.
3. the QR decomposition method for the direction of energy Jacobian matrix that GPU according to claim 1 accelerates, it is characterised in that: In step (2.5),
Firstly, calculating variable a2=J (j:n, j)TJ (j:n, j), steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) the variable Temp, Temp+=J (j+t, j) in thread × J (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread is not held Row;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)a2=Cache (0);
Secondly, calculating, V (j:n, j)=J (j:n, j)-aej(j:n), wherein being ejIt is the unit vector that j-th of element is 1;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=J (t+j, j)-aej(t+j);
3) 1) t=t+128 is returned;
Then, variable b is calculated2=V (j:n, j)TV (j:n, j);
Steps are as follows for specific calculating:
1) judge whether thread number t is less than n-j, otherwise the thread terminates to execute;
2) variable in thread is Temp, Temp+=V (j+t, j) × V (j+t, j);
3) t=t+128 is returned and 1) is executed;
4) value of Temp is assigned to shared drive variable Cache [threadID] in thread;
5) thread in thread block is synchronized;
6) variable M=128/2;
7) it is executed 8) when M is not equal to 0;Otherwise terminate to calculate, jump to 12);
8) judge that threadID is executed less than M: Cache [threadID] +=Cache [threadID+M], otherwise thread is not held Row;
9) thread in thread block is synchronized;
10) M/=2;
11) it returns and 7) executes;
12)b2=Cache (0);
Finally, calculating V (j:n, j)=V (j:n, j)/b;
Specific step is as follows:
1) judge whether thread number t is less than n-j, otherwise thread terminates to execute;
2) V (t+j, j)=V (t+j, j)/b;
3) 1) t=t+128 is returned.
CN201610592223.0A 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates Expired - Fee Related CN106026107B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610592223.0A CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610592223.0A CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Publications (2)

Publication Number Publication Date
CN106026107A CN106026107A (en) 2016-10-12
CN106026107B true CN106026107B (en) 2019-01-29

Family

ID=57113899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610592223.0A Expired - Fee Related CN106026107B (en) 2016-07-26 2016-07-26 A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates

Country Status (1)

Country Link
CN (1) CN106026107B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368368A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
CN107368454A (en) * 2017-06-22 2017-11-21 东南大学 A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN110718919B (en) * 2019-09-25 2021-06-01 北京交通大学 GPU acceleration-based large power grid static safety analysis fault screening method
CN111313413A (en) * 2020-03-20 2020-06-19 国网浙江省电力公司 A Power System State Estimation Method Based on Graphics Processor Parallel Acceleration
CN111416441B (en) * 2020-04-09 2021-08-10 东南大学 Power grid topology analysis method based on GPU hierarchical acceleration

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617150A (en) * 2013-11-19 2014-03-05 国家电网公司 GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN103793590A (en) * 2012-11-01 2014-05-14 同济大学 GPU-based computation method for quickly solving power flow in distribution networks
CN104158182A (en) * 2014-08-18 2014-11-19 国家电网公司 Large-scale power grid flow correction equation parallel solving method
CN104484234A (en) * 2014-11-21 2015-04-01 中国电力科学研究院 Multi-front load flow calculation method and system based on GPU (graphics processing unit)

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8364739B2 (en) * 2009-09-30 2013-01-29 International Business Machines Corporation Sparse matrix-vector multiplication on graphics processor units

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793590A (en) * 2012-11-01 2014-05-14 同济大学 GPU-based computation method for quickly solving power flow in distribution networks
CN103617150A (en) * 2013-11-19 2014-03-05 国家电网公司 GPU (graphic processing unit) based parallel power flow calculation system and method for large-scale power system
CN104158182A (en) * 2014-08-18 2014-11-19 国家电网公司 Large-scale power grid flow correction equation parallel solving method
CN104484234A (en) * 2014-11-21 2015-04-01 中国电力科学研究院 Multi-front load flow calculation method and system based on GPU (graphics processing unit)

Also Published As

Publication number Publication date
CN106026107A (en) 2016-10-12

Similar Documents

Publication Publication Date Title
CN106157176B (en) A kind of LU decomposition method for the direction of energy Jacobian matrix that GPU accelerates
CN106026107B (en) A kind of QR decomposition method for the direction of energy Jacobian matrix that GPU accelerates
CN106407158B (en) A kind of batch processing isomorphism sparse matrix that GPU accelerates multiplies the processing method of full vector
CN103970960B (en) The element-free Galerkin structural topological optimization method accelerated parallel based on GPU
CN101706741B (en) Method for partitioning dynamic tasks of CPU and GPU based on load balance
CN101694940B (en) An Optimal Power Flow Implementation Method Considering Transient Security Constraints
CN103440370A (en) Transmission and transformation project construction cost assessment method and device
CN106354479B (en) A kind of GPU acceleration QR decomposition method of a large amount of isomorphism sparse matrixes
CN105790266B (en) A kind of parallel Multi-objective Robust Optimized Operation integrated control method of micro-capacitance sensor
Tian et al. Product cooperative disassembly sequence and task planning based on genetic algorithm
Chen et al. A two-layered parallel static security assessment for large-scale grids based on GPU
CN105391057B (en) A kind of GPU threaded design methods that direction of energy Jacobi battle array calculates
CN105243476A (en) Architecture of hierarchical energy storage management system for high-permeability distributed photovoltaics
CN102338808A (en) Online hybrid forecasting method for short-term wind speed of wind power field
CN107368454A (en) A kind of GPU of the sparse lower trigonometric equation group of a large amount of isomorphisms pushes away method before accelerating
CN110516884A (en) A short-term load forecasting method based on big data platform
CN106408126A (en) Three-stage optimization method oriented to concurrent acquisition of energy consumption data
CN118658539A (en) A GPU-based method for calculating short-range interactions between particles
CN116185500B (en) Group sampling multi-process prediction method of parameter variable complexity in shield tunneling
CN107423259A (en) A kind of GPU of domino optimization accelerates trigonometric equation group back substitution method on electric power
CN106021943A (en) Direct current fault screening method designed in combination with GPU hardware and software architecture characteristics
CN115115276B (en) Virtual power plant scheduling method and system considering uncertainty and privacy protection
CN107368368A (en) A kind of GPU of the sparse upper trigonometric equation group of a large amount of isomorphisms accelerates back substitution method
Liu et al. Comparison of multi-area reactive power optimization parallel algorithm based on Ward and REI equivalent
CN107368455A (en) Trigonometric equation group back substitution method on the direction of energy that a kind of GPU accelerates

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 210009 No. 87 Dingjiaqiao, Gulou District, Nanjing City, Jiangsu Province

Applicant after: SOUTHEAST University

Address before: No. 2, four archway in Xuanwu District, Nanjing, Jiangsu

Applicant before: Southeast University

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190129

CF01 Termination of patent right due to non-payment of annual fee