WO2018133348A1 - 一种静态安全分析计算方法、装置及计算机存储介质 - Google Patents

一种静态安全分析计算方法、装置及计算机存储介质 Download PDF

Info

Publication number
WO2018133348A1
WO2018133348A1 PCT/CN2017/093226 CN2017093226W WO2018133348A1 WO 2018133348 A1 WO2018133348 A1 WO 2018133348A1 CN 2017093226 W CN2017093226 W CN 2017093226W WO 2018133348 A1 WO2018133348 A1 WO 2018133348A1
Authority
WO
WIPO (PCT)
Prior art keywords
gpu
cpu
parallel
breaking
calculation
Prior art date
Application number
PCT/CN2017/093226
Other languages
English (en)
French (fr)
Inventor
陆娟娟
温柏坚
王毅
陆进军
郭文鑫
闪鑫
彭龙
查国强
卢建刚
徐展强
王彬
Original Assignee
国电南瑞科技股份有限公司
广东电网有限责任公司电力调度控制中心
国电南瑞南京控制系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国电南瑞科技股份有限公司, 广东电网有限责任公司电力调度控制中心, 国电南瑞南京控制系统有限公司 filed Critical 国电南瑞科技股份有限公司
Publication of WO2018133348A1 publication Critical patent/WO2018133348A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Definitions

  • the invention relates to the technical field of power system automatic dispatching, in particular to a CPU+multi-GPU heterogeneous mode static security analysis calculation method, device and computer storage medium.
  • the solution process finds that the static safety analysis predicts that the faults are mutually decoupled and independent, and can be decomposed into multiple independent full-flow calculations, with parallelization processing congenital advantages, which are gradually affected by To the attention of researchers, and began to try to use parallel algorithms to seek breakthroughs.
  • NVI Unified Computing Device Architecture
  • CUDA Unified Computing Device Architecture
  • GPU Graphic Processing Unit
  • Research area the same time, in the CPU field, OpenMP multi-threading technology for shared memory parallel systems, the compiler automatically parallelizes the loop, which can effectively improve the performance of multi-processor system applications.
  • embodiments of the present invention are expected to provide a CPU + multi-GPU heterogeneous mode static security analysis calculation method, device, and computer storage medium, and the need for fast scanning of large grid static security analysis in practical engineering applications
  • CUDA Unified Computing Device Architecture
  • OpenMp multi-threading technology is used to allocate the corresponding number of threads, each thread uniquely corresponds to a single GPU, based on CPU and GPU mixed programming development, build CPU+
  • the GPU heterogeneous computing mode cooperates to complete the predictive fault parallel computing.
  • the multiple breaking power flow iterative process is highly synchronous and parallel, and the static security analysis predictive fault scanning parallel processing is greatly improved through element-level fine-grained parallelism.
  • the capability provides powerful technical support for online security analysis and early warning scanning of the interconnected large-grid integrated dispatching system.
  • an embodiment of the present invention provides a CPU+multi-GPU heterogeneous mode static security analysis calculation method, where the CPU+multi-GPU heterogeneous mode static security analysis calculation method includes: on a unified computing device architecture (CUDA) platform, Using a shared memory based parallel system OpenMP multi-threading technology, comprehensive consideration of GPU configuration and expected fault calculation requirements, determine the number of CPU threads, build CPU + multi-GPU heterogeneous mode, each GPU card internally cooperates with CPU + single GPU mode to complete parallel scan tasks, in a single expected fault Based on the power flow calculation, the iterative process of multiple breaking power flows is highly synchronous and parallel, and the parallel processing capability of the fault diagnosis is expected to be greatly improved by the element-level fine-grained parallel.
  • CUDA unified computing device architecture
  • the CPU+multi-GPU heterogeneous mode static security analysis calculation method specifically includes the following steps:
  • Step 1) Obtain a real-time section for state estimation, perform a ground state power flow calculation based on a Newton-Raphson algorithm, and provide a data section for reuse and sharing;
  • Step 2) Perform topology scanning on the entire network device according to user requirements, form an expected fault set, perform a deep topology search for each expected fault, and form an actual breaking branch-node information;
  • Step 3 Initialize the CUDA architecture, package the data according to the multi-break parallel computing requirement, and allocate the GPU memory space;
  • Step 4 According to the GPU configuration situation and the actual calculation amount, rationally allocate the breaking computing resources, evaluate the number of GPUs that need to be enabled, and use OpenMP technology to generate corresponding thread numbers to maximize the GPU parallel computing capability;
  • Step 5 Under the coordination of the CPU, the flow calculation iterative process, including the admittance matrix correction Jacobian matrix solution, the modified equation solution, the state quantity update, the branch power flow calculation, and the device overrun check, all completed by the GPU in parallel All breaking fault iterations are highly parallel, constructing corresponding kernel functions according to the parallel characteristics of each step of computing tasks, and completing the element parallel computing tasks in the iterative process on the fine-grained parallel level;
  • Step 6 judging whether the expected fault is all scan completed, if not, then proceeding to step 4), re-allocating the remaining disconnection faults, if all are completed, proceeding to step 7);
  • Step 7 the result shows, according to the scan result, the equipment or section overrun and overload information caused by the fault breaking is displayed, and according to the practical requirements of the static security analysis module of the dispatching system Scan results statistics.
  • the step 4) specifically includes:
  • Step (41) according to the breaking calculation requirement and the GPU configuration situation, optimize the allocation of the breaking computing resource, and evaluate the single maximum GPU single-shot scanning breaking number:
  • S_max is the total memory space of the GPU card
  • S is the memory size required for a single power flow calculation
  • M_max is the maximum number of breaks that can be calculated by a single GPU card
  • the number of GPU cards required to be started is calculated as:
  • is the rounding
  • n is the number of GPU cards that need to be activated for this round
  • N is the total number of GPU cards configured by the system
  • M_cal is the total number of breaks required for the static security analysis of the round
  • M_max ⁇ N is the multi-GPU. Calculate the breaking capacity once;
  • Step (42) based on the shared multiplexed ground state cross-sectional model data, based on the OpenMP multi-threading technology, allocate the corresponding number of CPU threads according to the actual running GPU card number n, and each CPU thread uniquely corresponds to a single GPU to construct a CPU+multi-GPU. Heterogeneous mode for parallel computing in a single GPU card in CPU + single GPU heterogeneous mode.
  • step 5 comprises the following steps:
  • Step (51) based on the ground state admittance Y0, according to the N-1 breaking branch-node information, define a kernel function fun_ker nel_1, and perform partial correction of the node admittance array simultaneously for the breaking fault assigned to the card. And modify the node injection information to simulate device breaking through changes in the admittance array and node information;
  • Step (52) forming a breaking admittance Ym according to the step (51), and completing the parallel computing task of the Jacobian matrix element.
  • the current calculation formula uses the following formula for each element in the Jacobian block matrix:
  • G ii , B ii , G ij , B ij are admittance matrix non-zero elements, ⁇ ij is i, j node phase angle difference, P i , Q i are i node injection power, V i , V j respectively Is the i, j node voltage amplitude;
  • Each element of the block matrix is the voltage phase angle of the state quantity node, the four basic operations of the admittance element and the trigonometric function, and the solution process does not affect each other, and has obvious parallel characteristics;
  • h nozero is the non-zero number of the H array, and m is the number of breaks;
  • the GPU has multiple Streaming Multiprocessors (SMs) that provide multiple thread calls.
  • SMs Streaming Multiprocessors
  • the kernel function fun_ker nel_2_H completes the vector multiplication and vector addition operations according to equation (5)
  • the CPU calls the fun_ker nel_2_H function
  • the GPU enables h GPU multithreading according to the fun_ker nel_2_H incoming function parameter.
  • the parallel operation of equation (5) is completed;
  • Step (53) calculating the node injection power residual according to equation (10) according to the breaking admittance Ym and the initial node injection amount formed in step (51):
  • P is and Q is respectively the active and reactive injection quantities of node i.
  • the residuals of the injection quantities of each node are the voltage phase angle of the state quantity node, the four elements of the admittance element and the trigonometric function.
  • the basic operation independent of other node power residual solving processes, defines the fun_ker nel_3 kernel function, enables multiple threads in the stream processor (SM), and completes the parallel computing tasks of all the power imbalances of the computing nodes of the m breaking faults.
  • the detailed parallel computing process is similar to step (52);
  • Step (54) checking whether the power residual satisfies the convergence judgment basis, if yes, skips to step (58), and if not, proceeds to step (55) to iterate;
  • ⁇ P t , ⁇ Q t are the power deviations of the iteration t times, respectively, and T is the maximum number of iterations;
  • Step (55) performing LU decomposition on the Jacobian matrix formed in the step (52), performing linear equation solving on the node power residual calculated in the step (53), and defining the fun_ker nel_5 kernel Function, task-level parallel solution to linear equations;
  • Step (56) the linear equation solution obtained by step (55) updates the initial state vector according to equation (12)
  • n is the number of system nodes
  • r is the number of PV nodes
  • each node vector in the iterative process is only incrementally related to the node before and after the update, and does not depend on any other node calculation value, and has parallel natural attributes, and the node voltage update task can be implemented in parallel by GPU multi-threading;
  • Formula (12) is defined as an added kernel function fun_ker nel_6. For m expected fault parallel tasks, a total of m ⁇ (2 (n-1)-r) threads are required to execute fun_ker nel_6 to complete a state quantity update calculation. ;
  • Step (57) jumping to step (52) to continue the iteration
  • Step (59) according to the calculation result of the branch (58) branch power flow, perform overload check verification on each branch or stable section, and save the overload overload result caused by all current breaking faults, and define the fun_ker nel_8 kernel. Function, complete branch or stable section overload overload check parallel calculation.
  • an embodiment of the present invention provides a CPU+multi-GPU heterogeneous mode static security analysis and computing device, where the device includes:
  • a memory for storing a computer program
  • the processor is connected to the memory, and is configured to implement a CPU+multi-GPU heterogeneous mode static security analysis calculation method provided by any one of the foregoing technical solutions by executing the computer program.
  • an embodiment of the present invention further provides a computer storage medium, where the computer
  • the computer-executable instructions are used to execute the CPU+multi-GPU heterogeneous mode static security analysis calculation method provided by any one of the foregoing technical solutions.
  • the CPU+multi-GPU heterogeneous mode static security analysis calculation method, device and computer storage medium of the OpenMP technology can be adopted according to the system GPU configuration situation and computing requirements on the unified computing device architecture (CUDA) platform.
  • OpenMp multi-threading technology allocates the corresponding number of threads, builds CPU+multi-GPU heterogeneous computing mode and cooperates to complete the parallel computing of expected faults.
  • CUDA unified computing device architecture
  • OpenMp multi-threading technology allocates the corresponding number of threads, builds CPU+multi-GPU heterogeneous computing mode and cooperates to complete the parallel computing of expected faults.
  • On the basis of a single expected fault power flow calculation realizes multiple synchronous power flow iterative processes with high synchronization and parallel, effectively improving static security. It analyzes the expected parallel processing capability of fault scan, and is suitable for dispatching integrated large-scale systems. It is a very important application value to improve the scanning and calculation efficiency of static security analysis.
  • FIG. 1 is a flowchart of a CPU+multi-GPU heterogeneous mode calculation according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a CPU+multi-GPU heterogeneous mode according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of parallel calculation of a Huckby block matrix H element according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a CPU+multi-GPU heterogeneous mode static security analysis and calculation device according to an embodiment of the present invention.
  • GPU unique correspondence based on CPU Mixing programming development with GPU, constructing CPU+multi-GPU heterogeneous computing mode and coordinating to complete predictive fault parallel computing.
  • the embodiment of the invention provides a CPU+multi-GPU heterogeneous mode static security analysis and calculation method.
  • the OpenMP multi-threading technology based on the shared memory parallel system is adopted, and the GPU configuration situation and the expected fault calculation requirements are comprehensively considered.
  • On the basis of a single expected fault power flow calculation realize multiple high-speed synchronous parallel flow iteration process.
  • the static security analysis predicts the parallel processing capability of fault scanning.
  • the above-mentioned CPU+multi-GPU heterogeneous mode static security analysis calculation method includes the following steps, as shown in FIG. 1:
  • Step 1) Obtain a real-time section for state estimation, perform a ground state power flow calculation based on a Newton-Raphson algorithm, and provide a data section for reuse and sharing;
  • Step 2) Perform topology scanning on the entire network device according to user requirements, form an expected fault set, perform a deep topology search for each expected fault, and form an actual breaking branch-node information;
  • Step 3 Initialize the CUDA architecture, package the data according to the multi-break parallel computing requirement, and allocate the GPU memory space;
  • Step 4 According to the GPU configuration situation and the actual calculation amount, rationally allocate the interrupted computing resources, evaluate the number of GPUs that need to be enabled, and use OpenMP technology to generate the corresponding number of threads to maximize the GPU parallel computing capability.
  • the OpenMP technology is A set of instructional compilation processing schemes for multithreaded programming of shared memory parallel systems;
  • Step 5 under the coordination of the CPU, the flow calculation iterative process, including the admittance matrix correction Jacques
  • the ratio matrix solution, the modified equation solution, the state quantity update, the branch power flow calculation, and the device overrun check are all done in parallel by the GPU. All the break fault iterations are highly parallel, and the corresponding kernel is constructed according to the parallel characteristics of each step of the computation task. Function, complete the element parallel computing task in the iterative process on the fine-grained parallel level;
  • Step 6 judging whether the expected fault is all scan completed, if not, then proceeding to step 4), re-allocating the remaining disconnection faults, if all are completed, proceeding to step 7);
  • Step 7 the result shows, according to the scan result, the equipment or section overrun and overload information caused by the fault breaking is displayed, and the scan result statistics are performed according to the practical requirements of the static security analysis module of the dispatching system.
  • the step 4) specifically includes:
  • Step (41) according to the breaking calculation requirement and the GPU configuration situation, optimize the allocation of the breaking computing resource, and evaluate the single maximum GPU single-shot scanning breaking number:
  • S_max is the total memory space of the GPU card
  • S is the memory size required for a single power flow calculation
  • M_max is the maximum number of breaks that can be calculated by a single GPU card
  • the number of GPU cards required to be started is calculated as:
  • is the rounding
  • n is the number of GPU cards that need to be activated for this round
  • N is the total number of GPU cards configured by the system
  • M_cal is the total number of breaks required for the static security analysis of the round
  • M_max ⁇ N is the multi-GPU. Calculate the breaking capacity once;
  • Step (42) based on the shared multiplexed ground state cross-sectional model data, based on the OpenMP multi-threading technology, allocate the corresponding number of CPU threads according to the actual running GPU card number n, and each CPU thread uniquely corresponds to a single GPU to construct a CPU+multi-GPU.
  • the heterogeneous mode as shown in Figure 2, performs parallel operations on a single GPU card in accordance with CPU + single GPU heterogeneous mode.
  • the step 5) includes the following steps:
  • Step (51) based on the ground state admittance Y0, according to the N-1 breaking branch-node information, define a kernel function fun_ker nel_1, and perform partial correction of the node admittance array simultaneously for the breaking fault assigned to the card. And modify the node injection information to simulate device breaking through changes in the admittance array and node information;
  • Step (52) forming a breaking admittance Ym according to the step (51), and completing the parallel computing task of the Jacobian matrix element.
  • the current calculation formula uses the following formula for each element in the Jacobian block matrix:
  • G ii , B ii , G ij , B ij are admittance matrix non-zero elements, ⁇ ij is i, j node phase angle difference, P i , Q i are i node injection power, V i , V j respectively Is the i, j node voltage amplitude;
  • Each element of the block matrix is a state quantity node voltage phase angle, an admittance element and a trigonometric function.
  • the four basic operations and the solution process do not affect each other, and have obvious parallel characteristics;
  • h nozero is the non-zero number of the H array, and m is the number of breaks;
  • the GPU has multiple Stream Processors (SMs) that provide multiple thread calls.
  • SMs Stream Processors
  • the kernel function fun_ker nel_2_H completes the vector multiplication and vector addition operation according to equation (5), and the H-array element multi-thread parallel computation is shown in Fig. 3.
  • the CPU calls the fun_ker nel_2_H function.
  • the GPU enables h GPU multithreading according to the fun_ker nel_2_H incoming function parameter. Each thread corresponds to a non-zero element in the H matrix.
  • the concurrent execution is performed according to equation (5) to complete the Jacques. More than matrix element generation work;
  • Step (53) calculating the node injection power residual according to equation (10) according to the breaking admittance Ym and the initial node injection amount formed in step (51):
  • P is and Q is respectively the active and reactive injection quantities of node i.
  • the residuals of the injection quantities of each node are the voltage phase angle of the state quantity node, the four elements of the admittance element and the trigonometric function.
  • the basic operation independent of other node power residual solving processes, defines the fun_ker nel_3 kernel function, enables multiple threads in the stream processor (SM), and completes the parallel computing tasks of all the power imbalances of the computing nodes of the m breaking faults.
  • the detailed parallel computing process is similar to step (52);
  • Step (54) checking whether the power residual satisfies the convergence judgment basis, if yes, skips to step (58), and if not, proceeds to step (55) to iterate;
  • ⁇ P t , ⁇ Q t are the power deviations of the iteration t times, respectively, and T is the maximum number of iterations;
  • Step (55) performing LU decomposition on the Jacobian matrix formed in the step (52), performing linear equation solving on the node power residual calculated in the step (53), defining the fun_ker nel_5 kernel function, and performing task level on the linear equation group Parallel solution
  • Step (56) the linear equation solution obtained by step (55) updates the initial state vector according to equation (12),
  • n is the number of system nodes
  • r is the number of PV nodes
  • each node vector in the iterative process is only incrementally related to the node before and after the update, and does not depend on any other node calculation value, and has parallel natural attributes, and the node voltage update task can be implemented in parallel by GPU multi-threading;
  • Formula (12) is defined as an added kernel function fun_ker nel_6. For M expected fault parallel tasks, a total of M ⁇ (2(n-1)-r) threads are required to execute fun_ker nel_6 to complete a state quantity update calculation. ;
  • Step (57) jumping to step (52) to continue the iteration
  • Step (59) according to the branch power flow calculation result of step (58), for each branch or stable
  • the surface performs the overload check and saves the overload result of all current breaking faults, defines the fun_ker nel_8 kernel function, and completes the branch or stable section overload overload check parallel calculation.
  • FIG. 4 is a schematic structural diagram of a CPU+multi-GPU heterogeneous mode static security analysis and calculation device according to an embodiment of the present invention.
  • the device includes:
  • a memory 10 for storing a computer program
  • the processor 20 is connected to the memory 10, and is configured to implement a CPU+multi-GPU heterogeneous mode static security analysis calculation method provided by any one of the foregoing technical solutions by executing the computer program.
  • the processor 20 is configured to perform the following steps:
  • Step 1) Obtain a real-time section for state estimation, perform a ground state power flow calculation based on a Newton-Raphson algorithm, and provide a data section for reuse and sharing;
  • Step 2) Perform topology scanning on the entire network device according to user requirements, form an expected fault set, perform a deep topology search for each expected fault, and form an actual breaking branch-node information;
  • Step 3 Initialize the CUDA architecture, package the data according to the multi-break parallel computing requirement, and allocate the GPU memory space;
  • Step 4 According to the GPU configuration situation and the actual calculation amount, rationally allocate the breaking computing resources, evaluate the number of GPUs that need to be enabled, and use OpenMP technology to generate corresponding thread numbers to maximize the GPU parallel computing capability;
  • Step 5 Under the coordination of the CPU, the flow calculation iterative process, including the admittance matrix correction Jacobian matrix solution, the modified equation solution, the state quantity update, the branch power flow calculation, and the device overrun check, all completed by the GPU in parallel All breaking fault iterations are highly parallel, constructing corresponding kernel functions according to the parallel characteristics of each step of computing tasks, and completing the element parallel computing tasks in the iterative process on the fine-grained parallel level;
  • Step 6 judging whether the expected fault is all scanned, and if not, proceeding to step 4), Re-allocating the remaining breaking faults, if all are completed, proceed to step 7);
  • Step 7 the result shows, according to the scan result, the equipment or section overrun and overload information caused by the fault breaking is displayed, and the scan result statistics are performed according to the practical requirements of the static security analysis module of the dispatching system.
  • the performing, by the processor 20, the step 4) specifically includes:
  • Step (41) optimizing allocation of interrupting computing resources according to breaking calculation requirements and GPU configuration conditions
  • S_max is the total memory space of the GPU card
  • S is the memory size required for a single power flow calculation
  • M_max is the maximum number of breaks that can be calculated by a single GPU card
  • the number of GPU cards required to be started is calculated as:
  • is the rounding
  • n is the number of GPU cards that need to be activated for this round
  • N is the total number of GPU cards configured by the system
  • M_cal is the total number of breaks required for the static security analysis of the round
  • M_max ⁇ N is the multi-GPU. Calculate the breaking capacity once;
  • Step (42) based on the shared multiplexed ground state cross-sectional model data, based on the OpenMP multi-threading technology, allocate the corresponding number of CPU threads according to the actual running GPU card number n, and each CPU thread uniquely corresponds to a single GPU to construct a CPU+multi-GPU. Heterogeneous mode for parallel computing in a single GPU card in CPU + single GPU heterogeneous mode.
  • the performing, by the processor 20, the step 5) specifically includes:
  • Step (51) based on the ground state admittance Y0, according to the N-1 breaking branch-node information, define a kernel function fun_ker nel_1, and perform partial correction of the node admittance array simultaneously for the breaking fault assigned to the card. And modify the node injection information to simulate device breaking through changes in the admittance array and node information;
  • Step (52) forming a breaking admittance Ym according to the step (51), and completing the parallel computing task of the Jacobian matrix element.
  • the current calculation formula uses the following formula for each element in the Jacobian block matrix:
  • G ii , B ii , G ij , B ij are admittance matrix non-zero elements, ⁇ ij is i, j node phase angle difference, P i , Q i are i node injection power, V i , V j respectively Is the i, j node voltage amplitude;
  • Each element of the block matrix is the voltage phase angle of the state quantity node, the four basic operations of the admittance element and the trigonometric function, and the solution process does not affect each other, and has obvious parallel characteristics;
  • h nozero is the non-zero number of the H array, and m is the number of breaks;
  • the GPU has multiple Stream Processors (SMs) that provide multiple thread calls.
  • SMs Stream Processors
  • the kernel function fun_ker nel_2_H completes the vector multiplication and vector addition operations according to equation (5).
  • the CPU calls the fun_ker nel_2_H function, and the GPU transmits according to fun_ker nel_2_H.
  • Into the function parameter enables h GPU multi-threading, while completing the parallel operation of equation (5);
  • Step (53) calculating the node injection power residual according to equation (10) according to the breaking admittance Ym and the initial node injection amount formed in step (51):
  • P is and Q is respectively the active and reactive injection quantities of node i.
  • the residuals of the injection quantities of each node are the voltage phase angle of the state quantity node, the four elements of the admittance element and the trigonometric function.
  • the basic operation independent of other node power residual solving processes, defines the fun_ker nel_3 kernel function, enables multiple threads in the stream processor (SM), and completes the parallel computing tasks of all the power imbalances of the computing nodes of the m breaking faults.
  • the detailed parallel computing process is similar to step (52);
  • Step (54) checking whether the power residual satisfies the convergence judgment basis, if yes, skips to step (58), and if not, proceeds to step (55) to iterate;
  • Step (55) performing LU decomposition on the Jacobian matrix formed in the step (52), performing linear equation solving on the node power residual calculated in the step (53), defining the fun_ker nel_5 kernel function, and performing task level on the linear equation group Parallel solution
  • Step (56) is defined as an added function kernel function fun_ker nel_6.
  • a total of M ⁇ (2(n-1)-r) threads are required to execute fun_ker nel_6 to complete a state quantity update calculation. ;
  • Step (57) jumping to step (52) to continue the iteration
  • Step (59) according to the calculation result of the branch power flow of step (58), perform overload check verification on each branch or stable section, and save the overload overload result caused by all current breaking faults, and define fun_ker nel_8 Kernel function, complete branch or stable section overload overload check parallel calculation.
  • step (54) of the step 5) is specifically:
  • ⁇ P t , ⁇ Q t are the power deviations of the iteration t times, respectively, and T is the maximum number of iterations;
  • step (56) of the step 5) is defined as an added kernel function fun_ker nel_6 is:
  • n is the number of system nodes
  • r is the number of PV nodes.
  • the memory 10 in this embodiment may be a storage structure including various storage media, for example, may include a storage medium such as a random storage medium, a read-only storage medium, a flash memory, etc., and may be used for non-instantaneous storage of the computer. program.
  • the processor 20 can be a central processing unit (CPU), or a graphics processing unit (GPU) or a microprocessor (MPU, The Microprocessor Unit), or a Digital Signal Processor (DSP), or a Field-Programmable Gate Array (FPGA), can be implemented by executing a computer program stored in the memory 10.
  • CPU central processing unit
  • GPU graphics processing unit
  • MPU microprocessor
  • DSP Digital Signal Processor
  • FPGA Field-Programmable Gate Array
  • the processor 20 and the memory 10 may be connected through a bus interface 30 (not shown in FIG. 4).
  • the bus interface 30 may include an integrated circuit (IIC, Inter). -Integrated Circuit) bus.
  • the embodiment further provides a storage medium, which stores a computer program, and after the computer program is executed by the processor, can implement the CPU+multi-GPU heterogeneous mode static security analysis calculation method according to any one of the foregoing.
  • the computer storage medium may be various types of storage media, and may be preferably a non-transitory storage medium in this embodiment.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner such as: multiple units or components may be combined, or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling, or direct coupling, or communication connection of the components shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place.
  • the party may also be distributed to multiple network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may be separately used as one unit, or two or more units may be integrated into one unit; the above integration
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage device includes the following steps: the foregoing storage medium includes: a mobile storage device, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or an optical disk.
  • optical disk A medium that can store program code.
  • the technical solution of the embodiment of the present invention is directed to the requirement of fast scanning of the static security analysis of the large power grid in actual engineering applications.
  • the OpenMp multi-threading technology is used to allocate the corresponding number of threads, each thread and
  • the unique correspondence of a single GPU is based on the mixed programming development of CPU and GPU, and the CPU+multi-GPU heterogeneous computing mode is coordinated to complete the parallel computing of expected faults.
  • multiple breaking power flow iterative processes are highly synchronized and parallel.
  • the static security analysis predicts the parallel processing capability of fault scanning, and it is online security for the integrated power grid integrated dispatching system. Analysis of early warning scans provides strong technical support.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computer Hardware Design (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Public Health (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

一种CPU+多GPU异构模式静态安全分析计算方法,针对实际工程应用中大电网静态安全分析快速扫描的需求,在统一计算设备架构(CUDA)平台上,根据系统GPU配置情况和计算需求,采用OpenMp多线程技术分配相应线程数,每个线程与单个GPU唯一对应,基于CPU与GPU混合编程开发,构建CPU+多GPU异构计算模式协同配合完成预想故障并行计算,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,通过元素级细粒度并行大幅提高静态安全分析预想故障扫描并行处理能力,为互联大电网一体化调度系统在线安全分析预警扫描提供了有力的技术支撑。还公开了一种CPU+多GPU异构模式静态安全分析计算装置及计算机存储介质。

Description

一种静态安全分析计算方法、装置及计算机存储介质
相关申请的交叉引用
本申请基于申请号为201710037124.0、申请日为2017年01月19日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本发明涉及电力系统自动化调度技术领域,尤其涉及一种CPU+多GPU异构模式静态安全分析计算方法、装置及计算机存储介质。
背景技术
随着电力系统调度一体化技术的快速发展,电网计算规模日趋扩大。在全网状态估计断面基础上进行静态安全分析计算,计算速度将难以满足需求。一方面,需要进行开断分析的预想事故数目大幅度增大,即使通过直流法筛选,需要详细开断扫描的预想故障数目也将非常可观。另一方面,每个预想事故进行开断潮流模拟计算,计算节点数增大,造成单潮流计算速度缓慢。此外,电网规模的扩大造成需要越限监视的设备数和稳定断面数大幅度增加,计算量持续增大。因此,现代电力系统的迅速发展,对调度系统静态安全分析应用在计算上的性能问题提出了严峻的考验。传统串行算法采用直流法筛选结合基于稀疏矩阵的局部因子分解等手段进行串行工程化处理,过滤非严重故障,有效提高单次开断分析速度,但仍然无法满足工程应用需求。
求解过程发现,静态安全分析预想故障之间相互解耦相互独立,可分解为多个独立的全潮流计算,具有并行化处理先天优势,这种特性逐渐受 到研究人员关注,并开始尝试采用并行算法寻求突破。
近年来,计算机领域英伟达公司提出了统一计算设备架构(CUDA,Compute Unified Device Architecture),为图形处理器(GPU,Graphic Processing Unit)发展提供了良好的并行计算框架,能够适用于电力系统具有并行特性的研究领域。同时,在CPU领域,用于共享内存并行系统的OpenMP多线程技术,由编译器自动将循环并行化,能够有效提高多处理器系统应用程序性能。
综上,对于静态安全分析快速扫描应用场景,如何有效使用GPU并行和CPU并行技术,满足工程现场对调度自动化系统静态安全分析应用计算规模及计算效率越来越高的需求,是值得深思探究的研究方向。
发明内容
为了克服现有技术的上述缺陷,本发明实施例期望提供一种CPU+多GPU异构模式静态安全分析计算方法、装置及计算机存储介质,针对实际工程应用中大电网静态安全分析快速扫描的需求,在统一计算设备架构(CUDA)平台上,根据系统GPU配置情况和计算需求,采用OpenMp多线程技术分配相应线程数,每个线程与单个GPU唯一对应,基于CPU与GPU混合编程开发,构建CPU+多GPU异构计算模式协同配合完成预想故障并行计算,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,通过元素级细粒度并行大幅提高静态安全分析预想故障扫描并行处理能力,为互联大电网一体化调度系统在线安全分析预警扫描提供了有力的技术支撑。
为达到上述目的,本发明实施例的技术方案是这样实现的:
第一方面,本发明实施例提供了一种CPU+多GPU异构模式静态安全分析计算方法,所述CPU+多GPU异构模式静态安全分析计算方法包括:在统一计算设备架构(CUDA)平台上,采用基于共享内存并行系统的 OpenMP多线程技术,综合考虑GPU配置情况及预想故障计算需求,确定CPU线程数目,构建CPU+多GPU异构模式,每块GPU卡内部按照CPU+单GPU模式协同配合完成并行扫描任务,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,通过元素级细粒度并行大幅提高静态安全分析预想故障扫描并行处理能力。
在一实施例中,所述CPU+多GPU异构模式静态安全分析计算方法具体包括以下步骤:
步骤1)、获取状态估计实时断面,基于牛顿-拉夫逊算法进行基态潮流计算,提供一个可供复用共享的数据断面;
步骤2)、根据用户需求,对全网设备进行拓扑扫描,形成预想故障集,针对每个预想故障进行深度拓扑搜索,形成实际开断支路-节点信息;
步骤3)、初始化CUDA架构,根据多开断并行计算需求将数据进行打包处理,并分配GPU内存空间;
步骤4)、根据GPU配置情况和实际计算量,对开断计算资源进行合理化分配,评估需要启用的GPU数目,采用OpenMP技术生成相应的线程数,以最大程度发挥GPU并行计算能力;
步骤5)、在CPU的统筹下,潮流计算迭代过程,包括导纳阵修正雅克比矩阵求解、修正方程求解、状态量更新、支路潮流计算及设备越限校核,全部交由GPU并行完成,所有开断故障迭代高度并行,根据每一步计算任务的并行特性构造相应的内核函数,在细粒度并行层面上完成迭代过程中元素并行计算任务;
步骤6)、判断预想故障是否全部扫描完成,如果否,则转入步骤4),对剩余的开断故障重新进行资源分配,如果全部完成,则转入步骤7);
步骤7)、结果展示,根据扫描结果,对故障开断造成的设备或断面越限重载信息进行展示,并根据调度系统静态安全分析模块实用化需求进行 扫描结果统计。
在一实施例中,所述步骤4)具体包括:
步骤(41)、根据开断计算需求和GPU配置情况,优化分配开断计算资源,评估单块GPU单次最大扫描开断数:
Figure PCTCN2017093226-appb-000001
其中,S_max为GPU卡总内存空间,S为单个潮流计算所需内存大小,M_max为单块GPU卡所能计算的最大开断数;
根据式(1),计算所需启动的GPU卡数目为:
Figure PCTCN2017093226-appb-000002
其中,α表示为取整,n为本轮所需启动的GPU卡数,N为系统配置的GPU卡总数,M_cal为本轮静态安全分析所需计算的开断总数,M_max·N为多GPU一次计算开断能力;
根据式(2),按照平均分配原则,每块卡实际计算开断数:
Figure PCTCN2017093226-appb-000003
当M_max·N≤M_cal,意味着GPU全部参与计算无法一次完成所有开断计算,则将剩余开断M_cal′重新按照式(2-3)进行新一轮分配,分批多次计算,计算公式为:
M_cal′=M_cal-M_max·N        (4)
步骤(42)、在共享复用的基态断面模型数据基础上,基于OpenMP多线程技术根据实际运行GPU卡数目n分配相应的CPU线程数,每个CPU线程与单个GPU唯一对应,构建CPU+多GPU异构模式,在单个GPU卡内按照CPU+单GPU异构模式进行并行运算。
在一实施例中,所述步骤5)包括以下步骤:
步骤(51)、在基态导纳阵Y0基础上,根据N-1开断支路-节点信息,定义内核函数fun_ker nel_1,对分配至该卡的开断故障同时进行节点导纳阵局部修正工作,并修改节点注入信息,通过导纳阵和节点信息的变化来模拟设备开断;
步骤(52)、根据步骤(51)形成开断导纳阵Ym,完成雅克比矩阵元素并行计算任务,为便于调用CUDA内核函数,潮流计算雅克比分块矩阵中各元素采用如下计算公式:
Figure PCTCN2017093226-appb-000004
Figure PCTCN2017093226-appb-000005
Figure PCTCN2017093226-appb-000006
Figure PCTCN2017093226-appb-000007
其中,Gii,Bii,Gij,Bij分别为导纳阵非零元,θij为i,j节点相角差,Pi,Qi为i节点注入功率,Vi,Vj分别为i,j节点电压幅值;
分块矩阵各元素均为状态量节点电压相角,导纳阵元素及三角函数的四则基本运算且求解过程互不影响,具有明显的并行特性;
分别定义4个内核函数对应计算分块矩阵中各元素值,单卡内,m个开断故障雅克比H矩阵计算量:
h=m×hnozero         (9)
其中,hnozero为H阵非零元数,m为开断数;
在CUDA架构中,GPU具有多个流处理器(SM,Streaming Multiprocessors),可提供多个线程调用。h个元素之间没有任何耦合关系, 具有高度的元素级细粒度并行特性,按式(5)定义内核函数fun_ker nel_2_H完成式向量乘和向量加操作,CPU调用fun_ker nel_2_H函数,GPU根据fun_ker nel_2_H传入函数参数启用h个GPU多线程,同时完成式(5)并行操作;
步骤(53)、根据步骤(51)形成的开断导纳阵Ym和初始节点注入量,按照式(10)进行节点注入功率残差计算:
Figure PCTCN2017093226-appb-000008
其中,Pis,Qis分别为节点i的有功、无功注入量,由上式可以看出,各节点注入量残差均为状态量节点电压相角,导纳阵元素及三角函数的四则基本运算,与其他节点功率残差求解过程互不依赖,定义fun_ker nel_3内核函数,启用流处理器(SM)中多个线程,完成m个开断故障所有计算节点功率不平衡量的并行计算任务,详细并行计算过程与步骤(52)相似;
步骤(54)、检查功率残差是否满足收敛判断依据,如果满足,则跳转至步骤(58),如果不满足,则继续转入步骤(55)进行迭代;
首先,定义fun_ker nel_4内核函数,GPU对各开断节点功率残差按式(11)进行收敛检查,如果满足收敛条件则记录该开断故障潮流收敛;
||ΔPt,ΔQt||<ε t≤T      (11)
其中,ε为功率收敛判据,ΔPt,ΔQt分别为迭代t次的功率偏差,T为最大迭代次数;
再次,由于各开断故障收敛迭代步数不一致,设定k≥80%开断故障已收敛,则结束潮流迭代过程,否则所有开断故障继续进行潮流迭代,直至满足k≥80%收敛条件或者达到最大迭代次数,跳转至步骤(58);
步骤(55)、对步骤(52)所形成的雅克比矩阵进行LU分解,结合步骤(53)计算的节点功率残差进行线性方程组求解,定义fun_ker nel_5内核 函数,对线性方程组进行任务级并行求解;
步骤(56)、由步骤(55)所得的线性方程组解对初始状态向量按式(12)进行更新
Figure PCTCN2017093226-appb-000009
式中,
Figure PCTCN2017093226-appb-000010
分别为线性方程组第i个潮流方程式状态向量第k次迭代前、迭代后值,
Figure PCTCN2017093226-appb-000011
为第k次修正值,n为系统节点数,r为PV节点数;
由上式可见,迭代过程中每一个节点向量更新前后仅与该节点增量相关,不依赖于其他任何节点计算值,具有并行天然属性,可通过GPU多线程并行实现节点电压更新任务;
将式(12)定义为一个加运算的内核函数fun_ker nel_6,对于m个预想故障并行任务,共需要m×(2(n-1)-r)个线程同步执行fun_ker nel_6完成一次状态量更新计算;
步骤(57)、跳转至步骤(52)继续迭代;
步骤(58)、根据当前收敛的开断故障潮流计算结果,进行支路潮流计算,由于支路潮流仅和支路参数及两侧节点电压值相关,互不依赖,定义fun_ker nel_7内核函数,完成支路潮流并行计算;
步骤(59)、根据步骤(58)支路潮流计算结果,对各支路或稳定断面进行重载越限校验,并保存当前所有开断故障造成的重载越限结果,定义fun_ker nel_8内核函数,完成支路或稳定断面重载越限校验并行计算。
第二方面,本发明实施例提供了一种CPU+多GPU异构模式静态安全分析计算装置,所述装置包括:
存储器,用于存储计算机程序;
处理器,与所述存储器连接,用于通过执行所述计算机程序,实现前述任意一个技术方案提供的CPU+多GPU异构模式静态安全分析计算方法。
第三方面,本发明实施例还提供了一种计算机存储介质,所述计算机 存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行前述任意一个技术方案提供的CPU+多GPU异构模式静态安全分析计算方法。
本发明实施例所述技术方案的有益效果是:
本发明实施例所述的OpenMP技术的CPU+多GPU异构模式静态安全分析计算方法、装置及计算机存储介质,在统一计算设备架构(CUDA)平台上,能够根据系统GPU配置情况和计算需求,采用OpenMp多线程技术分配相应线程数,构建CPU+多GPU异构计算模式协同配合完成预想故障并行计算,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,有效提升静态安全分析预想故障扫描并行处理能力,适用于调度一体化大规模系统对静态安全分析预想故障快速扫描应用场景,对提升大电网静态安全分析扫描计算效率具有非常重要的应用价值。
附图说明
图1为本发明实施例提供的CPU+多GPU异构模式计算流程图;
图2为本发明实施例提供的CPU+多GPU异构模式结构示意图;
图3为本发明实施例提供的雅克比分块矩阵H元素并行计算示意图;
图4为本发明实施例提供的CPU+多GPU异构模式静态安全分析计算装置的组成结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
本发明的目的在于提出一种适用于静态安全分析的快速并行方法。针对实际工程应用中大电网静态安全分析快速扫描的需求,在统一计算设备架构(CUDA)平台上,根据系统GPU配置情况和计算需求,采用OpenMp多线程技术分配相应线程数,每个线程与单个GPU唯一对应,基于CPU 与GPU混合编程开发,构建CPU+多GPU异构计算模式协同配合完成预想故障并行计算,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,通过元素级细粒度并行大幅提高静态安全分析预想故障扫描并行处理能力,为互联大电网一体化调度系统在线安全分析预警扫描提供了有力的技术支撑。
本发明实施例提供了一种CPU+多GPU异构模式静态安全分析计算方法,其中,在CUDA平台上,采用基于共享内存并行系统的OpenMP多线程技术,综合考虑GPU配置情况及预想故障计算需求,确定CPU线程数目,构建CPU+多GPU异构模式,每块GPU卡内部按照CPU+单GPU模式协同配合完成并行扫描任务,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,通过元素级细粒度并行大幅提高静态安全分析预想故障扫描并行处理能力。
上述的一种CPU+多GPU异构模式静态安全分析计算方法,包括以下步骤,如图1所示:
步骤1)、获取状态估计实时断面,基于牛顿-拉夫逊算法进行基态潮流计算,提供一个可供复用共享的数据断面;
步骤2)、根据用户需求,对全网设备进行拓扑扫描,形成预想故障集,针对每个预想故障进行深度拓扑搜索,形成实际开断支路-节点信息;
步骤3)、初始化CUDA架构,根据多开断并行计算需求将数据进行打包处理,并分配GPU内存空间;
步骤4)、根据GPU配置情况和实际计算量,对开断计算资源进行合理化分配,评估需要启用的GPU数目,采用OpenMP技术生成相应的线程数,以最大程度发挥GPU并行计算能力,OpenMP技术是一套用于共享内存并行系统多线程程序设计的指导性编译处理方案;
步骤5)、在CPU的统筹下,潮流计算迭代过程,包括导纳阵修正雅克 比矩阵求解、修正方程求解、状态量更新、支路潮流计算及设备越限校核,全部交由GPU并行完成,所有开断故障迭代高度并行,根据每一步计算任务的并行特性构造相应的内核函数,在细粒度并行层面上完成迭代过程中元素并行计算任务;
步骤6)、判断预想故障是否全部扫描完成,如果否,则转入步骤4),对剩余的开断故障重新进行资源分配,如果全部完成,则转入步骤7);
步骤7)、结果展示,根据扫描结果,对故障开断造成的设备或断面越限重载信息进行展示,并根据调度系统静态安全分析模块实用化需求进行扫描结果统计。
本实施例中,所述步骤4)具体包括:
步骤(41)、根据开断计算需求和GPU配置情况,优化分配开断计算资源,评估单块GPU单次最大扫描开断数:
Figure PCTCN2017093226-appb-000012
其中,S_max为GPU卡总内存空间,S为单个潮流计算所需内存大小,M_max为单块GPU卡所能计算的最大开断数;
根据式(1),计算所需启动的GPU卡数目为:
Figure PCTCN2017093226-appb-000013
其中,α表示为取整,n为本轮所需启动的GPU卡数,N为系统配置的GPU卡总数,M_cal为本轮静态安全分析所需计算的开断总数,M_max·N为多GPU一次计算开断能力;
根据式(2),按照平均分配原则,每块卡实际计算开断数:
Figure PCTCN2017093226-appb-000014
当M_max·N≤M_cal,意味着GPU全部参与计算无法一次完成所有开 断计算,则将剩余开断M_cal′重新按照式(2-3)进行新一轮分配,分批多次计算,计算公式为:
M_cal′=M_cal-M_max·N      (4)
步骤(42)、在共享复用的基态断面模型数据基础上,基于OpenMP多线程技术根据实际运行GPU卡数目n分配相应的CPU线程数,每个CPU线程与单个GPU唯一对应,构建CPU+多GPU异构模式,如图2所示,在单个GPU卡按照CPU+单GPU异构模式进行并行运算。
本实施例中,所述步骤5)包括以下步骤:
步骤(51)、在基态导纳阵Y0基础上,根据N-1开断支路-节点信息,定义内核函数fun_ker nel_1,对分配至该卡的开断故障同时进行节点导纳阵局部修正工作,并修改节点注入信息,通过导纳阵和节点信息的变化来模拟设备开断;
步骤(52)、根据步骤(51)形成开断导纳阵Ym,完成雅克比矩阵元素并行计算任务,为便于调用CUDA内核函数,潮流计算雅克比分块矩阵中各元素采用如下计算公式:
Figure PCTCN2017093226-appb-000015
Figure PCTCN2017093226-appb-000016
Figure PCTCN2017093226-appb-000017
Figure PCTCN2017093226-appb-000018
其中,Gii,Bii,Gij,Bij分别为导纳阵非零元,θij为i,j节点相角差,Pi,Qi为i节点注入功率,Vi,Vj分别为i,j节点电压幅值;
分块矩阵各元素均为状态量节点电压相角,导纳阵元素及三角函数的 四则基本运算且求解过程互不影响,具有明显的并行特性;
分别定义4个内核函数对应计算分块矩阵中各元素值,单卡内,m个开断故障雅克比H矩阵计算量:
h=m×hnozero       (9)
其中,hnozero为H阵非零元数,m为开断数;
在CUDA架构中,GPU具有多个流处理器(SM),可提供多个线程调用。h个元素之间没有任何耦合关系,具有高度的元素级细粒度并行特性,按式(5)定义内核函数fun_ker nel_2_H完成式向量乘和向量加操作,H阵元素多线程并行计算如图3所示,CPU调用fun_ker nel_2_H函数,GPU根据fun_ker nel_2_H传入函数参数启用h个GPU多线程,每一个线程对应于H阵中一个非零元,在t0时刻按照式(5)并发执行,共同完成雅克比矩阵元素生成工作;
步骤(53)、根据步骤(51)形成的开断导纳阵Ym和初始节点注入量,按照式(10)进行节点注入功率残差计算:
Figure PCTCN2017093226-appb-000019
其中,Pis,Qis分别为节点i的有功、无功注入量,由上式可以看出,各节点注入量残差均为状态量节点电压相角,导纳阵元素及三角函数的四则基本运算,与其他节点功率残差求解过程互不依赖,定义fun_ker nel_3内核函数,启用流处理器(SM)中多个线程,完成m个开断故障所有计算节点功率不平衡量的并行计算任务,详细并行计算过程与步骤(52)相似;
步骤(54)、检查功率残差是否满足收敛判断依据,如果满足,则跳转至步骤(58),如果不满足,则继续转入步骤(55)进行迭代;
首先,定义fun_ker nel_4内核函数,GPU对各开断节点功率残差按式(11)进行收敛检查,如果满足收敛条件则记录该开断故障潮流收敛;
||ΔPt,ΔQt||<ε t≤T       (11)
其中,ε为功率收敛判据,ΔPt,ΔQt分别为迭代t次的功率偏差,T为最大迭代次数;
再次,由于各开断故障收敛迭代步数不一致,设定k≥80%开断故障已收敛,则结束潮流迭代过程,否则所有开断故障继续进行潮流迭代,直至满足k≥80%收敛条件或者达到最大迭代次数,跳转至步骤(58);
步骤(55)、对步骤(52)所形成的雅克比矩阵进行LU分解,结合步骤(53)计算的节点功率残差进行线性方程组求解,定义fun_ker nel_5内核函数,对线性方程组进行任务级并行求解;
步骤(56)、由步骤(55)所得的线性方程组解对初始状态向量按式(12)进行更新,
Figure PCTCN2017093226-appb-000020
式中,
Figure PCTCN2017093226-appb-000021
分别为线性方程组第i个潮流方程式状态向量第k次迭代前、迭代后值,
Figure PCTCN2017093226-appb-000022
为第k次修正值,n为系统节点数,r为PV节点数;
由上式可见,迭代过程中每一个节点向量更新前后仅与该节点增量相关,不依赖于其他任何节点计算值,具有并行天然属性,可通过GPU多线程并行实现节点电压更新任务;
将式(12)定义为一个加运算的内核函数fun_ker nel_6,对于M个预想故障并行任务,共需要M×(2(n-1)-r)个线程同步执行fun_ker nel_6完成一次状态量更新计算;
步骤(57)、跳转至步骤(52)继续迭代;
步骤(58)、根据当前收敛的开断故障潮流计算结果,进行支路潮流计算,由于支路潮流仅和支路参数及两侧节点电压值相关,互不依赖,定义fun_ker nel_7内核函数,完成支路潮流并行计算;
步骤(59)、根据步骤(58)的支路潮流计算结果,对各支路或稳定断 面进行重载越限校验,并保存当前所有开断故障造成的重载越限结果,定义fun_ker nel_8内核函数,完成支路或稳定断面重载越限校验并行计算。
为实现上述方法,本发明实施例还提供了一种CPU+多GPU异构模式静态安全分析计算装置,图4为本发明实施例CPU+多GPU异构模式静态安全分析计算装置的组成结构示意图,该装置包括:
存储器10,用于存储计算机程序;
处理器20,与所述存储器10连接,用于通过执行所述计算机程序,实现前述任意一个技术方案提供的CPU+多GPU异构模式静态安全分析计算方法。
作为一种实施方式,所述处理器20,用于执行以下步骤:
步骤1)、获取状态估计实时断面,基于牛顿-拉夫逊算法进行基态潮流计算,提供一个可供复用共享的数据断面;
步骤2)、根据用户需求,对全网设备进行拓扑扫描,形成预想故障集,针对每个预想故障进行深度拓扑搜索,形成实际开断支路-节点信息;
步骤3)初始化CUDA架构,根据多开断并行计算需求将数据进行打包处理,并分配GPU内存空间;
步骤4)、根据GPU配置情况和实际计算量,对开断计算资源进行合理化分配,评估需要启用的GPU数目,采用OpenMP技术生成相应的线程数,以最大程度发挥GPU并行计算能力;
步骤5)、在CPU的统筹下,潮流计算迭代过程,包括导纳阵修正雅克比矩阵求解、修正方程求解、状态量更新、支路潮流计算及设备越限校核,全部交由GPU并行完成,所有开断故障迭代高度并行,根据每一步计算任务的并行特性构造相应的内核函数,在细粒度并行层面上完成迭代过程中元素并行计算任务;
步骤6)、判断预想故障是否全部扫描完成,如果否,则转入步骤4), 对剩余的开断故障重新进行资源分配,如果全部完成,则转入步骤7);
步骤7)、结果展示,根据扫描结果,对故障开断造成的设备或断面越限重载信息进行展示,并根据调度系统静态安全分析模块实用化需求进行扫描结果统计。
作为一种实施方式,所述处理器20执行步骤4)具体包括:
步骤(41)、根据开断计算需求和GPU配置情况,优化分配开断计算资源,
评估单块GPU单次最大扫描开断数:
Figure PCTCN2017093226-appb-000023
其中,S_max为GPU卡总内存空间,S为单个潮流计算所需内存大小,M_max为单块GPU卡所能计算的最大开断数;
根据式(1),计算所需启动的GPU卡数目为:
Figure PCTCN2017093226-appb-000024
其中,α表示为取整,n为本轮所需启动的GPU卡数,N为系统配置的GPU卡总数,M_cal为本轮静态安全分析所需计算的开断总数,M_max·N为多GPU一次计算开断能力;
根据式(2),按照平均分配原则,每块卡实际计算开断数:
Figure PCTCN2017093226-appb-000025
当M_max·N≤M_cal,意味着GPU全部参与计算无法一次完成所有开断计算,则将剩余开断M_cal′重新按照式(2-3)进行新一轮分配,分批多次计算,计算公式为:
M_cal′=M_cal-M_max·N        (4)
步骤(42)、在共享复用的基态断面模型数据基础上,基于OpenMP多线程技术根据实际运行GPU卡数目n分配相应的CPU线程数,每个CPU线程与单个GPU唯一对应,构建CPU+多GPU异构模式,在单个GPU卡内按照CPU+单GPU异构模式进行并行运算。
作为一种实施方式,所述处理器20执行步骤5)具体包括:
步骤(51)、在基态导纳阵Y0基础上,根据N-1开断支路-节点信息,定义内核函数fun_ker nel_1,对分配至该卡的开断故障同时进行节点导纳阵局部修正工作,并修改节点注入信息,通过导纳阵和节点信息的变化来模拟设备开断;
步骤(52)、根据步骤(51)形成开断导纳阵Ym,完成雅克比矩阵元素并行计算任务,为便于调用CUDA内核函数,潮流计算雅克比分块矩阵中各元素采用如下计算公式:
Figure PCTCN2017093226-appb-000026
Figure PCTCN2017093226-appb-000027
Figure PCTCN2017093226-appb-000028
Figure PCTCN2017093226-appb-000029
其中,Gii,Bii,Gij,Bij分别为导纳阵非零元,θij为i,j节点相角差,Pi,Qi为i节点注入功率,Vi,Vj分别为i,j节点电压幅值;
分块矩阵各元素均为状态量节点电压相角,导纳阵元素及三角函数的四则基本运算且求解过程互不影响,具有明显的并行特性;
分别定义4个内核函数对应计算分块矩阵中各元素值,单卡内,m个开断故障雅克比H矩阵计算量:
h=m×hnozero         (9)
其中,hnozero为H阵非零元数,m为开断数;
在CUDA架构中,GPU具有多个流处理器(SM),可提供多个线程调用。h个元素之间没有任何耦合关系,具有高度的元素级细粒度并行特性,按式(5)定义内核函数fun_ker nel_2_H完成式向量乘和向量加操作,CPU调用fun_ker nel_2_H函数,GPU根据fun_ker nel_2_H传入函数参数启用h个GPU多线程,同时完成式(5)并行操作;
步骤(53)、根据步骤(51)形成的开断导纳阵Ym和初始节点注入量,按照式(10)进行节点注入功率残差计算:
Figure PCTCN2017093226-appb-000030
其中,Pis,Qis分别为节点i的有功、无功注入量,由上式可以看出,各节点注入量残差均为状态量节点电压相角,导纳阵元素及三角函数的四则基本运算,与其他节点功率残差求解过程互不依赖,定义fun_ker nel_3内核函数,启用流处理器(SM)中多个线程,完成m个开断故障所有计算节点功率不平衡量的并行计算任务,详细并行计算过程与步骤(52)相似;
步骤(54)、检查功率残差是否满足收敛判断依据,如果满足,则跳转至步骤(58),如果不满足,则继续转入步骤(55)进行迭代;
步骤(55)、对步骤(52)所形成的雅克比矩阵进行LU分解,结合步骤(53)计算的节点功率残差进行线性方程组求解,定义fun_ker nel_5内核函数,对线性方程组进行任务级并行求解;
步骤(56)、定义为一个加运算的内核函数fun_ker nel_6,对于M个预想故障并行任务,共需要M×(2(n-1)-r)个线程同步执行fun_ker nel_6完成一次状态量更新计算;
步骤(57)、跳转至步骤(52)继续迭代;
步骤(58)、根据当前收敛的开断故障潮流计算结果,进行支路潮流计算,由于支路潮流仅和支路参数及两侧节点电压值相关,互不依赖,定义fun_ker nel_7内核函数,完成支路潮流并行计算;
步骤(59)、根据步骤(58)的支路潮流计算结果,对各支路或稳定断面进行重载越限校验,并保存当前所有开断故障造成的重载越限结果,定义fun_ker nel_8内核函数,完成支路或稳定断面重载越限校验并行计算。
其中,所述步骤5)的第(54)步具体为:
首先,定义fun_ker nel_4内核函数,GPU对各开断节点功率残差按式(11)进行收敛检查,如果满足收敛条件则记录该开断故障潮流收敛;
||ΔPt,ΔQt||<ε t≤T       (11)
其中,ε为功率收敛判据,ΔPt,ΔQt分别为迭代t次的功率偏差,T为最大迭代次数;
再次,由于各开断故障收敛迭代步数不一致,设定k≥80%开断故障已收敛,则结束潮流迭代过程,否则所有开断故障继续进行潮流迭代,直至满足k≥80%收敛条件或者达到最大迭代次数,跳转至步骤(58)。
其中,所述步骤5)的第(56)步定义为一个加运算的内核函数fun_ker nel_6为:
Figure PCTCN2017093226-appb-000031
式中,
Figure PCTCN2017093226-appb-000032
分别为线性方程组第i个潮流方程式状态向量第k次迭代前、迭代后值,
Figure PCTCN2017093226-appb-000033
为第k次修正值,n为系统节点数,r为PV节点数。
在实际应用中,本实施例所述存储器10可为包括各种存储介质的存储结构,例如,可包括随机存储介质、只读存储介质、闪存等存储介质,可用于非瞬间的存储所述计算机程序。
在实际应用中,所述处理器20可为中央处理器(CPU,Central Processing Unit)、或图像处理器(GPU,Graphic Processing Unit)或微处理器(MPU, Microprocessor Unit)、或数字信号处理器(DSP,Digital Signal Processor)、或现场可编程门阵列(FPGA,Field-Programmable Gate Array)等实现,可以通过执行存储器10存储的计算机程序实现上述操作。
在本实施例中所述处理器20和所述存储器10可通过总线接口30(在图4中未示出)连接,在本实施例中所述总线接口30可包括:集成电路(IIC,Inter-Integrated Circuit)总线等。
本实施例还提供一种存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行之后,能够实现前述任意一项所述的CPU+多GPU异构模式静态安全分析计算方法。
所述计算机存储介质可为各种类型的存储介质,在本实施例中可优选为非瞬间存储介质。
本领域技术人员应当理解,本实施例的存储介质中各程序的功能,可参照实施例所述的CPU+多GPU异构模式静态安全分析计算方法的相关描述而理解。
本领域技术人员应当理解,本实施例的存储介质中各程序的功能,可参照实施例所述的任务调度方法的相关描述而理解。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元,即可以位于一个地 方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本发明各实施例中的各功能单元可以全部集成在一个处理模块中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思做出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。
工业实用性
本发明实施例的技术方案针对实际工程应用中大电网静态安全分析快速扫描的需求,在CUDA平台上,根据系统GPU配置情况和计算需求,采用OpenMp多线程技术分配相应线程数,每个线程与单个GPU唯一对应,基于CPU与GPU混合编程开发,构建CPU+多GPU异构计算模式协同配合完成预想故障并行计算,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程高度同步并行,通过元素级细粒度并行大幅提高静态安全分析预想故障扫描并行处理能力,为互联大电网一体化调度系统在线安全 分析预警扫描提供了有力的技术支撑。

Claims (8)

  1. 一种CPU+多GPU异构模式静态安全分析计算方法,包括:在统一计算设备架构CUDA平台上,采用基于共享内存并行系统的OpenMP多线程技术,综合考虑GPU配置情况及预想故障计算需求,确定CPU线程数目,构建CPU+多GPU异构模式,每块GPU卡内部按照CPU+单GPU模式协同配合完成并行扫描任务,在单个预想故障潮流计算基础上,实现多个开断潮流迭代过程同步并行,通过在元素级细粒度并行层面上完成迭代过程中元素并行计算任务。
  2. 如权利要求1所述的一种CPU+多GPU异构模式静态安全分析计算方法,其中,所述CPU+多GPU异构模式静态安全分析计算方法包括以下步骤:
    步骤1)、获取状态估计实时断面,基于牛顿-拉夫逊算法进行基态潮流计算,提供一个可供复用共享的数据断面;
    步骤2)、根据用户需求,对全网设备进行拓扑扫描,形成预想故障集,针对每个预想故障进行深度拓扑搜索,形成实际开断支路-节点信息;
    步骤3)、初始化CUDA架构,根据多开断并行计算需求将数据进行打包处理,并分配GPU内存空间;
    步骤4)、根据GPU配置情况和实际计算量,对开断计算资源进行合理化分配,评估需要启用的GPU数目,采用OpenMP技术生成相应的线程数,以最大程度发挥GPU并行计算能力;
    步骤5)、在CPU的统筹下,潮流计算迭代过程,包括导纳阵修正雅克比矩阵求解、修正方程求解、状态量更新、支路潮流计算及设备越限校核,全部交由GPU并行完成,所有开断故障迭代高度并行,根据每一步计算任务的并行特性构造相应的内核函数,在细粒度并行层面上完成迭代过程中元素并行计算任务;
    步骤6)、判断预想故障是否全部扫描完成,如果否,则转入步骤4),对剩余的开断故障重新进行资源分配,如果全部完成,则转入步骤7);
    步骤7)、结果展示,根据扫描结果,对故障开断造成的设备或断面越限重载信息进行展示,并根据调度系统静态安全分析模块实用化需求进行扫描结果统计。
  3. 如权利要求2所述的一种CPU+多GPU异构模式静态安全分析计算方法,其中,所述步骤4)具体包括:
    步骤(41)、根据开断计算需求和GPU配置情况,优化分配开断计算资源,
    评估单块GPU单次最大扫描开断数:
    Figure PCTCN2017093226-appb-100001
    其中,S_max为GPU卡总内存空间,S为单个潮流计算所需内存大小,M_max为单块GPU卡所能计算的最大开断数;
    根据式(1),计算所需启动的GPU卡数目为:
    Figure PCTCN2017093226-appb-100002
    其中,α表示为取整,n为本轮所需启动的GPU卡数,N为系统配置的GPU卡总数,M_cal为本轮静态安全分析所需计算的开断总数,M_max·N为多GPU一次计算开断能力;
    根据式(2),按照平均分配原则,每块卡实际计算开断数:
    Figure PCTCN2017093226-appb-100003
    当M_max·N≤M_cal,意味着GPU全部参与计算无法一次完成所有开断计算,则将剩余开断M_cal′重新按照式(2-3)进行新一轮分配,分批多 次计算,计算公式为:
    M_cal′=M_cal-M_max·N     (4)
    步骤(42)、在共享复用的基态断面模型数据基础上,基于OpenMP多线程技术根据实际运行GPU卡数目n分配相应的CPU线程数,每个CPU线程与单个GPU唯一对应,构建CPU+多GPU异构模式,在单个GPU卡内按照CPU+单GPU异构模式进行并行运算。
  4. 如权利要求2所述的一种CPU+多GPU异构模式静态安全分析计算方法,其中,所述步骤5)包括以下步骤:
    步骤(51)、在基态导纳阵Y0基础上,根据N-1开断支路-节点信息,定义内核函数fun_kernel_1,对分配至该卡的开断故障同时进行节点导纳阵局部修正工作,并修改节点注入信息,通过导纳阵和节点信息的变化来模拟设备开断;
    步骤(52)、根据步骤(51)形成开断导纳阵Ym,完成雅克比矩阵元素并行计算任务,为便于调用CUDA内核函数,潮流计算雅克比分块矩阵中各元素采用如下计算公式:
    Figure PCTCN2017093226-appb-100004
    Figure PCTCN2017093226-appb-100005
    Figure PCTCN2017093226-appb-100006
    Figure PCTCN2017093226-appb-100007
    其中,Gii,Bii,Gij,Bij分别为导纳阵非零元,θij为i,j节点相角差,Pi,Qi为i节点注入功率,Vi,Vj分别为i,j节点电压幅值;
    分块矩阵各元素均为状态量节点电压相角,导纳阵元素及三角函数的 四则基本运算且求解过程互不影响,具有明显的并行特性;
    分别定义4个内核函数对应计算分块矩阵中各元素值,单卡内,m个开断故障雅克比H矩阵计算量:
    h=m×hnozero    (9)
    其中,hnozero为H阵非零元数,m为开断数;
    在CUDA架构中,GPU具有多个SM流处理器,可提供多个线程调用。h个元素之间没有任何耦合关系,具有高度的元素级细粒度并行特性,按式(5)定义内核函数fun_kernel_2_H完成式向量乘和向量加操作,CPU调用fun_kernel_2_H函数,GPU根据fun_kernel_2_H传入函数参数启用h个GPU多线程,同时完成式(5)并行操作;
    步骤(53)、根据步骤(51)形成的开断导纳阵Ym和初始节点注入量,按照式(10)进行节点注入功率残差计算:
    Figure PCTCN2017093226-appb-100008
    其中,Pis,Qis分别为节点i的有功、无功注入量,由上式可以看出,各节点注入量残差均为状态量节点电压相角,导纳阵元素及三角函数的四则基本运算,与其他节点功率残差求解过程互不依赖,定义fun_kernel_3内核函数,启用流处理器SM中多个线程,完成m个开断故障所有计算节点功率不平衡量的并行计算任务,详细并行计算过程与步骤(52)相似;
    步骤(54)、检查功率残差是否满足收敛判断依据,如果满足,则跳转至步骤(58),如果不满足,则继续转入步骤(55)进行迭代;
    步骤(55)、对步骤(52)所形成的雅克比矩阵进行LU分解,结合步骤(53)计算的节点功率残差进行线性方程组求解,定义fun_kernel_5内核函数,对线性方程组进行任务级并行求解;
    步骤(56)、定义为一个加运算的内核函数fun_kernel_6,对于m个预 想故障并行任务,共需要m×(2(n-1)-r)个线程同步执行fun_kernel_6完成一次状态量更新计算;
    步骤(57)、跳转至步骤(52)继续迭代;
    步骤(58)、根据当前收敛的开断故障潮流计算结果,进行支路潮流计算,由于支路潮流仅和支路参数及两侧节点电压值相关,互不依赖,定义fun_kernel_7内核函数,完成支路潮流并行计算;
    步骤(59)、根据步骤(58)支路潮流计算结果,对各支路或稳定断面进行重载越限校验,并保存当前所有开断故障造成的重载越限结果,定义fun_kernel_8内核函数,完成支路或稳定断面重载越限校验并行计算。
  5. 如权利要求4所述的一种CPU+多GPU异构模式静态安全分析计算方法,其中,所述步骤5)的第(54)步具体为:
    首先,定义fun_kernel_4内核函数,GPU对各开断节点功率残差按式(11)进行收敛检查如果满足收敛条件则记录该开断故障潮流收敛;
    ||ΔPt,ΔQt||<ε t≤T    (11)
    其中,ε为功率收敛判据,ΔPt,ΔQt分别为迭代t次的功率偏差,T为最大迭代次数;
    再次,由于各开断故障收敛迭代步数不一致,设定k≥80%开断故障已收敛,则结束潮流迭代过程,否则所有开断故障继续进行潮流迭代,直至满足k≥80%收敛条件或者达到最大迭代次数,跳转至步骤(58)。
  6. 如权利要求4所述的一种CPU+多GPU异构模式静态安全分析计算方法,其中,所述步骤5)的第(56)步定义为一个加运算的内核函数fun_kernel_6为:
    Figure PCTCN2017093226-appb-100009
    式中,
    Figure PCTCN2017093226-appb-100010
    分别为线性方程组第i个潮流方程式状态向量第k次迭代前、迭代后值,
    Figure PCTCN2017093226-appb-100011
    为第k次修正值,n为系统节点数,r为PV节点数。
  7. 一种CPU+多GPU异构模式静态安全分析计算装置,所述装置包括:
    存储器,用于存储计算机程序;
    处理器,与所述存储器连接,用于通过执行所述计算机程序,实现权利要求1至6任一项所述的CPU+多GPU异构模式静态安全分析计算方法。
  8. 一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行之后,能够实现权利要求1至6任一项所述的CPU+多GPU异构模式静态安全分析计算方法。
PCT/CN2017/093226 2017-01-19 2017-07-17 一种静态安全分析计算方法、装置及计算机存储介质 WO2018133348A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710037124.0 2017-01-19
CN201710037124.0A CN106874113A (zh) 2017-01-19 2017-01-19 一种cpu+多gpu异构模式静态安全分析计算方法

Publications (1)

Publication Number Publication Date
WO2018133348A1 true WO2018133348A1 (zh) 2018-07-26

Family

ID=59158723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/093226 WO2018133348A1 (zh) 2017-01-19 2017-07-17 一种静态安全分析计算方法、装置及计算机存储介质

Country Status (2)

Country Link
CN (1) CN106874113A (zh)
WO (1) WO2018133348A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109634830A (zh) * 2018-12-19 2019-04-16 哈尔滨工业大学 一种基于多特征耦合的cuda程序一体化性能预测方法
CN109711554A (zh) * 2018-09-07 2019-05-03 天翼电子商务有限公司 一种基于基础设施大数据的应用弹性管理装置
CN110718919A (zh) * 2019-09-25 2020-01-21 北京交通大学 基于gpu加速的大电网静态安全分析故障筛选的方法

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874113A (zh) * 2017-01-19 2017-06-20 国电南瑞科技股份有限公司 一种cpu+多gpu异构模式静态安全分析计算方法
CN109871559B (zh) * 2017-12-04 2023-05-02 广东电网有限责任公司电力调度控制中心 一种电力系统信息故障扫描的分析方法
CN108897616B (zh) * 2018-06-04 2021-08-24 四川大学 基于并行运算的非下采样轮廓波变换优化方法
CN108879691B (zh) * 2018-06-21 2020-09-04 清华大学 一种大规模连续潮流计算的方法及装置
CN109038543B (zh) * 2018-06-27 2021-10-15 国网辽宁省电力有限公司 一种基于cpu+gpu混合异构的状态估计计算方法
CN108804386B (zh) * 2018-07-09 2022-03-29 东北电力大学 一种电力系统负荷裕度的并行化计算方法
CN109167354B (zh) * 2018-10-08 2022-02-22 国网天津市电力公司电力科学研究院 一种基于文件交换的电网预想故障并行分析计算方法
CN109167357A (zh) * 2018-10-18 2019-01-08 国网山东省电力公司泰安供电公司 一种优化电网静态安全分析时间的方法
CN109388496A (zh) * 2018-11-01 2019-02-26 北京视甄智能科技有限公司 一种基于多gpu卡的图像并发处理方法、装置及系统
CN111355231A (zh) * 2018-12-24 2020-06-30 中国电力科学研究院有限公司 一种配电网拓扑辨识方法及系统
CN109857564A (zh) * 2019-03-05 2019-06-07 上海交通大学 基于细粒度的gpu的资源管理方法及其应用的gpu
CN110175775A (zh) * 2019-05-24 2019-08-27 浙江大学 基于图形处理器和中央处理器协同架构的大规模病态潮流计算方法
CN110543711B (zh) * 2019-08-26 2021-07-20 中国原子能科学研究院 一种数值堆热工水力子通道模拟的并行实现与优化方法
CN111478333B (zh) * 2020-04-14 2021-11-30 广东电网有限责任公司广州供电局 一种提升配电网灾后恢复用并行静态安全分析方法
CN111930471B (zh) * 2020-08-14 2023-05-26 中国科学院上海高等研究院 一种基于gpu的并行仿真评估选择方法
CN112083956B (zh) * 2020-09-15 2022-12-09 哈尔滨工业大学 一种面向异构平台的复杂指针数据结构自动管理系统
CN117687779B (zh) * 2023-11-30 2024-04-26 山东诚泉信息科技有限责任公司 基于异构多核计算平台的复杂电波传播预测快速计算方法
CN117555695B (zh) * 2024-01-10 2024-05-14 深圳本贸科技股份有限公司 一种基于并行计算实现异构计算的优化方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103677761A (zh) * 2013-12-11 2014-03-26 中国资源卫星应用中心 一种卫星遥感数据快速处理系统
CN103885040A (zh) * 2014-03-29 2014-06-25 江西理工大学 一种基于cpu-gpu异构计算的圆迹合成孔径雷达回波生成方法
CN105576648A (zh) * 2015-11-23 2016-05-11 中国电力科学研究院 一种基于gpu-cpu异构计算平台的静态安全分析双层并行方法
CN106157176A (zh) * 2016-07-26 2016-11-23 东南大学 一种gpu加速的电力潮流雅可比矩阵的lu分解方法
CN106874113A (zh) * 2017-01-19 2017-06-20 国电南瑞科技股份有限公司 一种cpu+多gpu异构模式静态安全分析计算方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103677761A (zh) * 2013-12-11 2014-03-26 中国资源卫星应用中心 一种卫星遥感数据快速处理系统
CN103885040A (zh) * 2014-03-29 2014-06-25 江西理工大学 一种基于cpu-gpu异构计算的圆迹合成孔径雷达回波生成方法
CN105576648A (zh) * 2015-11-23 2016-05-11 中国电力科学研究院 一种基于gpu-cpu异构计算平台的静态安全分析双层并行方法
CN106157176A (zh) * 2016-07-26 2016-11-23 东南大学 一种gpu加速的电力潮流雅可比矩阵的lu分解方法
CN106874113A (zh) * 2017-01-19 2017-06-20 国电南瑞科技股份有限公司 一种cpu+多gpu异构模式静态安全分析计算方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LU , JUANJUAN ET AL.: "Design and Application of Parallel Static Security Analysis Based on GPU", JOURNAL OF FRONTIERS OF COMPUTER SCIENCE & TECHNOLOGY, 31 December 2016 (2016-12-31), pages 79 - 83 *
ZHOU, GAN ET AL.: "A Novel GPU-Accelerated Strategy for Contingency Screening of Static Security Analysis", ELECTRICAL POWER AND ENERGY SYSTEMS, vol. 83, 31 December 2016 (2016-12-31), pages 33 - 39, XP029641940 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711554A (zh) * 2018-09-07 2019-05-03 天翼电子商务有限公司 一种基于基础设施大数据的应用弹性管理装置
CN109634830A (zh) * 2018-12-19 2019-04-16 哈尔滨工业大学 一种基于多特征耦合的cuda程序一体化性能预测方法
CN109634830B (zh) * 2018-12-19 2022-06-07 哈尔滨工业大学 一种基于多特征耦合的cuda程序一体化性能预测方法
CN110718919A (zh) * 2019-09-25 2020-01-21 北京交通大学 基于gpu加速的大电网静态安全分析故障筛选的方法

Also Published As

Publication number Publication date
CN106874113A (zh) 2017-06-20

Similar Documents

Publication Publication Date Title
WO2018133348A1 (zh) 一种静态安全分析计算方法、装置及计算机存储介质
Wang et al. Distributed optimization approaches for emerging power systems operation: A review
Van Werkhoven et al. Performance models for CPU-GPU data transfers
CN106339351B (zh) 一种sgd算法优化系统及方法
CN107085562B (zh) 一种基于高效复用数据流的神经网络处理器及设计方法
CN106250349A (zh) 一种高能效异构计算系统
CN103617150A (zh) 一种基于gpu的大规模电力系统潮流并行计算系统及其方法
CN105389772A (zh) 基于图形处理器的数据处理方法和装置
CN103365727A (zh) 一种云计算环境中的主机负载预测方法
CN112948123B (zh) 一种基于Spark的网格水文模型分布式计算方法
CN109272110A (zh) 基于光子神经网络芯片的光电融合智能信号处理系统
CN103413273A (zh) 一种基于gpu加速实现图像复原处理方法
Kim et al. Efficient large-scale deep learning framework for heterogeneous multi-GPU cluster
NL2023815B1 (en) Numerical simulation method for unstructured grid tides and tidal currents based on gpu computation technology
Zhong et al. Parallel graph processing on graphics processors made easy
Dinavahi et al. Parallel dynamic and transient simulation of large-scale power systems: A high performance computing solution
Wu et al. Hierarchical task mapping of cell-based AMR cosmology simulations
CN115345285A (zh) 基于gpu的时序图神经网络训练方法、系统及电子设备
CN103051473B (zh) 一种网络运维保障效果仿真方法及系统
Khaitan et al. Parallelizing power system contingency analysis using D programming language
CN109992860A (zh) 基于gpu的电磁暂态并行仿真方法和系统
Yao et al. Towards edge-enabled distributed computing framework for heterogeneous android-based devices
Nesi et al. Summarizing task-based applications behavior over many nodes through progression clustering
Wang et al. Design and Key Algorithm Research of High Performance Power Big Data Server Software Intelligent System
Li et al. petaPar: a scalable Petascale framework for meshfree/particle simulation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17892956

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17892956

Country of ref document: EP

Kind code of ref document: A1