CN110635476B

CN110635476B - Knowledge migration-based cross-regional interconnected power grid dynamic scheduling rapid optimization method

Info

Publication number: CN110635476B
Application number: CN201910932990.5A
Authority: CN
Inventors: 唐昊; 金国平; 吕凯; 王珂; 王刚; 杨胜春
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2021-01-15
Anticipated expiration: 2039-09-29
Also published as: CN110635476A

Abstract

The invention discloses a knowledge migration-based method for rapidly optimizing dynamic scheduling of a trans-regional interconnected power grid, which comprises the steps of firstly, in a source task pre-learning stage, storing an optimal knowledge matrix after optimization of each source task into a knowledge base as experience knowledge; then, in a target task learning stage, a source task with the highest similarity to the target task is obtained, and the optimal knowledge matrix is migrated to obtain an initial knowledge matrix of the target task, so that the target task is quickly optimized; and finally, storing the optimal knowledge matrix of the target task as experience knowledge in a knowledge base. Under the obtained strategy, the scheduling mechanism can select a reasonable action scheme according to the actual running state of the power grid at the scheduling moment, so as to realize the dynamic scheduling of the cross-region interconnected power grid. The mechanism of layered learning and knowledge migration in the invention can avoid the problem of dimension disaster of reinforcement learning to a certain extent, accelerate the convergence speed of the algorithm and promote the rapid solution of the scheduling strategy.

Description

Knowledge migration-based cross-regional interconnected power grid dynamic scheduling rapid optimization method

Technical Field

The invention belongs to the field of cross-region interconnected power grid dispatching, and particularly relates to a knowledge migration-based cross-region interconnected power grid dynamic dispatching rapid optimization method.

Background

The cross-regional power grid interconnection is one of important means for realizing the national optimal allocation of resources and improving the utilization efficiency, the cross-provincial and cross-regional interconnected power grids are constructed, the various benefits of surplus and shortage conditioning, resource optimal allocation, standby sharing, accident support and the like of a large power grid can be fully exerted, and the consumption level of new energy can be greatly improved.

The existing research on the joint optimization of the junctor between areas and the units in the areas of the cross-regional interconnected power grid system is few, although some researches apply reinforcement learning to the solution of the junctor transmission plan of the cross-regional interconnected power grid, the problem of 'dimension disaster' caused by the continuous expansion of the problem scale is not considered. In addition, the traditional reinforcement learning method considers that different learning tasks are independent of each other, and needs to perform re-modeling and re-solving aiming at different tasks, but in fact, different learning tasks are often related to each other.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method for quickly optimizing the dynamic scheduling of a trans-regional interconnected power grid based on knowledge migration. By utilizing a layered Q learning algorithm, a huge knowledge matrix required by original cooperative scheduling is decomposed into a plurality of smaller knowledge matrices, so that the number of state action pairs can be reduced, and the problem of dimension disaster is avoided to a certain extent. In addition, the invention provides a method for measuring the similarity between the source scheduling task and the target scheduling task by digging the relation between the tasks, and provides a mechanism of knowledge migration based on the method, so that the learning of the target task can be accelerated by using the past learning experience, the convergence speed of the algorithm is accelerated, and the learning cost is reduced.

The invention adopts the following technical scheme for solving the technical problems:

a method for quickly optimizing the dynamic scheduling of a trans-regional interconnected power grid based on knowledge migration is carried out according to the following steps:

step 1, in a multi-region cross-regional interconnected power grid connected by a direct-current tie line, assuming that a wind power output power predicted value of a region z at any time t in a dispatching day is

The predicted value of the photovoltaic output power is

The predicted value of the total power demand of the load is

Step 2, determining the actual value of wind power output in the region z

At time t relative to the predicted value

State class of output deviation power

Step 3, determining the actual value of the photovoltaic power generation output in the area z

At time t relative to the predicted value

State class of output deviation power

Step 4, determining the actual power P of the load demand at the moment t in the area z before DLC implementation_l ^z,tRelative to predicted value

State level of load demand deviation power

And load power cut-off level within decision period k at DLC implementation

The load demand power of the area z at any time t in the decision period k after the DLC is implemented

Can be characterized by formula (1):

wherein the content of the first and second substances,

power is removed for the DLC load in decision period k for region z,

DLC load bounce power for region z within decision period k;

step 5, determining the power adjustment level of I, II-type thermal generator sets in the area z at the time t to be

And real-time generated power rating

Obtaining the real-time generating power of the III-class thermal power generating unit through a regional power balance formula;

step 6, determining the power adjustment level of the cross-regional interconnected network inter-regional tie line l at the moment t

And power transmission class

Step 7, determining the cross-region interconnected power grid system at decision time t_kThe upper and lower layer states and actions, the upper layer state can be characterized by formula (2):

wherein the content of the first and second substances,

deciding time t for region z_kThe status information of the state,

determining a time t for a DC link_kA lower power level; z is the total number of the cross-regional interconnected power grid system regions, and L is the total number of the inter-region tie lines. The upper layer action may be characterized by formula (3):

the lower layer state can be characterized by formula (4):

wherein L is_zFor the total number of tie lines connected to the zone z in the cross-regional interconnected grid system,

for direct current links l connected to the zone z_zThe transmission power level of. The underlying action may be characterized by equation (5):

step 8, determining the total cost generated by the upper layer and the lower layer of the cross-region interconnected power grid system in the decision period k

Determining the optimization targets of the upper layer and the lower layer of the system;

step 9, performing pre-learning on the source task by adopting a hierarchical Q learning algorithm, which comprises the following specific steps:

9.1, initializing the upper knowledge matrix Q of the cross-region interconnected power grid system_upAnd underlying knowledge matrices of regions

9.2, initializing system model parameters and learning parameters;

step 9.3, initializing the current learning step number m to be 0, and the current decision period k to be 0;

step 9.4, determining the upper-layer state of the system at the current decision time

Step 9.5, the upper layer is according to Q_upAnd greedy strategy, selecting decision time t_kAct of

Step 9.6, lower zone z receives upper actions

Determining the decision time t_kState of (1)

Step 9.7, lower zone z is based on

And greedy strategy, selecting decision time t_kAct of

Step 9.8, calculating the cost of the lower layer region z in the decision period k

Simultaneous update of knowledge matrix of underlying region z

Step 9.9, the cost of each lower layer in the decision period k

Feeding back to the upper layer, and calculating to obtain the total cost of the upper layer

Updating the upper knowledge matrix Q_up；

Step 9.10, making k ═ k + 1; if K is less than the total number K of the decision periods, returning to the step 9.4; otherwise, making k equal to 0;

step 9.11, making m: ═ m + 1; if M is less than the total learning step number M, updating the learning rate and returning to the step 9.4; otherwise, ending the program, and storing the source load power prediction information of the source task and the optimal knowledge matrix in the step 9 into a knowledge base as experience knowledge;

step 10: and (3) quickly optimizing the target task by adopting a knowledge migration-based hierarchical Q learning algorithm:

step 10.1, defining the net load prediction power as a similarity element, calculating the similarity distance between the net load prediction power of the target task and the net load prediction power of the source task in each area, and measuring the similarity between the target task and the source task according to the similarity distance;

step 10.2, the source task with the minimum similar distance to the target task is used for migration, each knowledge matrix of the target task is initialized, and then an upper knowledge matrix Q of the target task_upAnd the lower layer region z knowledge matrix

Can be characterized by the formulae (6), (7):

wherein the content of the first and second substances,

respectively an upper-layer optimal knowledge matrix and a lower-layer region z optimal knowledge matrix of the minimum similar distance source task;

and step 10.3, initializing model parameters and learning parameters of the cross-region interconnected power grid scheduling optimization target task, and realizing rapid optimization of the target task, wherein the steps are the same as the steps 9.3-9.11, and are not repeated.

The method for rapidly optimizing the dynamic scheduling of the cross-regional interconnected power grid based on knowledge migration is characterized in that in the step 10.1, the similarity distance between the source task and the target task is calculated according to the following steps:

step 1, reflecting difference information between specific numerical values of a time sequence by using Euclidean distance:

in the operation of a power grid, in consideration of uncertainty of loads in a region and intermittent randomness of new energy power generation, a concept of net load is introduced, intermittent new power generation is considered as reversed load, namely the net load in the region is the total load minus the total new energy power generation output, and in a source task psi and a target task phi, the net load predicted power of a region z at a time t can be respectively represented as an equation (8) and an equation (9):

wherein the content of the first and second substances,

respectively predicting the net load power of the region z in the source task psi and the target task phi at the time t;

respectively in the central region of the source task psiThe load demand predicted power, the wind power predicted power and the photovoltaic power generation predicted power of the domain z at the moment t;

respectively predicting the load demand predicted power, the wind power predicted power and the photovoltaic power generation predicted power of the region z in the target task phi at the moment t;

are respectively paired

Sampling is performed assuming a time series length of N_sIf the sampling interval Δ T is equal to T/N_sTo obtain two time sequences

Characterized by the formulae (10), (11), respectively:

time series

And

the euclidean distance between can be characterized by equation (12):

step 2, reflecting time series trend and fluctuation information by using the dynamic time bending distance:

respectively carrying out power derivative functions on an endogenous task psi and a target task phi net load in a z region at a sampling interval delta t

Sampling to obtain two time sequences

Can be characterized by formulas (13), (14), respectively:

construction of N_s×N_sOf (a), the elements within the matrix being characterized by equation (15):

the set of each set of adjacent elements in matrix Γ is referred to as a curved path, denoted as H ═ H₁,…,h_s,…,h_mWhere m is the total number of elements in the path, element h_sIs the coordinates of the s-th point on the path. The objective of the dynamic time warping algorithm is to find an optimal warped path, such that the sequence

And

is minimized and can be characterized by equation (16):

wherein

For minimum total cost of bending, i.e. time series

And

dynamic time warping distance between;

step 3, calculating the similar distance between the target task and the source task based on the Euclidean distance and the dynamic time bending distance of the net load prediction power of the target task and the source task in each region

Can be characterized by formula (17):

wherein λ is_e、λ_dThe weighting coefficients of the euclidean distance and the dynamic time warping distance, respectively.

Aiming at the problem of cross-region interconnected power grid scheduling, the machine learning algorithm is applied to the field of power scheduling optimization, an intelligent solution can be provided for power scheduling, and economic and environment-friendly operation of a power grid is realized. Compared with the prior art, the invention has the beneficial effects that:

1. aiming at the problem of cross-regional interconnected power grid scheduling, the randomness of both sides of a source load is considered, a flexible load is used as a schedulable resource for collaborative optimization, and a strategy is solved through a Q learning algorithm;

2. the invention adopts a layered Q learning algorithm, reduces the scale of the knowledge matrix and can avoid dimension disaster to a certain extent;

3. the invention adopts a knowledge migration mechanism, utilizes the past learning experience, accelerates the learning optimization of the target task, can accelerate the convergence speed of the algorithm and reduces the learning cost.

Drawings

Fig. 1 is a schematic diagram of a cross-regional interconnected power grid system architecture according to the present invention;

fig. 2 is an algorithm flowchart for solving the problem of dynamic scheduling of the cross-regional interconnected power grid according to the present invention.

Detailed Description

The method for optimizing the dynamic scheduling of the cross-region interconnected power grid in the embodiment is applied to a cross-region interconnected power grid system shown in fig. 1, and comprises the following steps: conventional generator sets, photovoltaic generator sets, wind turbine sets, rigid loads, flexible loads and direct current connecting lines connecting the regions in each region; the dispatching mechanism obtains the output condition and the power requirement of each unit of the trans-regional interconnected power grid through the detection and communication equipment at the decision time, and selects the optimal action according to the strategy obtained by the dynamic dispatching optimization method of the trans-regional interconnected power grid to adjust the output power of the conventional generator set, adjust the transmission power of the direct-current connecting line and reduce the flexible load requirement, so that the operation benefit of the trans-regional interconnected power grid system is improved.

Referring to fig. 2, the method for optimizing the dynamic scheduling of the cross-regional interconnected power grid in this embodiment is performed according to the following steps:

The predicted value of the photovoltaic output power is

The predicted value of the total power demand of the load is

Step 2, actual wind power output value in the region z is compared with

At time t relative to the predicted value

Is dispersed as

In total

The state grade of the wind power output deviation power of the region z at the moment t is

Step 3, outputting the actual value of the photovoltaic power generation output in the area z

At time t relative to the predicted value

Is dispersed as

In total

The state grade of the photovoltaic output deviation power of the area z at the moment t is

Step 4, before DLC implementation, the load demand actual power P at the time t in the area z_l ^z,tRelative to predicted value

Is dispersed as

In total

The state level of the load demand deviation power of the region z at the time t is

In DLCIn practice, the DLC load demand in decision period k in region z is adjusted

Is dispersed into

In total

The individual state class, the load power cut-off class in the decision period k in the region z is

Can be characterized by formula (1):

wherein the content of the first and second substances,

power is removed for the DLC load in decision period k for region z,

DLC load bounce power for region z within decision period k;

step 5, dispersing the power change interval of the I-type thermal generator set in the region z within the climbing restriction limit range into

In total

The power regulation grade of the I-type thermal generator set at the moment t is

The allowable output power range of the I-type thermal generator set is dispersed into

In total

The generated power grade of the I-type thermal generator set at the time t is

Similarly, the power change interval of the II-type thermal generator set in the region z within the climbing constraint limit is dispersed into

In total

The power regulation grade of the individual state grade and the class II thermal generator set at the moment t is

The allowable output power range of the II-type thermal generator set is dispersed into

In total

The power generation power grade of the II-type thermal generator set at the moment t is

step 6, dispersing the power change interval of the cross-regional interconnected power grid inter-regional tie line in one period into

In total

Individual state level, power regulation level of tie line at time t is

The power range allowed to be transmitted by the DC link is dispersed into

In total

A state class of transmission power of the link l at time t

wherein the content of the first and second substances,

deciding time t for region z_kThe status information of the state,

the lower layer state can be characterized by formula (4):

And determining the optimization targets of the upper layer and the lower layer of the system:

the total cost generated by the lower layer region z of the system in the decision period k

Compensation costs including DLC loading

Operating costs of thermal power generating units

Wind and light abandoning cost

Peak to valley difference cost

And workRate balancing constraint cost

The optimization target of the lower layer area z of the system is to find an optimal strategy on the basis of a given junctor transmission plan

Minimizing the daily operating costs of zone z;

cost generated by the upper layer of the system in a decision period k

The sum of the costs generated for the regions of the lower layer is optimized by finding an optimal strategy

The daily operating cost of the upper layers of the system is minimized,

9.2, initializing system model parameters and learning parameters;

Step 9.6,Lower zone z receives upper actions

Determining the decision time t_kState of (1)

Step 9.7, lower zone z is based on

And greedy strategy, selecting decision time t_kAct of

Simultaneous update of knowledge matrix of underlying region z

Step 9.9, the cost of each lower layer in the decision period k

Updating the upper knowledge matrix Q_up；

Can be characterized by the formulae (6), (7):

wherein the content of the first and second substances,

In a specific implementation, the calculation of the similarity distance in the step 10.1 is performed according to the following steps:

wherein the content of the first and second substances,

respectively predicting load demand predicted power, wind power predicted power and photovoltaic power generation predicted power of an area z in a source task psi at the moment t;

are respectively paired

Characterized by the formulae (10), (11), respectively:

time series

And

the euclidean distance between can be characterized by equation (12):

Sampling to obtain two time sequences

Can be characterized by formulas (13), (14), respectively:

And

is minimized and can be characterized by equation (16):

wherein

For minimum total cost of bending, i.e. time series

And

dynamic time warping distance between;

Can be characterized by formula (17):

The method can effectively deal with the randomness of new energy and load requirements in the cross-region interconnected power grid, ensure the safe and economic operation of the cross-region interconnected power grid, avoid the problem of dimension disaster of reinforcement learning to a certain extent by a mechanism of layered learning and knowledge migration, accelerate the convergence speed of the algorithm and promote the rapid solution of a scheduling strategy.

Claims

1. A method for quickly optimizing the dynamic scheduling of a trans-regional interconnected power grid based on knowledge migration is characterized by comprising the following steps:

The predicted value of the photovoltaic output power is

The predicted value of the total power demand of the load is

Step 2, determining the actual value of wind power output in the region z

At time t relative to the predicted value

State class of output deviation power

At time t relative to the predicted value

State class of output deviation power

State level of load demand deviation power

And load power cut-off level within decision period k at DLC implementation

Can be characterized by formula (1):

wherein the content of the first and second substances,

power is removed for the DLC load in decision period k for region z,

DLC load bounce power for region z within decision period k;

And real-time generated power rating

And obtaining the real-time generating power of the III-class thermoelectric generator set through a regional power balance formula

And power transmission class

wherein the content of the first and second substances,

deciding time t for region z_kThe status information of the state,

determining a time t for a DC link_kA lower power level; z is the total number of the cross-regional interconnected power grid system regions, and L is the total number of the inter-region tie linesThe number of the particles; the upper layer action may be characterized by formula (3):

the lower layer state can be characterized by formula (4):

for direct current links l connected to the zone z_zThe transmission power level of; the underlying action may be characterized by equation (5):

9.2, initializing system model parameters and learning parameters;

Step 9.6, lower zone z receives upper actions

Determining the decision time t_kState of (1)

Step 9.7, lower zone z is based on

And greedy strategy, selecting decision time t_kAct of

Simultaneous update of knowledge matrix of underlying region z

Step 9.9, the cost of each lower layer in the decision period k

Feeding back to the upper layer, and calculating to obtain the upper layerTotal cost

Updating the upper knowledge matrix Q_up；

Can be characterized by the formulae (6), (7):

wherein the content of the first and second substances,

an upper layer optimal knowledge matrix and a lower layer area which are respectively a minimum similar distance source taskA domain z optimal knowledge matrix;

and 10.3, initializing model parameters and learning parameters of the cross-region interconnected power grid scheduling optimization target task, and realizing rapid optimization of the target task, wherein the steps are the same as the steps 9.3-9.11.

2. The knowledge transfer-based method for dynamically scheduling and rapidly optimizing the trans-regional interconnected power grid according to claim 1, which comprises the following steps:

the calculation of the similarity distance in step 10.1 is performed as follows:

wherein the content of the first and second substances,

are respectively paired

Characterized by the formulae (10), (11), respectively:

time series

And

the euclidean distance between can be characterized by equation (12):

Sampling to obtain two time sequences

Can be characterized by formulas (13), (14), respectively:

the set of each set of adjacent elements in matrix Γ is referred to as a curved path, denoted as H ═ H₁,…,h_s,…,h_mWhere m is the total number of elements in the path, element h_sCoordinates of the s-th point on the path; the objective of the dynamic time warping algorithm is to find an optimal warped path, such that the sequence

And

is minimized and can be characterized by equation (16):

wherein

For minimum total cost of bending, i.e. time series

And

dynamic time warping distance between;

Can be characterized by formula (17):