CN113541192A

CN113541192A - Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning

Info

Publication number: CN113541192A
Application number: CN202110850832.2A
Authority: CN
Inventors: 李辉; 谭宏涛; 周芷汀; 郑杰; 彭瀚峰; 王嘉瑶; 青和; 向学位; 姚然; 全瑞坤
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2021-07-27
Filing date: 2021-07-27
Publication date: 2021-10-22

Abstract

The invention relates to an offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning, and belongs to the technical field of wind power generation. The method comprises the following steps: s1: establishing an offshore wind plant reactive power-voltage coordination control model; s2: establishing a Markov decision process model based on a wind power plant reactive-voltage control strategy, and defining a state action and a reward function of the system; s3: training a reactive power-voltage coordination control model based on a depth certainty gradient strategy and the random output data of the wind turbine generator set to realize the mapping from the state of the wind turbine generator set to a reactive power instruction; s4: an offshore wind power plant reactive-voltage coordination control strategy online deployment process. Compared with the traditional method, the method does not depend on accurate modeling of the system, can effectively reduce the voltage deviation of the wind power plant, and has better model solving precision and instant response speed.

Description

Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning

Technical Field

The invention belongs to the technical field of wind power generation, and relates to an offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning.

Background

At present, an offshore wind farm is mainly based on an alternating current transmission grid-connected mode, a large amount of charging power is generated due to the capacitance effect of an alternating current submarine cable, the voltage at the tail end of the cable is raised, and the out-of-limit risk of the voltage of a wind turbine generator is higher when the wind turbine generator is closer to the tail end of a feeder line. In addition, the reactive compensation equipment is difficult to install in severe offshore environment, so that the reactive-voltage regulation control function of the wind turbine generator is more prominent, and the wind turbine generator has higher economy. The single unit has limited regulating capacity, and geographical positions and operating states of the units are different, so that how to comprehensively plan the operating states of all the units to realize reactive-voltage coordination control is a research hotspot.

Most of the existing methods depend on the structural model parameters of the wind power plant, repeated solution is needed for each new scene, and the instant response capability needs to be further improved. The artificial intelligence method does not depend on detailed modeling of the system and constructs the mapping relation between the operation state of the wind power plant and the decision action through operation data information, and the instant decision response capability of the wind power plant capability management platform is greatly improved.

Based on the above, the offshore wind power plant reactive power-voltage coordination control method based on deep reinforcement learning is provided in consideration of the characteristics of artificial intelligence reinforcement learning and the characteristics of the wind power plant reactive power-voltage coordination control model.

Disclosure of Invention

In view of the above, the present invention provides a method for offshore wind farm reactive-voltage coordination control based on deep reinforcement learning. The method comprises the steps of firstly establishing a reactive power-voltage coordination control model and a Markov decision process model of the offshore wind power plant, then training the reactive power-voltage coordination control model based on a depth certainty gradient strategy and combination of random output data of a wind generation set, realizing mapping from the state of the wind generation set to a reactive power instruction, and entering an online deployment stage when the model training is finished.

In order to achieve the purpose, the invention provides the following technical scheme:

the offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning comprises the following steps:

s1: establishing an offshore wind plant reactive power-voltage coordination control model;

s2: establishing a Markov decision process model based on a wind power plant reactive-voltage coordination control strategy, and defining the state, action and reward function of the system;

s3: training a reactive power-voltage coordination control model based on a depth certainty gradient strategy and the random output data of the wind turbine generator set to realize the mapping from the state of the wind turbine generator set to a reactive power instruction;

s4: an offshore wind power plant reactive-voltage coordination control strategy online deployment process.

Optionally, the S1 specifically includes the following steps:

s11: constructing a mathematical expression objective function of a wind power plant reactive-voltage coordination control model:

in order to minimize the voltage deviation DeltaV of all nodes of the system_iWhile taking system loss P into account_loss：

In the formula, N and N respectively represent the number of all nodes and the number of wind driven generators in a wind power plant grid-connected system; subscript i denotes the ith node;

for maximum amplitude reduction of the system voltage deviation:

in the formula, V_i ^refThe rated voltage of the node is taken as 1p.u. for the given value of the voltage of the ith node;

the system losses are:

in the formula, G_ijAnd B_ijThe real part and the imaginary part of the admittance between the nodes i and j respectively; theta_ij＝θ_i-θ_jIs the phase angle difference between nodes i and j;

s12: the equality constraint of the mathematical expression of the wind power plant reactive-voltage coordination control model is a node power balance equation of the system, namely a power flow constraint equation:

s13: the wind power plant reactive-voltage coordination control model is mathematically expressed by inequality constraint:

constraint equation of voltage limit values of all nodes of wind power plant:

in the formula, V_i ^minAnd V_i ^maxThe upper and lower limit values of the ith node voltage are respectively;

the output reactive power range constraint equation of the wind turbine generator is as follows:

in the formula, Q_k ^minAnd Q_k ^maxRespectively outputting reactive upper and lower limit values for the kth wind turbine generator set;

the output current constraint equation of the wind power converter is as follows:

in the formula I_k ^maxFor the current upper limit of the kth wind turbine generator set converter, all the units adopt the same converter, and the current upper limit constraints are the same.

Optionally, the S2 specifically includes the following steps:

s21: at time t, the environment is in state S_t＝[P_i,t，Q_i,t，V_i,t，P_loss,t]The agent makes action A according to the state information_t＝[Q_1,t…Q_n,t]；

Wherein, the state space S refers to the state set of the environment, the invention selects the information of active power, idle power, voltage, loss and the like output by the wind turbine as the state input, and the state space S refers to the state set of the environment_t＝[P_i,t，Q_i,t，V_i,t，P_loss,t]In, P_i,t、Q_i,t、V_i,t、P_loss,tRespectively representing active and reactive power output, voltage and system loss of the ith node at the moment t; the action space a refers to an action response set made by the agent, namely a given reactive value of each wind turbine is determined according to the state of the wind farm, A_t＝[Q_1,t…Q_n,t]；

S22: after the environment executes the action, the next state S is entered_t+1＝[P_i,t+1，Q_i,t+1，V_i,t+1，P_loss,t]Setting the reward function as the first target function with negative number in the step S11, taking the five constraint equations in the steps S12 and S13 as the formulas (1) - (5), and setting the punishment machineCalculating the reward obtained:

in the formula, rho is a penalty factor, and the value is far larger than the target function Obj.

Optionally, the S3 specifically includes the following steps:

s31: a Markov decision process formed by wind power plant reactive-voltage coordination control is a continuous space optimization problem, and a good solving effect is obtained based on a depth certainty gradient strategy algorithm (DDPG); consists of an actor-critic Deep Neural Network (DNN);

wherein the operator network is formed by a parameter theta^πComposition, as a policy function, implementing an environmental state s_tTo action a_tThe mapping of the wind power station realizes a strategy function, namely a reactive action instruction is made according to the state of the wind power station; criticc, from parameter θ^QComposition as a function of action-value Q(s)_t,a_t) Evaluating the action of the actor;

s32: the specific process of the reactive power-voltage coordination control strategy offline training needs system initialization at first: randomly generating data in an active power range output by the wind turbine generator; initializing operator-critical network parameters; initializing a sequence buffer R and motion random noise;

s33: experience generation: the intelligent agent is based on the environmental state s of the wind power plant_tMaking a reactive action command a_t：

In the formula, N is random noise introduced to increase randomness of motion output and further enlarge learning accuracy;

entering a next state s after the environment performs the action_t+1And earn a prize r_t：

Same as the formula in S22;

s34: and (3) recovering experience: experience with last step agent interaction with environment(s)_t a_t r_ts_t+1) Storing the data format of the sequence R into a buffer area to form a sequence R;

s35: and (3) policy updating: randomly screening small-batch data M from the sequence buffer R, and updating the operator-critical network parameters by using a gradient descent strategy;

s36: and (3) finishing training: with the continuous interaction between the intelligent agent and the environment, the operator-critical network parameters are gradually converged, the actions which can be made by the intelligent agent are gradually stable, and when the acquired reward tends to be stable and does not obviously change, the intelligent agent is considered to be capable of realizing the mapping of the reactive power-voltage coordination control strategy, and the online deployment stage can be entered after the model training is finished.

Optionally, the S4 specifically includes the following steps:

s41: collecting real-time measured data of a wind power plant;

s42: and determining a reactive-voltage control strategy of the current time interval according to the real-time measured data by utilizing the final completion result of the off-line training.

The invention has the beneficial effects that: the method can effectively reduce the voltage deviation of the wind power plant, does not depend on accurate modeling of a system, and has better model solving precision and instant response speed.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.

Drawings

For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a block diagram of the implementation process of offshore wind farm reactive-voltage coordination control based on deep reinforcement learning according to the present invention;

FIG. 2 is a diagram of an intelligent voltage regulation model structure of a wind farm according to the present invention;

FIG. 3 shows the result of the interaction between the agent and the environment to obtain the reward curve after the reinforcement learning model is trained according to the present invention;

FIG. 4 is a graph of active power output data from a wind farm test when all wind turbines are operating in the maximum power mode according to the present invention;

FIG. 5 is a comparison graph of PCC voltages of the generator sets under different control methods of the present invention;

FIG. 6 is a comparison chart of VPI indexes of the generator set under different control methods of the invention;

FIG. 7 is a diagram of the reactive output waveform of the unit under the sensitivity analysis method of the present invention;

FIG. 8 is a reactive output waveform of the unit under the control method of the present invention;

FIG. 9 is a comparison graph of the optimized target values of the generator set under different control methods of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.

Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.

In the embodiment, a 5MW high-speed permanent magnet wind turbine generator set is selected for modeling, an Inter I7-9700CPU is adopted for training, experiments are carried out on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given.

Fig. 1 shows a specific process of the method of the present invention, and the present invention aims to provide an intelligent reactive-voltage control method for an offshore wind farm. The method comprises the steps of firstly establishing a reactive power-voltage coordination control model and a Markov decision process model of the offshore wind power plant, then training the reactive power-voltage coordination control model based on a depth certainty gradient strategy and combination of random output data of a wind generation set, realizing mapping from the state of the wind generation set to a reactive power instruction, and entering an online deployment stage when the model training is finished, and specifically comprises the following steps:

s2: establishing a Markov decision process model based on a reactive power-voltage coordination control strategy, and defining the state, action and reward function of the system;

Further, in step S1, establishing an offshore wind farm reactive-voltage coordination control model specifically includes:

in order to minimize the voltage deviation DeltaV of all nodes of the system_iWhile taking system loss P into account_loss(per unit value in the present invention):

in the formula, N and N respectively represent the number of all nodes and the number of wind driven generators in a wind power plant grid-connected system; the index i indicates the ith node.

For maximum amplitude reduction of the system voltage deviation:

in the formula, V_i ^refGiven for the ith node voltage, the nominal voltage (1p.u.) of the node is taken.

The system losses are:

in the formula, G_ijAnd B_ijThe real part and the imaginary part of the admittance between the nodes i and j respectively; theta_ij＝θ_i-θ_jIs the phase angle difference between nodes i and j.

constraint equation of voltage limit values of all nodes of wind power plant:

in the formula, V_i ^minAnd V_i ^maxRespectively, the ith node voltage upper and lower limit values.

in the formula, Q_k ^minAnd Q_k ^maxAnd respectively outputting reactive upper and lower limit values for the kth wind turbine generator set.

in the formula I_k ^maxThe current upper limit of the current transformer of the kth wind turbine generator set is set (in the invention, all the generator sets adopt the same current transformer, and the current upper limit is restrained the same).

Further, step S2 is based on the markov decision process model of the coordination control strategy, which specifically includes:

s21: at time t, the environment is in state S_t＝[P_i,t，Q_i,t，V_i,t，P_loss,t]The agent makes action A according to the state information_t＝[Q_1,t…Q_n,t]。

Wherein, the state space S refers to the state set of the environment, the invention selects the information of active power, idle power, voltage, loss and the like output by the wind turbine as the state input, and the state space S refers to the state set of the environment_t＝[P_i,t，Q_i,t，V_i,t，P_loss,t]In, P_i,t、Q_i,t、V_i,t、P_loss,tRespectively representing the active power, the reactive power, the voltage and the system loss of the ith node at the moment t. The action space a refers to an action response set made by the agent, namely a given reactive value of each wind turbine is determined according to the state of the wind farm, A_t＝[Q_1,t…Q_n,t]；

S22: after the environment executes the action, the next state S is entered_t+1＝[P_i,t+1，Q_i,t+1，V_i,t+1，P_loss,t]The reward function is set as the first objective function with negative number in step S11, and the five constraint equations in steps S12 and S13 are given as equations (1) - (5), and the penalty mechanism is set to calculate the reward obtained:

Further, step S3 is to train the reactive-voltage coordination control model based on the depth certainty gradient strategy in combination with the unit random output data, specifically:

s31: a Markov decision process formed by wind power plant reactive-voltage coordination control is a continuous space optimization problem, and a better solving effect can be obtained based on a depth deterministic gradient strategy algorithm (DDPG). The method consists of an actor-critic (actor-critic) Deep Neural Network (DNN).

Wherein the operator network (by parameter theta)^πComposition) as a policy function, implementing an environmental state s_tTo action a_tThe mapping of the wind power station realizes a strategy function, namely a reactive action instruction is made according to the state of the wind power station; criticc (by parameter θ)^QComposition) as a function of action-value (Q(s)_t,a_t) Make action evaluation on the actor.

S32: the specific process of the reactive power-voltage coordination control strategy offline training needs system initialization at first: randomly generating active output data (within the output active range of the wind turbine generator) of the wind turbine generator; initializing operator-critical network parameters; a sequence buffer R and motion random noise are initialized.

In the formula, N is random noise introduced to increase randomness of motion output and further increase learning accuracy.

The formula is the same as the formula in S22;

s34: and (3) recovering experience: experience with last step agent interaction with environment(s)_t a_t r_ts_t+1) The data format of (2) is stored in a buffer to form a sequence R.

S35: and (3) policy updating: randomly screening small-batch data M from the sequence buffer R, and updating the operator-critical network parameters by using a gradient descent strategy, wherein the parameter updating process is schematically shown in FIG. 2.

S37: according to the method for interacting the wind power plant environment and the intelligent agent, the reinforcement learning model is trained. The training period is 10000, each period comprises 100 steps, and the training is carried out by adopting an InterI 7-9700CPU, which takes 32 hours. Training results as shown in fig. 3, the environment is explored by the agent randomly acting during the initial stage of training, and the reward obtained during this stage is low. After 500 training periods, the intelligent agent obtains the reward according to the learning experience and greatly improves, after about 6000 training periods, the intelligent agent obtains the reward gradually and stably, and the model training is converged.

Further, step S4 is to perform an online deployment process of the offshore wind farm reactive-voltage coordination control strategy, specifically:

s41: and collecting real-time measured data of the wind power plant.

In order to verify the effectiveness of the proposed method, three different control methods are used for comparison verification:

1) no additional control;

2) the wind power plant reactive-voltage coordination control method based on deep reinforcement learning has the advantages that the optimization targets are voltage deviation and system loss, and response is realized by deploying an online reinforcement learning model;

3) the reactive-voltage coordination control optimization method based on sensitivity linearization has the same optimization target as that of the method, and an MATLAB is adopted to call a Yalmip solver to solve.

The results are shown in fig. 4 to 9, and the test output data of the wind power plant is shown in fig. 4 when all the wind turbines are set to operate in the maximum power mode. In the unit terminal voltage waveform without reactive-voltage control, the terminal voltage range of the unit is [1.012,1.015] p.u., the average value is 1.0136p.u., it can be seen that the voltage of the feeder line head end unit exceeds the rated value, and the closer to the tail row unit terminal voltage, the higher the voltage out-of-limit risk exists. The terminal voltage range of the generator set under the sensitivity method is [1.0005,1.003] p.u., and the average value is 1.0017 p.u.; the proposed method is [0.9987,1.001] p.u., with an average value of 1.0p.u., a 1.3% reduction in the average deviation compared to the uncontrolled voltage.

For example, as shown in fig. 5, the PCC voltage performance under the two control methods is similar, and always remains around the rated value of 1p.u, which is 1.2% lower than the voltage deviation under the non-control condition.

To further analyze the performance of the methods presented herein in controlling voltage deviation, a voltage deviation index coefficient (VPI) is introduced for analysis and comparison, such as VPI (t | | | V (t) -V^ref(t)||₂As shown. As shown in fig. 6, the voltage VPI indexes under different control methods have an average value of 14.5e-3 in the case of no reactive-voltage control, and have average values of 2.3e-3 and 1e-3 in the case of sensitivity analysis and the proposed reactive-voltage control, respectively, it can be seen that the voltage deviation optimization performance of the proposed method is superior to that of the sensitivity analysis.

For the comparison of system loss under different control methods, under the reactive-voltage control, the output reactive power of the unit is increased, and the loss of the system is slightly increased. The two optimization methods consider system loss, so the influence on the system loss is small, and the system loss performance under the two optimization methods has no great difference.

The reactive output waveform of the unit is as shown in fig. 7 and 8, because the voltage in the wind power plant exceeds the rated value, the unit needs to absorb reactive power, and because the voltage deviation of the unit is larger closer to the rear row unit, the output reactive power of the rear row unit is higher than that of the front row unit. The average value of the reactive output of the unit under the control of the sensitivity analysis method is-0.3535 MVar, and the mentioned method is-0.4154 MVar higher than the former, so the voltage deviation under the mentioned method is smaller, but the loss is slightly higher than that of the sensitivity analysis method.

To further compare the advantages of the methods presented herein, the objective function values under different control methods are compared, as shown in fig. 9, the objective function values of the presented methods are all smaller than those of the sensitivity analysis method in the whole process, which indicates that the optimization of the methods presented herein achieves better results.

The total time for solving is 152s by adopting a sensitivity linear method, the total time for solving the model of the method provided by the invention is 2.9s, the single solving time is only 0.006s, and the real-time performance is better. Compared with a traditional solver, the method has a better model optimization result. In conclusion, the method provided by the invention can simultaneously improve the solving precision and speed of the offshore wind plant reactive-voltage optimization model.

Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims

1. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:

2. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 1, characterized in that: the S1 specifically includes the following steps:

for maximum amplitude reduction of the system voltage deviation:

the system losses are:

constraint equation of voltage limit values of all nodes of wind power plant:

3. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 2, characterized in that: the S2 specifically includes the following steps:

S22: after the environment executes the action, the next state S is entered_t+1＝[P_i,t+1，Q_i,t+1，V_i,t+1，P_loss,t]The reward function is set as the first objective function in step S11 taking negative number, and the five constraint equations in steps S12 and S13 are taken as equations (1) to (5), and the penalty mechanism is set to calculate the obtained reward:

4. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 3, characterized in that: the S3 specifically includes the following steps:

a_t＝π(s_t∣θ^π)+N

Same as the formula in S22;

5. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 4, characterized in that: the S4 specifically includes the following steps:

s41: collecting real-time measured data of a wind power plant;