CN113541192A - Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning - Google Patents

Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning Download PDF

Info

Publication number
CN113541192A
CN113541192A CN202110850832.2A CN202110850832A CN113541192A CN 113541192 A CN113541192 A CN 113541192A CN 202110850832 A CN202110850832 A CN 202110850832A CN 113541192 A CN113541192 A CN 113541192A
Authority
CN
China
Prior art keywords
reactive
voltage
coordination control
wind
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110850832.2A
Other languages
Chinese (zh)
Inventor
李辉
谭宏涛
周芷汀
郑杰
彭瀚峰
王嘉瑶
青和
向学位
姚然
全瑞坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202110850832.2A priority Critical patent/CN113541192A/en
Publication of CN113541192A publication Critical patent/CN113541192A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/381Dispersed generators
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/12Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load
    • H02J3/16Circuit arrangements for ac mains or ac distribution networks for adjusting voltage in ac networks by changing a characteristic of the network load by adjustment of reactive power
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for ac mains or ac distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/466Scheduling the operation of the generators, e.g. connecting or disconnecting generators to meet a given demand
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/28The renewable source being wind energy
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/40Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation wherein a plurality of decentralised, dispersed or local energy generation technologies are operated simultaneously
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E10/00Energy generation through renewable energy sources
    • Y02E10/70Wind energy
    • Y02E10/76Power conversion electric or electronic aspects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Control Of Eletrric Generators (AREA)
  • Wind Motors (AREA)

Abstract

The invention relates to an offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning, and belongs to the technical field of wind power generation. The method comprises the following steps: s1: establishing an offshore wind plant reactive power-voltage coordination control model; s2: establishing a Markov decision process model based on a wind power plant reactive-voltage control strategy, and defining a state action and a reward function of the system; s3: training a reactive power-voltage coordination control model based on a depth certainty gradient strategy and the random output data of the wind turbine generator set to realize the mapping from the state of the wind turbine generator set to a reactive power instruction; s4: an offshore wind power plant reactive-voltage coordination control strategy online deployment process. Compared with the traditional method, the method does not depend on accurate modeling of the system, can effectively reduce the voltage deviation of the wind power plant, and has better model solving precision and instant response speed.

Description

Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of wind power generation, and relates to an offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning.
Background
At present, an offshore wind farm is mainly based on an alternating current transmission grid-connected mode, a large amount of charging power is generated due to the capacitance effect of an alternating current submarine cable, the voltage at the tail end of the cable is raised, and the out-of-limit risk of the voltage of a wind turbine generator is higher when the wind turbine generator is closer to the tail end of a feeder line. In addition, the reactive compensation equipment is difficult to install in severe offshore environment, so that the reactive-voltage regulation control function of the wind turbine generator is more prominent, and the wind turbine generator has higher economy. The single unit has limited regulating capacity, and geographical positions and operating states of the units are different, so that how to comprehensively plan the operating states of all the units to realize reactive-voltage coordination control is a research hotspot.
Most of the existing methods depend on the structural model parameters of the wind power plant, repeated solution is needed for each new scene, and the instant response capability needs to be further improved. The artificial intelligence method does not depend on detailed modeling of the system and constructs the mapping relation between the operation state of the wind power plant and the decision action through operation data information, and the instant decision response capability of the wind power plant capability management platform is greatly improved.
Based on the above, the offshore wind power plant reactive power-voltage coordination control method based on deep reinforcement learning is provided in consideration of the characteristics of artificial intelligence reinforcement learning and the characteristics of the wind power plant reactive power-voltage coordination control model.
Disclosure of Invention
In view of the above, the present invention provides a method for offshore wind farm reactive-voltage coordination control based on deep reinforcement learning. The method comprises the steps of firstly establishing a reactive power-voltage coordination control model and a Markov decision process model of the offshore wind power plant, then training the reactive power-voltage coordination control model based on a depth certainty gradient strategy and combination of random output data of a wind generation set, realizing mapping from the state of the wind generation set to a reactive power instruction, and entering an online deployment stage when the model training is finished.
In order to achieve the purpose, the invention provides the following technical scheme:
the offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning comprises the following steps:
s1: establishing an offshore wind plant reactive power-voltage coordination control model;
s2: establishing a Markov decision process model based on a wind power plant reactive-voltage coordination control strategy, and defining the state, action and reward function of the system;
s3: training a reactive power-voltage coordination control model based on a depth certainty gradient strategy and the random output data of the wind turbine generator set to realize the mapping from the state of the wind turbine generator set to a reactive power instruction;
s4: an offshore wind power plant reactive-voltage coordination control strategy online deployment process.
Optionally, the S1 specifically includes the following steps:
s11: constructing a mathematical expression objective function of a wind power plant reactive-voltage coordination control model:
in order to minimize the voltage deviation DeltaV of all nodes of the systemiWhile taking system loss P into accountloss
Figure BDA0003182439860000021
In the formula, N and N respectively represent the number of all nodes and the number of wind driven generators in a wind power plant grid-connected system; subscript i denotes the ith node;
for maximum amplitude reduction of the system voltage deviation:
Figure BDA0003182439860000022
in the formula, Vi refThe rated voltage of the node is taken as 1p.u. for the given value of the voltage of the ith node;
the system losses are:
Figure BDA0003182439860000023
in the formula, GijAnd BijThe real part and the imaginary part of the admittance between the nodes i and j respectively; thetaij=θijIs the phase angle difference between nodes i and j;
s12: the equality constraint of the mathematical expression of the wind power plant reactive-voltage coordination control model is a node power balance equation of the system, namely a power flow constraint equation:
Figure BDA0003182439860000024
Figure BDA0003182439860000025
s13: the wind power plant reactive-voltage coordination control model is mathematically expressed by inequality constraint:
constraint equation of voltage limit values of all nodes of wind power plant:
Figure BDA0003182439860000026
in the formula, Vi minAnd Vi maxThe upper and lower limit values of the ith node voltage are respectively;
the output reactive power range constraint equation of the wind turbine generator is as follows:
Figure BDA0003182439860000027
in the formula, Qk minAnd Qk maxRespectively outputting reactive upper and lower limit values for the kth wind turbine generator set;
the output current constraint equation of the wind power converter is as follows:
Figure BDA0003182439860000028
in the formula Ik maxFor the current upper limit of the kth wind turbine generator set converter, all the units adopt the same converter, and the current upper limit constraints are the same.
Optionally, the S2 specifically includes the following steps:
s21: at time t, the environment is in state St=[Pi,t,Qi,t,Vi,t,Ploss,t]The agent makes action A according to the state informationt=[Q1,t…Qn,t];
Wherein, the state space S refers to the state set of the environment, the invention selects the information of active power, idle power, voltage, loss and the like output by the wind turbine as the state input, and the state space S refers to the state set of the environmentt=[Pi,t,Qi,t,Vi,t,Ploss,t]In, Pi,t、Qi,t、Vi,t、Ploss,tRespectively representing active and reactive power output, voltage and system loss of the ith node at the moment t; the action space a refers to an action response set made by the agent, namely a given reactive value of each wind turbine is determined according to the state of the wind farm, At=[Q1,t…Qn,t];
S22: after the environment executes the action, the next state S is enteredt+1=[Pi,t+1,Qi,t+1,Vi,t+1,Ploss,t]Setting the reward function as the first target function with negative number in the step S11, taking the five constraint equations in the steps S12 and S13 as the formulas (1) - (5), and setting the punishment machineCalculating the reward obtained:
Figure BDA0003182439860000031
in the formula, rho is a penalty factor, and the value is far larger than the target function Obj.
Optionally, the S3 specifically includes the following steps:
s31: a Markov decision process formed by wind power plant reactive-voltage coordination control is a continuous space optimization problem, and a good solving effect is obtained based on a depth certainty gradient strategy algorithm (DDPG); consists of an actor-critic Deep Neural Network (DNN);
wherein the operator network is formed by a parameter thetaπComposition, as a policy function, implementing an environmental state stTo action atThe mapping of the wind power station realizes a strategy function, namely a reactive action instruction is made according to the state of the wind power station; criticc, from parameter θQComposition as a function of action-value Q(s)t,at) Evaluating the action of the actor;
s32: the specific process of the reactive power-voltage coordination control strategy offline training needs system initialization at first: randomly generating data in an active power range output by the wind turbine generator; initializing operator-critical network parameters; initializing a sequence buffer R and motion random noise;
s33: experience generation: the intelligent agent is based on the environmental state s of the wind power planttMaking a reactive action command at
Figure BDA0003182439860000032
In the formula, N is random noise introduced to increase randomness of motion output and further enlarge learning accuracy;
entering a next state s after the environment performs the actiont+1And earn a prize rt
Figure BDA0003182439860000041
Same as the formula in S22;
s34: and (3) recovering experience: experience with last step agent interaction with environment(s)t at rtst+1) Storing the data format of the sequence R into a buffer area to form a sequence R;
s35: and (3) policy updating: randomly screening small-batch data M from the sequence buffer R, and updating the operator-critical network parameters by using a gradient descent strategy;
s36: and (3) finishing training: with the continuous interaction between the intelligent agent and the environment, the operator-critical network parameters are gradually converged, the actions which can be made by the intelligent agent are gradually stable, and when the acquired reward tends to be stable and does not obviously change, the intelligent agent is considered to be capable of realizing the mapping of the reactive power-voltage coordination control strategy, and the online deployment stage can be entered after the model training is finished.
Optionally, the S4 specifically includes the following steps:
s41: collecting real-time measured data of a wind power plant;
s42: and determining a reactive-voltage control strategy of the current time interval according to the real-time measured data by utilizing the final completion result of the off-line training.
The invention has the beneficial effects that: the method can effectively reduce the voltage deviation of the wind power plant, does not depend on accurate modeling of a system, and has better model solving precision and instant response speed.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a block diagram of the implementation process of offshore wind farm reactive-voltage coordination control based on deep reinforcement learning according to the present invention;
FIG. 2 is a diagram of an intelligent voltage regulation model structure of a wind farm according to the present invention;
FIG. 3 shows the result of the interaction between the agent and the environment to obtain the reward curve after the reinforcement learning model is trained according to the present invention;
FIG. 4 is a graph of active power output data from a wind farm test when all wind turbines are operating in the maximum power mode according to the present invention;
FIG. 5 is a comparison graph of PCC voltages of the generator sets under different control methods of the present invention;
FIG. 6 is a comparison chart of VPI indexes of the generator set under different control methods of the invention;
FIG. 7 is a diagram of the reactive output waveform of the unit under the sensitivity analysis method of the present invention;
FIG. 8 is a reactive output waveform of the unit under the control method of the present invention;
FIG. 9 is a comparison graph of the optimized target values of the generator set under different control methods of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
In the embodiment, a 5MW high-speed permanent magnet wind turbine generator set is selected for modeling, an Inter I7-9700CPU is adopted for training, experiments are carried out on the premise of the technical scheme of the invention, and a detailed implementation mode and a specific operation process are given.
Fig. 1 shows a specific process of the method of the present invention, and the present invention aims to provide an intelligent reactive-voltage control method for an offshore wind farm. The method comprises the steps of firstly establishing a reactive power-voltage coordination control model and a Markov decision process model of the offshore wind power plant, then training the reactive power-voltage coordination control model based on a depth certainty gradient strategy and combination of random output data of a wind generation set, realizing mapping from the state of the wind generation set to a reactive power instruction, and entering an online deployment stage when the model training is finished, and specifically comprises the following steps:
s1: establishing an offshore wind plant reactive power-voltage coordination control model;
s2: establishing a Markov decision process model based on a reactive power-voltage coordination control strategy, and defining the state, action and reward function of the system;
s3: training a reactive power-voltage coordination control model based on a depth certainty gradient strategy and the random output data of the wind turbine generator set to realize the mapping from the state of the wind turbine generator set to a reactive power instruction;
s4: an offshore wind power plant reactive-voltage coordination control strategy online deployment process.
Further, in step S1, establishing an offshore wind farm reactive-voltage coordination control model specifically includes:
s11: constructing a mathematical expression objective function of a wind power plant reactive-voltage coordination control model:
in order to minimize the voltage deviation DeltaV of all nodes of the systemiWhile taking system loss P into accountloss(per unit value in the present invention):
Figure BDA0003182439860000061
in the formula, N and N respectively represent the number of all nodes and the number of wind driven generators in a wind power plant grid-connected system; the index i indicates the ith node.
For maximum amplitude reduction of the system voltage deviation:
Figure BDA0003182439860000062
in the formula, Vi refGiven for the ith node voltage, the nominal voltage (1p.u.) of the node is taken.
The system losses are:
Figure BDA0003182439860000063
in the formula, GijAnd BijThe real part and the imaginary part of the admittance between the nodes i and j respectively; thetaij=θijIs the phase angle difference between nodes i and j.
S12: the equality constraint of the mathematical expression of the wind power plant reactive-voltage coordination control model is a node power balance equation of the system, namely a power flow constraint equation:
Figure BDA0003182439860000064
Figure BDA0003182439860000065
s13: the wind power plant reactive-voltage coordination control model is mathematically expressed by inequality constraint:
constraint equation of voltage limit values of all nodes of wind power plant:
Figure BDA0003182439860000066
in the formula, Vi minAnd Vi maxRespectively, the ith node voltage upper and lower limit values.
The output reactive power range constraint equation of the wind turbine generator is as follows:
Figure BDA0003182439860000067
in the formula, Qk minAnd Qk maxAnd respectively outputting reactive upper and lower limit values for the kth wind turbine generator set.
The output current constraint equation of the wind power converter is as follows:
Figure BDA0003182439860000068
in the formula Ik maxThe current upper limit of the current transformer of the kth wind turbine generator set is set (in the invention, all the generator sets adopt the same current transformer, and the current upper limit is restrained the same).
Further, step S2 is based on the markov decision process model of the coordination control strategy, which specifically includes:
s21: at time t, the environment is in state St=[Pi,t,Qi,t,Vi,t,Ploss,t]The agent makes action A according to the state informationt=[Q1,t…Qn,t]。
Wherein, the state space S refers to the state set of the environment, the invention selects the information of active power, idle power, voltage, loss and the like output by the wind turbine as the state input, and the state space S refers to the state set of the environmentt=[Pi,t,Qi,t,Vi,t,Ploss,t]In, Pi,t、Qi,t、Vi,t、Ploss,tRespectively representing the active power, the reactive power, the voltage and the system loss of the ith node at the moment t. The action space a refers to an action response set made by the agent, namely a given reactive value of each wind turbine is determined according to the state of the wind farm, At=[Q1,t…Qn,t];
S22: after the environment executes the action, the next state S is enteredt+1=[Pi,t+1,Qi,t+1,Vi,t+1,Ploss,t]The reward function is set as the first objective function with negative number in step S11, and the five constraint equations in steps S12 and S13 are given as equations (1) - (5), and the penalty mechanism is set to calculate the reward obtained:
Figure BDA0003182439860000071
in the formula, rho is a penalty factor, and the value is far larger than the target function Obj.
Further, step S3 is to train the reactive-voltage coordination control model based on the depth certainty gradient strategy in combination with the unit random output data, specifically:
s31: a Markov decision process formed by wind power plant reactive-voltage coordination control is a continuous space optimization problem, and a better solving effect can be obtained based on a depth deterministic gradient strategy algorithm (DDPG). The method consists of an actor-critic (actor-critic) Deep Neural Network (DNN).
Wherein the operator network (by parameter theta)πComposition) as a policy function, implementing an environmental state stTo action atThe mapping of the wind power station realizes a strategy function, namely a reactive action instruction is made according to the state of the wind power station; criticc (by parameter θ)QComposition) as a function of action-value (Q(s)t,at) Make action evaluation on the actor.
S32: the specific process of the reactive power-voltage coordination control strategy offline training needs system initialization at first: randomly generating active output data (within the output active range of the wind turbine generator) of the wind turbine generator; initializing operator-critical network parameters; a sequence buffer R and motion random noise are initialized.
S33: experience generation: the intelligent agent is based on the environmental state s of the wind power planttMaking a reactive action command at
Figure BDA0003182439860000072
In the formula, N is random noise introduced to increase randomness of motion output and further increase learning accuracy.
Entering a next state s after the environment performs the actiont+1And earn a prize rt
Figure BDA0003182439860000081
The formula is the same as the formula in S22;
s34: and (3) recovering experience: experience with last step agent interaction with environment(s)t at rtst+1) The data format of (2) is stored in a buffer to form a sequence R.
S35: and (3) policy updating: randomly screening small-batch data M from the sequence buffer R, and updating the operator-critical network parameters by using a gradient descent strategy, wherein the parameter updating process is schematically shown in FIG. 2.
S36: and (3) finishing training: with the continuous interaction between the intelligent agent and the environment, the operator-critical network parameters are gradually converged, the actions which can be made by the intelligent agent are gradually stable, and when the acquired reward tends to be stable and does not obviously change, the intelligent agent is considered to be capable of realizing the mapping of the reactive power-voltage coordination control strategy, and the online deployment stage can be entered after the model training is finished.
S37: according to the method for interacting the wind power plant environment and the intelligent agent, the reinforcement learning model is trained. The training period is 10000, each period comprises 100 steps, and the training is carried out by adopting an InterI 7-9700CPU, which takes 32 hours. Training results as shown in fig. 3, the environment is explored by the agent randomly acting during the initial stage of training, and the reward obtained during this stage is low. After 500 training periods, the intelligent agent obtains the reward according to the learning experience and greatly improves, after about 6000 training periods, the intelligent agent obtains the reward gradually and stably, and the model training is converged.
Further, step S4 is to perform an online deployment process of the offshore wind farm reactive-voltage coordination control strategy, specifically:
s41: and collecting real-time measured data of the wind power plant.
S42: and determining a reactive-voltage control strategy of the current time interval according to the real-time measured data by utilizing the final completion result of the off-line training.
In order to verify the effectiveness of the proposed method, three different control methods are used for comparison verification:
1) no additional control;
2) the wind power plant reactive-voltage coordination control method based on deep reinforcement learning has the advantages that the optimization targets are voltage deviation and system loss, and response is realized by deploying an online reinforcement learning model;
3) the reactive-voltage coordination control optimization method based on sensitivity linearization has the same optimization target as that of the method, and an MATLAB is adopted to call a Yalmip solver to solve.
The results are shown in fig. 4 to 9, and the test output data of the wind power plant is shown in fig. 4 when all the wind turbines are set to operate in the maximum power mode. In the unit terminal voltage waveform without reactive-voltage control, the terminal voltage range of the unit is [1.012,1.015] p.u., the average value is 1.0136p.u., it can be seen that the voltage of the feeder line head end unit exceeds the rated value, and the closer to the tail row unit terminal voltage, the higher the voltage out-of-limit risk exists. The terminal voltage range of the generator set under the sensitivity method is [1.0005,1.003] p.u., and the average value is 1.0017 p.u.; the proposed method is [0.9987,1.001] p.u., with an average value of 1.0p.u., a 1.3% reduction in the average deviation compared to the uncontrolled voltage.
For example, as shown in fig. 5, the PCC voltage performance under the two control methods is similar, and always remains around the rated value of 1p.u, which is 1.2% lower than the voltage deviation under the non-control condition.
To further analyze the performance of the methods presented herein in controlling voltage deviation, a voltage deviation index coefficient (VPI) is introduced for analysis and comparison, such as VPI (t | | | V (t) -Vref(t)||2As shown. As shown in fig. 6, the voltage VPI indexes under different control methods have an average value of 14.5e-3 in the case of no reactive-voltage control, and have average values of 2.3e-3 and 1e-3 in the case of sensitivity analysis and the proposed reactive-voltage control, respectively, it can be seen that the voltage deviation optimization performance of the proposed method is superior to that of the sensitivity analysis.
For the comparison of system loss under different control methods, under the reactive-voltage control, the output reactive power of the unit is increased, and the loss of the system is slightly increased. The two optimization methods consider system loss, so the influence on the system loss is small, and the system loss performance under the two optimization methods has no great difference.
The reactive output waveform of the unit is as shown in fig. 7 and 8, because the voltage in the wind power plant exceeds the rated value, the unit needs to absorb reactive power, and because the voltage deviation of the unit is larger closer to the rear row unit, the output reactive power of the rear row unit is higher than that of the front row unit. The average value of the reactive output of the unit under the control of the sensitivity analysis method is-0.3535 MVar, and the mentioned method is-0.4154 MVar higher than the former, so the voltage deviation under the mentioned method is smaller, but the loss is slightly higher than that of the sensitivity analysis method.
To further compare the advantages of the methods presented herein, the objective function values under different control methods are compared, as shown in fig. 9, the objective function values of the presented methods are all smaller than those of the sensitivity analysis method in the whole process, which indicates that the optimization of the methods presented herein achieves better results.
The total time for solving is 152s by adopting a sensitivity linear method, the total time for solving the model of the method provided by the invention is 2.9s, the single solving time is only 0.006s, and the real-time performance is better. Compared with a traditional solver, the method has a better model optimization result. In conclusion, the method provided by the invention can simultaneously improve the solving precision and speed of the offshore wind plant reactive-voltage optimization model.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (5)

1. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning is characterized by comprising the following steps of: the method comprises the following steps:
s1: establishing an offshore wind plant reactive power-voltage coordination control model;
s2: establishing a Markov decision process model based on a wind power plant reactive-voltage coordination control strategy, and defining the state, action and reward function of the system;
s3: training a reactive power-voltage coordination control model based on a depth certainty gradient strategy and the random output data of the wind turbine generator set to realize the mapping from the state of the wind turbine generator set to a reactive power instruction;
s4: an offshore wind power plant reactive-voltage coordination control strategy online deployment process.
2. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 1, characterized in that: the S1 specifically includes the following steps:
s11: constructing a mathematical expression objective function of a wind power plant reactive-voltage coordination control model:
in order to minimize the voltage deviation DeltaV of all nodes of the systemiWhile taking system loss P into accountloss
Figure FDA0003182439850000011
In the formula, N and N respectively represent the number of all nodes and the number of wind driven generators in a wind power plant grid-connected system; subscript i denotes the ith node;
for maximum amplitude reduction of the system voltage deviation:
Figure FDA0003182439850000012
in the formula, Vi refThe rated voltage of the node is taken as 1p.u. for the given value of the voltage of the ith node;
the system losses are:
Figure FDA0003182439850000013
in the formula, GijAnd BijThe real part and the imaginary part of the admittance between the nodes i and j respectively; thetaij=θijIs the phase angle difference between nodes i and j;
s12: the equality constraint of the mathematical expression of the wind power plant reactive-voltage coordination control model is a node power balance equation of the system, namely a power flow constraint equation:
Figure FDA0003182439850000014
Figure FDA0003182439850000015
s13: the wind power plant reactive-voltage coordination control model is mathematically expressed by inequality constraint:
constraint equation of voltage limit values of all nodes of wind power plant:
Figure FDA0003182439850000021
in the formula, Vi minAnd Vi maxThe upper and lower limit values of the ith node voltage are respectively;
the output reactive power range constraint equation of the wind turbine generator is as follows:
Figure FDA0003182439850000022
in the formula, Qk minAnd Qk maxRespectively outputting reactive upper and lower limit values for the kth wind turbine generator set;
the output current constraint equation of the wind power converter is as follows:
Figure FDA0003182439850000023
in the formula Ik maxFor the current upper limit of the kth wind turbine generator set converter, all the units adopt the same converter, and the current upper limit constraints are the same.
3. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 2, characterized in that: the S2 specifically includes the following steps:
s21: at time t, the environment is in state St=[Pi,t,Qi,t,Vi,t,Ploss,t]The agent makes action A according to the state informationt=[Q1,t…Qn,t];
Wherein, the state space S refers to the state set of the environment, the invention selects the information of active power, idle power, voltage, loss and the like output by the wind turbine as the state input, and the state space S refers to the state set of the environmentt=[Pi,t,Qi,t,Vi,t,Ploss,t]In, Pi,t、Qi,t、Vi,t、Ploss,tRespectively representing active and reactive power output, voltage and system loss of the ith node at the moment t; the action space a refers to an action response set made by the agent, namely a given reactive value of each wind turbine is determined according to the state of the wind farm, At=[Q1,t…Qn,t];
S22: after the environment executes the action, the next state S is enteredt+1=[Pi,t+1,Qi,t+1,Vi,t+1,Ploss,t]The reward function is set as the first objective function in step S11 taking negative number, and the five constraint equations in steps S12 and S13 are taken as equations (1) to (5), and the penalty mechanism is set to calculate the obtained reward:
Figure FDA0003182439850000024
in the formula, rho is a penalty factor, and the value is far larger than the target function Obj.
4. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 3, characterized in that: the S3 specifically includes the following steps:
s31: a Markov decision process formed by wind power plant reactive-voltage coordination control is a continuous space optimization problem, and a good solving effect is obtained based on a depth certainty gradient strategy algorithm (DDPG); consists of an actor-critic Deep Neural Network (DNN);
wherein the operator network is formed by a parameter thetaπComposition, as a policy function, implementing an environmental state stTo action atThe mapping of the wind power station realizes a strategy function, namely a reactive action instruction is made according to the state of the wind power station; criticc, from parameter θQComposition as a function of action-value Q(s)t,at) Evaluating the action of the actor;
s32: the specific process of the reactive power-voltage coordination control strategy offline training needs system initialization at first: randomly generating data in an active power range output by the wind turbine generator; initializing operator-critical network parameters; initializing a sequence buffer R and motion random noise;
s33: experience generation: the intelligent agent is based on the environmental state s of the wind power planttMaking a reactive action command at
at=π(st∣θπ)+N
In the formula, N is random noise introduced to increase randomness of motion output and further enlarge learning accuracy;
entering a next state s after the environment performs the actiont+1And earn a prize rt
Figure FDA0003182439850000031
Same as the formula in S22;
s34: and (3) recovering experience: experience with last step agent interaction with environment(s)t at rtst+1) Storing the data format of the sequence R into a buffer area to form a sequence R;
s35: and (3) policy updating: randomly screening small-batch data M from the sequence buffer R, and updating the operator-critical network parameters by using a gradient descent strategy;
s36: and (3) finishing training: with the continuous interaction between the intelligent agent and the environment, the operator-critical network parameters are gradually converged, the actions which can be made by the intelligent agent are gradually stable, and when the acquired reward tends to be stable and does not obviously change, the intelligent agent is considered to be capable of realizing the mapping of the reactive power-voltage coordination control strategy, and the online deployment stage can be entered after the model training is finished.
5. The offshore wind farm reactive-voltage coordination control method based on deep reinforcement learning of claim 4, characterized in that: the S4 specifically includes the following steps:
s41: collecting real-time measured data of a wind power plant;
s42: and determining a reactive-voltage control strategy of the current time interval according to the real-time measured data by utilizing the final completion result of the off-line training.
CN202110850832.2A 2021-07-27 2021-07-27 Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning Pending CN113541192A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110850832.2A CN113541192A (en) 2021-07-27 2021-07-27 Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110850832.2A CN113541192A (en) 2021-07-27 2021-07-27 Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN113541192A true CN113541192A (en) 2021-10-22

Family

ID=78089212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110850832.2A Pending CN113541192A (en) 2021-07-27 2021-07-27 Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113541192A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114069650A (en) * 2022-01-17 2022-02-18 南方电网数字电网研究院有限公司 Power distribution network closed loop current regulation and control method and device, computer equipment and storage medium
CN114188997A (en) * 2021-12-07 2022-03-15 国网甘肃省电力公司电力科学研究院 Dynamic reactive power optimization method for high-ratio new energy power access area power grid
CN114721409A (en) * 2022-06-08 2022-07-08 山东大学 Underwater vehicle docking control method based on reinforcement learning
CN116154771A (en) * 2023-04-17 2023-05-23 阿里巴巴达摩院(杭州)科技有限公司 Control method of power equipment, equipment control method and electronic equipment
CN116169857A (en) * 2023-04-19 2023-05-26 山东科迪特电力科技有限公司 Voltage control method and device for cascade switching circuit
CN117967499A (en) * 2024-04-02 2024-05-03 山东大学 Wind power plant grouping wake optimization method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HONGTAO TAN等: "Reactive-Voltage Coordinated Control of Offshore Wind Farm Based on Deep Reinforcement Learning", 《2021 3RD ASIA ENERGY AND ELECTRICAL ENGINEERING SYMPOSIUM (AEEES)》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114188997A (en) * 2021-12-07 2022-03-15 国网甘肃省电力公司电力科学研究院 Dynamic reactive power optimization method for high-ratio new energy power access area power grid
CN114069650A (en) * 2022-01-17 2022-02-18 南方电网数字电网研究院有限公司 Power distribution network closed loop current regulation and control method and device, computer equipment and storage medium
CN114069650B (en) * 2022-01-17 2022-04-15 南方电网数字电网研究院有限公司 Power distribution network closed loop current regulation and control method and device, computer equipment and storage medium
CN114721409A (en) * 2022-06-08 2022-07-08 山东大学 Underwater vehicle docking control method based on reinforcement learning
CN116154771A (en) * 2023-04-17 2023-05-23 阿里巴巴达摩院(杭州)科技有限公司 Control method of power equipment, equipment control method and electronic equipment
CN116169857A (en) * 2023-04-19 2023-05-26 山东科迪特电力科技有限公司 Voltage control method and device for cascade switching circuit
CN117967499A (en) * 2024-04-02 2024-05-03 山东大学 Wind power plant grouping wake optimization method and system
CN117967499B (en) * 2024-04-02 2024-06-25 山东大学 Wind power plant grouping wake optimization method and system

Similar Documents

Publication Publication Date Title
CN113541192A (en) Offshore wind farm reactive power-voltage coordination control method based on deep reinforcement learning
CN110535146B (en) Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning
CN112615379A (en) Power grid multi-section power automatic control method based on distributed multi-agent reinforcement learning
CN114217524A (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN114784823A (en) Micro-grid frequency control method and system based on depth certainty strategy gradient
CN113363998A (en) Power distribution network voltage control method based on multi-agent deep reinforcement learning
CN103904641A (en) Method for controlling intelligent power generation of island micro grid based on correlated equilibrium reinforcement learning
CN116468159A (en) Reactive power optimization method based on dual-delay depth deterministic strategy gradient
CN116760047A (en) Power distribution network voltage reactive power control method and system based on safety reinforcement learning algorithm
CN114362187A (en) Active power distribution network cooperative voltage regulation method and system based on multi-agent deep reinforcement learning
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN112381359A (en) Multi-critic reinforcement learning power economy scheduling method based on data mining
CN115588998A (en) Graph reinforcement learning-based power distribution network voltage reactive power optimization method
CN109950933B (en) Wind-solar-storage combined peak regulation optimization method based on improved particle swarm optimization
Tan et al. Reactive-voltage coordinated control of offshore wind farm based on deep reinforcement learning
CN114048576B (en) Intelligent control method for energy storage system for stabilizing power transmission section tide of power grid
CN116300440A (en) DC-DC converter control method based on TD3 reinforcement learning algorithm
CN110210113B (en) Wind power plant dynamic equivalent parameter intelligent checking method based on deterministic strategy gradient
CN115102228A (en) Multi-target coordination frequency optimization method and device for wind power plant containing flywheel energy storage
CN114400675A (en) Active power distribution network voltage control method based on weight mean value deep double-Q network
Tongyu et al. Based on deep reinforcement learning algorithm, energy storage optimization and loss reduction strategy for distribution network with high proportion of distributed generation
CN114421470B (en) Intelligent real-time operation control method for flexible diamond type power distribution system
CN118117717B (en) Storage battery matching method based on unstable power supply
Chen Deep Reinforcement Learning-based Data-Driven load Frequency Control for Microgrid
Chen et al. Real-Time Optimal Dispatch of Microgrid Based on Deep Deterministic Policy Gradient Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211022

RJ01 Rejection of invention patent application after publication