CN115395532A

CN115395532A - Time-lag wind power system wide-area damper control method based on reinforcement learning

Info

Publication number: CN115395532A
Application number: CN202210994492.5A
Authority: CN
Inventors: 谢兴旺
Original assignee: Wuchang University of Technology
Current assignee: Wuchang University of Technology
Priority date: 2022-08-18
Filing date: 2022-08-18
Publication date: 2022-11-25

Abstract

The invention discloses a time-lag wind power system wide-area damper control method based on reinforcement learning, which comprises the following steps of: constructing a time-lag wind power wide area damping (TDWADC) control system based on reinforcement learning; the TDWADC control system comprises: the system comprises a double-fed wind turbine generator and a TDWADC controller based on reinforcement learning; analyzing the geometric controllability/observability of the system, and selecting one or more feedback signals with the highest geometric observability corresponding to the interval oscillation mode as input signals of the TDWADC; controlling the double-fed wind turbine generator set by adopting a TDWADC controller based on reinforcement learning; the invention has the beneficial effects that: the TDWADC controller can effectively restrain low-frequency oscillation of a power system in time, improves safety and stability of a power grid on one hand, enables the power grid to absorb electric energy generated by a wind power plant on a large scale in time on the other hand, and improves economic and social benefits of wind power generation enterprises.

Description

Time-lag wind power system wide-area damper control method based on reinforcement learning

Technical Field

The invention relates to the technical field of large-scale wind power grid-connected power generation system control, in particular to a time-lag wind power system wide-area damper control method based on reinforcement learning.

Background

The internet trend of power systems and the transmission and exchange of electric energy between regions will form a great test for the stability of power systems. Various stability problems of the power system, such as transient stability and small disturbance stability, are closely related to the mutual parallel networking operation of the large power grid. With the continuous expansion of the scale of the interconnected power system and the increasing complexity, it becomes difficult for a control device which only adopts local unit signals to ensure the stable operation of the power system.

The Wide Area Measurement System WAMS (Wide Area Measurement System) is a power network dynamic monitoring System based on a network control technology and constructed by a Global Positioning System (GPS), and specifically, physical quantities such as internal potential, power angle, angular velocity, bus voltage and the like of each generator are obtained by real-time Measurement of PMUs distributed at different geographic locations, so that the observability of the whole power System network is realized. And data information with unified time scale obtained by PMU measurement is transmitted to the PDC through a communication network, so that a data basis is provided for screening out Wide-Area signals with better controllability on inter-Area oscillation, constructing a Wide-Area Power Damping controller (WADC) to realize the coordination Control of a Wide-Area Power system, and further effectively inhibiting inter-Area low-frequency oscillation in an interconnected Power grid.

A Wide Area Measurement System (WAMS) formed based on Phasor Measurement Unit (PMU) technology provides reliable support for analyzing and researching dynamic behaviors and control strategies of a large-scale interconnected power System, provides a new technical idea and an operation platform for stable analysis and control of the Wide Area power System, provides very powerful support for comprehensive real-time monitoring of a large-Area interconnected power grid, and provides a new means for real-time monitoring of the large-Area interconnected power grid.

The most obvious difference between the PMU/WAMS-based controller and the traditional local control is that a communication network is added to form a networked wide-area time-lag system.

In the process that each PMU substation in the WAMS transmits measured phasor data to a wide-area controller through a network, the phasor data need to pass through a sensor (a voltage transformer and a current transformer), synchronous sampling, phasor calculation and data packaging, a substation communication module, a communication link, data synchronization and processing of a phasor data centralized server, data issuing to the controller and other links, and different time lags can be introduced into each link. Therefore, the existence of the time lag is an inevitable problem in the application of the wide-area signal and the magnitude of the time lag depends on various factors such as the distance between the measurement stations, the communication carrier, the communication protocol, and the load condition of the communication line. The test result shows that a certain time lag exists when wide-area signals are transmitted in a communication network formed by different media, wherein the minimum time lag of optical fibers and digital microwaves is 100-150 ms, and when the satellite mode communication is adopted, the transmission time lag can be as high as 700ms.

The delay characteristic of the communication will affect the effect of the controller, causing the controller to malfunction and even playing the opposite role. Since there is a large time lag in the transmission of the wide-area measurement information in the communication network, which is one of the important causes of controller malfunction, deterioration of the operating state, and system instability, the influence of the time lag must be taken into account when performing closed-loop control of the power system using the wide-area measurement information.

Disclosure of Invention

The invention provides a time-lag wind power system wide-area damper control method based on reinforcement learning, and aims to solve the problem that the conventional PSS control is difficult to adapt to the continuous increase of the permeability of renewable energy represented by wind power generation, and the large fluctuation of the voltage and power at a public node of a wind power and a power grid is caused.

The application provides a time-lag wind power system wide area damper control method based on reinforcement learning, which comprises the following steps:

s101: constructing a time-lag wind power wide area damping TDWADC control system based on reinforcement learning; the TDWADC control system comprises: the system comprises a plurality of groups of double-fed wind turbines and a TDWADC controller based on reinforcement learning;

wherein doubly-fed wind turbine generator system includes: the system comprises a wind turbine, a gear box, a double-fed induction generator DFIG, a transformer, a rotor side frequency converter, a power grid side frequency converter and an overvoltage protection circuit crowBar;

the wind turbine is connected with the gear box through mechanical transmission; the gear box is connected with the doubly-fed induction generator through a transmission bearing; the double-fed induction generator DFIG is connected with the transformer through electromagnetic coupling and is connected to an alternating current power grid through the transformer; the DFIG of the doubly-fed induction generator is electrically connected with the output end of the overvoltage protection circuit crowBar and the input end of the rotor side frequency converter; the output end of the rotor side frequency converter is electrically connected with the input end of the power grid side frequency converter; the output end of the power grid side frequency converter is electrically connected with one end of the transformer;

s102: controlling the doubly-fed wind turbine generator set by adopting a TDWADC controller based on reinforcement learning;

the TDWADC controller based on reinforcement learning comprises three parts of control: reinforcement learning control, voltage outer loop PI control and current inner loop PI control;

the input signal of the reinforcement learning control is a wide-area feedback power signal generated after a plurality of groups of double-fed wind turbine generators are connected through a communication network; the output signal of the reinforcement learning control is accessed to the voltage outer loop PI control;

the input signal selection process of reinforcement learning control is specifically as follows: selecting wide-area feedback power signals by adopting a modal geometric controllable/observable method, and selecting one or more wide-area feedback power signals with the highest geometric observability corresponding to the interval oscillation model as input signals for reinforcement learning control by performing geometric controllable/observable analysis on the wide-area feedback power signals;

the voltage outer loop PI control is used for controlling a power grid side frequency converter;

and the current inner loop PI control is used for controlling the rotor side frequency converter to output specified active power and reactive power and finishing the suppression of the power oscillation of the grid-connected access point of the wind turbine.

Compared with the prior art, the invention has the beneficial effects that: the damping controller can effectively suppress the low-frequency oscillation of the power system in time, so that the safety and the stability of a power grid are improved, the power grid can absorb electric energy generated by a wind power plant in time on a large scale, and the economic and social benefits of a wind power generation enterprise are improved.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic structural diagram of a time-lag wind power wide area damping TDWADC control system based on reinforcement learning;

FIG. 3 is a reinforcement learning control schematic;

FIG. 4 is a schematic diagram of the structure of the Critic network;

fig. 5 is a schematic diagram of the architecture of the Actor network;

FIG. 6 is a schematic diagram of a reinforcement learning process;

FIG. 7 is a basic control block diagram of the rotor-side frequency converter;

FIG. 8 is a diagram of a grid side converter architecture;

FIG. 9 is a grid side converter control block diagram;

FIG. 10 is a schematic diagram of the control principle of a reinforcement learning based TDWADC controller;

FIG. 11 is a schematic diagram of a 16-machine time-lag wind power generation system;

fig. 12 is a power angle deviation response curve between the

generators

1 and 3 with time lag t =300 ms;

fig. 13 is a power angle deviation response curve between the

generators

1 and 3 with a time lag t =600 ms.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

First, the related terms and their abbreviations are explained in a unified manner as follows.

A Time lag Wide Area Damping Controller (TDWADC);

power System Stabilizer (PSS);

reinforcement Learning (RL);

time lag (TD);

wide Area Measurement System (WAMS);

a synchronous Phasor Measurement Unit (PMU);

phasor Data Concentrators (PDC);

low Frequency Oscillation (LFO);

referring to fig. 1, fig. 1 is a schematic flow chart of the method of the present invention;

the invention provides a method for designing an oscillation damping controller of a wind storage power generation system based on reinforcement learning. The method comprises the following steps:

s101: constructing a time-lag wind power wide area damping TDWADC control system based on reinforcement learning; the TDWADC control system comprises: the system comprises a plurality of groups of double-fed wind turbine generators and a TDWADC controller based on reinforcement learning;

referring to fig. 2, fig. 2 is a schematic structural diagram of a time-lag wind power wide area damping TDWADC control system based on reinforcement learning;

the wind turbine is connected with the gear box through mechanical transmission; the gear box is connected with the doubly-fed induction generator through a transmission bearing; the DFIG is connected with the transformer through electromagnetic coupling and is connected to an alternating current power grid through the transformer; the DFIG of the doubly-fed induction generator is electrically connected with the output end of the overvoltage protection circuit crowBar and the input end of the rotor side frequency converter; the output end of the rotor side frequency converter is electrically connected with the input end of the power grid side frequency converter; the output end of the power grid side frequency converter is electrically connected with one end of the transformer;

the double-fed wind turbine generator structure diagram mainly comprises the following models, namely a wind turbine model and a double-fed induction generator model;

it should be noted that the wind turbine model mainly completes the conversion from wind energy to mechanical energy, and takes the wind speed, pitch angle and generator mechanical rotation speed per unit value as input, and outputs the mechanical torque acting on the induction motor rotor.

Based on aerodynamics, the characteristics of a wind turbine capturing wind energy can be represented by the wind turbine output mechanical power simplified as follows

In the formula (I), the compound is shown in the specification,

1)P _w outputting mechanical power (W) for the wind turbine, wherein R is the radius (m) of the blade, rho is the air density (kg/m 3), and v is _w Equivalent wind speed (m/s);

2)C _p the coefficient of wind energy utilization is related to λ and β, λ being the tip speed ratio and β being the blade pitch angle (°).

(1) Tip speed ratio

Defined as the ratio of linear velocity of the tip of the wind turbine blade to the wind speed

In the formula, ω _m The mechanical rotating speed of the wind turbine.

In the formula, ω _r The generator mechanical speed per unit value (p.u.), p is the generator pole pair number (p), and GR is the gear ratio.

(2) Blade pitch angle

The included angle between the fan blade and the plane of the wind wheel is indicated. The smaller the pitch angle, the larger the windward side of the blade and thus the larger the wind energy captured.

Considering the mechanical efficiency of the wind turbine gearbox, the actual output power of the wind turbine is

P _{w_out} ＝η·P _w (1.4)

Where η is the gearbox efficiency.

From this, the per unit value of the output mechanical torque can be obtained as

It should be noted that the doubly-fed induction generator model is as follows:

the double-fed generator mathematical model consists of a voltage equation, a flux linkage equation and a torque power equation. According to the motor convention, in a synchronous rotating coordinate system, the stator-rotor voltage equation is:

where p is the differential operator d/dt, # denotes flux linkage, subscripts s and r denote stator and rotor variables, respectively, and subscripts d and q denote d-axis and q-axis variables, respectively, of the synchronous coordinate system, the same applies below. All variables including time in the formula (2.1) are per unit values, wherein the reference value of the time t is 1/ω b, namely according to ω _b The angular frequency of (1) is selected for the time taken for 1 rad. Omega _b Reference angular frequency, for 50Hz systems, omega _b ＝2*πf。

The per unit stator has active and reactive equations as follows:

and (3) orienting the d axis of the synchronous rotating coordinate system on the axis of the stator magnetic chain, and then the stator magnetic chain equation and the rotor magnetic chain equation are as follows:

neglecting stator resistance (Rs = 0) and stator flux linkageTransient state (p psi) _sd,q = 0), the stator voltage in equation (2.3) is simplified as:

in the formula of U _s For stator phase voltage amplitude, ω ₁ Is the angular velocity of the electrical quantities of the stator. Will psi _sd ＝U _s /ω ₁ The sum formula (2.4) is substituted into the power equation, and the output power of the stator is recorded as P _out And Q _out I.e. P _sout ＝-P _s ，Q _sout ＝-Q _s Obtaining:

thus, P is shown by the formula (2.5) _sout ,Q _sout Can be formed by _rq ,i _rd And (4) decoupling control. And i _rq ,i _rd Is finally controlled by u _rd ,u _rq And (4) realizing. i.e. i _rq ,i _rd And u _rd ,u _rq The relationship of (a) is derived as follows:

from the stator flux linkage equation:

will be given by i _rq ,i _rd The expressed stator current equation is substituted into the rotor flux linkage equation to obtain

In the formula:

substituting equation (2.7) into the rotor voltage equation in equation (2.1) can yield i _rq ,i _rd And u _rd ,u _rq In relation to (2)

Wherein ω is _slip ＝ω ₁ -ω _r And is a per unit value of the angular velocity deviation signal.

S102: controlling the double-fed wind turbine generator set by adopting a TDWADC controller based on reinforcement learning;

the input signal of the reinforcement learning control is a wide-area feedback power signal generated after a plurality of groups of double-fed wind turbine generators are connected through a communication network;

it should be noted that, due to the existence of the communication network added among the multiple groups of double-fed wind turbine generators, a networked wide area time-lag system is formed; in the wide-area damping control, a time-lag link is approached by a pad approximate rational polynomial, equivalent processing is carried out, and then a wide-area damping controller is designed.

The time-lag link can be generally used as e ^-as To represent; the closed loop transfer function of the entire skew system is as follows:

wherein, G _c (s) is a controller link, G(s) is a controlled object, e ^-as Is a time-lag link, a is a delay time; from equation (3.1), it can be seen that the denominator of the closed loop transfer function contains e ^-as The time-lag element makes the transfer function G(s) have infinite poles, and the closed-loop system may be unstable. Network communication time lag in the system can cause system controller failure and system running state deterioration, and finally the system is unstable. Therefore, when the informatization degree of the power system is higher and higher, the time lag generated by the remote communication of the communication network in the power system, the time lag of the data acquisition and pretreatment links, the response delay of the controller and the actuator to the input signal and the like have the influence on the stable operation of the systemThe louder and louder.

In this application, a time-lag link e ^-as A first order Pade approximation can be used (equation 2.10):

for the influence brought by time lag, the method selects the wide area feedback power signals by adopting a modal geometric controllable/observable method, and selects one or more wide area feedback power signals with the highest geometric observable corresponding to the interval oscillation model as input signals for reinforcement learning control by performing geometric controllable/observable analysis on the wide area feedback power signals;

with respect to the geometric controllability/observability method of the modality, the present application is specifically explained as follows:

assume a linearized model of the entire system as:

wherein A, B and C are respectively a state matrix, an input matrix and an output matrix.

Let matrix A have n independent eigenvalues lambda _k (k =1,2, \8230;, n), and their corresponding left and right eigenvectors are, respectively

And psi, and thus, a degree of geometric controllability gm corresponding to the mode k _ci (k) Gm of geometric considerable degree _oj (k) Respectively as follows:

wherein, b _i Is the ith column of matrix B, and the ithInput is corresponding to c _j Is the jth row of matrix C, corresponding to the jth output, | z |, | | z | | is the modulo and Euclidean norm of z, respectively, α (ψ) _k ,b _i ) Is the geometric angle between the ith input and the kth left eigenvector,

is the geometric angle between the jth output and the kth right eigenvector.

From this, the definition of the integrated geometric controllability/observability of the kth modality is:

gm _cok (i,j)＝gm _ci (k)gm _oj (k) (2.13)

and selecting one or more feedback signals with the highest geometric observability corresponding to the interval oscillation model as input signals of the wide-area damping controller TDWADC through geometric controllable/observability analysis.

For the output signal of the reinforcement learning control, the output signal is accessed to the voltage outer loop PI control;

and the current inner loop PI control is used for controlling the rotor side frequency converter to output specified active power and reactive power so as to complete the suppression of power oscillation of the grid-connected access point of the wind turbine.

Referring to fig. 3, fig. 3 is a schematic diagram of reinforcement learning control; the reinforcement learning control includes: a state converter, an Actor network and a Critic network; the principle of the independent control is as follows:

subtracting a preset signal w (t) from the output quantity y (t) of the controlled photovoltaic power generation system according to the actual situation to generate an error signal e (t); converting the error signal e (t) into an input state signal x (t) of the reinforcement learning network through a state converter; inputting the state signal x (t) into the Actor network to obtain the output signal u _n (t); inputting the state signal x (t) and the error reinforcement learning signal r (t) into a Critic network together to obtain an output signal n (t); output signal u _n (t) combining the control input signal u (t) with the control input signal n (t) to obtain a control input signal u (t) of the controlled photovoltaic power generation system; u (t) acts on a controlled photovoltaic power generation system to obtain an output signal y (t) to form closed-loop control; a. TheThe factor network and the Critic network also pass through a timing differential signal delta _TD And (t) updating the weight coefficients of the Actor network and the Critic network online.

And (3) respectively finishing the strategy function of the Actor network and the value function of the criticic network by adopting two BP neural networks.

Referring to fig. 4, fig. 4 is a schematic structural diagram of the Critic network;

the input of Critic network is a state signal

x _c (t)＝[x ₁ (t),x ₂ (t)…,x _n (t),r(t)] ^T (1)

The criticic network error function is shown as formula (2),

where λ is the discount coefficient, 0< λ <1;

r (t) is defined as:

wherein is a constant with ε > 0;

the transfer function of the cryptic layer neuron of the Critic network adopts a bipolar sigmoid function, and the following formula (4) is shown:

the output of the Critic network is a performance index function J (t), a hidden layer adopts a sigmoid activation function, and an output layer adopts a linear activation function; inputs and outputs for the hidden and output layer neurons of the Critic network are as follows (5):

wherein N is _c For evaluating neurons in hidden layers of networksNumber q _i And p _i The input and output of the ith neuron of the hidden layer respectively,

and

respectively representing the weights from the input layer to the hidden layer and from the hidden layer to the output layer;

the Critic network weight value updating calculation is as in formula (6):

η _c (t) is the learning rate of the Critic network; the gradient calculation from the hidden layer to the output layer is obtained according to a reverse gradient descent method as shown in formula (7):

the gradient calculation from the input layer to the hidden layer is shown as equation (8):

referring to fig. 5, fig. 5 is a schematic diagram of the architecture of an Actor network;

the input of the Actor network is as follows:

x _a (t)＝[x ₁ (t),x ₂ (t)…,x _n (t)] ^T (9)

the inputs and outputs of the Actor network hidden layer and output layer neurons are as follows (10):

N _a to evaluate the number of network hidden layer neurons, h _i And g _i The input and output of the ith neuron of the hidden layer respectively,

and

respectively representing the weights from an input layer to a hidden layer and from the hidden layer to an output layer;

the Actor network weight updating formula is shown in formula (11):

η _a (t) is the learning rate of the Actor network; the gradient calculations from the hidden layer to the output layer and the input layer to the hidden layer are shown as equations (12) and (13):

wherein ω is _nj And ω _j The weight coefficients of the Actor network and the Critic network are respectively.

The expression of the control input signal u (t) is as follows (14):

u(t)＝u ₁ (t)+η _m (0,ρ(t)) (14)

η _m depends on the output of Critic network J (t), ρ (t) = [1+ exp (2J (t))] ^-1 。

Referring to fig. 6, fig. 6 is a schematic diagram of a reinforcement learning process; the reinforcement learning process is as follows:

1. initializing learning rate, weight, iteration times and error threshold of an Actor network and a Critic network;

2. calculating values of J (t) and u1 (t) according to the aforementioned formulas (5) and (10);

3. initializing iteration times i =1; judging whether i is smaller than the maximum iteration number Nc; if yes, entering the step 4; otherwise, turning to the step 7;

4. calculating the time-series difference function delta according to the formula (2) _TD A value of (d);

5. determining an error threshold delta ₀ >δ _TD If yes, updating the weight of the criticic network according to the formulas (7) and (8); otherwise, turning to the step 7;

6. updating the weight of the Actor network according to the formulas (12) and (13);

7. t = t +1, entering a calculation process after updating the weight value;

8. calculating a control signal u (t) according to the formula (14) and acting on a controlled system;

9. updating a system state vector;

10. judging whether T is greater than the simulation time T, if yes, ending the reinforcement learning process; otherwise, i = i +1, and jumping to the iteration frequency judging process in the third step.

And the voltage outer loop PI control and the current inner loop PI control are used for controlling the rotor side frequency converter to output specified active power and reactive power, and finishing the suppression of the output power oscillation of the wind turbine grid-connected access point.

Regarding the control of the rotor-side inverter, the following is detailed:

the DFIG rotor winding is fed into a power grid through a power electronic frequency converter capable of controlling the voltage of a rotor slip ring, and active power and reactive power generated by the DFIG can be respectively controlled through a proper decoupling control method.

From the previously described model formula for the doubly fed induction generator DFIG: by changing i _qr ,i _dr Namely, the output power P of the doubly-fed induction generator can be respectively controlled _sout ,Q _sout And i is _qr ,i _dr Control of final composition of u _qr ,u _dr And realizing a power outer loop PI control structure and a current inner loop PI control structure. The basic control block diagram of the rotor-side frequency converter is shown in fig. 7. It should be noted that the current inner loop PI control is a necessary control means in the present application, and the power outer loop PI control can be performed by a common PI control means, which is referred to in the present applicationAre not described in detail;

with regard to the grid-side frequency converter control, the present application is elaborated as follows:

the grid side frequency converter is controlled to maintain the voltage of the direct current link capacitor at a preset constant value, and is irrelevant to the direction and the magnitude of the rotor power; and the reactive power generated by the whole wind generation set is controlled to be a set reference value according to the requirement of the whole wind generation set on the reactive power.

The positive direction of each electrical quantity of the grid-side inverter is defined as shown in fig. 8. u. u _dc Is a DC bus voltage, i _dcr For direct current to the rotor-side frequency converter, i _dcg For the direct current output from the network-side frequency converter, L _g Is a filter inductor, R _g Is the resistance of the filter inductor.

The voltage equation at any dq rotation coordinate is

Wherein omega _c And the variable including time t in the formula is a per unit value.

When d axis is oriented to grid voltage vector, u _gq =0, the active power injected into the grid-side frequency converter by the power grid is as follows:

thus by controlling i separately _gd ,i _gq Power P capable of being output to power grid _g ，Q _g The decoupling control of (1).

According to the instantaneous power theory, reactive power is only transmitted between three phases, and only active power influencing direct current voltage can pass through active current i _gd The dc voltage is controlled.

Adopting the conversion from ABC to dq of the transverse amplitude, neglecting the time P of the direct current side loss under the famous value _g ＝u _dc i _dcg ＝3/2u _gd i _gd Substituted into AC side currentVoltage and direct current voltage relation (m is modulation ratio):

to obtain

And a DC circuit

To obtain

Will i _dcr Considering the disturbance amount, the transfer function of the dc voltage controlled by the output active current can be expressed as:

similar to the control of the rotor-side frequency converter, the control of the network-side frequency converter is finally realized by controlling the voltage thereof, so that the relationship between the alternating voltage and the current of the frequency converter needs to be established. The AC side voltage u of the frequency converter can be obtained according to the formula (3.4) _gcd ,u _gcq The expression of (a) is:

the control of the grid-side frequency converter is shown in fig. 9.

Referring to fig. 10, fig. 10 is a schematic diagram illustrating a control principle of the TDWADC controller based on reinforcement learning.

Wherein, I _dref The reference value of the inner ring current d axis generated by the outer ring of the active power can control the wind power to output the active power.

Output inner loop current q-axis reference value I controlled by reactive power PI _qref To adjust the wind powerAnd outputting reactive power.

As shown in FIG. 9, the wide-area damping controller of the wind power generation system is designed in the voltage outer loop, i.e. the wide-area damping control signal is added into the voltage outer loop, so as to adjust the inner loop current q-axis reference value I _qref And wind power reactive power is output, so that the aim of inhibiting power oscillation of a fan grid-connected access point is fulfilled.

Finally, referring to fig. 11, fig. 11 is a schematic diagram of a 16-machine time-lag wind power generation system as an embodiment. For the unit, the relevant parameter settings in the application are shown in tables 1-3;

TABLE 1 reinforcement learning network parameter settings

TABLE 2 Fan parameter settings

TABLE 3 results of modal analysis of the machine system

As can be seen from table 3, the 16-machine system has 4 interval oscillation modes, with

modes

1 and 3 being sufficiently damped, and

modes

2 and 4 being insufficiently damped. Mode 4 has a higher frequency than mode 2 and a larger damping ratio. The TDWADC feedback signal selection is primarily for mode 2.

For the above mode, please see table 4 for the geometric observable/controllable results of the present application;

TABLE 4 results of geometric observability/controllability in different modes

As can be seen from table 4, compared to the six alternative signals, for mode 2, signal 5 is smaller than the integrated geometric controllability/observability of signal 1, but for the other three modes, the integrated geometric controllability/observability of signal 5 is the smallest, and the transmitted power signal P68-52 in the signal selection tie 68-52 is considered as the feedback signal of the TDWADC controller, because it has a larger integrated geometric controllability/observability for mode 2 and at the same time has a smaller influence corresponding to the other 3 modes, the TDWADC controller can better improve the damping of mode 2, while having the smallest damping influence on the other 3 modes.

Two different calculations were also made in this application, comparing the conventional PSS control and the reinforced learning based TDWADC control based on the time lags t =200ms and t =600ms, respectively, and the results are shown in fig. 12 and 13.

As can be seen from fig. 12 and 13, the RL-TDWADC (RL stands for reinforcement learning, and TDWADC stands for time-lag wide-area damping controller) adopted by the present application can better suppress low-frequency oscillation of the power system.

The invention has the beneficial effects that: the low-frequency oscillation of the power system can be effectively inhibited in time, on one hand, the safety and the stability of the power grid are improved, on the other hand, the power grid can absorb the electric energy generated by the wind power plant in time and on a large scale, and the economic and social benefits of a wind power generation enterprise are improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The above-described embodiments of the present invention should not be construed as limiting the scope of the present invention. Any other corresponding changes and modifications made according to the technical idea of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A time-lag wind power system wide-area damper control method based on reinforcement learning is characterized by comprising the following steps:

the input signal selection process of reinforcement learning control is specifically as follows: selecting the wide-area feedback power signals by adopting a modal geometric controllable/observable method, performing geometric controllable/observable analysis on the wide-area feedback power signals, and selecting one or more wide-area feedback power signals with the highest geometric observability corresponding to the interval oscillation model as input signals of the reinforcement learning controller;

2. The reinforcement learning-based time-lag wind power system wide-area damper control method as claimed in claim 1, wherein: the reinforcement learning control includes: a state converter, an Actor network and a Critic network; the principle of independent control is as follows:

subtracting the output quantity y (t) from a preset signal w (t) according to the actual situation to generate an error signal e (t); converting the error signal e (t) into an input state signal x (t) of the reinforcement learning network through a state converter; inputting the state signal x (t) into the Actor network to obtain the output signal u _n (t); inputting the state signal x (t) and the error reinforcement learning signal r (t) into a Critic network together to obtain an output signal n (t); output signal u _n (t) combining the control input signal u (t) with the control input signal n (t) to obtain a control input signal u (t) of the controlled photovoltaic power generation system; u (t) acts on a controlled photovoltaic power generation system to obtain an output signal y (t) to form closed-loop control; the Actor network and the Critic network also pass through a time sequence differential signal delta _TD And (t) updating the weight coefficients of the Actor network and the Critic network online.

3. The reinforcement learning-based time lag wind power system wide area damper control method of claim 2, characterized in that: and respectively finishing the strategy function of the Actor network and the value function of the criticic network by adopting two BP networks.

4. The reinforcement learning-based time lag wind power system wide area damper control method of claim 3, characterized in that:

the input of the Critic network is a state signal

x _c (t)＝[x ₁ (t),x ₂ (t)…,x _n (t),r(t)] ^T (1)，

The criticic network error function is shown as formula (2),

where λ is the discount coefficient, 0< λ <1;

r (t) is defined as:

wherein is a constant with ε > 0;

the transfer function of the hidden layer neuron of the Critic network adopts a bipolar sigmoid function, and is shown as the formula (4):

wherein N is _c To evaluate the number of network hidden layer neurons, q _i And p _i Input and output, ω, of the ith neuron of the hidden layer, respectively _c (1) And ω _c (2) Respectively representing the weights from the input layer to the hidden layer and from the hidden layer to the output layer;

the updating calculation of the Critic network weight value is as the following formula (6):

η _c (t) is the learning rate of the Critic network;

the gradient calculation from the hidden layer to the output layer is obtained according to a reverse gradient descent method, and is shown as the formula (7):

5. the reinforcement learning-based time-lag wind power system wide-area damper control method as claimed in claim 4, wherein: the input of the Actor network is as follows:

x _a (t)＝[x ₁ (t),x ₂ (t)…,x _n (t)] ^T (9)

the input and output of the neurons of the hidden layer and the output layer of the Actor network are as follows (10):

N _a to evaluate the number of network hidden layer neurons, h _i And g _i Input and output, ω, of the ith neuron of the hidden layer, respectively _a (1) And omega _a (2) Respectively representing the weights from the input layer to the hidden layer and from the hidden layer to the output layer;

the formula for updating the weights of the Actor network is shown in equation (11):

η _a (t) is the learning rate of the Actor network; the gradient calculation from the hidden layer to the output layer and from the input layer to the hidden layer is shown as equation (12) and equation (13):

wherein omega _nj And omega _j The weight coefficients of the Actor network and the Critic network are respectively.

The expression of the control input signal u (t) is as follows (14):

u(t)＝u ₁ (t)+η _m (0,ρ(t)) (14)