CN110880774B

CN110880774B - Self-adaptive adjustment inverter controller

Info

Publication number: CN110880774B
Application number: CN201911165356.XA
Authority: CN
Inventors: 魏俊; 叶圣永; 张玉鸿; 韩宇奇; 刘旭娜; 张文涛; 赵达维; 李达; 吕学海; 陈博
Original assignee: Economic and Technological Research Institute of State Grid Sichuan Electric Power Co Ltd
Current assignee: Economic and Technological Research Institute of State Grid Sichuan Electric Power Co Ltd
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2021-01-05
Anticipated expiration: 2039-11-25
Also published as: CN110880774A

Abstract

The invention discloses a self-adaptive adjustment inverter controller which comprises a dq conversion module, an output active reactive power and end voltage effective value calculation module, a modulation wave signal amplitude calculation module, a simulation rotor motion equation module, a reinforcement learning control module and a dq inverse conversion and PWM (pulse width modulation) module.

Description

Self-adaptive adjustment inverter controller

Technical Field

The invention relates to the technical field of power electronic inverters, in particular to a self-adaptive adjusting inverter controller.

Background

In view of the pressure of energy and environmental protection, more and more renewable energy sources are connected to the power grid through power electronic power generation equipment. As an electric energy conversion device capable of converting direct current into alternating current, an inverter is widely applied in the fields of wind power, energy storage, photovoltaic and the like. The earliest inverter control strategy adopted a dual-layer control structure, i.e., the inner layer was a current loop and the outer layer was a power loop or voltage loop. However, the response speed of the control strategy is fast, which is not favorable for the frequency stability of the power system, and the output power of the inverter can not be adjusted adaptively according to the voltage and frequency conditions of the system. A droop control strategy for the inverter then occurs. Although the control strategy can automatically adjust the output active power and reactive power according to the frequency and voltage of the system, and is beneficial to the frequency and voltage control of the system, the control strategy still has the defect of high response speed. In view of the beneficial effect of the rotational inertia of the rotor of the synchronous generator in the frequency stabilization of the power system, a Virtual Synchronous Generator (VSG) control technology has been proposed, and the core idea is to simulate the rotor motion equation of the synchronous generator in the control strategy of the inverter. However, the technology also inherits the characteristic that the rotor of the synchronous generator is easy to generate low-frequency oscillation while reducing the response speed and increasing the inertia of a power system. In order to solve the problem, a series of control strategies are proposed by starting from the moment of inertia J and the damping coefficient D. A small signal model of single inverter grid connection containing VSG is built, and proper moment of inertia J and damping coefficient D are selected by means of root track, so that stability is guaranteed, and good dynamic characteristics are achieved. However, the controller designed by the linearization method is difficult to adapt to the complex operation condition of the inverter, and once the operation condition is different from the initial design condition, there is a risk that instability or deterioration of dynamic characteristics may occur. A series of control measures from the adaptive control point of view have also been proposed, such as a VSG control algorithm that adjusts J according to the rate of change D ω/dt of the VSG virtual angular frequency, or parameters of J and D according to the condition of the output power P and its rate of change dP/dt. However, the adaptive control strategy also has the problem of selecting the parameters of the controller. In practice, these controller parameters are often selected by a simulation trial and error method, which is difficult to cover all possible operating states of the inverter, and also cannot ensure that the designed controller can be stable in a complex operating environment.

In recent years, the application of artificial intelligence related technology in power systems is becoming widespread. In the field of power equipment control, Reinforcement Learning (RL) technology is attracting attention. RL can be considered as an important machine learning method in the field of artificial intelligence, and can also be considered as an independent branch of Markov Decision Process (MDP) and dynamic optimization methods. In the reinforcement learning method, a reinforcement learning control module is often regarded as an agent, which does not need any prior knowledge about the environment, and gradually obtains an optimized mapping strategy from a state to a behavior by exploring the action of the self controller and the obtained reward and continuously updating and iterating. Namely, the reinforcement learning mainly carries out behavior decision through actions, states and rewards of continuous interaction between the intelligent agent and the environment. At present, the Power equipment controller designed based on RL algorithm includes a dc additive damping controller, a dynamic quadrature booster, a Static Var Compensator (SVC), a Power System Stabilizer (PSS), and the like. Research shows that the controller designed by the RL algorithm can effectively give consideration to the optimization control of a mixed target of stability and dynamic performance, shows good environmental adaptability, and is more suitable for a system with multiple uncertain factors and large disturbance, such as a power system. If a reinforcement learning module is introduced into the controller of the inverter to evaluate the oscillation states of the inverter excited by system disturbance and with different J and D, the parameters of the controller can be adaptively adjusted through online learning and training, so that the inverter function can be realized while a better oscillation suppression effect is obtained. Therefore, with the further improvement of the permeability of renewable energy, when a large proportion of power electronic devices with various control strategies coexisting and complex operation conditions exist in a power system, how to realize the control of the inverter by using the RL algorithm becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a self-adaptive adjustment inverter controller, which utilizes a reinforcement learning algorithm and introduces an online learning and optimization mechanism to realize self-adaptive adjustment of virtual moment of inertia and a virtual damping coefficient of an inverter so as to adapt to the operation control of the inverter in the easy oscillation environment of a high-proportion renewable energy grid-connected power electronic system.

The invention is realized by the following technical scheme:

a self-adaptive adjustment inverter controller comprises a dq transformation module, an output active reactive power and end voltage effective value calculation module, a modulation wave signal amplitude calculation module, a simulation rotor motion equation module, a reinforcement learning control module and a dq inverse transformation and PWM (pulse Width modulation) modulation module, wherein the inverter controller simulates a synchronous generator rotor motion equation and adjusts virtual inertia and a damping coefficient on line through the reinforcement learning control module so as to obtain a good low-frequency oscillation suppression effect of an electric power system.

In the adjusting scheme of two virtual coefficients of the inverters J and D based on the virtual synchronous generator control strategy, the adjusting idea of an optimization strategy through online learning based on a reinforcement learning algorithm is adopted, and J and D are not adaptively adjusted through the differential of the output power P of the inverter and the virtual angular frequency omega and the differential in the existing method. The reinforcement learning algorithm simulates human intelligence to a certain extent, so the control strategy is favorable for the inverter to adapt to more complex power system operation conditions and oscillation environments

The invention is particularly suitable for the inverter controller which operates in an easy oscillation environment. The controller is added with the function of strengthening learning and optimizing and adjusting the parameters of the controller on line on the basis of the control technology of the virtual synchronous generator.

Furthermore, the dq conversion module is used for converting the three-phase voltage e of the capacitor on the LC filter of the inverter_a、e_b、e_cAnd three-phase current i flowing through the inductor_a、i_b、i_cDecomposing the data into a synchronous rotation coordinate system of the inverter to obtain a dq axis component u thereof_d、u_qAnd i_d、i_q；

The dq inverse transformation and PWM module is used for carrying out dq inverse transformation on the calculation results of the modulation wave signal amplitude calculation module and the simulation rotor motion equation module to obtain a modulation wave signal u_a、u_b、u_cAnd generating a PWM control signal according to a PWM algorithm to drive a three-phase inverter bridge so as to realize an inverter function.

Further, the calculation formula of the output active and reactive power and terminal voltage effective value calculation module is shown in formula (1):

in the formula, P is active power, Q is reactive power, U is output voltage amplitude, and s is a laplace operator.

Further, the modulation wave signal amplitude calculation module is used for calculating the modulation wave amplitude Eq by using the formula:

E_q＝∫K_e[(U_ref-U)-n(Q-Q_ref)]d(t) (2)

in the formula, Uref is a set value of the inverter terminal voltage; q_refIs a set value of the reactive power of the inverter; ke is amplification gain, and n is droop coefficient of the reactive voltage link; eq is a q-axis component in the inverter modulated wave signal, and both the 0-axis and d-axis components of the modulated wave signal are 0.

Further, the analog rotor motion equation module is used for calculating the virtual angular frequency ω of the inverter and the phase angle of the modulation wave, and the calculation formula is shown in formulas (3) and (4):

＝∫ωd(t) (4)

in the formula, J is the virtual rotational inertia of the inverter; m is an active droop coefficient; pref is a set value of the active power output by the inverter; omega₀The rated angular frequency of the power system where the inverter is located; d is the virtual damping coefficient of the inverter; the wave phase is modulated for the inverter.

Further, the reinforcement learning control module is used for adjusting the virtual moment of inertia and the damping coefficient on line, and comprises the following steps:

step 1: determining a control action set a, a state set s, a reward function R, a learning time step and a Q value updating strategy;

step 2: establishing a mathematical model of the inverter, and obtaining a controller parameter after preliminary optimization by utilizing offline pre-learning of a Q learning algorithm;

and step 3: and putting the inverter into online operation, continuously utilizing a Q learning algorithm to learn online in the operation process, and further updating an action strategy to adapt to the complex operation environment of the power grid.

Further, the expression of the control action set a in step 1 is as follows:

a∈{(J，D),|J∈[J_min,J_max],D∈[D_min,D_max]} (5)

in the formula, J_minAnd J_maxA set of predetermined ranges of virtual moments of inertia; d_minAnd D_maxA set of predetermined ranges of virtual rotational damping coefficients;

the expression of the state set s in step 1 is as follows:

in the formula, Δ P is the difference between the output active power of the inverter at a certain time and a set value; Δ ω is a deviation of the virtual synchronous generator rotational speed at the time, and ([ delta ] P, [ delta ] ω) represents a deviation combination of the inverter output power P and the virtual angular frequency ω at the time.

Further, the expression of the reward function R in step 1 is as follows:

in the formula, R_iThe prize value obtained for the ith iteration; t is a certain time; t + step is the time after one learning time step; a is_P、a_ωRespectively are weight coefficients of the active power difference and the rotating speed deviation in the evaluation index;

the expression of the Q value updating strategy in the step 1 is as follows:

in the formula, s_i，a_iState and action representing the ith iteration; q(s)_i,a_i) For the ith iteration s_i，a_iThe corresponding Q value; α is a learning factor indicating the degree of confidence to be given to the improved update portion; gamma is a discount factor and determines how far and near the time affects the return.

Further, the Q learning algorithm in step 2 is a cyclic iterative process, and obtains the preliminarily optimized controller parameters through offline learning, including the following steps:

s11: initializing all parameters of reinforcement learning, namely setting a discount factor gamma, a learning factor alpha, the probability of a greedy strategy, a Q table matrix convergence setting value Q and initializing a Q table as a zero matrix;

s12: observing the state s at the current moment_i；

S13: according to the current state s_iOne action a is selected from the action set represented by the formula (5) according to the policy pi_i；

S14: after the action is performed, the state s at the next moment is observed_i+1；

S15: the reward value R under the current action is calculated by the reward function formula (7)_i；

S16: updating the Q table matrix by equation (8);

s17: i ═ i + 1. And judging whether the algorithm convergence condition is met, if not, returning to S12, otherwise, ending the off-line learning.

Further, in step 3, the Q learning algorithm is continuously used for online learning and further updating the action strategy in the operation process, and the Q learning algorithm is used in the following steps:

s21: observing the current state s_i；

S22: according to the current state s_iOne action a is selected from the action set represented by the formula (5) according to the policy pi_i；

S23: after the action is performed, the state s at the next moment is observed_i+1；

S24: obtaining current actions through reward functionsLower reward value R_i；

S25: updating the Q table matrix by equation (8);

s26: i +1, the process returns to step S21.

Further, in the iteration step of the Q learning algorithm, the strategy pi is a greedy strategy, namely, a selected action a_iThere is a probability that one of the actions in the action set represented by the formula (5) is arbitrarily selected as a_iThere is a probability of 1-depending on the current state s from the set of actions represented by equation (5)_iSelecting the action with the maximum Q value as a_i。

Compared with the prior art, the invention has the following advantages and beneficial effects:

in the adjusting scheme of two virtual coefficients of the inverters J and D based on the virtual synchronous generator control strategy, the adjusting idea of an optimization strategy through online learning based on a reinforcement learning algorithm is adopted, and J and D are not adaptively adjusted through the differential of the output power P of the inverter and the virtual angular frequency omega and the differential in the existing method. The reinforcement learning algorithm simulates human intelligence to a certain extent, so that the control strategy is favorable for the inverter to adapt to more complex power system operation conditions and oscillation environments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a main circuit topology of an inverter controller of the present invention and a control block diagram thereof;

fig. 2 is a control block diagram of the inverter controller of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example (b):

as shown in fig. 2, the adaptive regulation inverter controller of the present invention includes a dq conversion module, an output active and reactive and terminal voltage effective value calculation module, a modulated wave signal amplitude calculation module, an analog rotor motion equation module, a reinforcement learning control module, and a dq inverse conversion and PWM modulation module:

a dq transformation module: three-phase voltage e for connecting capacitor on inverter LC filter_a、e_b、e_cAnd three-phase current i flowing through the inductor_a、i_b、i_cDecomposing the data into a synchronous rotation coordinate system of the inverter to obtain a dq axis component u thereof_d、u_qAnd i_d、i_q。

The output active power and end voltage effective value calculation module: the calculation formula is shown as formula (1):

Modulation wave signal amplitude calculation module: the formula used for calculating the amplitude Eq of the modulated wave is as follows:

E_q＝∫K_e[(U_ref-U)-n(Q-Q_ref)]d(t) (2)

A simulation rotor motion equation module: obtaining the virtual angular frequency omega of the inverter and the phase angle of the modulation wave according to the formulas (3) and (4), wherein the formula is as follows:

＝∫ωd(t) (4)

A reinforcement learning control module: the method comprises the following steps:

step 1: and determining a control action set a, a state set s, a reward function R, a step of learning time each time and a Q value updating strategy. In conjunction with the inverter control strategy, the action set a here is:

a∈{(J，D),|J∈[J_min,J_max],D∈[D_min,D_max]} (5)

in the formula, J_minAnd J_maxA set of predetermined ranges of virtual moments of inertia; d_minAnd D_maxA set of predetermined ranges of virtual rotational damping coefficients; by means of discretization, { (J, D) } represents the set of combinations within the two parameter adjustment ranges, representing possible candidate combinations of J and D.

The state set s is:

in the formula, Δ P is the difference between the output active power of the inverter at a certain time and a set value; Δ ω is the rotational speed deviation of the virtual synchronous generator at the moment, (Δp,. DELTA.ω) represents the deviation combination of the output power P and the virtual angular frequency ω of the inverter at the moment, and the state space of Δ P and Δ ω is divided into 6 different states respectively to reflect the oscillation condition of the inverter in the dynamic process.

The reward function R is:

in the formula, R_iThe prize value obtained for the ith iteration; t is a certain time; t + step is the elapsed learning time step step later; a is_P、a_ωThe weight coefficients of the active power difference and the rotating speed deviation which account for the evaluation indexes are respectively. Due to the discrete control of the inverter in the above implementation, the reward value in each step time period needs to be calculated according to equation (8):

in the formula, j-1 represents a time at which t-0, and N-step/Δ t represents a value obtained by rounding step time periods at time intervals of Δ t, and thus represents a t + step time. The time interval of Δ t is equal to the time interval of the inverter controller sampling and control. For medium and low power inverters Δ t is typically 1/5000 seconds.

Step 2: and establishing a mathematical model of the inverter, and obtaining the preliminarily optimized controller parameters by utilizing offline pre-learning of a Q learning algorithm. The Q learning algorithm is a cyclic iterative process, and the controller parameters after preliminary optimization are obtained through off-line learning. The whole process is divided into 7 steps from S11 to S17:

s12: observing the state s at the current moment_i；

S16: updating the Q table matrix by equation (8);

s17: i ═ i + 1. Judging whether the algorithm convergence condition is met, if not, returning to the step S12, otherwise, ending off-line learning;

the iterative convergence condition of the algorithm is that the 2 norm of the Q table matrix is smaller than a set value Q.

And step 3: and putting the inverter into online operation, continuously utilizing a Q learning algorithm to learn online in the operation process, and further updating an action strategy to adapt to the complex operation environment of the power grid. The steps of the Q learning algorithm are a loop iteration process, and the whole process is divided into 6 steps from S21 to S26:

s21: observing the current state s_i；

S24: obtaining the reward value R under the current action through the reward function_i；

S25: updating the Q table matrix by equation (8);

s26: i +1, the process returns to step S21.

The strategy pi involved in the iteration step of the Q learning algorithm in the reinforcement learning steps 2 and 3 above is a greedy strategy, i.e. one action a is selected_iThere is a probability that one of the actions in the action set represented by the formula (5) is arbitrarily selected as a_iThere is a probability of 1-depending on the current state s from the set of actions represented by equation (5)_iSelecting the action with the maximum Q value as a_i。

dq inverse transformation and PWM modulation module: for placing Eq calculated by formula (2) on q axis, supplementing signals on d axis and 0 axis to 0, and obtaining modulated wave signal u after dq inverse transformation by using sum omega of formula (3) and formula (4)_a、u_b、u_cAnd generating a PWM control signal according to a PWM algorithm to drive a three-phase inverter bridge so as to realize an inverter function.

Specific parameters of an inverter controller participating in the suppression of low-frequency oscillations of the power system are given below as an example. The rated voltage of the three-phase inverter is 380V, the rated frequency is 50Hz, and the rated power is 100 kW. Correspondingly, rated voltage U of DC end of inverter_dc750V. The power electronic device of the three-phase full-bridge inverter circuit is IGBT (insulated gate bipolar transistor)d Gate Bipolar Transistor) model number English flying F150R12RT 4. The PWM carrier frequency is 5 kHz. The inductance L of the LC filter is 2mH, the capacitance is 13 muF, and the model C67S1136-002700 is selected. The energy storage capacitors on the direct current bus are Hitachi capacitors with capacity values of 2200 mu F and withstand voltage of 450V, 6 capacitors are connected in series every two capacitors, and the total capacity value is 3300 mu F. The current sensor is selected from the group consisting of HAS150-S (LEM).

The parameters in the inverter controller are set to Uref ═ 1.00, ω₀1.00, Pref 0.85, Qref 0.00, these parameters are per unit. The droop coefficient of the virtual synchronous generator is selected to be m-n-0.1, Ke-100, J₀1.0 and 1.00. The relevant parameters of reinforcement learning are selected as follows: j. the design is a square_min＝0.01，J_max＝5.01，D_min＝0.01，D_max5.01. When discretization is used to determine the action set { J, D }, Δ J is 0.2 and Δ D is 0.2, i.e., the action set { J, D } { (0.01 ), (0.01, 0.21), (0.21, 0.01), … …, (5.01 ) }, 676 combinations. step is taken for 0.1 s. a is_P、a_ωThe weighting coefficients are 1 and 314, respectively. The discount factor γ is 0.7. Learning factor a is 0.5. Greedy policy 0.1. The 2 norm Q of the Q table matrix iteration convergence condition is 0.1.

Fig. 1 is a main circuit topology and a control block diagram of an inverter controller according to the present invention, which includes modules such as a power circuit, a current and voltage measuring unit, a controller, and a driving circuit. The following describes the structure and function of each part in detail with reference to fig. 1:

1) the power circuit part: the circuit comprises an inverse direct current side, a three-phase full-bridge inverter circuit, an LC filter, a closing switch KM and the like, and is mainly used for electric energy transmission. In FIG. 1, r_fAnd L_fThe equivalent inductance and resistance of the inductor L portion of the inverter LC filter is usually so small compared to the inductance that it is negligible. C_fIs the capacitor C in the inverter LC filter. r and L are the equivalent resistance and inductance of the inverter to the transmission line of the connected grid. Udc is the direct current side, such as a battery or the like. The three-phase full-bridge inverter circuit comprises 6 full-control power electronic devices, and is switched on or off under the control of a voltage signal output by the drive circuit, so that the three-phase full-bridge inverter circuit is realizedAnd (4) inversion function.

2) Current and voltage measuring unit part: this section is mainly voltage and current sensors that enable measurement of the inverter port voltage and output current. Wherein the three-phase voltage measured on the capacitor C in the LC filter is represented by e_a、e_b、e_cThe current flowing through the inductor L is measured as i_a、i_b、i_c。

3) A controller section: this part is mainly to implement the control function, in particular the three-phase voltage e to be measured_a、e_b、 e_cAnd a current signal i_a、i_b、i_cAnd the control signal is sent to the control module, and the control signal is generated and output according to the control strategy and is used for controlling the on-off of the power electronic device by the driving circuit. In the present invention, the specific control strategy of the controller can be seen in fig. 2.

4) A drive circuit section: this part is mainly to control the on/off of the power electronics according to the PWM signal.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A self-adaptive adjustment inverter controller is characterized by comprising a dq conversion module, an output active reactive power and end voltage effective value calculation module, a modulation wave signal amplitude calculation module, a simulation rotor motion equation module, a reinforcement learning control module and a dq inverse conversion and PWM modulation module, wherein the inverter controller simulates a synchronous generator rotor motion equation and adjusts virtual rotational inertia and a damping coefficient on line through the reinforcement learning control module so as to obtain a better low-frequency oscillation suppression effect of an electric power system;

the reinforcement learning control module is used for adjusting the virtual moment of inertia and the damping coefficient on line, and comprises the following steps:

2. The adaptive modulation inverter controller of claim 1, wherein the dq conversion module is configured to convert a three-phase voltage e of a capacitor on an LC filter of the inverter into a three-phase voltage e_a、e_b、e_cAnd three-phase current i flowing through the inductor_a、i_b、i_cDecomposing the data into a synchronous rotation coordinate system of the inverter to obtain a dq axis component u thereof_d、u_qAnd i_d、i_q；

3. The adaptive-adjustment inverter controller according to claim 1, wherein a calculation formula of the output active reactive power and terminal voltage effective value calculation module is as shown in formula (1):

in the formula, P is active power, Q is reactive power, U is an output voltage amplitude, and s is a Laplace operator;

the modulation wave signal amplitude calculation module is used for calculating a modulation wave amplitude Eq, and the formula is as follows:

E_q＝∫K_e[(U_ref-U)-n(Q-Q_ref)]d(t) (2)

4. The adaptive modulation inverter controller according to claim 1, wherein the analog rotor motion equation module is configured to calculate an angular frequency ω of the inverter virtual and a phase angle of the modulated wave, and the calculation formula is as shown in equations (3) and (4):

＝∫ωd(t) (4)

5. The adaptive-tuning inverter controller according to claim 1, wherein the expression of the control action set a in step 1 is as follows:

a∈{(J，D),|J∈[J_min,J_max],D∈[D_min,D_max]} (5)

the expression of the state set s in step 1 is as follows:

6. The adaptive-tuning inverter controller according to claim 1, wherein the reward function R in step 1 is expressed as follows:

R_i＝-∫_t ^t+step(a_P|ΔP|+a_ω|Δω|)dt (7)

the expression of the Q value updating strategy in the step 1 is as follows:

7. The adaptive-adjustment inverter controller according to claim 6, wherein the Q learning algorithm in step 2 is a cyclic iterative process, and the controller parameters after preliminary optimization are obtained through offline learning, and the adaptive-adjustment inverter controller comprises the following steps:

s12: observing the state s at the current moment_i；

S16: updating the Q table matrix by equation (8);

s17: and if the algorithm convergence condition is not met, returning to S12, and otherwise, ending the off-line learning.

8. The adaptive-adjustment inverter controller according to claim 6, wherein the step 3 is to continue to use the Q learning algorithm to learn online and further update the action strategy during the operation process, and the Q learning algorithm is used in the following steps:

s21: observing the current state s_i；

S25: updating the Q table matrix by equation (8);

s26: i +1, the process returns to step S21.

9. The adaptive-tuning inverter controller according to claim 7 or 8, wherein the strategy pi is a greedy strategy.