CN115296341A

CN115296341A - Power electronic transformer feed network self-adaptive stability controller based on reinforcement learning

Info

Publication number: CN115296341A
Application number: CN202211052731.1A
Authority: CN
Inventors: 邹志翔; 汤建; 刘星琪; 张益�; 徐若凯
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-11-04

Abstract

The invention discloses a power electronic transformer feed network adaptive stability controller based on reinforcement learning in the technical field of power system control, which is used for solving the wide frequency domain instability phenomenon in the small signal stability range of a power electronic transformer feed network system, comprises filter resonance and controller multi-time scale coupling instability phenomenon, optimizes the matching degree of complex frequency domain impedance of a power electronic transformer and power grid equivalent impedance, and improves the stability of the power electronic transformer feed network system, and the adaptive stability controller is integrated in a power electronic transformer low-voltage side converter. The self-adaptive stability controller provided by the invention can not only improve the stability of a feed network system, but also automatically adjust the parameters of a stability control strategy to improve the system stability and the robustness of the stability control strategy and flexibly apply to various occasions even if the system parameters of the feed network of the power electronic transformer are changed and the system instability phenomenon is different from the prior phenomenon after training.

Description

Power electronic transformer feed network self-adaptive stability controller based on reinforcement learning

Technical Field

The invention belongs to the technical field of power system control, and particularly relates to a power electronic transformer feed network adaptive stability controller based on reinforcement learning.

Background

At present, distributed renewable energy is connected to a power grid at high permeability, a traditional feed grid faces a plurality of challenges, especially the problem of broadband oscillation caused by the background of high proportion of renewable energy and high proportion of power electronic equipment (called double high for short), and harmonic waves generated by instability can be transmitted to other power grids through traditional transformers and power transmission lines to influence the operation of the whole power system. The power electronic transformer can control output voltage, improves the power quality of a system and the reliability and the bearing capacity of a power grid, and the introduction of the power electronic transformer into the traditional power grid to construct a novel feed grid system is an effective measure for solving the oscillation problem.

However, in the face of distributed renewable energy with a high proportion, the power electronic transformer feed network still has a risk of instability, and particularly when the number of low-voltage side connection converters of the power electronic transformer is increased and parameters of system elements are changed, the stability of the feed network system can be effectively improved by modifying a control strategy of the power electronic transformer, but the stability is greatly affected by selection of the stability control strategy and adjustment of the parameters, and in consideration of uncertainty of power and element parameters of the power electronic transformer feed network system and characteristics of system broadband oscillation, in the face of various stability problems caused by multi-time scale coupling of a 'double-high' power system, the robustness of a fixed stability control strategy and control parameters is greatly challenged.

Disclosure of Invention

In view of the defects in the prior art, the present invention aims to provide a power electronic transformer feeding network adaptive stability controller based on reinforcement learning, so as to solve the problems in the background art.

The purpose of the invention can be realized by the following technical scheme:

the self-adaptive stability controller based on reinforcement learning for the power electronic transformer feed network is used for solving the wide frequency domain instability phenomenon in the small signal stability range of a power electronic transformer feed network system, comprises filter resonance and controller multi-time scale coupling instability phenomenon, optimizes the matching degree of complex frequency domain impedance of a power electronic transformer and power network equivalent impedance, and improves the stability of a power electronic transformer feed network system, the self-adaptive stability controller is integrated in a power electronic transformer low-voltage side converter, and comprises a stability controller and a self-adaptive controller, and the self-adaptive controller adjusts the parameters of the stability controller;

the self-adaptive controller is trained by adopting a reinforcement learning algorithm, the training structure comprises a state observation module, a reinforcement learning algorithm module and a selection action strategy module, and the state observation module, the reinforcement learning algorithm module and the selection action strategy module form an intelligent body and are used for interactive training with a power electronic transformer feed network.

Preferably, the stabilizing controller is based on a complex frequency domain impedance remodeling principle, and the stabilizing controller reshapes the complex frequency domain impedance of the low-voltage side converter of the power electronic transformer.

Preferably, the stability controller control strategy comprises passive control, active damping, virtual impedance, digital filter and lead-lag compensation control strategies.

Preferably, the state observation module extracts state information of a power electronic transformer feeder grid system and gives an intelligent agent action reward according to the state information;

the reinforcement learning algorithm module is used for training according to the state information of the power electronic transformer feeder network system, the action taken by the intelligent agent and the reward obtained after the action is taken, and optimizing an action strategy;

the action strategy selection module selects action strategies according to the state information of the power electronic transformer feed network system, and then changes the parameters of the stability controller, so that the power electronic transformer feed network system is stabilized.

Preferably, the state information extracted by the state observation module is output voltage, output current, filter inductance current and power of a low-voltage side converter of the power electronic transformer, including effective value, average value and harmonic component thereof.

Preferably, the algorithms used in the reinforcement learning algorithm module include Q learning, deep reinforcement learning, DDPG, and Actor-Critic algorithms.

The reinforcement learning method of the self-adaptive stability controller of the power electronic transformer feed network based on reinforcement learning comprises the following steps:

firstly, initializing relevant quantities of a reinforcement learning algorithm, including a learning rate, a discount coefficient, relevant parameters of a neural network and a training convergence mark;

then, each screen of training is carried out, in the training of each screen, parameters of a power electronic transformer feed network system are initialized, and a state observation module acquires current state information S _t The action selection strategy module selects action a according to the current state information _t Updating parameters of the stable control strategy, and observing the current state S of the system by using a state observation module after the parameters change and the system state changes correspondingly _t+1 And is acquired in state S _t Take action a _t Is awarded R _t ；

Then, the reinforcement learning algorithm updates the strategy of selecting action according to the information, after the action strategy is updated, whether the action strategy can be ended is judged by judging whether the state reaches the target state or whether the action step number reaches the set maximum value, if the action strategy does not reach the end mark, the updated strategy module of selecting action according to the state S _t+1 Selecting action a _t+1 Repeating the previous process until the training is finished, and judging whether the convergence state of the intelligent agent and the training can be finished or not after the training of each screen is finished;

and finally, if the number of training screens reaches a set value or the intelligent agent reaches a convergence condition, the training can be terminated, otherwise, the parameters of the power electronic transformer feed network are reinitialized, each training process is repeated until the termination condition is reached, and a final optimized selection action strategy is obtained after the training is finished.

Preferably, after training is completed, the state observation module and the action selection strategy module are transplanted into the adaptive controller, when the power electronic transformer feed network system is unstable, the state observation module identifies the state of the power electronic transformer feed network system, and the action selection module adjusts parameters of the stable controller according to the current system state so as to stabilize the power electronic transformer feed network system.

The invention has the beneficial effects that:

1. the self-adaptive stability controller provided by the invention can not only improve the stability of a feed network system, but also automatically adjust the parameters of a stability control strategy to improve the robustness of the system stability and the stability control strategy even if the parameters of the feed network system of the power electronic transformer are changed and the instability phenomenon of the system is different from the prior phenomenon after training.

Drawings

In order to more clearly illustrate the embodiments or prior art solutions of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a block diagram of one application of the present invention;

FIG. 2 is a control block diagram of a low-side converter of a power electronic transformer according to the present invention;

FIG. 3 is a block diagram of adaptive stability controller training in accordance with the present invention;

FIG. 4 is a flow chart of the adaptive settling controller training in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims at the phenomenon of wide frequency domain instability (including and not limited to filter resonance and controller multi-time scale coupling instability) in the small signal stability scope of a power electronic transformer feed network system, and adds a stability controller in the control of a power electronic transformer low-voltage side converter to improve the stability of the feed network system. And the self-adaptability of parameter adjustment of the stable controller is realized by using a reinforcement learning algorithm. After training, the adaptive controller can automatically adjust the parameters of the stable controller according to the external input state information so as to maintain the stability of the feed network system under different working conditions and different scenes.

Controllers other than stability control typically include a voltage control stage and a current control stage, and the voltage and phase angle reference values may be set directly or generated by a phase lock mechanism based on power control. The voltage reference value is compared with the actual voltage value, then the voltage reference value is obtained through the voltage controller and the stability controller, the current reference value is compared with the actual value, then the current reference value is output through the current controller, and the output voltage of the power electronic transformer is adjusted.

The self-adaptive stability controller based on reinforcement learning for the power electronic transformer feed network is characterized in that the self-adaptive stability controller is added in the control of a low-voltage side converter of the power electronic transformer to reshape the equivalent complex frequency domain impedance of the converter, optimize the matching degree of the low-voltage side converter impedance with the low-voltage power network and improve the stability of a feed network system, the self-adaptive stability controller is integrated in a voltage controller of the low-voltage side converter of the power electronic transformer, in order to improve the robustness of a stability control strategy, the self-adaptive control of parameter adjustment of the self-adaptive stability controller is realized by using a reinforcement learning algorithm, and after training, the self-adaptive controller 9 can automatically adjust the parameters of the stability controller 7 according to the change of external input state information so as to maintain the stability of the feed network system under different working conditions and different scenes.

The self-adaptive stability control is realized by reinforcement learning training, and the training framework comprises a state observation module, a reinforcement learning algorithm module and a selection action strategy module. The three modules jointly form a reinforcement learning intelligent agent, and the reinforcement learning intelligent agent is interactively trained with a power electronic transformer feed network. And finally, the intelligent agent selects the optimal action function according to the system state through training. The state observation module extracts the current system state information and gives action rewards according to the current state information, and the reinforcement learning algorithm module trains and updates the action strategy selection module according to the system state, the action taken by the intelligent agent and the rewards obtained after the action is taken. The action selection strategy module selects a proper action according to the current state to change the parameters of the stable controller.

Fig. 1 is a structural diagram of an application of the present invention, which includes a medium-voltage side medium-voltage power grid 1 of a power electronic transformer, a power electronic transformer 2, a filter 3, a low-voltage alternating-current voltage 4 and a new energy power generation converter group 5 connected to a low-voltage side of the power electronic transformer;

the power electronic transformer 2 is formed by cascading an AC/DC rectifier 2-1, an isolated DC/DC converter 2-2 and a DC/AC inverter 2-3, the isolated DC/DC converter 2-2 can decouple the medium-voltage side and the low-voltage side of the power electronic transformer, and a capacitor is connected to the direct-current side of the power electronic transformer to reduce voltage ripples. The LC filter 3 is connected to the low-voltage side of the power electronic transformer and filters harmonic components;

the new energy power generation converter group 5 is connected behind the filter 3, the control can adopt current control, power control, voltage control, droop control and the like, the centralized control can be realized by adopting a stability control strategy on a low-voltage side converter of the power electronic transformer coupled with the new energy power generation converter group 5, a stability control module is not required to be added on all new energy converters, and functions related to an artificial intelligence algorithm in the new energy converters can also be realized by the low-voltage side of the power electronic transformer in most occasions.

The control strategy of the low-voltage side converter of the power electronic transformer is shown in fig. 2, and the control is a classic double-loop control strategy implemented under a rotating coordinate system (dq axis), and comprises a voltage controller 6 (VC), a current controller 8 (CC) and the like;

when no stable control strategy exists, a voltage actual value obtained through coordinate transformation and a reference value are compared and input into a voltage controller 6, a current reference value is directly output, the current reference value and a dq axis current actual value are compared and input into a current controller 8, the output quantity regulates the output voltage through PWM, the voltage controller 6 and the current controller 8 can adopt PI control to ensure no static tracking, the self-adaptive stable controller comprises a stable controller 7 and a self-adaptive controller 9, and the stable controller based on an impedance remodeling principle can adopt control strategies such as active damping, virtual impedance, a digital filter, lead-lag compensation and the like;

because the problem of wide frequency domain stability is frequently encountered in a 'double-high' power system, the stabilizing controller 7 is connected in series behind the voltage controller 6 to reshape the complex frequency domain impedance of the converter and improve the stability of the feed network system, the parameters of the stabilizing controller 7 are adjusted by the adaptive controller 9, the input quantity of the stabilizing controller is a certain state variable of the system, the input quantity is selected as a filter inductance current in fig. 2, and the output quantity is the parameters of the stabilizing controller, and through the adaptive controller 9, the system can realize the adaptive adjustment of the parameters of the stabilizing controller 7 according to the system state, thereby improving the robustness of a stabilizing control strategy. While an example of an application is shown and described, in practical applications, the stabilizing controller may also be integrated into the original control system in a parallel or series-parallel manner, and the access position is not limited to controlling the forward channel, and such changes and modifications fall within the scope of the claimed invention.

The control of the low-voltage side converter of the power electronic transformer may also be implemented in a two-phase stationary coordinate system (α β axis), and correspondingly, the voltage controller 6 and the current controller 8 adopt PR control in order to implement non-static control.

FIG. 3 is a training block diagram of the adaptive stability controller of the present invention, wherein the training employs reinforcement learning algorithm, and the application environment is the power electronic transformer feeding network system employing the adaptive stability controller for stability control.

The training framework comprises a state observation module, a reinforcement learning algorithm module and a selection action strategy module. The three modules jointly form a reinforcement learning intelligent agent, and the reinforcement learning intelligent agent is continuously interacted with a power electronic transformer feed network to train, so that the system can select the optimal action according to the state.

The state observation module extracts current system state information including but not limited to harmonic times, harmonic amplitudes and the like of current, voltage or power waveforms, and gives action rewards according to the current state information;

the reinforcement learning algorithm is trained according to the system state, the action taken by the intelligent agent and the reward obtained after the action is taken, the strategy of selecting the action is continuously updated, and the training algorithm can adopt Q learning, deep reinforcement learning, DDPG or Actor-Critic algorithm and the like;

the action selection strategy module selects a proper action according to the current state to change the parameters of the stable controller.

Fig. 4 is a flowchart of a training control method of a power electronic transformer feed network adaptive stability controller based on reinforcement learning, the training control method is as follows:

firstly, initializing reinforcement learning algorithm related quantities including a learning rate, a discount coefficient, neural network related parameters, a training convergence mark and the like; then, each scene of training is carried out, considering that system element parameters can change in actual operation, if it is necessary to randomly assign values to the element parameters according to actual conditions in the training process, the robustness of an adaptive control algorithm is increased, parameters of a power electronic transformer feed network system are initialized in the training process of each scene, and a state observation module obtains current state information S _t The select action policy module selects action a based on the current state information _t Updating parameters of the stable control strategy, and observing the current state S of the system by using a state observation module after the parameters change and the system state changes correspondingly _t+1 And is acquired in state S _t Take action a _t Is awarded R _t . Then, the reinforcement learning algorithm updates the selection according to the informationSelecting action strategy, updating the action strategy, judging whether the action can be finished by judging whether the state reaches the target state or whether the action step number reaches the set maximum value, if the action does not reach the finishing mark, the strategy module for selecting action after updating judges whether the action can be finished according to the state S _t+1 Selecting action a _t+1 And repeating the previous process until the training is finished, judging whether the convergence state of the intelligent body and the training can be finished or not after the training of each screen is finished, if the number of the training screens reaches a set value or the intelligent body reaches the convergence condition, stopping the training, otherwise, reinitializing the parameters of the power electronic transformer feed network, repeating each training process until the termination condition is reached, and finishing the training to obtain a final optimized selection action strategy.

After training is finished, the state observation module and the action selection strategy module are transplanted into the self-adaptive controller, when the power electronic transformer feed network system is unstable, the state observation module identifies the state of the power electronic transformer feed network system, and the action selection module can adjust parameters of the stable controller according to the current system state so as to stabilize the power electronic transformer feed network system.

In the description herein, references to the description of "one embodiment," "an example," "a specific example," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing shows and describes the general principles, principal features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims

1. The self-adaptive stability controller of the power electronic transformer feed network based on reinforcement learning is used for solving the wide frequency domain instability phenomenon in the small signal stability range of a power electronic transformer feed network system, comprises filter resonance and controller multi-time scale coupling instability phenomenon, optimizes the matching degree of complex frequency domain impedance of a power electronic transformer and power network equivalent impedance, and improves the stability of a power electronic transformer feed network system, and is characterized in that the self-adaptive stability controller is integrated in a power electronic transformer low-voltage side converter, comprises a stability controller and a self-adaptive controller, and adjusts the parameters of the stability controller;

2. The reinforcement learning-based power electronic transformer feed network adaptive stabilization controller of claim 1, wherein the stabilization controller is based on a complex frequency domain impedance remodeling principle, and the stabilization controller remodels a complex frequency domain impedance of a low-voltage side converter of the power electronic transformer.

3. The reinforcement learning-based power electronic transformer feed network adaptive stability controller of claim 1, wherein the stability controller control strategies include passive control, active damping, virtual impedance, digital filters, and lead-lag compensation control strategies.

4. The reinforcement learning-based power electronic transformer feed network adaptive stability controller according to claim 1, wherein the state observation module extracts state information of a power electronic transformer feed network system and gives an action reward to an agent according to the state information;

the action selection strategy module selects an action strategy according to the state information of the power electronic transformer feed network system, and then changes the parameters of the stability controller, so that the power electronic transformer feed network system is stabilized.

5. The adaptive stability controller based on reinforcement learning as claimed in claim 4, wherein the status information extracted by the status observation module is output voltage, output current, filter inductance current and power of the low-voltage side converter of the power electronic transformer, including effective value, average value, harmonic component, etc.

6. The adaptive stability controller based on reinforcement learning of claim 4, wherein the algorithms used in the reinforcement learning algorithm module comprise Q learning, deep reinforcement learning, DDPG and Actor-criticc algorithms.

7. A reinforcement learning method for a power electronic transformer feed network adaptive stability controller based on reinforcement learning according to any one of claims 1-6, characterized in that the reinforcement learning method is as follows:

firstly, initializing reinforcement learning algorithm related quantities including a learning rate, a discount coefficient, neural network related parameters and a training convergence mark;

then, each screen of training is carried out, in the training of each screen, parameters of a power electronic transformer feed network system are initialized, and a state observation module acquires current state information S _t The action selection strategy module selects action a according to the current state information _t Updating parameters of the stable control strategy, and after the parameters are changed, the system state can be correspondingly changed and the state can be utilizedThe observation module observes the present state S of the system _t+1 And is acquired in state S _t Take action a _t Is awarded R _t ；

Then, the reinforcement learning algorithm updates the strategy of selecting action according to the information, and judges whether the action strategy can be ended or not by judging whether the state reaches the target state or whether the action step number reaches the set maximum value after the action strategy is updated, if the action strategy does not reach the end mark, the module of selecting action strategy after updating is based on the state S _t+1 Selecting action a _t+1 Repeating the previous process until the training of the screen is finished, and judging whether the convergence state of the intelligent agent and the training can be finished or not after the training of each screen is finished;

8. The reinforcement learning method for the power electronic transformer feed network adaptive stability controller based on reinforcement learning of claim 7, wherein after training is completed, a state observation module and a selection action strategy module are transplanted into the adaptive controller, when the power electronic transformer feed network system is unstable, the state observation module identifies the system state of the power electronic transformer feed network, and the selection action module adjusts parameters of the stability controller according to the current system state to stabilize the power electronic transformer feed network system.