CN101667012A

CN101667012A - Method for controlling reinforcement learning adaptive proportion integration differentiation-based distribution static synchronous compensator

Info

Publication number: CN101667012A
Application number: CN 200810051135
Authority: CN
Inventors: 孟祥萍; 谭万禹; 孙贵新; 纪秀
Original assignee: Electric Power Research Institute Of Jilin Electric Power Co; Changchun Institute Technology
Current assignee: Electric Power Research Institute Of Jilin Electric Power Co; Changchun Institute of Applied Chemistry of CAS; Changchun Institute Technology
Priority date: 2008-09-03
Filing date: 2008-09-03
Publication date: 2010-03-10

Abstract

The invention discloses a method for controlling a reinforcement learning adaptive proportion integration differentiation-based distribution static synchronous compensator, which comprises the following steps of: deducing a current-voltage conversion equation in a d-q coordinate system according to a instantaneous power balance formula; respectively regulating an error between a given voltage value and a practical measured value and an error between a DC side capacitor voltage command value and a practical measured value by a reinforcement learning adaptive PID control algorithm in the controlrealizing process so as to obtain an active command signal and a passive command signal; performing voltage-current conversion in a mathematical model to obtain a voltage signal; and carrying out coordinate transformation to obtain a required voltage modulating signal. The method uses the reinforcement learning adaptive PID control algorithm to prevent a condition of unstable performance of a controller caused by the variation of an equivalent parameter in the conventional PID control, realizes the adaptive capability of the controller well and improves the control accuracy.

Description

Based on reinforcement learning adaptive proportion integration differentiation distribution static synchronous compensator control method

Technical field

The present invention relates to a kind of control method of distribution static synchronous compensator, the control method of particularly a kind of distribution static synchronous compensator (DSTATCOM) based on reinforcement learning adaptive PID.

Background technology

Along with science and technology development, China's power system development is very fast, and in industry and household electricity load, inductive load occupies very big ratio, and the necessary absorbing reactive power ability of inductive load operate as normal, a large amount of harmonic currents that these loads simultaneously produce are also wanted consume reactive power.The consumption of a large amount of reactive powers has caused a series of power quality problems such as voltage fluctuation, flickering and three-phase imbalance of power distribution network.In distribution system, there is simultaneously a large amount of quick impact loads,, can causes voltage flicker, cause the unbalancedness of system power and voltage as the electric arc furnaces load.On the other hand, fast development along with Chinese national economy and scientific and technological level, all trades and professions are more and more higher to the requirement of the quality of power supply, particularly along with the widespread use of various electronic installations and precision equipment, make the user wish that power supply enterprise can provide the electric energy of high-efficiency high-quality.In case power quality problem occurs, gently then cause equipment failure, heavy then cause the damage of total system, the loss that brings thus is difficult to estimate.So power quality problem has been related to safe, stable, economy, the reliability service of whole electric system and equipment, and be related to the overall benefit and the development strategy of whole national economy.When electric system pressed for advanced power transmission and distribution technology and improves the quality of power supply and system stability, along with the fast development of Power Electronic Technique and modern control technology, a kind of new technology that changes the power transmission and distribution ability---system (FACTS) quietly rose.Distribution static synchronous compensator (DSTATCOM) has been represented the development trend of Future Power System reactive power compensator like this, utilize Power Electronic Technique and modern control technology combine can be comprehensive the solution power distribution network in multiple power quality problem.The conventional controller design of power distribution network STATCOM is based on its local linearization model, because the uncertain feature of the non-linear and equivalent parameters of DSTATCOM model, very difficulty is complicated to make its control.Using at present more is traditional PID control, and when adopting PID control, when equivalent parameters was measured inaccurate or change, it is even unstable that the performance of controller can reduce, and the more serious maloperation that can occur controlling burns out the DSTATCOM device.So realize the adaptive ability of DSTATCOM controller, have great importance.

Summary of the invention

Technical matters to be solved by this invention provides a kind of distribution static synchronous compensator control method based on reinforcement learning adaptive PID with dynamically good and static properties, to realize the adaptive ability of DSTATCOM controller.

For solving the problems of the technologies described above, the invention provides a kind of based on reinforcement learning adaptive proportion integration differentiation distribution static synchronous compensator control method, according to the instantaneous power equilibrium principle, list the distribution static synchronous compensator mathematical model, and it is transformed into the dq0 coordinate system by transformed matrix from rest frame, drawing the distribution static synchronous compensator system is the coupling nonlinear system of typical two inputs, two outputs, it is characterized in that: the error of voltage instruction value and actual measured value is formed idle instruction current signal after reinforcement learning adaptive proportion integration differentiation is regulated; The error of dc capacitor voltage command value and actual measured value forms meritorious instruction current signal after reinforcement learning adaptive proportion integration differentiation is regulated, after the relation transformation of idle instruction current signal and meritorious instruction current signal voltage and electric current in mathematical model, form idle command voltage signal and meritorious command voltage signal, idle command voltage signal and meritorious command voltage signal after the dq/abc coordinate transform as modulation signal, after the triangular carrier modulation, produce the action that the pulse-width modulation PWM drive signal goes to control Intelligent Power Module, produce the voltage that needs compensation, thereby kept the constant of dc capacitor voltage and points of common connection PCC voltage.

The present invention introduces the reinforcement learning adaptive pid control algorithm in the control method of DSTATCOM.Wherein in control procedure, pass through the intensified learning algorithm to K _P, K _IAnd K _DTrain and learn, and in study, add the decoupling zero requirement, make control system to regulate K automatically according to the variation of model parameter _P, K _IAnd K _DValue, make system reach satisfied control result.

DSTATCOM control method based on reinforcement learning adaptive PID of the present invention has been kept the constant of dc capacitor voltage and system node voltage, has realized effective reactive power compensation.This control algolithm is derived current-voltage model under the dq0 coordinate according to the instantaneous power equilibrium principle, and proposed to be suitable for the reinforcement learning adaptive pid control algorithm of this control system, when this control method has been avoided in the traditional PID control equivalent parameters change, the unsettled situation of controller performance, well realize the adaptive ability of controller, improved the degree of accuracy of control.

Description of drawings

Fig. 1 is DSTATCOM main circuit structure figure of the present invention;

Fig. 2 is a control system schematic diagram of the present invention;

Fig. 3 is the control algolithm process flow diagram of reinforcement learning adaptive PID.

Embodiment

With reference to Fig. 1, u represents the three-phase voltage of electrical network; E and i then represent three-phase output voltage and the electric current of DSTATCOM respectively; Resistance R and inductance L respectively the indication device loss with line reactance and be connected the transformer leakage reactance.The supposing the system three-phase voltage is:

u = [\begin{matrix} u_{a} \\ u_{b} \\ u_{c} \end{matrix}] = \sqrt{2} V_{S} [\begin{matrix} \sin ωt \\ \sin (ωt - 2 π / 3) \\ \sin (ωt + 2 π / 3) \end{matrix}] - - - (1)

Suppose that the DSTATCOM output voltage is:

e = [\begin{matrix} e_{a} \\ e_{b} \\ e_{c} \end{matrix}] = {Ku}_{dc} [\begin{matrix} \sin (ωt - δ) \\ \sin (ωt - 2 π / 3 - δ) \\ \sin (ωt + 2 π - δ) \end{matrix}] - - - (2)

In the formula (2), K is the no-load voltage ratio coefficient, and δ is DSTATCOM output voltage u _cWith system voltage u _sAngle is controlled amounts.

It is as follows to obtain in the dq0 coordinate system current-voltage conversion formula according to the instantaneous power balanced type:

According to the power-balance principle, the DSTATCOM output power should equal the power of injected system and the power sum that equivalent resistance, reactance consume, that is:

P _e＝P _o+P _f (3)

Q _e＝Q _o+Q _f (4)

Select the d axle of synchronous rotating frame to overlap, can get with PCC access point voltage vector:

u _d＝u u _q＝0 (5)

(5) formula substitution power-balance formula is drawn:

e _d＝i _dR-i _qωL+u

(6)

e _q＝i _qR+i _dωL

Following formula has been realized current i in the d-q coordinate system _d, i _qTo voltage e _d, e _qConversion.

By last two formulas as can be seen, the Control of Voltage of DSTATCOM instruction e _d, e _qFormation and equivalent parameters R, L is closely related.If the i that adopts conventional PID control to form _q ^*And i _d ^*, when equivalent parameters was measured accurately and remained unchanged, controller performance was superior, but the device agings that equivalent parameters may cause because of the variation that is difficult to accurate measurement, operating condition, owing to long-time running etc. are former thereby caused the uncertainty of equivalent parameters.In order to improve the robustness of controller to system's major parameter disturbance, realize the adaptive ability of system, the present invention proposes DSTATCOM control method based on reinforcement learning adaptive PID.

With reference to Fig. 2, with voltage instruction value U ^* _PccWith actual measured value U _AbcError regulate the idle instruction current signal i of back formation through intensified learning PID _q ^*Dc capacitor voltage command value U _Dc ^*With actual measured value U _DcError regulate the back through intensified learning PID and form the instruction current signal i that gains merit _d ^*i _q ^*, i _d ^*In mathematical model, after the relation transformation of voltage and electric current, form idle command voltage signal e _q ^*With meritorious command voltage signal e _d ^*e _q ^*And e _d ^*After the dq/abc coordinate transform as modulation signal, after the triangular carrier modulation, produce the action that the pulse-width modulation PWM drive signal goes to control Intelligent Power Module, produce the voltage that needs compensation, thereby kept the constant of dc capacitor voltage and points of common connection PCC voltage.

The introduction of reinforcement learning adaptive pid control algorithm:

Intensified learning is called again and strengthens study, be a kind of with environmental feedback as machine learning method input, special, that conform, examination is gathered and is searched and the time-delay remuneration is two key characters of intensified learning.Wherein the Q learning algorithm is that effect is reasonable a kind of in the intensified learning algorithm, and in Q study, (s a), is defined as: its value is that the maximum conversion accumulative total when state s begins and uses as first element a is repaid to main study action valuation functions Q.The value of Q is for carrying out the value (converting with γ) of following optimal strategy after repaying immediately of action adds from state.After the intensified learning process was finished, system by to the mapping of Q matrix, obtained optimum relatively action to corresponding state.

In the adjustment of traditional pid parameter, all be to use fixing formula, it is not very desirable adjusting effect, is difficult to adapt to changeable power grid environment, in the adjustment of pid parameter, can make pid parameter have better adaptability the intensified learning algorithm application.With the parameter K of Q learning algorithm to PID _P, K _IAnd K _DAdjust respectively, step is as follows:,

1. the Q value matrix of initiation parameter, Q value matrix recording status s and under this state, select action a to expect that the accumulation that obtains awards.Wherein state promptly is somebody's turn to do K constantly _P, K _IAnd K _DValue, the action a promptly adjust this parameter value, export corresponding K _P, K _IAnd K _DValue.Study factor-alpha of initialization simultaneously and discount factor γ;

2. select and carry out action a according to current state s and Action Selection strategy π;

3. according to the K that imports _P, K _IAnd K _DValue calculate remuneration r, and enter new state s ₁

4. use formula Q (s, a)=(s a)+α * (r+ γ * maxQ (s ', a ')) upgrades the Q value to (1-α) * Q; Wherein α is the study factor, and this study factor reduces gradually along with the increase of study iterations, is zero at last, means the end of learning process, because the Q value will no longer be updated;

As can be seen, the iteration of Q learning algorithm is that strategy is irrelevant from formula, and it always selects maximum Q value to import as iteration.Same through after iterating, Q (s _t, a _t) will progressively approach desirable Q (s, a).Adjusting the study factor, return step 2, is 0 up to the study factor.

5. export rational K _P, K _IAnd K _DValue;

6.PID controller adopts the increment type PID control algolithm: control deviation:

e (k) = U_{PCC}^{*} (k) - U_{abc} (k),

Δu(k)＝K _PX(1)+K _IX(2)+K _DX(3)

Wherein: X (1)=e (k);

X(2)＝e(k)-e(k-1)；

X(3)＝e(k)-2e(k-1)+e(k-2)；

u(k)＝u(k-1)+Δu(k)

In the process of the training of reinforcement learning adaptive pid control algorithm and study, add the requirement of decoupling zero control, make the result of its training and study also satisfy the function of decoupling zero control simultaneously.Improved the control accuracy of system.

Claims

1, a kind of based on reinforcement learning adaptive proportion integration differentiation distribution static synchronous compensator control method, at first according to the instantaneous power equilibrium principle, list the distribution static synchronous compensator mathematical model, and it is transformed into the dq0 coordinate system by transformed matrix from rest frame, it is characterized in that: the error of voltage instruction value and actual measured value is regulated the idle instruction current signal of back formation through the reinforcement learning adaptive proportion integration differentiation control algolithm; The error of dc capacitor voltage command value and actual measured value forms meritorious instruction current signal after reinforcement learning adaptive proportion integration differentiation is regulated; After obtaining idle instruction current signal and active current signal, after the relation transformation of idle instruction current signal and meritorious instruction current signal voltage and electric current in mathematical model, form idle command voltage signal and meritorious command voltage signal, idle command voltage signal and meritorious command voltage signal after the dq/abc coordinate transform as modulation signal, after the triangular carrier modulation, produce the action that the pulse-width modulation PWM drive signal goes to control Intelligent Power Module, produce the voltage that needs compensation.

2, according to claim 1 based on reinforcement learning adaptive proportion integration differentiation distribution static synchronous compensator control method, it is characterized in that: the implementation procedure of reinforcement learning adaptive proportion integration differentiation control algolithm is as follows:

(1) the Q value matrix of initiation parameter, Q value matrix recording status s reach the accumulation award of selecting action a to expect acquisition under this state;

(2) select and carry out action a according to current state s and Action Selection strategy π;

(3) according to the K that imports _P, K _IAnd K _DValue calculate remuneration r, and enter new state s ₁

(4) use formula Q (s, a)=(s a)+α * (r+ γ * maxQ (s ', a ')) upgrades the Q value to (1-α) * Q; Through after iterating, Q (s _t, a _t) will progressively approach Q (s, a); Adjusting the study factor, return step 2, is 0 up to the study factor;

(5) export rational K _P, K _IAnd K _DValue;

(6) with the K that exports _P, K _IAnd K _DValue send into the adaptive proportion integration differentiation controller, adopt increment type adaptive proportion integration differentiation control algolithm to control.