CN113839578B - Multi-level converter neutral point voltage balance system and method based on reinforcement learning - Google Patents

Multi-level converter neutral point voltage balance system and method based on reinforcement learning Download PDF

Info

Publication number
CN113839578B
CN113839578B CN202111201653.2A CN202111201653A CN113839578B CN 113839578 B CN113839578 B CN 113839578B CN 202111201653 A CN202111201653 A CN 202111201653A CN 113839578 B CN113839578 B CN 113839578B
Authority
CN
China
Prior art keywords
module
voltage
reinforcement learning
strategy
level converter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111201653.2A
Other languages
Chinese (zh)
Other versions
CN113839578A (en
Inventor
叶舒
张峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202111201653.2A priority Critical patent/CN113839578B/en
Publication of CN113839578A publication Critical patent/CN113839578A/en
Application granted granted Critical
Publication of CN113839578B publication Critical patent/CN113839578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02MAPPARATUS FOR CONVERSION BETWEEN AC AND AC, BETWEEN AC AND DC, OR BETWEEN DC AND DC, AND FOR USE WITH MAINS OR SIMILAR POWER SUPPLY SYSTEMS; CONVERSION OF DC OR AC INPUT POWER INTO SURGE OUTPUT POWER; CONTROL OR REGULATION THEREOF
    • H02M7/00Conversion of ac power input into dc power output; Conversion of dc power input into ac power output
    • H02M7/42Conversion of dc power input into ac power output without possibility of reversal
    • H02M7/44Conversion of dc power input into ac power output without possibility of reversal by static converters
    • H02M7/48Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode
    • H02M7/483Converters with outputs that each can have more than two voltages levels
    • H02M7/487Neutral point clamped inverters
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02MAPPARATUS FOR CONVERSION BETWEEN AC AND AC, BETWEEN AC AND DC, OR BETWEEN DC AND DC, AND FOR USE WITH MAINS OR SIMILAR POWER SUPPLY SYSTEMS; CONVERSION OF DC OR AC INPUT POWER INTO SURGE OUTPUT POWER; CONTROL OR REGULATION THEREOF
    • H02M1/00Details of apparatus for conversion
    • H02M1/08Circuits specially adapted for the generation of control voltages for semiconductor devices incorporated in static converters
    • H02M1/088Circuits specially adapted for the generation of control voltages for semiconductor devices incorporated in static converters for the simultaneous control of series or parallel connected semiconductor devices
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02MAPPARATUS FOR CONVERSION BETWEEN AC AND AC, BETWEEN AC AND DC, OR BETWEEN DC AND DC, AND FOR USE WITH MAINS OR SIMILAR POWER SUPPLY SYSTEMS; CONVERSION OF DC OR AC INPUT POWER INTO SURGE OUTPUT POWER; CONTROL OR REGULATION THEREOF
    • H02M7/00Conversion of ac power input into dc power output; Conversion of dc power input into ac power output
    • H02M7/42Conversion of dc power input into ac power output without possibility of reversal
    • H02M7/44Conversion of dc power input into ac power output without possibility of reversal by static converters
    • H02M7/48Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode
    • H02M7/53Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal
    • H02M7/537Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters
    • H02M7/5387Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters in a bridge configuration
    • H02M7/53871Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters in a bridge configuration with automatic control of output voltage or current
    • H02M7/53873Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters in a bridge configuration with automatic control of output voltage or current with digital control
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02MAPPARATUS FOR CONVERSION BETWEEN AC AND AC, BETWEEN AC AND DC, OR BETWEEN DC AND DC, AND FOR USE WITH MAINS OR SIMILAR POWER SUPPLY SYSTEMS; CONVERSION OF DC OR AC INPUT POWER INTO SURGE OUTPUT POWER; CONTROL OR REGULATION THEREOF
    • H02M7/00Conversion of ac power input into dc power output; Conversion of dc power input into ac power output
    • H02M7/42Conversion of dc power input into ac power output without possibility of reversal
    • H02M7/44Conversion of dc power input into ac power output without possibility of reversal by static converters
    • H02M7/48Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode
    • H02M7/53Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal
    • H02M7/537Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters
    • H02M7/5387Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters in a bridge configuration
    • H02M7/53871Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters in a bridge configuration with automatic control of output voltage or current
    • H02M7/53875Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters in a bridge configuration with automatic control of output voltage or current with analogue control of three-phase output
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02MAPPARATUS FOR CONVERSION BETWEEN AC AND AC, BETWEEN AC AND DC, OR BETWEEN DC AND DC, AND FOR USE WITH MAINS OR SIMILAR POWER SUPPLY SYSTEMS; CONVERSION OF DC OR AC INPUT POWER INTO SURGE OUTPUT POWER; CONTROL OR REGULATION THEREOF
    • H02M7/00Conversion of ac power input into dc power output; Conversion of dc power input into ac power output
    • H02M7/42Conversion of dc power input into ac power output without possibility of reversal
    • H02M7/44Conversion of dc power input into ac power output without possibility of reversal by static converters
    • H02M7/48Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode
    • H02M7/53Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal
    • H02M7/537Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters
    • H02M7/539Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters with automatic control of output wave form or frequency
    • H02M7/5395Conversion of dc power input into ac power output without possibility of reversal by static converters using discharge tubes with control electrode or semiconductor devices with control electrode using devices of a triode or transistor type requiring continuous application of a control signal using semiconductor devices only, e.g. single switched pulse inverters with automatic control of output wave form or frequency by pulse-width modulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Rectifiers (AREA)

Abstract

The invention relates to a multi-level converter neutral point voltage balance system and a method based on reinforcement learning, wherein the system comprises an input module, a multi-level converter module, a voltage and current detection module, a main control module, a man-machine interaction module and an auxiliary module, wherein the auxiliary module is connected with the main control module, the main control module comprises an FPGA module, an IGBT driving module and a DSP module which are respectively connected with the FPGA module, the DSP module is respectively connected with the voltage and current detection module and the man-machine interaction module, and the multi-level converter module is respectively connected with the input module, the IGBT driving module and the voltage and current detection module; and the DSP module or the man-machine interaction module is provided with a neutral point voltage controller module DRL-NPVC based on a deep reinforcement learning algorithm. Compared with the prior art, the invention has the advantages of high response speed, strong disturbance rejection capability, low cost and the like.

Description

Multi-level converter neutral point voltage balance system and method based on reinforcement learning
Technical Field
The invention relates to the technical field of multi-level converter control, in particular to a multi-level converter midpoint voltage balancing system and method based on reinforcement learning.
Background
With the development of power electronics, microelectronics technologies and modern control theory, multilevel converters are becoming an important means to solve the problem of medium-high voltage high power electrical energy conversion. The waveform output by the multilevel converter is more similar to a sine wave, and the multilevel converter has the advantages of low output harmonic content, small voltage stress born by a single switching device, small dv/dt, small switching loss, small EMI and the like. The neutral point clamped converter is a widely applied type of multilevel converter topology structure and is characterized in that the direct current side is divided by a series capacitor, and the converter generates multilevel by clamping voltages of neutral points of the two capacitors. The midpoint of the two capacitors is a neutral point, so that the voltage balance control of the neutral point at the direct current side is an important guarantee for the correct output level and safe and stable operation of the system. There are many methods for suppressing the midpoint voltage fluctuation, and a closed-loop control method is one of the more common methods.
The existing classical control method is simple to realize and stable in effect, but obviously lacks in self-learning, self-adaption and fault tolerance capacity, and the control method based on the modern control theory improves the control method, has a good control effect on a nonlinear system, but still requires more priori knowledge and lacks self-optimizing capacity. The direct-current side capacitor voltage of the traditional neutral point clamped type multi-level converter is difficult to balance rapidly, the balance algorithm is poor in instantaneity, robustness is poor, output current voltage harmonic wave is large, the influence of external environment and load transformation is large, and the algorithm lacks versatility.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a multi-level converter neutral point voltage balancing system and method based on reinforcement learning, which have the advantages of high response speed adjustment, strong disturbance rejection capability and low cost.
The aim of the invention can be achieved by the following technical scheme:
according to one aspect of the invention, a multi-level converter neutral point voltage balance system based on reinforcement learning is provided, which comprises an input module, a multi-level converter module, a voltage and current detection module, a main control module, a man-machine interaction module and an auxiliary module, wherein the auxiliary module is connected with the main control module, the main control module comprises an FPGA module, an IGBT driving module and a DSP module which are respectively connected with the FPGA module, the DSP module is respectively connected with the voltage and current detection module and the man-machine interaction module, and the multi-level converter module is respectively connected with the input module, the IGBT driving module and the voltage and current detection module;
and the DSP module or the man-machine interaction module is provided with a neutral point voltage controller module DRL-NPVC based on a deep reinforcement learning algorithm.
As an preferable technical scheme, the input module comprises an alternating current input end and an uncontrolled rectifier, wherein the alternating current input end is connected with an alternating current power grid, and is converted into an independent direct current power supply through the uncontrolled rectifier to be connected with the multi-level converter module.
As an preferable technical scheme, the multi-level converter module comprises neutral point clamped multi-level converters of various level grades, wherein the direct current side of the multi-level converter adopts two capacitors to be connected in series, and multi-level is generated through neutral point potential of the clamping capacitors; each phase includes a plurality of IGBTs or reverse diodes; the alternating current output end of the multilevel converter is connected with a load through a filter.
As an preferable technical scheme, the voltage and current detection module is connected with the DSP module and is used for obtaining real-time circuit signals, including output current, output voltage and direct-current side capacitor voltage;
the FPGA module is connected with the IGBT driving module and is used for sending out PWM control signals.
The IGBT driving module is connected with the multi-level converter module and is used for realizing the on-off control of the IGBT.
As a preferred solution, the DSP module is configured to deploy a space vector pulse width modulation strategy SVPWM of the multilevel converter.
As an optimal technical scheme, the neutral point voltage controller module DRL-NPVC based on the deep reinforcement learning algorithm comprises an input sub-module, a judging sub-module and an output sub-module;
the input submodule acquires capacitor voltage at the direct-current side, current voltage at the load end and other circuit parameter values through the voltage and current detection module; the judging submodule is used for judging whether the current voltage control strategy is an optimal strategy or not, if so, executing the strategy and updating network parameters, and if not, continuing to execute iteration until the optimal strategy is selected; the output sub-module is used for outputting the current selected optimal strategy and parameters, and is used for optimizing the SVPWM modulation strategy of the multi-level converter to realize the balance control of the midpoint voltage.
As a preferable technical scheme, if the DRL-NPVC module is deployed in the man-machine interaction module, a corresponding communication port and the DSP module are also required to carry out communication interaction.
As an optimal technical scheme, the man-machine interaction module comprises an upper computer and a display screen, and is connected with the DSP module and used for realizing man-machine interaction.
As an optimized technical scheme, the auxiliary module comprises an auxiliary power supply and peripheral equipment, wherein the auxiliary power supply is an uninterrupted power supply and is respectively connected with the FPGA module, the DSP module and the IGBT driving module for guaranteeing the stability of system voltage and meeting the power supply requirement.
According to another aspect of the present invention, there is provided a method for a reinforcement learning based multi-level converter neutral point voltage balancing system as claimed in claim 1, comprising the steps of:
s1: according to actual conditions, preliminarily determining the neutral point voltage balance requirement of the multi-level converter based on reinforcement learning; the multilevel converter adopts an SVPWM modulation mode, wherein the balance control of the midpoint voltage of the direct current side is based on an optimal voltage regulating factor fed back in real time by a deep reinforcement learning controller, and the charge flowing through the midpoint is regulated by changing the duty ratio of a switching sequence, so that the voltage balance of the capacitor of the direct current side is realized;
s2: establishing a Markov decision process (Markov Decision Process, MDP) of a multi-level midpoint voltage balance controller based on deep reinforcement learning, mapping the adjustment behavior of a voltage adjustment factor in each sampling period in a SVPWM modulation process into a reinforcement learning process based on iterative updating of action value, carrying out mathematical modeling on midpoint voltage balance control problems by adopting MDP and a Belman equation, determining algorithm control targets, environment states and instant rewards, abstracting and simplifying the problems, and converting the problems into problems for solving an optimal cost function and an action cost function:
v π (s)=E π (R t+1 +γv π (S t+1 )|S t =s)
q π (s,a)=E π (R t+1 +γq π (S t+1 ,A t+1 )|S t =s,A t =a)
wherein v is π (s) as a cost function, using the desired function E π Indicating that gamma is a reward attenuation factor and gamma is E [0,1 ]]Pi is the policy of the individual, S is the environmental state, S t For the environmental state at time t, a is the action taken, q π (s, a) is an action cost function, R t For environmental rewards at time t, A t Action taken at time t;
s3: establishing a Q network model, and selecting a deep reinforcement learning DDQN (Double Deep Q Network) algorithm to calculate network parameters; the DDQN algorithm is an off-policy algorithm, and the target strategy and the behavior strategy are separated, so that excessive regulation of midpoint voltage is avoided; aiming at the Q-learning method that the action behavior is a discrete variable, a target Q value is approximately expressed as Q (s, a) by constructing a deep neural network; the network input is the environmental state S t I.e. circuit parameter feature vector, the network output is action set A t Action value q of time π (s,a);
S4: initializing the established DDQN network parameters, then carrying out iterative optimization from 1 to T times, solving the optimal network parameters for realizing the expected targets, and determining the optimal voltage regulating factors meeting the system requirements;
s5: and feeding back the optimal voltage regulating factor to an SVPWM (space vector pulse width modulation) strategy, so that effective balance control of the neutral point voltage of the multi-level converter is realized.
Compared with the prior art, the invention has the following advantages:
1) The invention has good voltage balance effect and high response speed;
2) According to the voltage deviation direction, deviation degree and deviation time, the optimal voltage regulation behavior can be learned and iteratively optimized by continuously adopting actions and obtaining feedback;
3) The algorithm has good robustness, can make real-time adjustment and feedback according to the change of the environment, and can effectively cope with neutral-point voltage unbalance caused by load change or power disturbance;
4) The control strategy can be embedded in a DSP module or an upper computer environment, and additional hardware equipment is not required to be added, so that the cost is effectively saved;
5) The invention effectively reduces the harmonic content of output voltage and current;
6) The invention avoids excessive regulation and effectively reduces the switching loss;
7) The algorithm of the invention has good universality and strong mobility, and is suitable for neutral point clamping type multi-level converters with different level grades and topological structures;
8) The complexity of the neutral point voltage balance algorithm of the invention does not increase along with the increase of the complexity of the modulation strategy, so the invention is suitable for solving the neutral point voltage balance problem of the multilevel converter with higher level grade and complex topological structure.
Drawings
FIG. 1 is a block diagram of a system according to the present invention;
FIG. 2 is a diagram of a system framework in accordance with the present invention;
FIG. 3 (a) is a schematic diagram of a single-phase three-level NPC converter module;
FIG. 3 (b) is a schematic diagram of a single-phase three-level T-type converter module;
FIG. 4 (a) is a schematic diagram of a single-phase five-level NNPP converter module;
FIG. 4 (b) is a schematic diagram of a single-phase five-level ANPC converter module;
FIG. 5 is a deep reinforcement learning controller submodule diagram;
FIG. 6 is a flow chart of midpoint voltage balance control based on deep reinforcement learning in the present invention;
fig. 7 is a schematic diagram of the working principle of the deep reinforcement learning controller.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Deep Learning (DL) is a method based on data characterization Learning in the machine Learning field, mainly using a Deep neural network as a tool, processing a multi-layer network structure through nonlinear transformation, combining low-layer features and forming a high-layer feature representation easy to distinguish, so that complex Learning tasks are completed by using a simple model. Reinforcement learning (Reinforcement Learning, RL) is one of the methodologies in the field of machine learning, the basic idea of which is that agents (agents) gradually develop habitual actions that maximize the expected benefits by taking actions (actions) to change State continuously during interaction with the Environment (Environment) and obtaining feedback given by the Environment, i.e. rewards (Reward) or penalties. The perception capability of DL and the decision capability of RL are creatively combined to form a deep reinforcement learning DRL, and the DRL realizes the direct control of proxy end-to-end (end-to-end) from the original input to the output, is a perception control system with strong universality, and can solve the complex decision control task of high-dimensional original data.
The deep reinforcement learning method can obtain feedback in the interaction process with the environment under the condition of no model and no early knowledge accumulation to realize self iteration and self update, further realize continuous improvement of control performance, and is a real intelligent control method with good application prospect. The application of DRL in the neutral point voltage balance of the multi-level converter based on reinforcement learning is an innovative application of DRL in the engineering field. The application of the DRL can effectively adapt to different environments and changes, and continuously updates the optimal control strategy parameters according to environment feedback, thereby being beneficial to realizing a rapid, self-adaptive and strong-robustness midpoint voltage control strategy.
In this embodiment, as shown in fig. 1, the multi-level converter neutral point voltage balancing system based on deep reinforcement learning provided by the invention comprises an input module, a multi-level converter module, a voltage and current detection module, a DRL-NPVC module, a main control module, a man-machine interaction module and an auxiliary module.
The input module is a relatively independent power supply network, as shown in fig. 2, the alternating current power grid is converted into direct current after rectification and filtering, and the direct current is connected with the multi-level converter module to serve as a power supply to supply power to the multi-level converter module.
The multi-level converter module is a neutral point clamping type multi-level converter with various level grades and various topological structures, is powered by the input module, and the alternating current output end of the multi-level converter module is connected with a load through a filter. The multi-level converter includes, but is not limited to, a three-level NPC converter, a three-level T-converter, a five-level ANPC converter, and a five-level NNPP converter as shown in fig. 3 (a), 3 (b), 4 (a), and 4 (b). As shown in fig. 2, the dc side of the neutral point clamped multilevel converter connects two capacitors in series and generates a multilevel by clamping the neutral point potential of the two capacitors. Taking the five-level NNPP converter shown in fig. 4 as an example, each of the three phases ABC is composed of 12 IGBTs/reverse diodes and 2 floating capacitors, and five levels are generated by clamping the midpoint voltage and the floating capacitor voltage, so that the converter is modulated by taking the balance of the capacitor voltage on the direct current side into consideration, and simultaneously taking the balance of the floating capacitor voltage, the ripple suppression of the capacitor voltage, the acceptable maximum total harmonic distortion and the like into consideration. Therefore, with the improvement of the level grade of the converter and the improvement of the complexity of the topological structure, the difficulty of the neutral point voltage balance control is increased, the factors are required to be comprehensively considered, and the change of working conditions and loads is adapted in real time.
The voltage and current detection module mainly comprises a voltage and current sensor, is connected with a DSP module in the main control module and is used for monitoring and sampling circuit signals in real time to acquire the running state of the system, wherein the circuit signals comprise, but are not limited to, the output voltage and current of the converter, the capacitor voltage and current of the direct current side, the capacitor voltage and current of the suspension, and the like; the circuit signals are subjected to A/D conversion and fed back to a modulation strategy of the multi-level converter deployed in the DSP module, and the modulation strategy is used as a control signal to regulate the running state of the multi-level converter.
The DRL-NPVC module is a transformer neutral point voltage balance controller based on deep reinforcement learning, the main function of the module is realized based on software, and the module can be deployed in a DSP module or an upper computer connected with the DSP according to requirements. As shown in FIG. 5, the DRL-NPVC module comprises an input sub-module, a judging sub-module and an output sub-module. Taking deployment in a DSP module as an example, the DRL-NPVC completes iterative optimization of a control strategy in the DSP module, inputs the iterative optimization into a digital circuit signal obtained in real time from an A/D sampling module, judges that a sub-module takes a multi-level converter as an environment, adopts actions and obtains feedback given by the running condition of the circuit, continuously iterates learning, judges whether the current voltage balance control parameter is an optimal parameter, and if not, continues learning iteration; if yes, the parameter is output, an output sub-module of the parameter is connected with an SVPWM strategy of the internal part of the DSP module, and the obtained optimal voltage balance control parameter is fed back to the modulation strategy to realize multi-level voltage control.
The main control module is based on the DSP module, the FPGA module and the power switch device driving module and is used for realizing multi-level PWM modulation. The control system of the multilevel converter has the characteristic of large driving quantity, the independent modulation of the independent DSP module is difficult to finish, the FPGA is added into the control system to expand the DSP modulation, and the driving modulation of the multilevel converter is finished together.
The DSP module is responsible for generating three-phase multipath PWM modulation signals and transmitting the three-phase multipath PWM modulation signals to the FPGA module, and the operation control of the FPGA module is divided into three states: standby state, running state, and fault state. And after the DSP program is initialized, entering a standby state, and waiting for sampling and serial port communication work. After the sampling meets the requirements and receives the instruction of serial port communication transmission, the normal operation state is entered, the sampling data is converted into electric parameters, and then software protection judgment is carried out. If the protection signal appears, the DSP enters a fault state, sends an operation closing instruction to the FPGA, and resets the submodule. And sending a fault signal to the upper computer through serial port communication and keeping a fault state. If the protection signal is not generated, the DSP continues to operate the SVPWM modulation strategy core algorithm and sends a switch state control signal to the FPGA module. After completion, the information to be saved and checked can be controlled through serial communication. And after the core program is operated, judging whether a shutdown signal is relevant, if not, repeating the actions, and if the shutdown instruction is received, shutting down and resetting the sub-module.
The FPGA module is connected with the DSP module and the IGBT driving module, and has the main functions of outputting PWM driving signals of the multipath power devices according to the level states of signals generated by the DSP and controlling the high-frequency switch of the multi-level converter. The FPGA filters according to the narrow pulse, avoids frequent disconnection of the power device, and calculates the superposition time length according to the rising edge and the falling edge of the modulation signal of the power device to control the dead zone.
The IGBT driving module receives a control instruction sent by the FPGA module, is connected with the multi-level converter module and is used for controlling a high-frequency switching device in the circuit.
The man-machine interaction module is connected with the DSP module and comprises an upper computer, a display screen, serial port communication equipment and the like, and is used for realizing man-machine interaction. The DRL-NPVC is a software module, can be deployed in an upper computer system, realizes an iterative optimization process of an algorithm in the upper computer system, returns the obtained optimal voltage regulation parameters to an SVPWM algorithm in the DSP module, and optimizes a modulation strategy of the multi-level converter.
The multi-level midpoint voltage balancing method based on deep reinforcement learning provided by the invention, as shown in fig. 6 and 7, comprises the following steps:
step 1: the system modulation requirements, i.e. the control targets, are determined. The voltage regulation factor is used for inhibiting the voltage fluctuation in the middle point of the converter and improving the system stability, and the control targets are as follows:
Minimize|ΔVdc/Vdc|
S.t.Thd<δ
wherein SLR is the switching loss rate Switching Loss Rate, wherein delta is summedCan be adjusted according to the requirement, in the embodiment, delta=5% and ++>
Step 2: and modulating the multi-level converter by adopting a seven-segment space vector pulse width modulation strategy based on a g/h coordinate system. For any given reference voltage vector U in g/h coordinate system ref Its coordinates in the three-phase stationary coordinate system a-b-c are (U) a ,U b ,U c ) The coordinates in the αβ coordinate system are (U α ,U β ) The coordinates in the g/h coordinate system are (U rg ,U rh ) The transformation relationship between the two coordinate systems can be obtained by the geometric relationship between the alpha beta and g/h coordinate systems as follows:
step 3: for arbitrary spatial reference voltage vectors U ref Selecting three vertexes of a small triangle closest to the small triangle in the space vector diagram, and using U 1 、U 2 、U 3 And means for synthesizing the reference voltage vector. According to the points relative to U ref Respectively calculating the acting time T of the contribution degree of the (B) 1 、T 2 、T 3
Step 4: according to the balance condition of neutral point voltage, introducing voltage regulating factor lambda, by reassigning action time T 1 、T 2 、T 3 The charge flowing through the neutral point during the period is changed to achieve balanced control of the midpoint voltage, in this embodiment randomly initialized λ=0.2.
Step 5: the mid-point voltage balance problem of the multi-level converter meets the description of the dynamic system state of discrete events, and the future evolution of the voltage state is not expressed by past evolution and state as a necessary condition, so the problem meets the Markov process, and the problem is mathematically described by adopting a Markov model.
Step 6: determining basic elements of reinforcement learning, including state S of environment at time t t Action A taken t Corresponding delay rewards R t+1 Individual policyPi, cost function v after action π (s) action cost function q π (s|a) and a prize decay factor gamma.
Step 7: mapping the regulation behavior of the voltage regulation factor in the modulation process into a reinforcement learning process based on the iterative updating of a cost function or an action cost function, and describing the cost function and the action cost function by adopting a Markov decision process and a Belman equation:
v π (s)=E π (R t+1 +γv π (S t+1 )|S t =s)
q π (s,a)=E π (R t+1 +γq π (S t+1 ,A t+1 )|S t =s,A t =a)
step 8: and (5) establishing a Q network model. And constructing a model network of the voltage balance problem by adopting a DDQN algorithm and solving network parameters. Wherein the inputs to the algorithm include: iteration times T, action set A taken, step length alpha, rewarding attenuation factor gamma, exploration probability epsilon, current network Q, target network Q', gradient descent sample number m; the output of the algorithm is the Q network parameter.
Step 9: randomly initializing all states and corresponding values v of actions π (s) and q π (s, a) randomly initializing all parameters ω of the current network Q, initializing all parameters of the target network Q 'to the current network parameters ω' =ω; the set ER of experience playback is emptied.
Step 10: an iteration from 1 to T starts.
Step 10.1: selecting an action a corresponding to the maximum Q value in the current network Q by using an epsilon-greedy method max : initializing an environment state S, representing neutral point voltage offset and other circuit parameters of a DC side of a converter as a characteristic vector phi (S) of a current environment, taking phi (S) as input, calculating Q values corresponding to all voltage regulating actions in a current network Q, and then selecting an action a corresponding to the maximum Q value in the current network max (S j ,ω)。
Step 10.2, executing the current voltage balance control action in the state S to obtain a feature vector phi (S ') and a reward R ' corresponding to the new state S ', and judging whether to terminate the state END; five elements { A, R, φ (S), φ (S'), END } are stored in the experience playback set ER.
Step 10.3: the current environmental state s=s' is updated.
Step 10.4: sampling m samples { phi } (S) from the experience playback set ER j ),A j ,R j ,φ(S' j ),is_end j J=1, 2, the first and second parameters, m, calculating the Q' value y of the target network j If the current state is the termination state, y j =R j Otherwise y j =R j +γQ′(φ(S′ j ),a max (S′ j ,ω),ω′)。
Step 10.5: updating all parameters omega of the current Q network; after a certain time interval, the network parameter ω '=ω of the target network Q' is updated.
Step 10.6: if S' is the termination state, the current round of iteration is completed, otherwise, the step 11 is returned to for continuous execution.
Step 11: and feeding the acquired optimal Q network parameters back to the SVPWM modulation strategy for voltage regulation, so that effective balance control of the neutral point voltage of the multi-level converter is realized.
The beneficial effects of the invention are as follows:
the voltage regulating speed is high, and the balance effect is good; the real-time performance is good, and the control algorithm is continuously interacted with the environment and is subjected to iterative circulation until the optimal voltage regulation behavior is obtained; the robustness is good, at any moment, the network parameters can be updated in real time and the state is perfected by inputting a new training sample for the reinforcement learning algorithm, and the real-time adjustment and feedback can be carried out according to the change of the environment, so that the neutral point voltage unbalance caused by the load change or the power disturbance can be effectively solved; based on software control, the method is flexible to realize and low in cost, and the algorithm can be realized in a DSP module or a man-machine interaction module without adding additional hardware equipment; decoupling control is carried out on the current state and the target state network, so that voltage fluctuation caused by excessive regulation is avoided; the harmonic content and the switching loss of the output voltage and the output current are effectively reduced; the algorithm has good universality and strong mobility, and is suitable for multi-level converters with different level grades and topological structures; the superiority of the algorithm is more obvious when the problem of controlling the neutral point voltage of the converter with high level grade and complex topological structure is faced.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (9)

1. The method for the neutral point voltage balance system of the multi-level converter based on reinforcement learning is characterized in that the system comprises an input module, a multi-level converter module, a voltage and current detection module, a main control module, a man-machine interaction module and an auxiliary module, wherein the auxiliary module is connected with the main control module, the main control module comprises an FPGA module, an IGBT driving module and a DSP module, the IGBT driving module and the DSP module are respectively connected with the voltage and current detection module and the man-machine interaction module, and the multi-level converter module is respectively connected with the input module, the IGBT driving module and the voltage and current detection module;
a midpoint voltage controller module DRL-NPVC based on a deep reinforcement learning algorithm is deployed on the DSP module or the man-machine interaction module;
the method comprises the following steps:
s1: according to actual conditions, preliminarily determining the neutral point voltage balance requirement of the multi-level converter based on reinforcement learning; the multilevel converter adopts an SVPWM modulation mode, wherein the balance control of the midpoint voltage of the direct current side is based on an optimal voltage regulating factor fed back in real time by a deep reinforcement learning controller, and the charge flowing through the midpoint is regulated by changing the duty ratio of a switching sequence, so that the voltage balance of the capacitor of the direct current side is realized;
s2: establishing a Markov decision process MDP of a multi-level midpoint voltage balance controller based on deep reinforcement learning, mapping the adjustment behavior of a voltage adjustment factor in each sampling period in an SVPWM modulation process into a reinforcement learning process based on action value iterative update, carrying out mathematical modeling on midpoint voltage balance control problems by adopting MDP and a Bellman equation, determining algorithm control targets, environment states and instant rewards, abstracting and simplifying the problems, and converting the problems into problems for solving an optimal cost function and an action cost function:
v π (s)=E π (R t+1 +γv π (S t+1 )|S t =s)
q π (s,a)=E π (R t+1 +γq π (S t+1 ,A t+1 )|S t =s,A t =a)
wherein v is π (s) is a cost function, typically an expectation function E π Indicating pi is the policy of the individual, S is the environmental state, S t For the environmental state at time t, gamma is the rewarding attenuation factor, gamma is [0,1 ]]A is the action taken, q π (s, a) is an action cost function, R t For environmental rewards at time t, A t Action taken at time t;
s3: establishing a Q network model, and selecting a deep reinforcement learning DDQN algorithm to calculate network parameters; the DDQN algorithm is an off-policy algorithm, and the target strategy and the behavior strategy are separated, so that excessive regulation of midpoint voltage is avoided; aiming at the Q-learning method that the action behavior is a discrete variable, a target Q value is approximately expressed as Q (s, a) by constructing a deep neural network; the network input is the environmental state S t I.e. circuit parameter feature vector, the network output is action set A t Action value q of time π (s,a);
S4: initializing the established DDQN network parameters, then carrying out iterative optimization from 1 to T times, solving the optimal network parameters for realizing the expected targets, and determining the optimal voltage regulating factors meeting the system requirements;
s5: and feeding back the optimal voltage regulating factor to an SVPWM (space vector pulse width modulation) strategy, so that effective balance control of the neutral point voltage of the multi-level converter is realized.
2. The method of claim 1, wherein the input module comprises an ac input terminal and an uncontrolled rectifier, the ac input terminal is connected to an ac power grid, and the ac input terminal is converted into an independent dc power source by the uncontrolled rectifier and is connected to the multilevel converter module.
3. The method of claim 1 wherein the multilevel converter module comprises a neutral point clamped multilevel converter of each level class, the dc side of the multilevel converter being connected in series with two capacitors, the multilevel being generated by clamping the neutral point potential of the capacitors; each phase contains a plurality of IGBTs and reverse diodes; the alternating current output end of the multilevel converter is connected with a load through a filter.
4. The method of claim 1, wherein the voltage and current detection module is connected to the DSP module to obtain real-time circuit signals, including an output current, an output voltage, and a dc-side capacitor voltage;
the FPGA module is connected with the IGBT driving module and is used for sending out PWM control signals;
the IGBT driving module is connected with the multi-level converter module and is used for realizing the on-off control of the IGBT.
5. The method of claim 1, wherein the DSP module is configured to deploy a multilevel converter space vector pulse width modulation strategy SVPWM.
6. The method of claim 1, wherein the deep reinforcement learning algorithm based midpoint voltage controller module DRL-NPVC comprises an input sub-module, a judgment sub-module, and an output sub-module;
the input submodule acquires capacitor voltage at the direct-current side, current voltage at the load end and other circuit parameter values through the voltage and current detection module; the judging submodule is used for judging whether the current voltage control strategy is an optimal strategy or not, if so, executing the strategy and updating network parameters, and if not, continuing to execute iteration until the optimal strategy is selected; the output sub-module is used for outputting the current selected optimal strategy and parameters, and is used for optimizing the SVPWM modulation strategy of the multi-level converter to realize the balance control of the midpoint voltage.
7. The method of claim 1, wherein if the DRL-NPVC module is deployed in the man-machine interaction module, a corresponding communication port and DSP module are further required for communication interaction.
8. The method of claim 1, wherein the man-machine interaction module comprises an upper computer and a display screen, and is connected with the DSP module for realizing man-machine interaction.
9. The method of claim 1, wherein the auxiliary module comprises an auxiliary power source and peripheral equipment, and the auxiliary power source is an uninterruptible power source and is respectively connected with the FPGA module, the DSP module and the IGBT driving module for guaranteeing the stability of system voltage and meeting the power supply requirement.
CN202111201653.2A 2021-10-15 2021-10-15 Multi-level converter neutral point voltage balance system and method based on reinforcement learning Active CN113839578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111201653.2A CN113839578B (en) 2021-10-15 2021-10-15 Multi-level converter neutral point voltage balance system and method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111201653.2A CN113839578B (en) 2021-10-15 2021-10-15 Multi-level converter neutral point voltage balance system and method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113839578A CN113839578A (en) 2021-12-24
CN113839578B true CN113839578B (en) 2024-03-01

Family

ID=78969073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111201653.2A Active CN113839578B (en) 2021-10-15 2021-10-15 Multi-level converter neutral point voltage balance system and method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113839578B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111416540A (en) * 2020-04-27 2020-07-14 山东大学 Multi-level converter midpoint potential rapid balance control system and method
CN112187074A (en) * 2020-09-15 2021-01-05 电子科技大学 Inverter controller based on deep reinforcement learning
CN113254197A (en) * 2021-04-30 2021-08-13 西安电子科技大学 Network resource scheduling method and system based on deep reinforcement learning
CN113437889A (en) * 2021-07-26 2021-09-24 沈阳工业大学 Three-phase three-level high-power-factor rectifying device and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111416540A (en) * 2020-04-27 2020-07-14 山东大学 Multi-level converter midpoint potential rapid balance control system and method
CN112187074A (en) * 2020-09-15 2021-01-05 电子科技大学 Inverter controller based on deep reinforcement learning
CN113254197A (en) * 2021-04-30 2021-08-13 西安电子科技大学 Network resource scheduling method and system based on deep reinforcement learning
CN113437889A (en) * 2021-07-26 2021-09-24 沈阳工业大学 Three-phase three-level high-power-factor rectifying device and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A Dynamic-Segment-Alternating SVPWM for a Five-Level NNPP Converter With Neutral-Point Voltage Control;Shu Ye等;《IEEE TRANSACTIONS ON POWER ELECTRONICS》;第36卷(第09期);10612-10626 *
A New Model-Free Space Vector Modulation Technique for Multilevel Inverters Based On Deep Reinforcement Learning;Pouria Qashqai等;《IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society》;2407-2411 *
深度强化学习研究综述;赵星宇 等;《计算机科学》;第45卷(第07期);1-6 *

Also Published As

Publication number Publication date
CN113839578A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
Al-Saedi et al. Power flow control in grid-connected microgrid operation using Particle Swarm Optimization under variable load conditions
Valderrama et al. Reactive power and imbalance compensation using STATCOM with dissipativity-based control
CN109802584B (en) Three-phase VSR unified MPC method capable of realizing AC-DC side performance consideration
Rathika et al. Fuzzy logic–based approach for adaptive hysteresis band and dc voltage control in shunt active filter
Bouzidi et al. Hybrid direct power/current control using feedback linearization of three-level four-leg voltage source shunt active power filter
CN109067217B (en) Design method of linear active disturbance rejection controller of three-phase voltage type PWM rectifier
Cortajarena et al. Sliding mode control of an active power filter with photovoltaic maximum power tracking
Chen et al. State-space modeling, analysis, and implementation of paralleled inverters for microgrid applications
Mohanraj et al. A unified power quality conditioner for power quality improvement in distributed generation network using adaptive distributed power balanced control (ADPBC)
Aboelsaud et al. Voltage control of autonomous power supply systems based on PID controller under unbalanced and nonlinear load conditions
Ghanbarian et al. Design and implementation of a new modified sliding mode controller for grid-connected inverter to controlling the voltage and frequency
CN113839578B (en) Multi-level converter neutral point voltage balance system and method based on reinforcement learning
Boukezata et al. Implementation of predictive current control for shunt active power filter
CN108631624B (en) Cascaded H-bridge rectifier based on three-dimensional modulation and control method thereof
Wang et al. Simulation of three-phase voltage source PWM rectifier based on direct current control
CN107634657B (en) Predictive control method and device for matrix converter
CN111525551B (en) Target control method and system for rectifier under unbalanced power grid voltage
CN111756261B (en) PWM rectifier control method and device
CN111952993B (en) Modular cascade power electronic transformer balance control system and method
CN114928261A (en) Model prediction and zero sequence voltage balance control method of three-phase five-level PWM rectifier
Perez et al. FPGA-based predictive current control of a three-phase active front end rectifier
Marzouki et al. Sensorless nonlinear control for a three-phase PWM AC-DC converter
CN112787350A (en) Low-frequency oscillation circulating current suppression method and system for modular multilevel converter
CN110855157A (en) Airplane ground static variable power supply direct-current bus control method based on active rectification
Xing et al. Research on VIENNA rectifier based on DSP under unbalanced power grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant