CN117634320B

CN117634320B - Multi-objective optimization design method for three-phase high-frequency transformer based on deep reinforcement learning

Info

Publication number: CN117634320B
Application number: CN202410096904.2A
Authority: CN
Inventors: 王佳宁; 王开鹏
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2024-01-24
Filing date: 2024-01-24
Publication date: 2024-04-09
Anticipated expiration: 2044-01-24
Also published as: CN117634320A

Abstract

The invention provides a multi-objective optimization design method of a three-phase high-frequency transformer based on deep reinforcement learning, which carries out multi-objective optimization based on a DDPG algorithm. The artificial neural network is utilized to replace a complex leakage inductance calculation formula, so that leakage inductance parameters can be obtained rapidly and accurately, multi-objective optimization is performed by combining a DDPG algorithm, calculation resources are saved, the power density, efficiency and reliability of the transformer are improved, parasitic parameters are obtained rapidly and accurately, further the parasitic parameters are controlled accurately, and support is provided for subsequent engineering design.

Description

Multi-objective optimization design method for three-phase high-frequency transformer based on deep reinforcement learning

Technical Field

The invention belongs to the field of high-frequency transformer design, and particularly relates to a multi-objective optimization design method for a three-phase high-frequency transformer based on deep reinforcement learning.

Background

The power electronic transformer has the advantages of small volume, light weight, no insulating oil and the like, and thus, is widely paid attention to, and is composed of a high-frequency high-power transformer and a power electronic converter.

The power density of the high-frequency high-power transformer can be improved by improving the working frequency. But at the same time also leads to increased losses and reduced efficiency. And miniaturization of the volume reduces the heat conduction area, so that heat dissipation is difficult, and meanwhile, the three-phase high-frequency high-power transformer is generally covered with insulating materials to meet high-voltage environment, so that heat dissipation performance is further reduced. Therefore, the design of the three-phase high-frequency high-power transformer needs to comprehensively consider a plurality of optimization design targets such as power density, efficiency, heat dissipation capacity, parasitic parameters and the like, and conflicts often exist, and the three-phase high-frequency high-power transformer cannot be optimized at the same time, so that the design of the three-phase high-frequency high-power transformer is a multi-target optimization problem.

The conventional transformer design method, such as an AP method and a geometric parameter method, generally calculates parameters such as an AP value according to experience, and selects a commercial magnetic core according to the parameters, and the design method generally cannot select to optimize power density and efficiency, and cannot perform targeted design on insulation requirements, so that certain defects exist.

The three-phase LLC resonant converter has the advantages of good soft switching characteristic, simple control, high efficiency and the like, and is widely applied to power electronic converters. Therefore, a fast and accurate three-phase high-frequency high-power transformer optimization design method suitable for the three-phase LLC resonant converter is needed.

At present, the optimal design method of the high-frequency high-power transformer has become a research hot spot problem. The main research aims at low calculation amount and high accuracy of the high-frequency high-power transformer model, high efficiency and high accuracy of the optimization method, and pursuing low cost, high power density, high efficiency and high reliability of designing the high-frequency high-power transformer and rapid, accurate and controllable parasitic parameter calculation. This has been a theoretical analysis of the fact that the academic paper has made a deep analysis, and also has engineering methods that are practically used. The design method is provided for a high-frequency and high-power three-phase transformer applied to a three-phase double-active bridge converter, wherein the design method is disclosed in Chinese patent application specification CN112052562A on 12 months and 8 days in 2020, and aims at the high-frequency and high-power three-phase transformer applied to the three-phase double-active bridge converter.

But the following disadvantages exist: (1) The free parameter scanning method has huge calculated amount under the condition of more independent variables and wider variable range, and is unfavorable for engineering application. (2) The flat copper strip and the laminated magnetic core are high in cost, are not suitable for windings with a plurality of turns, and are high in process difficulty.

The invention patent application CN 113283073A discloses a multi-objective optimization design method of a three-phase high-frequency high-power transformer in 2021, 8 months and 20 days, and provides an optimization design of the three-phase high-frequency high-power transformer aiming at being applicable to a three-phase LLC resonant converter. The scheme can control the parameter of leakage inductance and improve the power density and efficiency of the high-frequency high-power three-phase transformer.

However, the following disadvantages are existed, (1) because NSGA-II algorithm is adopted, when the system state is changed, complex and time-consuming optimizing solving process is needed to be carried out again, computing resources are consumed, action values after state change cannot be rapidly given out, the optimizing process has limitation, and the application range is limited. (2) The leakage inductance calculation has a complex calculation formula, so that the time is very long; the calculation formulas are different in different application scenes, the complexity of the selected formulas is different, a certain error can be generated, the calculation result is not accurate enough, and accurate control of leakage inductance parameters is not facilitated.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a multi-objective optimization design method for a three-phase high-frequency transformer based on deep reinforcement learning, which carries out multi-objective optimization based on a DDPG algorithm. The artificial neural network is utilized to replace a complex leakage inductance calculation formula, so that leakage inductance parameters can be obtained rapidly and accurately, multi-objective optimization is performed by combining a DDPG algorithm, calculation resources are saved, the power density, efficiency and reliability of the transformer are improved, parasitic parameters are obtained rapidly and accurately, further the parasitic parameters are controlled accurately, and support is provided for subsequent engineering design.

The technical scheme of the invention is as follows:

a multi-objective optimization design method for a three-phase high-frequency high-power transformer based on deep reinforcement learning, wherein the three-phase high-frequency high-power transformer is applied to a three-phase LLC resonant converter and comprises three identical single phasesTransformer, an upper yokeA lower yoke->And an insulating structure; the insulation structure comprises a main insulation structure +.>And sub-insulation structure->；

Any one single-phase transformer of the three-phase high-frequency high-power transformer is marked asPhase transformer->，/>The phase of the light is represented by a phase,said->Phase transformer->From inside to outside, a magnetic core leg with a rectangular cross section >A primary winding->And a secondary winding->Composition, primary winding->Secondary winding->Is shaped as a magnetic core column->The three parts are concentric and are in the magnetic core column +.>And primary winding->Is filled with a secondary insulation structure>In the primary winding->And secondary winding->Is filled with main insulating structure>Magnetic core column->The height of (2) is denoted as window height +.>Magnetic core column->The width of the cross section of (a) is denoted as core cross section width +.>Magnetic core column->The length of the cross section of (a) is recorded as the length of the cross section of the magnetic core；

The upper magnetic yokeAnd a lower yoke->Is in the shape of a cuboid of the same height as the cross-sectional width of the core +.>Equal width and core cross-section length +.>Equal; three identical single-phase transformers are equally spaced +.>Are arranged in sequence side by side at the upper magnetic yoke->And a lower yoke->Between and in three single-phase transformers and upper yoke +>Between, bottom yoke->A certain space is reserved between the two parts, and equal distance is reserved between the two parts>Recorded as window length +.>The method comprises the steps of carrying out a first treatment on the surface of the In three magnetic core columns->Is +.>The non-magnetic material with the same thickness is paved in the opposite space, and forms an air gap layer +.>The method comprises the steps of carrying out a first treatment on the surface of the Three secondary windings of three transformers +.>Is +. >Between and the lower yoke->The space between the two is filled with a secondary insulation structure>。

Primary windingAnd secondary winding->The circular multi-strand twisted wire is adopted for winding.

The magnetic core columnUpper yoke->Lower yoke->All are made of ferrite materials with initial magnetic permeability larger than 2500, and the single-turn wire diameter of the circular stranded wire is made to be +.>Less than the working frequency of the three-phase high-frequency high-power transformer, the skin depth of electromagnetic signals is +.>Skin-care productDepth->The expression is:

,

in the method, in the process of the invention,the resistivity of the conductive material in the circular stranded wire; />The working frequency of the three-phase high-frequency transformer is set;is the magnetic conductivity of the conductive material in the circular stranded wire.

The multi-objective optimization design method comprises the following steps:

step 1, designing requirements and selecting parameters;

recording three-phase high-frequency high-power transformer as system, combing design requirement of three-phase high-frequency high-power transformer, including rated powerPrimary winding->Voltage at two ends->Frequency of operation->Through the primary winding->Is>Flow through the secondary winding->Is>Primary winding->Turns of->Turn ratio->And output voltage level->；

The following parameters of the three-phase high-frequency high-power transformer are selected according to the design requirements: magnetic core brand and first loss parameter thereof Second loss parameter->Third loss parameter->The method comprises the steps of carrying out a first treatment on the surface of the Single-turn wire diameter of circular stranded wire>And its effective area coefficientThe method comprises the steps of carrying out a first treatment on the surface of the Main insulation Structure->Thickness +.>And sub-insulation structure->Thickness +.>Etc.;

step 2, building leakage inductance by using neural networkThe optimization model is used for establishing a power density optimization model and a loss optimization model by using an analytic formula;

step 2.1, establishing leakage inductance by adopting a counter propagation neural networkOptimizing a model;

step 2.1.1, determining input variables and output variables of a neural network 1-ANN1 model;

the input variables of the neural network 1-ANN1 model are 5. The 5 variables are respectively: by window height +>Magnetic core cross-sectional width->Magnetic core cross-sectional length->Primary winding->Wire diameter->Secondary winding->Wire diameter->The output variable of the neural network is 1, which is transformer leakage inductance +.>Marked as->；

Step 2.1.2, acquiring a sample data set for constructing the neural network by using simulation software;

the sample data required to construct the neural network model includesGroup input data and corresponding +.>Group simulation output values are respectively input data of the neural network: transformer window height +>Magnetic core cross-sectional width->Magnetic core cross-sectional length->Primary winding- >Is->Secondary winding->Is->Leakage inductance of transformer with neural network simulation output value>Is marked as->. Wherein (1)>For serial numbers of groups>；

2.1.3, determining a network structure of a neural network 1-ANN1 model, wherein in the neural network structure, an input layer contains 5 neurons, a hidden layer contains 11 neurons, and an output layer contains 1 neuron;

step 2.1.4, grouping the sample data;

dividing the sample data obtained in step 2.1.2 into a training subset and a test set, wherein the training subset comprisesGroup sample data, test set containing +.>Group sample data, and->；

Step 2.1.5, constructing a neural network 1-ANN1 model;

initializing weight and bias parameters of a neural network 1-ANN1 model, randomly extracting a group of input data from the training subset obtained in the step 2.1.4, inputting the input data into the neural network 1-ANN1 model, and obtaining output corresponding to the input dataCalculating the simulation output value of the neural network>And the actual output value of the neural network->Error between->：

，

Based on the obtained errorUpdating the parameter weight and the threshold value of the neural network 1-ANN1 model by using a gradient descent algorithm to obtain an updated neural network 1-ANN1 model;

and then the test set obtained in the step 2.1.4 The group input data are respectively input into the neural network 1 after updating, and the +.>Group input data corresponding +.>Group output, including output of neural network 1；

Definition of root mean square errorThe expressions are respectively:

，

given a first target errorAnd makes the following judgment:

if it isThe construction of the neural network 1-ANN 1 model is completed, otherwise, the step 2.1.5 is returned;

leakage inductance is established through the training stepsOptimizing a model;

step 2.2, establishing a power density optimization model;

at the power density of the systemFor the purpose, a power density optimization model is established, and the expression is as follows:

，

in the method, in the process of the invention,the volume of the three-phase high-frequency high-power transformer; />For primary winding->Is a wire diameter of (2); />For secondary winding->Is a wire diameter of (2); />Window length for three-phase high-frequency high-power transformer; />Window height for three-phase high-frequency high-power transformer, < >>Magnetic core cross section width for three-phase high-frequency high-power transformer, < >>The length of the cross section of the magnetic core of the three-phase high-frequency high-power transformer; />Is a main insulating structure->Is a thickness of (2); />Is a sub-insulating structure->Is a thickness of (c).

Step 2.3, establishing a loss optimization model of the three-phase high-frequency high-power transformer, and calculating the efficiencyAnd loss per unit area The expression is:

，

wherein,surface area for three-phase high-frequency high-power transformer, < >>Core loss for three-phase high-frequency high-power transformer, < >>Winding loss of the three-phase high-frequency high-power transformer; magnetic core loss of three-phase high-frequency high-power transformerWinding loss of the three-phase high-frequency high-power transformer>The expressions of (2) are respectively:

，

in the method, in the process of the invention,is the magnetic core volume; />Is a first loss parameter; />Is a second loss parameter; />Is a third loss parameter; />The working frequency of the three-phase high-frequency high-power transformer is set; />Is the peak value of magnetic induction intensity of the magnetic core; />And->Respectively primary winding->And secondary winding->A current; />And->Respectively primary winding->And secondary winding->A resistor;and->Winding primary windings respectively>And secondary winding->A number of circular stranded strands; />And->Single-turn wire diameter of round stranded wire>In the primary winding->And secondary winding->Relative values in the high frequency Dowell model;

step 3, determining a state set according to the built leakage inductance optimization model, the power density optimization model and the loss optimization model of the three-phase high-frequency high-power transformerAction set->And bonus function->；

Step 3.1, determining a State set ；

The current time of the recording system is，/>，/>For the moment of the system termination state, the system is at the current moment +.>The state of (2) is recorded as state->，/>Wherein->Rated power of three-phase high-frequency high-power transformer, < >>For the operating frequency of a three-phase high-frequency high-power transformer, < > in->Leakage inductance parameter of three-phase high-frequency high-power transformer, < ->For the power density of the three-phase high-frequency high-power transformer, < >>For the efficiency of a three-phase high-frequency high-power transformer, < >>Loss per unit area of the three-phase high-frequency high-power transformer;

the state setIs->Personal status->Set of->And (2) and；

step 3.2, determining an action set；

The operation space of the three-phase high-frequency high-power transformer is mainly aimed at the addition, subtraction and variation of the magnetic core and the winding size, so that the system is arranged in the following wayThe action taken at the moment is denoted action->，/>The method comprises the steps of carrying out a first treatment on the surface of the The action set->Is->Individual actions->Set of->；

Step 3.3 determining a reward function；

Step 3.3.1, carrying out normalization treatment on the multi-target model;

the system comprises a leakage inductance optimization model and a loss optimization model, wherein the values among the power density optimization models are not in the same magnitude, and normalization processing is carried out to ensure that the values of the four optimization models are all between 0 and 1;

inductance in leakage inductance optimization model For optimization purposes +.>Rate in loss optimization model +.>And loss per unit area->For optimization purposes +.>+.>Marked as optimization objective->；

Introducing optimization targets，/>For optimization objective->Normalization is carried out to obtain a normalized optimization target->And->The expression is as follows:

，

in the method, in the process of the invention,for the purpose of optimizing the minimum value of the target +.>Maximum value for optimization objective;

step 3.3.2, weighting the four optimization targets, and setting a reward function；

The bonus functionThe weighted sum of prize values generated for all actions of the system from the current state to the end state is represented as follows:

，

in the method, in the process of the invention,for the system in->Status of time->Take action->The single step prize value obtained later, +.>For the discount factor->Indicating the extent of the effect of the length of time on the prize value:

wherein->For penalty factor, +.>Is a weight coefficient>And->；

Step 4, obtaining an optimal strategy by utilizing offline learning of DDPG algorithmOutput optimal action->The method comprises the steps of carrying out a first treatment on the surface of the From the state set->Is optionally extracted->Personal status->Composing training data set for offline learning +.>The method comprises the steps of carrying out a first treatment on the surface of the Status set according to step 3 +.>Action set->And bonus function->Offline learning is performed by using a DDPG algorithm of deep reinforcement learning, so that an optimal strategy is obtained >；

The DDPG algorithm comprises 4 neural networks, namely an online strategy network, a target strategy network, an online evaluation network and a target evaluation network, wherein the neural network parameters of the online strategy network are recorded as first neural network parametersThe neural network parameters of the target policy network are noted as second neural network parameters +.>The neural network parameter of the online evaluation network marks the third neural network parameter as +.>The neural network parameters of the target evaluation network are marked as fourth neural network parameters +.>；

Given training step lengthAnd maximum step +.>Given training round number +.>And maximum training rounds +.>，/>I.e. comprising +/in each training round>Training and co-administering->Training rounds;

defining a bonus function in each training roundAnd is recorded as average reward +.>，The number of +/per training round>In the course of (2) the first neural network parameter +.>Second neural network parameter->Third neural network parameter->Fourth neural network parameter->Are all towards average rewards>The maximized direction update finally gets the optimal strategy +.>；

The optimal strategyThe expression of (2) is as follows:

，

in the method, in the process of the invention,a state value input for the online policy network corresponding to the optimal policy, and ，/>The action value outputted by the online policy network corresponding to the optimal policy is recorded as optimal action +.>And->；

Outputting the optimal action；

Step 5, according to the optimal actionMake the system in state set->Can realize leakage inductance, efficiency and single under any state and weightBit area loss heat dissipation and power density optimization;

step 5.1, first selecting the states other than the training data set from the state set SReform an application data set and then randomly extract +.>Personal status->And redefined as application state->，Application state->；

Step 5.2, the optimal action output in the step 4 is performedSubstituted into->Personal application state->In (2) different application states are obtained>Down-output optimal application action->，/>；

Step 5.3, the application state is setOptimal application actionsSubstituting the leakage inductance optimization model, the loss optimization model and the power density optimization model established in the step 2 respectively to obtain the optimal leakage inductance of the transformer in the system>Optimal efficiency of system optimization>Optimal loss per unit area of the system>Optimal power density of the system->，/>；

Step 6, determining a proper excitation inductanceA value;

determining proper excitation inductance according to turn-off current, gain trend and soft switching characteristic of three-phase LLC resonant converter Value and by adjusting the air gap layer +.>Thickness to obtain the required excitation inductance +.>A value;

preferably, the offline learning is performed by using the DDPG algorithm of deep reinforcement learning in step 4 to obtain an optimal strategyThe specific steps of (a) are as follows:

step 4.1, initializing first neural network parametersSecond neural network parameter->Third neural network parameter->And fourth neural network parameter->And let->The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of the experience playback pool P to +.>The method comprises the steps of carrying out a first treatment on the surface of the Initializing learning rate of online evaluation network>Learning rate of online policy network ∈>Moving average update parametersAnd->The method comprises the steps of carrying out a first treatment on the surface of the The output of the online policy network is recorded as +.>Wherein->Action value output for online policy network, +.>Corresponding to the action set +.>Is an individual of (a), and；/>status value entered for the online policy network, +.>Corresponding to the state set->Individuals in (a) and->；/>First neural network parameters for passing through an online policy networkAnd the entered state value +.>The strategy obtained;

step 4.2, the system is arranged inStatus of time->Inputting the online policy network to obtain the output of the online policy networkAnd add noise->Action of obtaining final output ∈>The specific expression is as follows:

，

step 4.3, the system is according to the state Execution of action->Transition to the new state +.>At the same time get execution action->After a single step reward value +.>Will->Called state transition sequences, and handleStore experience playback pool->The system goes to the next moment +>Status of->；

Step 4.2 to step 4.3 are circularly executed, and experience recording playback pool is adoptedThe number of the medium state transition sequences is +.>If (if)Step 4.4 is entered, otherwise step 4.2 is returned;

step 4.4, from experience playback poolRandom extraction of->A state transition sequence, and->Will->The personal state transition sequence is used as small batch data for training an online strategy network and an online evaluation network, and the +.>The individual state transition sequence is denoted->，/>；

Step 4.5, small batch data obtained according to step 4.4，/>Calculating a jackpot +.>Error function->The specific expression is as follows:

，

in the method, in the process of the invention,a scoring value output for the target scoring network, whereinAction value output for target policy network, +.>The state values input for the target evaluation network and the target strategy network; />For online evaluation of the scoring value output by the network, +.>And->The method comprises the steps of evaluating a state value and an action value input by a network on line;

step 4.6, on-line evaluation network passing minimized error function To update->On-line policy network through deterministic policy gradient +.>Update->The target evaluation network and the target policy network are updated by a moving average methodAnd->The specific expression is as follows:

，

in the method, in the process of the invention,is a partial guide symbol>Is a time constant, wherein->Representation strategy->For->The deviation is calculated and guided, and the deviation is calculated,input representing on-line evaluation network is +.>In the time of online evaluation of the scoring value outputted by the network +.>Action value->The deviation is calculated and guided, and the deviation is calculated,input representing an online policy network is +.>When in use, the action value output by the online policy networkFor->Deviation-inducing and->Representing error function->For->Deviation-inducing and->For the third neural network parameter after updating, +.>For the first neural network parameter after updating, +.>For the fourth neural network parameter after updating, +.>For the updated second neural network parameter;

step 4.7, completing the training process of one step length when the steps 4.4-4.6 are completed onceAnd (3) repeating the steps 4.4-4.6 when +.>When the training process of one round is completed, the training process of the next round starts from the step 4.2 to the step 4.6, when +.>And (3) repeating the steps 4.2-4.6 when +.>When (I)>When the training process of each round is completed, the learning process of the DDPG algorithm is ended;

Step 4.8, finishing the training algorithm and storing the optimal strategyThe average prize for one training round is recorded as +.>；

At the position ofIn the training round, the first neural network parameter +.>Second neural network parameter->Third neural network parameter->And fourth neural network parameter->Towards average rewards->The maximized direction update finally gets the optimal strategy +.>。

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the artificial neural network is adopted to model the leakage inductance of the three-phase high-frequency transformer, so that a complex mathematical formula of the leakage inductance is replaced, and the problems of complex calculation and large error of the mathematical formula of the leakage inductance are solved.

(2) Compared with the traditional manual optimization and the multi-objective optimization by using NSGA-II algorithm, the multi-objective optimization method can obviously reduce the optimization time of the three-phase high-frequency transformer and improve the design efficiency of the three-phase high-frequency transformer.

(3) According to the invention, the DDPG algorithm is adopted to carry out multi-objective optimization on the three-phase high-frequency transformer, so that the problem of complex high-dimensional design variables can be solved, the problem of failure in designing the multi-three-phase high-frequency transformer can be avoided, an optimal scheme meeting the optimization objective is found, and the performance of the three-phase high-frequency transformer is fully improved.

(4) The invention provides the optimal strategyUnder the normal working condition of the dynamic three-phase high-frequency high-power transformer and under different weights distributed to four targets, the optimal design variable values can be directly obtained to optimize the efficiency, the heat dissipation loss in unit area, the power density and the leakage inductance, the complex and time-consuming optimizing solving process is not needed to be carried out again, and the method is simple, convenient and quick, and saves the computing resources.

Drawings

FIG. 1 is a schematic diagram of a three-phase high-frequency high-power transformer in a three-dimensional structure according to an embodiment of the present invention;

FIG. 2 is a front view of a three-phase high-frequency high-power transformer according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a three-phase LLC resonant converter;

FIG. 4 is a block diagram of a three-phase high frequency high power transformer multi-objective optimization method;

FIG. 5 is a flow chart of a multi-objective optimization method for a three-phase high frequency high power transformer;

FIG. 6 is a flow chart of the three-phase high-frequency high-power transformer for optimally modeling leakage inductance by using a neural network;

FIG. 7 is a neural network structure diagram of a leakage inductance optimization model in the multi-objective optimization method of the three-phase high-frequency high-power transformer;

FIG. 8 is a graph of convergence effects of average rewards in an embodiment of the invention;

FIGS. 9 (a) -9 (e) are the subjectIn the training effect diagram of the action variable in the embodiment of the invention, fig. 9 (a) is a window height of a three-phase high-power transformerFIG. 9 (b) shows the training effect of the three-phase high-power transformer core cross-sectional length +.>FIG. 9 (c) shows the training effect of the three-phase high-power transformer core cross-sectional width +.>Fig. 9 (d) shows the training effect of the primary winding +.>Wire diameter->Figure 9 (e) is a training effect diagram of a three-phase high power transformer secondary windingWire diameter->Is a training effect diagram of (1).

Detailed Description

The technical scheme of the present invention will be clearly and completely described in the following in conjunction with the accompanying drawings and examples. Fig. 1 is a schematic perspective view of a three-phase high-frequency high-power transformer according to an embodiment of the present invention, fig. 2 is a front view of the three-phase high-frequency high-power transformer according to an embodiment of the present invention, and as can be seen from fig. 1 and 2, the three-phase high-frequency high-power transformer according to the present invention is applied to a three-phase LLC resonant converter, and includes three identical single-phase transformers, an upper yokeA lower yoke->And an insulating structure; by a means ofThe insulation structure comprises a main insulation structure->And sub-insulation structure->；

Any one single-phase transformer of the three-phase high-frequency high-power transformer is marked as Phase transformer->，/>The phase of the light is represented by a phase,said->Phase transformer->From inside to outside, a magnetic core leg with a rectangular cross section>A primary winding->And a secondary winding->Composition, primary winding->Secondary winding->Is shaped as a magnetic core column->The three parts are concentric and are in the magnetic core column +.>And primary winding->Is filled with a secondary insulation structure>In the primary winding->And secondary winding->Is filled with main insulating structure>Magnetic core column->The height of (2) is denoted as window height +.>Magnetic core column->The width of the cross section of (a) is denoted as core cross section width +.>Magnetic core column->The length of the cross section of (a) is recorded as the length of the cross section of the magnetic core；

The upper magnetic yokeAnd a lower yoke->Is in the shape of a cuboid of the same height as the cross-sectional width of the core +.>Equal width and core cross-section length +.>Equal; three identical single-phase transformers are equally spaced +.>Are arranged in sequence side by side at the upper magnetic yoke->And a lower yoke->Between and in three single-phase transformers and upper yoke +>Between, bottom yoke->A certain space is reserved between the two parts, and equal distance is reserved between the two parts>Recorded as window length +.>The method comprises the steps of carrying out a first treatment on the surface of the In three magnetic core columns->Is +.>The non-magnetic material with the same thickness is paved in the opposite space, and forms an air gap layer +. >The method comprises the steps of carrying out a first treatment on the surface of the Three times of three transformersStage winding->Is +.>Between and the lower yoke->The space between the two is filled with a secondary insulation structure>。

The magnetic core columnUpper yoke->Lower yoke->All are made of ferrite materials with initial magnetic permeability larger than 2500, and the single-turn wire diameter of the circular stranded wire is made to be +.>Skin depth +.f. of electromagnetic signal less than operating frequency in three-phase high frequency high power transformer>Skin depth->The expression is:

，

The three-phase high-frequency high-power transformer is applied to a three-phase LLC resonant converter, the topology of the three-phase LLC resonant converter IN the embodiment is shown IN fig. 3, and the three-phase high-frequency high-power transformer comprises a direct-current power supply F, a three-phase full-bridge inverter IN3, a resonant capacitor CR3, a transformer T3, a three-phase uncontrolled rectifier REC3, a filter capacitor Co and a load resistor R.

FIG. 4 is a block diagram of a multi-objective optimization method for a three-phase high-frequency high-power transformer in an embodiment of the invention; fig. 5 is a flow chart of a multi-objective optimization design method of a three-phase high-frequency high-power transformer according to an embodiment of the invention, and as can be seen from fig. 4 and 5, the multi-objective optimization design method includes the following steps:

Step 1, design requirements and parameter selection.

The following parameters of the three-phase high-frequency high-power transformer are selected according to the design requirements: magnetic core brand and first loss parameter thereofSecond loss parameter->Third loss parameter->The method comprises the steps of carrying out a first treatment on the surface of the Single-turn wire diameter of circular stranded wire>And its effective area coefficientThe method comprises the steps of carrying out a first treatment on the surface of the Main insulation Structure->Thickness +.>And sub-insulation structure->Thickness +.>。

In the embodiment of the invention, the design requirements of the three-phase high-frequency high-power transformer are shown in table 1.

TABLE 1

，

In the embodiment of the invention, the magnetic core pole Zi, the upper magnetic yoke S and the lower magnetic yoke X are made of PC95 type ferrite material with initial magnetic permeability of 3300, and the first loss parameter is that0.94, second loss parameter +.>1.453, third loss parameter->2.325; said primary winding->And secondary winding->Adopts single strand wire diameter->Round stranded wire with a thickness of 0.15 mm is wound, main insulation structure +. >Thickness +.>15mm, sub-insulation structure->Thickness +.>Is 5mm.

Step 2, building a leakage inductance optimization model by using a neural network, and building a loss optimization model and a power density optimization model by using an analytic formula;

step 2.1, establishing leakage inductance by adopting a counter propagation neural networkFIG. 6 is a flowchart of the three-phase high-frequency high-power transformer of the invention for optimizing and modeling leakage inductance by using a neural network, as can be seen from FIG. 6;

the input variables of the neural network 1-ANN1 model are 5. The 5 variables are respectively: by window height +>Magnetic core cross-sectional width->Magnetic core cross-sectional length->Primary winding->Wire diameter->Secondary winding->Wire diameter->The output variable of the neural network is 1, which is transformer leakage inductance +.>Marked as->。

the sample data required to construct the neural network model includesGroup input data and corresponding +.>Group simulation output values are respectively input data of the neural network: transformer window height +>Magnetic core cross-sectional width->Magnetic core cross-sectional length- >Primary winding->Is->Secondary winding->Is->Leakage inductance of transformer with neural network simulation output value>Is marked as->. Wherein (1)>For serial numbers of groups>；

Step 2.1.3, determining a network structure of a neural network 1-ANN1 model, wherein FIG. 7 is a neural network structure diagram of a leakage inductance optimization model in the multi-objective optimization method of the three-phase high-frequency high-power transformer, and as can be seen from FIG. 7, in the neural network structure, an input layer contains 5 neurons, a hidden layer contains 11 neurons, and an output layer contains 1 neuron;

step 2.1.4, grouping the sample data;

dividing the sample data obtained in step 2.1.2 into a training subset and a test set, wherein the training subset comprisesGroup sample data, test set containing +.>Group sample data, and->；/>

Step 2.1.5, constructing a neural network 1-ANN1 model;

initializing weight and bias parameters of a neural network 1-ANN1 model, randomly extracting a group of input data from the training subset obtained in the step 2.1.4, inputting the input data into the neural network 1-ANN1 model, and obtaining output corresponding to the input dataCalculating the simulation output value of the neural network>And neural netComplex actual output value +.>Error between->：

，

Based on the obtained error Updating the parameter weight and the threshold value of the neural network 1-ANN1 model by using a gradient descent algorithm to obtain an updated neural network 1-ANN1 model;

and then the test set obtained in the step 2.1.4The group input data are respectively input into the updated neural network 1-ANN1 model to obtain the sum +.>Group input data corresponding +.>Group output, including output of neural network 1；

Definition of root mean square errorThe expressions are respectively:

，

given target errorAnd makes the following judgment:

if it isThe construction of the neural network 1-ANN1 model is completed, otherwise, the step 2.1.5 is returned;

leakage inductance is established through the training stepsAnd optimizing the model.

Step 2.2, establishing a power density optimization model;

，

in the method, in the process of the invention,the volume of the three-phase high-frequency high-power transformer; />For primary winding->Is a wire diameter of (2); />For secondary winding->Is a wire diameter of (2); />Window length for three-phase high-frequency high-power transformer; />Window height for three-phase high-frequency high-power transformer, < >>Magnetic core cross section width for three-phase high-frequency high-power transformer, < >>The length of the cross section of the magnetic core of the three-phase high-frequency high-power transformer; / >Is a main insulating structure->Is a thickness of (2); />Is a sub-insulating structure->Is a thickness of (c).

Step 2.3, establishing a loss model of the three-phase high-frequency high-power transformer, and calculating the efficiencyAnd loss per unit areaThe expression is:

，/>

，

in the method, in the process of the invention,is the magnetic core volume; />Is a first loss parameter; />Is a second loss parameter; />Is a third loss parameter; />The working frequency of the three-phase high-frequency high-power transformer is set; />Is the peak value of magnetic induction intensity of the magnetic core; />Is the magnetic core volume;and->Respectively primary winding->And secondary winding->A current; />And->Respectively primary winding->And secondary winding->A resistor; />And->Winding primary windings respectively>And secondary winding->A number of circular stranded strands; />And->Single-turn wire diameter of round stranded wire>In the primary winding->And secondary winding->Relative values in the high frequency Dowell model.

Step 3, determining a state set according to the built leakage inductance optimization model, the power density optimization model and the loss optimization model of the three-phase high-frequency high-power transformer Action set->And bonus function->；

Step 3.1, determining a State set；

The current time of the recording system is，/>，/>For the moment of the system termination state, the system is at the current moment +.>The state of (2) is recorded as state->，/>Wherein->Rated power of three-phase high-frequency high-power transformer, < >>For the operating frequency of a three-phase high-frequency high-power transformer, < > in->Leakage inductance parameter of three-phase high-frequency high-power transformer, < ->For the power density of the three-phase high-frequency high-power transformer, < >>For the efficiency of a three-phase high-frequency high-power transformer, < >>The unit area loss of the three-phase high-frequency high-power transformer is realized.

The state setIs->Personal status->Set of->And (2) and；

step 3.2, determining an action set；

The operation space of the three-phase high-frequency high-power transformer is mainly aimed at the addition, subtraction and variation of the magnetic core and the winding size, so that the system is arranged in the following wayThe action taken at the moment is denoted action->，/>. The action set->Is->Individual actions->Set of->。

Step 3.3 determining a reward function；/>

Step 3.3.1, carrying out normalization treatment on the multi-target model;

In this embodiment, consider the inductance in the leakage inductance optimization modelFor optimization purposes +.>Rate in loss optimization model +.>And loss per unit area->For optimization purposes +.>+.>Marked as optimization objective->；

Introducing optimization targets，/>For optimization objective->Normalization is carried out to obtain a normalized optimization target->And->The expression is:

，

In an embodiment, consider four optimization objectives, takeTaking->，，/>Said bonus function->The weighted sum of prize values generated for all actions of the system from the current state to the end state is represented as follows:

，

step 4, obtaining an optimal strategy by utilizing offline learning of DDPG algorithmOutput optimal action->；

From a set of statesIs optionally extracted->Personal status->Composing training data set for offline learning +.>The method comprises the steps of carrying out a first treatment on the surface of the Status set according to step 3 +.>Action set->And bonus function->Offline learning is performed by using a DDPG algorithm of deep reinforcement learning, so that an optimal strategy is obtained>；

Given training step lengthAnd maximum step +.>Given training round number +. >And maximum training rounds +.>，/>I.e. in each training roundTraining and co-administering->The training rounds. In this embodiment, get +.>Taking out。

Defining a bonus function in each training roundAnd is recorded as average reward +.>，The number of +/per training round>In the course of (2) the first neural network parameter +.>Second neural network parameter->Third neural network parameter->Fourth neural network parameter->Are all towards average rewards>The maximized direction update finally gets the optimal strategy +.>。/>

The optimal strategy in this embodimentThe expression of (2) is as follows:

，

in the method, in the process of the invention,a state value input for the online policy network corresponding to the optimal policy, and，/>network output for online policies corresponding to optimal policiesAction value of (2) is recorded as optimal action +.>And->；

Outputting the optimal action；

Step 5, according to the optimal actionMake the system in state set->Can realize the optimization of leakage inductance, efficiency, loss per unit area and power density under any state and weight;

step 5.1, first selecting the states other than the training data set from the state set SReform an application data set and then randomly extract +. >Personal status->And redefined as application state->，Application state->；

Step 5.2, the optimal action output in the step 3 is performedSubstituted into->Personal application state->In (2) different application states are obtained>Down-output optimal application action->，/>；

Step 5.3, the application state is setOptimal application actionsRespectively substituting the leakage inductance optimization model, the loss optimization model and the power density optimization model established in the step 1 to obtain the optimal leakage inductance of the transformer in the system>Optimal efficiency of system optimization>Optimal loss per unit area of the system>Optimal power density of the system->，/>。

Step 6: determining an appropriate excitation inductanceA value;

determining proper excitation inductance according to turn-off current, gain trend and soft switching characteristic of three-phase LLC resonant converterValue and by adjusting the air gap layer +.>Thickness to obtain the required excitation inductance +.>Values.

step 4.1, initializing first neural network parametersSecond neural network parameter->Third neural network parameter->And fourth neural network parameter->And let->The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of the experience playback pool P to +. >The method comprises the steps of carrying out a first treatment on the surface of the Initializing learning rate of online evaluation network>Learning rate of online policy network ∈>Moving average update parametersAnd->The method comprises the steps of carrying out a first treatment on the surface of the The output of the online policy network is recorded as +.>Wherein->Action value output for online policy network, +.>Corresponding to the action set +.>Is an individual of (a), and；/>status value entered for the online policy network, +.>Corresponding to the state set->Individuals in (a) and->；/>First neural network parameters for passing through an online policy networkAnd the entered state value +.>The strategy obtained;

，

in the present embodiment, takeTaking->Taking->Taking out noise。

Step 4.3, the system is according to the stateExecution of action->Transfer toChange to new state->At the same time get execution action->After a single step reward value +.>Will->Called state transition sequences, and handleStore experience playback pool->The system goes to the next moment +>Status of->；

In the present embodiment, takeTaking->。

，

in the method, in the process of the invention,evaluation for the objectA scoring value output by the price network, whereinAction value output for target policy network, +.>The state values input for the target evaluation network and the target strategy network; />For online evaluation of the scoring value output by the network, +.>And->The method comprises the steps of evaluating a state value and an action value input by a network on line;

step 4.6, on-line evaluation network passing minimized error functionTo update->On-line policy network through deterministic policy gradient +.>Update->The target evaluation network and the target policy network are updated by a moving average methodAnd->The specific expression is as follows: />

，

In the middle of，Is a partial guide symbol>Is a time constant, wherein->Representation strategy->For->The deviation is calculated and guided, and the deviation is calculated,input representing on-line evaluation network is +.>In the time of online evaluation of the scoring value outputted by the network +. >Action value->The deviation is calculated and guided, and the deviation is calculated,input representing an online policy network is +.>When in use, the action value output by the online policy networkFor->Deviation-inducing and->Representing error function->For->Deviation-inducing and->For the third neural network parameter after updating, +.>For the first neural network parameter after updating, +.>For the fourth neural network parameter after updating, +.>For the updated second neural network parameter;

At the position ofIn the training round, the first neural network parameter +.>Second neural network parameter->Third neural network parameter->And fourth neural network parameter->Towards average rewards->The maximized direction update finally gets the optimal strategy +. >。

In the present embodiment, the excitation inductance is selected by adjustment145uH.

In order to prove the beneficial effects of the invention, the invention is simulated.

FIG. 8 is a diagram of average rewards in an embodiment of the inventionThe abscissa in FIG. 8 is the training round number +.>The ordinate is average reward +.>，/>. As can be seen from fig. 5, in the early stage of the training process, since the agent explores the random execution of actions in the early stage, interacts with the environment and collects experience data, the policy network and evaluation network parameters are not updated temporarily, so the average cumulative reward is small; />After that, the strategy network and the evaluation network start to be updated continuously, so that the average rewards start to be converged continuously, however, because the maximum rewards obtained in different states are different, the training effect of the trend of vibration in the later learning stage is optimized, and four neural network parameters are obtainedHas been updated to get the optimal strategy +.>。

In the present embodiment, the operation is assembled when the rated power is 200kW and the frequency is 30kHzIn (a) and (b)Training is performed in which FIG. 9 (a) is three-phase high power transformer window height +.>FIG. 9 (b) shows the training effect of the three-phase high-power transformer core cross-sectional length +. >FIG. 9 (c) shows the training effect of the three-phase high-power transformer core cross-sectional width +.>Fig. 9 (d) shows the training effect of the primary winding +.>Wire diameterFigure 9 (e) shows the training effect of the three-phase high-power transformer secondary winding +.>Wire diameter->Is a training effect diagram of (1). The abscissa in FIG. 9 is the training round number +.>The ordinate is the length of the cross section of the magnetic core of the three-phase high-power transformer +.>Magnetic core cross-sectional length->Magnetic core cross-sectional width->Primary winding->Wire diameter->Secondary winding->Wire diameter->，. As can be seen, with the number of training rounds/>Is increased, each action variable remains converged, and when，/>And obtaining the optimal action variable value, and calculating to obtain the optimal target of the system, wherein the optimal target is shown in table 2. Table 2 shows the DDPG optimization results under design requirements.

TABLE 2

，

Table 3 shows NSGA optimization results.

TABLE 3 Table 3

，

In the embodiment, under the conditions of rated power of 200kW and frequency of 30kHz, time consumption and optimization effects of the NSGA-II and the DDPG algorithm are compared (only for illustration), the optimization result error of parameter design is within 2.5% based on the optimization result of the NSGA-II, the performance optimization result difference is within 4%, the optimization results of the DDPG and the NSGA-II are similar, and compared with the NSGA-II, the time for obtaining the optimization result of the DDPG is obviously shorter, and the quick response characteristic is better.

Claims

1. The multi-objective optimization design method for the three-phase high-frequency transformer based on deep reinforcement learning is characterized by comprising the following steps of:

step 1, designing requirements and selecting parameters;

The following parameters of the three-phase high-frequency high-power transformer are selected according to the design requirements: magnetic core brand and first loss parameter thereofSecond loss parameter->Third loss parameter->The method comprises the steps of carrying out a first treatment on the surface of the Single-turn wire diameter of circular stranded wire>And its effective area coefficient->The method comprises the steps of carrying out a first treatment on the surface of the Main insulation Structure->Thickness +.>And sub-insulation structure->Thickness +.>；

Step 2, building leakage inductance of three-phase high-frequency high-power transformer by using neural networkThe optimization model is used for establishing a power density optimization model and a loss optimization model by using an analytic formula;

step 3, according to the established leakage inductance of the three-phase high-frequency high-power transformerOptimizing model, power density optimizing model and loss optimizing model, and determining state set +. >Action set->And bonus function->；

Step 4, offline learning is carried out by utilizing a DDPG algorithm, and an optimal strategy is obtainedOutput optimal action->；

step 6, determining a proper excitation inductanceA value;

the three-phase high-frequency high-power transformer is applied to a three-phase LLC resonant converter and comprises three same single-phase transformers and an upper magnetic yokeA lower yoke->And an insulating structure; the insulation structure comprises a main insulation structure +.>And sub-insulation structure；

Any one single-phase transformer of the three-phase high-frequency high-power transformer is marked asPhase transformer->，/>The phase of the light is represented by a phase,said->Phase transformer->From inside to outside, a magnetic core leg with a rectangular cross section>A primary winding->And a secondary winding->Composition, primary winding->Secondary winding->Is shaped as a magnetic core column->The three parts are concentric and are in the magnetic core column +.>And primary winding->Is filled with a secondary insulation structure>In the primary winding->And secondary winding->Is filled with main insulating structure>Magnetic core column- >The height of (2) is denoted as window height +.>Magnetic core column->The width of the cross section of (a) is denoted as core cross section width +.>Magnetic core column->The length of the cross section of (a) is recorded as the length of the cross section of the magnetic core；

The upper magnetic yokeAnd a lower yoke->Is in the shape of a cuboid with the same height as the cross section width of the magnetic coreEqual width and core cross-section length +.>Equal; three identical single-phase transformers are equally spaced +.>Are arranged in sequence side by side at the upper magnetic yoke->And a lower yoke->Between and in three single-phase transformers and upper yoke +>Between, bottom yoke->A certain space is reserved between the two parts, and equal distance is reserved between the two parts>Recorded as window length +.>The method comprises the steps of carrying out a first treatment on the surface of the In three magnetic core columns->Is +.>The non-magnetic conductive materials with the same thickness are paved in the opposite spacesThe non-magnetic material forms an air gap layer +.>The method comprises the steps of carrying out a first treatment on the surface of the Three secondary windings of three transformers +.>Is +.>Between and the lower yoke->The space between the two is filled with a secondary insulation structure>；

Primary windingAnd secondary winding->Winding a plurality of circular stranded wires;

the magnetic core columnUpper yoke->Lower yoke->All are made of ferrite materials with initial magnetic permeability larger than 2500, and the single-turn wire diameter of the circular stranded wire is made to be +.>Skin depth +.f. of electromagnetic signal less than operating frequency in three-phase high frequency high power transformer >Skin depth->The expression is:

,

in the method, in the process of the invention,the resistivity of the conductive material in the circular stranded wire; />The working frequency of the three-phase high-frequency transformer is set; />Magnetic permeability of conductive materials in the circular stranded wires;

the specific implementation method of the step 2 is as follows;

the input variables of the neural network 1-ANN1 model are 5, and the variables areThe method comprises the steps of carrying out a first treatment on the surface of the The 5 variables represent: window height->Magnetic core cross-sectional width->Magnetic core cross-sectional length->Primary winding->Wire diameter->Secondary winding->Wire diameter->The output variables of the neural network 1-ANN1 model are 1, and represent the leakage inductance of the transformer +.>Marked as->；

the sample data required to construct the neural network model includesGroup input data and corresponding +.>Group simulation output values are respectively neural network input data, namely transformer window height +.>Magnetic core cross-sectional width->Magnetic core cross-sectional length->Primary winding->Is->Secondary winding->Is- >Leakage inductance of transformer with neural network simulation output value>Is marked as->The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For serial numbers of groups>；

2.1.3, determining a network structure of a neural network 1-ANN1 model, wherein in the network structure, an input layer contains 5 neurons, a hidden layer contains 11 neurons, and an output layer contains 1 neuron;

step 2.1.4, grouping the sample data;

dividing the sample data obtained in step 2.1.2 into a training subset and a test set, wherein the training subset comprisesGroup sample data, test set containing +.>The data of the group of samples,and->；

Step 2.1.5, constructing a neural network 1-ANN1 model;

,

and then the test set obtained in the step 2.1.4The group input data are respectively input into the updated neural network 1-ANN1 model to obtain the sum +. >Corresponding to group input data/>Group output including output of neural network 1-ANN1；

Definition of root mean square errorThe expressions are respectively:

,

given target errorAnd makes the following judgment:

leakage inductance is established through the training stepsOptimizing a model;

step 2.2, establishing a power density optimization model;

，

in the method, in the process of the invention,the volume of the three-phase high-frequency high-power transformer; />For primary winding->Is a wire diameter of (2); />Is a secondary windingIs a wire diameter of (2); />Window length for three-phase high-frequency high-power transformer; />Window height for three-phase high-frequency high-power transformer, < >>Magnetic core cross section width for three-phase high-frequency high-power transformer, < >>The length of the cross section of the magnetic core of the three-phase high-frequency high-power transformer; />Is a main insulating structure->Is a thickness of (2); />Is a sub-insulating structure->Is a thickness of (2); />Rated power of the three-phase high-frequency high-power transformer;

step 2.3, establishing a loss optimization model of the three-phase high-frequency high-power transformer, and calculating the efficiencyAnd loss per unit areaThe expression is:

，

Wherein,surface area for three-phase high-frequency high-power transformer, < >>Core loss for three-phase high-frequency high-power transformer, < >>Winding loss for three-phase high-frequency high-power transformer, < >>Rated power of the three-phase high-frequency high-power transformer; magnetic core loss of three-phase high-frequency high-power transformer>Winding loss of the three-phase high-frequency high-power transformer>The expressions of (2) are respectively:

,

in the method, in the process of the invention,is the magnetic core volume; />Is a first loss parameter; />Is a second loss parameter; />Is a third loss parameter; />The working frequency of the three-phase high-frequency high-power transformer is set; />Is the peak value of magnetic induction intensity of the magnetic core; />And->Respectively primary winding->And secondary winding->A current; />And->Respectively primary winding->And secondary winding->A resistor; />And->Winding primary windings respectively>And secondary winding->A number of circular stranded strands; />And->Single-turn wire diameter of round stranded wire>In the primary winding->And secondary winding->Relative values in the high frequency Dowell model;

the specific implementation method of the step 3 is as follows;

step 3.1, determining a State set；

The current time of the recording system is，/>，/>For the moment of the system termination state, the system is at the current moment +.>The state of (2) is recorded as state- >，/>Wherein->Rated power of three-phase high-frequency high-power transformer, < >>For the operating frequency of a three-phase high-frequency high-power transformer, < > in->Leakage for three-phase high-frequency high-power transformerSensory parameters (i.e. feeling)>Power density for three-phase high-frequency high-power transformer, < >>For the efficiency of a three-phase high frequency high power transformer,the unit area loss of the three-phase high-frequency high-power transformer is realized;

the state setIs->Personal status->Set of->And (2) and；

step 3.2, determining an action set；

The operation space of the three-phase high-frequency high-power transformer is increased, decreased and changed for the magnetic core and winding size, so that the system is arranged in the following wayThe action taken at the moment is denoted action->，/>The action set ∈ ->Is->Individual actions->Set of->；

Step 3.3 determining a reward function；

Step 3.3.1, carrying out normalization treatment on the multimode;

leakage inductance of the systemThe values of the power density optimization model are not in the same magnitude, and normalization processing is carried out to ensure that the values of the three optimization models are between 0 and 1;

inductance in leakage inductance optimization modelFor optimization purposes +.>Efficiency in loss optimization model +.>And loss per unit area->For optimization purposes +.>+. >Marked as optimization objective->；

Introducing optimization targets，/>For optimization objective->Normalizing to obtain normalized optimization targetAnd->The expression is as follows:

,/>，

step 3.3.2, weight is given to the 4 optimization targets,setting a bonus function；

，

in the method, in the process of the invention,for the system in->Status of time->Take action->The single step prize value obtained later, +.>For the discount factor->Indicating the extent to which the length of time affects the prize value,wherein->In order to penalize the coefficients,is a weight coefficient>And->。

2. The method according to claim 1, wherein the specific implementation method of step 4 is;

from a set of statesIs optionally extracted->Personal status->Training data set for offline learning, < >>The method comprises the steps of carrying out a first treatment on the surface of the Status set according to step 3 +.>Action set->And bonus function->Offline learning is performed by using a DDPG algorithm of deep reinforcement learning, so that an optimal strategy is obtained>；

The DDPG algorithm comprises 4 neural networks, namely an online strategy network, a target strategy network, an online evaluation network and a target evaluation network, wherein in the process of online strategy network, target evaluation network and target evaluation network The neural network parameters of the line policy network are noted as first neural network parametersThe neural network parameters of the target policy network are noted as second neural network parameters +.>The neural network parameter of the online evaluation network marks the third neural network parameter as +.>The neural network parameters of the target evaluation network are marked as fourth neural network parameters +.>；

Given training step lengthAnd maximum step +.>Given training round number +.>And maximum training rounds +.>，I.e. comprising +/in each training round>Training and co-administering->Training rounds;

definition in each training roundIs a reward function of (2)And is recorded as average reward +.>，The number of +/per training round>In the course of (2) the first neural network parameter +.>Second neural network parameter->Third neural network parameter->Fourth neural network parameter->Are all towards average rewards>The maximized direction update finally gets the optimal strategy +.>；

The optimal strategyThe expression of (2) is as follows:

，

in the method, in the process of the invention,a state value input for the online policy network corresponding to the optimal policy, and，/>the action value outputted by the online policy network corresponding to the optimal policy is recorded as optimal action +.>And->；

Outputting the optimal action。

3. The method according to claim 2, wherein the specific implementation method of step 5 is;

Step 5.3, the application state is setOptimal application actionsSubstituting the leakage inductance optimization model, the loss optimization model and the power density optimization model established in the step 2 respectively to obtain the optimal leakage inductance of the transformer in the system>Optimal efficiency of system optimization>Optimal loss per unit area of the system>Optimal power density of the system->，/>。

4. A method according to claim 3, wherein the method of step 6 is performed by;

5. The method of claim 4, further characterized by the specific steps of step 4 as follows:

Step 4.1, initializing first neural network parametersSecond neural network parameter->Third neural network parameter->And fourth neural network parameters/>And let->The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of the experience playback pool P to beThe method comprises the steps of carrying out a first treatment on the surface of the Initializing learning rate of online evaluation network>Learning rate of online policy network ∈>Sliding average update parameter ∈>And->The method comprises the steps of carrying out a first treatment on the surface of the The output of the online policy network is recorded as +.>Wherein->Action value output for online policy network, +.>Corresponding to the action set +.>Is an individual of (a), and；/>status value entered for the online policy network, +.>Corresponding to the state set->Individuals in (a) and->；/>First neural network parameters for passing through an online policy networkAnd the entered state value +.>The strategy obtained;

，

step 4.3, the system is according to the stateExecution of action->Transition to the new state +.>At the same time get execution action->After a single step reward value +.>Will->Called state transition sequence, and +.>Store experience playback pool->The system goes to the next moment + >Status of->；

Step 4.2 to step 4.3 are circularly executed, and experience recording playback pool is adoptedThe number of the medium state transition sequences is +.>If->Step 4.4 is entered, otherwise step 4.2 is returned;

，

step 4.6, on-line evaluation network passing minimized error functionTo update->On-line policy network through deterministic policy gradient +.>Update->The target evaluation network and the target policy network update +_ by a moving average method>And->The specific expression is as follows:

，

in the method, in the process of the invention, Is a partial guide symbol>Is a time constant, wherein->Representation strategy->For->The deviation is calculated and guided, and the deviation is calculated,input representing on-line evaluation network is +.>In the time of online evaluation of the scoring value outputted by the network +.>Action value->The deviation is calculated and guided, and the deviation is calculated,input representing an online policy network is +.>When the online policy network outputs action value +.>For->Deviation-inducing and->Representing error function->For->Deviation-inducing and->For the third neural network parameter after updating, +.>For the first neural network parameter after updating, +.>For the fourth neural network parameter after updating, +.>For the updated second neural network parameter;

step 4.8, finishing the training algorithm and storing the optimal strategyRecord the average rewards of a training round as；

At the position ofIn the training round, the first neural network parameter +. >Second neural network parameter->Parameters of third neural networkAnd fourth neural network parameter->Towards average rewards->The maximized direction is updated to finally obtain the optimal strategy。