CN117313560A - Multi-objective optimization method for IGBT module packaging based on machine learning - Google Patents

Multi-objective optimization method for IGBT module packaging based on machine learning Download PDF

Info

Publication number
CN117313560A
CN117313560A CN202311617795.6A CN202311617795A CN117313560A CN 117313560 A CN117313560 A CN 117313560A CN 202311617795 A CN202311617795 A CN 202311617795A CN 117313560 A CN117313560 A CN 117313560A
Authority
CN
China
Prior art keywords
network
neural network
state
output
online
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311617795.6A
Other languages
Chinese (zh)
Other versions
CN117313560B (en
Inventor
王佳宁
孙菲双
王睿源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202311617795.6A priority Critical patent/CN117313560B/en
Publication of CN117313560A publication Critical patent/CN117313560A/en
Application granted granted Critical
Publication of CN117313560B publication Critical patent/CN117313560B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/23Design optimisation, verification or simulation using finite element methods [FEM] or finite difference methods [FDM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/06Multi-objective optimisation, e.g. Pareto optimisation using simulated annealing [SA], ant colony algorithms or genetic algorithms [GA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces

Abstract

The invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, and belongs to the technical field of power electronics. The method comprises the steps of establishing a three-objective optimization model by utilizing a neural network; determining a state set, an action set and a reward function; offline learning is carried out by using a DDPG algorithm of machine learning, so as to obtain an optimal strategy; and substituting the optimal strategy into the three-objective optimization model for application. The optimization method can enable the IGBT module to realize optimization of solder layer stress, stray inductance and chip junction temperature under any state and any weight coefficient. The invention can solve the problem of high-dimensional design variable of the complex IGBT module, avoids the problem that the traditional manual design cannot be optimized, and adopts the DDPG algorithm to avoid the time-consuming optimizing solving process when the traditional genetic algorithm is changed, thereby greatly saving the computing resource and improving the design efficiency of the IGBT module.

Description

Multi-objective optimization method for IGBT module packaging based on machine learning
Technical Field
The invention belongs to the technical field of power electronics, and particularly relates to a multi-objective optimization method for IGBT module packaging.
Background
The IGBT module, as one of the main failure elements in the photovoltaic inverter, the failure of which will directly affect the safe operation of the system. In the IGBT module, larger stray inductance can bring voltage overshoot and oscillation, and reliability is reduced. Meanwhile, the IGBT module in the photovoltaic inverter is influenced by illumination intensity and external environment temperature throughout the year, and junction temperature inside the device is easy to fluctuate. Because of the mismatch in thermal expansion coefficients of the internal materials, junction temperature fluctuations can cause extrusion deformation between device layers, thereby creating thermal stresses. Such cyclic thermal stresses can accelerate life consumption of the device, ultimately leading to thermal fatigue failure of the device.
Therefore, optimization of IGBT module packaging targeting low stray inductance, low junction temperature and low stress is carried out, and the method has great engineering application value and research significance. The traditional artificial-based packaging optimization design method has the main defects that the mathematical model of a power module and the optimization process need repeated manual iteration improvement, the whole design period is long, and the time cost is high. Many expert scholars propose different solutions for this:
the method can be used for structural optimization based on the thermal performance of the power module by adopting a response surface optimization method in the electric-thermal-force multiple physical field simulation design study [ D ] of the multi-chip parallel SiC MOSFET module (Tianjin university's treatise on the university of Shuo's, 2022.DOI: 10.27356/d.cnki.gtjdu.2020.002346). However, this solution has the following drawbacks:
1) Because of the limitation of finite element software, the optimization target only selects thermal and mechanical performance indexes, and the influence of parasitic parameter stray inductance is not considered;
2) The current level of the method is a fixed value, when the system design requirement changes, iterative optimization is needed again, and the defect of long time consumption exists;
a layout for automatically designing modules using the idea of genetic algorithm is proposed in the heading "Automatic layout design for power module [ J ]" (IEEE Transactions on Power electronics.2013, 28 (1): 481-487 ]), and the layout design process is generally iterated based on genetic algorithm by expressing layout position information, chip placement information, etc. by genetic characters. However, this solution has the following drawbacks:
(1) If the number of the parallel chips of the modules is large, the premature convergence of the genetic algorithm can be caused by the excessive population number, and the accuracy of the layout result is reduced;
(2) In addition, the method lacks accurate mathematical modeling for parasitic inductance, temperature and the like of the module, and influences of factors such as mutual inductance, thermal coupling and the like are often ignored in the design process.
Disclosure of Invention
The technical problem to be solved by the invention is that the optimal solution cannot be provided under a variable state in the multi-objective optimization method of the existing IGBT module, and the defect of long calculation time is overcome. Aiming at the defects, the invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, which utilizes depth certainty in machine learning to provide a strategy algorithm (DDPG), provides an optimal solution under a variable state in a quick response manner, and avoids the limitation that a meta heuristic algorithm needs to carry out iterative optimization again.
In order to achieve the aim of the invention, the invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, wherein the IGBT module comprises an upper bridge arm chip, a lower bridge arm chip, a DBC substrate, a solder layer and a bonding wire; the DBC substrate comprises an upper copper layer, a ceramic layer and a lower copper layer, wherein the thicknesses of the upper copper layer and the lower copper layer are the same; the method comprises the following steps:
step 1, constructing a three-objective optimization model based on a neural network;
the IGBT module is recorded as a system, and the stress F of a solder layer, the stray inductance L and the junction temperature T of a chip of the system are used j Establishing a three-target optimization model based on a neural network as a target;
the input variables of the neural network are 6, and the neural network is divided into two groups, wherein the first group is a current level I, and the second group is an IGBT module size, and the neural network comprises: lateral distance d of the same bridge arm chip 1 Lateral distance d between upper bridge arm chip and lower bridge arm chip 2 Longitudinal distance d between upper bridge arm chip and lower bridge arm chip 3 Copper layer thickness h 1 And ceramic layer thickness h 2
The output variables of the neural network are 3, and the output variables are respectively: solder layer stress F, stray inductance L and chip junction temperature T j
Step 2, determining a state set S and an action set A according to the three-objective optimization model obtained in the step 1 0 And a reward function R, and calculates an average reward
Step 3, according to the state set S and the action set A obtained in the step 2 0 And a reward function R, performing offline learning by using a DDPG algorithm of machine learning to obtain an optimal strategy pi(s) y );
The DDPG algorithm comprises 4 neural networks, namely an online strategy network, a target strategy network, an online evaluation network and a target evaluation network, wherein the neural network parameters of the online strategy network are recorded as theta μ The neural network parameters of the target policy network are noted as θ μ’ The neural network parameter of the on-line evaluation network is marked as theta Q The neural network parameters of the target evaluation network are marked as theta Q’
The optimal strategy pi (s y ) The expression of (2) is as follows:
wherein s is y For the state of inputValue of a y For the state passing through the optimal strategy pi (s y ) Output action value s y =(I y ,F y ,L y ,T jy ) y Wherein I y For current level in any one of the set of states S, F y 、L y 、T jy Respectively the initial module size and the current level I y Corresponding stray inductance, chip junction temperature and solder layer stress; a, a y =(d 1y ,d 2y ,d 3y ,h 1y ,h 2y ) y Wherein (d) 1y ,d 2y ,d 3y ,h 1y ,h 2y ) y To be at current level I y In a state, outputting the optimal IGBT module size corresponding to the lowest stray inductance, the lowest chip junction temperature and the lowest solder layer stress after the optimal strategy;
step 4, the optimal strategy pi (s y ) Substituting the three-objective optimization model based on the neural network established in the step 1, and collecting the system in a stateAdopts an optimal strategy pi(s) y ) All can realize average rewards->Maximization.
Preferably, the implementation process of step 1 is as follows:
step 1.1, determining input variables and output variables of a neural network;
the input variables of the neural network are 6, namely the current level I of the system and the transverse distance d of the same bridge arm chip 1 Lateral distance d between upper bridge arm chip and lower bridge arm chip 2 Longitudinal distance d between upper bridge arm chip and lower bridge arm chip 3 Copper layer thickness h 1 And ceramic layer thickness h 2 The method comprises the steps of carrying out a first treatment on the surface of the The output variables of the neural network are 3, namely the stress F of the solder layer, the stray inductance L and the junction temperature T of the chip j
Step 1.2, acquiring a sample data set required for constructing a neural network by using simulation software;
acquiring sample data required for constructing a neural network by using simulation software, and establishing a sample data set, wherein the sample data set comprises E groups of sample data, each group of data comprises 6 pieces of neural network input data and 3 pieces of neural network simulation output values corresponding to the 6 pieces of neural network input data, and the three groups of data are respectively recorded as input T and simulation output gamma, T= (I) F ,d 1F ,d 2F ,d 3F ,h 1F ,h 2F ),Wherein->For the stress simulation output value of the solder layer, +.>Simulation output value for stray inductance, < >>The output values were simulated for the chip junction temperature, where f=1, 2, E;
dividing the sample data set into a training subset and a verification subset, wherein the training subset comprises E1 group sample data, and the verification subset comprises E2 group sample data, and E1+E2=E;
step 1.3, constructing a neural network A;
the method comprises the steps of constructing a neural network A, wherein the neural network 1 consists of an input layer, an output layer and an hidden layer, wherein the input layer contains 6 neurons, the hidden layer contains 11 neurons, and the output layer contains 3 neurons;
step 1.4, randomly extracting a group of input data from the training subset obtained in step 1.2, inputting the input data into the neural network A to obtain outputs corresponding to the input data, and respectively recording the outputs as solder layer stress network output values F F1 Stray inductance network output value L F1 And chip junction temperature network output value T jF1 Wherein f1=1, 2, E1;
step 1.5, carrying out parameter updating on the neural network A by adopting an error back propagation gradient descent algorithm to obtain an updated neural network B;
step 1.6, respectively inputting the E2 group input data of the verification subset obtained in the step 1.2 into a neural network B to obtain E2 group output corresponding to the E2 group input data, and marking any one of the E2 group input data as F2 group, wherein the F2 group comprises a solder layer stress network output value F F2 Stray inductance network output value L F2 And chip junction temperature network output value T Jf2 Wherein f2=e1+1, e1+2, E;
step 1.7, defining a root mean square error sigma, wherein the expression is as follows:
comparing the root mean square error sigma with a preset target error epsilon, and making the following judgment:
if sigma < epsilon, the neural network model is constructed; otherwise, returning to the step 1.4;
and marking the constructed neural network model as a three-objective optimization model.
Preferably, the implementation process of step 2 is as follows:
the state set S is defined as follows:
defining action set A 0 The following are provided:
recording a certain moment of the system as T, and a moment of the system termination state as T, t=1, 2. The state of the system at the time t is recorded as s t The action taken by the system at time t is denoted as a t The specific expression is as follows:
let s be the state at the time t+1 next to the time t t+1 The operation at time t+1 is denoted as a t+1 The specific expression is as follows:
the bonus function R represents a weighted sum of the bonus values generated by all actions of the system from the current state to the end state, expressed as follows:
wherein, gamma is a discount factor and represents the influence degree of the time length on the rewarding value t-1 For accumulation of discount factors at time t, r t For the state s of the system at time t t Take action a t The single step prize value obtained after that has the expression:
wherein, psi is penalty coefficient, eta 1 As the first weight coefficient, eta 2 As the second weight coefficient, eta 3 Is a third weight coefficient;
will single step prize value r t Record as average rewards
Preferably, in step 3, offline learning is performed by using a DDPG algorithm of machine learning to obtain an optimal policy pi (s y ) The specific implementation process of (2) is as follows:
step 3.1, initializing neural network parameters theta of an online policy network, a target policy network, an online evaluation network and a target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Let theta μ’μ 、θ Q’Q The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of an experience playback pool P as D;
the output of the online policy network is noted as a, a=μ (s|θ μ ) Wherein a is onlineAction value, a, outputted by the policy network, corresponding to said action set, a, in claim 1 0 And a= (d) 1 ,d 2 ,d 3 ,h 1 ,h 2 ) The method comprises the steps of carrying out a first treatment on the surface of the S is the state value entered by the online policy network, S corresponds to an individual in the state set S in claim 1, and s= (I, F, L, T) j ) The method comprises the steps of carrying out a first treatment on the surface of the μ is the neural network parameter θ through the online policy network μ And a policy derived from the entered state value s;
step 3.2, state s of the system at time t t Inputting the online policy network to obtain the output mu of the online policy network t (s tμt ) And adding noise delta t Action a of obtaining final output t The specific expression is as follows:
step 3.3, the system is based on the state s t Executing action a t
Loading the three-objective optimization model into a machine learning algorithm, and recording the three-objective optimization model as an environment model; will I t ,d 1t ,d 2t ,d 3t ,h 1t ,h 2t As input variables of the environmental model, output variables are obtained and denoted as F t+1 ,L t+1 ,T jt+1 The method comprises the steps of carrying out a first treatment on the surface of the Build I t For normal distribution function of mean value, giving standard deviation, randomly sampling to obtain I t+1
Transition to a new state s t+1 =(I t+1 ,F t+1 ,L t+1 ,T jt+1 ) t+1 At the same time get the execution action a t The single step prize value r t Will(s) t ,a t ,r t ,s t+1 ) Called a state transition sequence, and stored in the experience playback pool P, the system enters a state s of t+1 at the next time t+1
Circularly executing the steps 3.2-3.3, recording the number of state transition sequences in the experience playback pool P as N, entering the step 3.4 if N=D, otherwise returning to the step 3.2;
step 3.4, randomly extracting n state transition sequences from the experience playback pool P, wherein n is less than D, taking the n state transition sequences as small batch data for training an online strategy network and an online evaluation network, and recording the kth state transition sequence in the small batch data as(s) k ,a k ,r k ,s k+1 ) N is a small batch sampling factor, k=1, 2,;
step 3.5, based on the small batch data(s) obtained in step 3.4 k ,a k ,r k ,s k+1 ) K=1, 2,3,..n, n, calculated as the jackpot y k And error function L (θ) Q ) The specific expression is as follows:
in which Q (s k+1 ,u (s k+1μ’Q’ ) Scoring value output for target evaluation network, wherein u (s k+1μ’ )|θ Q’ Action value s output for target strategy network k+1 The state values input for the target evaluation network and the target strategy network; q(s) k ,a kQ ) For on-line evaluation of the scoring value output by the network s k And a k The method comprises the steps of evaluating a state value and an action value input by a network on line;
step 3.6, on-line evaluation network is performed by minimizing the error function L (θ Q ) To update theta Q The online strategy network passes through a deterministic strategy gradient V θμ J update θ μ The target evaluation network and the target policy network update theta by a moving average method Q’ And theta μ’ The specific expression is as follows:
wherein, V is a partial guide symbol, wherein V θμ J represents policy J vs. theta μ The deviation is calculated and guided, and the deviation is calculated,input representing online evaluation network is s=s k ,a=μ(s k ) When in use, the scoring value output by the network is evaluated onlineDeviation of the action value a is determined, +.>Input representing online policy network is s=s k When the online policy network outputs action value +.>For theta μ Deviation-inducing and->Representing an error function L (θ) Q ) For theta Q Obtaining a deflection guide;
α Q to evaluate the learning rate of a network on line, alpha μ Learning rate of online strategy network, tau is a running average update parameter, and 0 < alpha Q <1,0<α μ <1,0<τ<1,Neural network parameters for an online evaluation network after updating, +.>For the neural network parameters of the online policy network after updating, +.>To update the neural network parameters of the target evaluation network after,neural network parameters of the target strategy network after updating;
step 3.7, giving a step size, a maximum step size max Training round number M and maximum training round number M, step=1, 2,..step max M=1, 2, M, when the steps 3.4 to 3.6 are completed once, training of one stepThe process is completed, the steps 3.4 to 3.6 are repeatedly executed, and when step is completed max When the training process of each step length is completed, the training process of one round is completed; starting the training process of the next round from the step 3.2 to the step 3.6, repeatedly executing the steps 3.2-3.6, and ending the learning process of the DDPG algorithm when the training processes of M rounds are completed;
on-line policy network, target policy network, on-line evaluation network, and neural network parameter θ of target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Toward maximizationIs updated in the direction of the (c) to finally obtain the optimal strategy pi (s y )。
Compared with the prior art, the invention has the beneficial effects that:
(1) According to the invention, the nonlinear mapping relation between the module size and the optimization target is constructed by utilizing the neural network, the capability of learning the distributed characteristics among data is utilized by the ANN, the complex finite element simulation is converted into a matrix expression corresponding to the neural network, the optimization target of the IGBT module can be quickly obtained by inputting the corresponding module size, the problems of time consumption, complex finite element simulation and difficult software interaction caused by multi-physical field coupling are avoided, and the calculation resources are greatly reduced;
(2) The optimal strategy pi (a|s) provided by the invention can directly obtain the optimal design variable value under the design requirements of different IGBT module current levels so as to maximize the efficiency, does not need to carry out complex and time-consuming optimizing solving process again, is simple, convenient and quick, and saves calculation resources.
Drawings
Fig. 1 is a three-objective optimization model structure based on a neural network, which is built in an embodiment of the present invention.
FIG. 2 is a chart showing the convergence of the training frequency and error of the neural network according to the present invention.
FIG. 3 is a block diagram of a multi-objective optimization method of the present invention.
FIG. 4 is a flow chart of the multi-objective optimization method of the present invention.
FIG. 5 is a chart showing the convergence effect of average rewards in an embodiment of the invention.
Detailed Description
The present invention will be described in detail with reference to the accompanying drawings.
Fig. 1 is a block diagram of a neural network based on a three-objective optimization model of the neural network established in an embodiment of the present invention. As can be seen, the input layer contains 6 neurons, the output layer contains 3 neurons, the hidden layer contains 11 neurons, given a set of inputs (I, d 1 ,d 2 ,d 3 ,h 1 ,h 2 ) Can obtain a group of outputs (F, L, T j ). In addition, fig. 1 shows the relationship between 11 neurons in the hidden layer and 6 neurons in the input layer and 3 neurons in the output layer, respectively.
Fig. 2 is a convergence graph of the root mean square error of the neural network training in this example. After the data set and structure of the neural network are determined, the neural network is trained using the training commands. After about 10000 iterations, the root mean square error is significantly reduced, and the neural network training is completed.
Fig. 3 is a block diagram of the IGBT module multi-objective optimization method of the present invention, and fig. 4 is a flowchart of the IGBT module multi-objective optimization method of the present invention. As can be seen from fig. 3 and 4, the IGBT module multi-objective optimization method optimizes the IGBT module stress, stray inductance, and junction temperature based on machine learning.
The invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, wherein the IGBT module comprises an upper bridge arm chip, a lower bridge arm chip, a DBC substrate (directly copper-clad ceramic substrate), a solder layer and a bonding wire; the DBC substrate comprises an upper copper layer, a ceramic layer and a lower copper layer, wherein the thicknesses of the upper copper layer and the lower copper layer are the same. Specifically, the method comprises the following steps:
step 1, constructing a three-objective optimization model based on a neural network;
the IGBT module is recorded as a system, and the stress F of a solder layer, the stray inductance L and the junction temperature T of a chip of the system are used j Establishing a three-target optimization model based on a neural network as a target;
the nerveThe input variable of network is 6, divides into two groups, and first group is electric current class I, and the second group is IGBT module size, includes: lateral distance d of the same bridge arm chip 1 Lateral distance d between upper bridge arm chip and lower bridge arm chip 2 Longitudinal distance d between upper bridge arm chip and lower bridge arm chip 3 Copper layer thickness h 1 And ceramic layer thickness h 2
The output variables of the neural network are 3, and the output variables are respectively: solder layer stress F, stray inductance L and chip junction temperature T j
In this embodiment, the implementation procedure of step 1 is as follows:
step 1.1, determining input variables and output variables of a neural network;
the input variables of the neural network are 6, namely the current level I of the system and the transverse distance d of the same bridge arm chip 1 Lateral distance d between upper bridge arm chip and lower bridge arm chip 2 Longitudinal distance d between upper bridge arm chip and lower bridge arm chip 3 Copper layer thickness h 1 And ceramic layer thickness h 2 The method comprises the steps of carrying out a first treatment on the surface of the The output variables of the neural network are 3, namely the stress F of the solder layer, the stray inductance L and the junction temperature T of the chip j
Step 1.2, acquiring a sample data set required for constructing a neural network by using simulation software;
acquiring sample data required for constructing a neural network by using simulation software, and establishing a sample data set, wherein the sample data set comprises E groups of sample data, each group of data comprises 6 pieces of neural network input data and 3 pieces of neural network simulation output values corresponding to the 6 pieces of neural network input data, and the three groups of data are respectively recorded as input T and simulation output gamma, T= (I) F ,d 1F ,d 2F ,d 3F ,h 1F ,h 2F ),Wherein->For the stress simulation output value of the solder layer, +.>Simulation output value for stray inductance, < >>The output values were simulated for the chip junction temperature, where f=1, 2, E;
dividing the sample data set into a training subset and a verification subset, wherein the training subset comprises E1 group sample data, and the verification subset comprises E2 group sample data, and E1+E2=E;
step 1.3, constructing a neural network A;
the method comprises the steps of constructing a neural network A, wherein the neural network 1 consists of an input layer, an output layer and an hidden layer, wherein the input layer contains 6 neurons, the hidden layer contains 11 neurons, and the output layer contains 3 neurons;
step 1.4, randomly extracting a group of input data from the training subset obtained in step 1.2, inputting the input data into the neural network A to obtain outputs corresponding to the input data, and respectively recording the outputs as solder layer stress network output values F F1 Stray inductance network output value L F1 And chip junction temperature network output value T jF1 Wherein f1=1, 2, E1;
step 1.5, carrying out parameter updating on the neural network A by adopting an error back propagation gradient descent algorithm to obtain an updated neural network B;
step 1.6, respectively inputting the E2 group input data of the verification subset obtained in the step 1.2 into a neural network B to obtain E2 group output corresponding to the E2 group input data, and marking any one of the E2 group input data as F2 group, wherein the F2 group comprises a solder layer stress network output value F F2 Stray inductance network output value L F2 And chip junction temperature network output value T Jf2 Wherein f2=e1+1, e1+2, E;
step 1.7, defining a root mean square error sigma, wherein the expression is as follows:
comparing the root mean square error sigma with a preset target error epsilon, and making the following judgment:
if sigma < epsilon, the neural network model is constructed; otherwise, returning to the step 1.4;
and marking the constructed neural network model as a three-objective optimization model.
Specifically, in this example, the stray inductance of the IGBT module is first extracted using Q3D finite element simulation software, the direct current positive input side is set to source, the direct current negative input side is sink, and the module size in the half bridge module is as follows: the same bridge arm chip is subjected to parameterization treatment by a transverse distance d1, a transverse distance d2 between an upper bridge arm chip and a lower bridge arm chip, a longitudinal distance d3 between the upper bridge arm chip and the lower bridge arm chip, a copper layer thickness h1 and a ceramic layer thickness h 2. d1 is 1-11 mm in value range, and the step length is 2 mm; d2 is 5-20 mm in value range and 3 mm in step length; d3 is 5-25 mm in value range and 5 mm in step length; the value range of h1 is 0.1-0.2 mm, and the step length is 0.05; the value range of the millimeter h2 is 0.2-0.4 millimeter, and the step length is 0.1 millimeter; a total of 54×3=1875 sets of data were available, noted as data a.
In this example, the IGBT module was simulated using the comsol software for electro-thermal-force coupling to extract the chip junction temperature and solder layer stress. And carrying out parameterization treatment on the module size and the current level in the half-bridge module, wherein the value range of the current level is 230-250 milliamperes, the step length is 4, and the value range and the step length of the module size are the same. A total of 55×3=9375 sets of data were available, noted as data B.
In this example, the input data of the data a and the data B are inclusion relationships, and may be integrated into a set of data, and after normalization processing and random scrambling on the data, a set of sample data sets is obtained, including 55×3=9375 sets of sample data, i.e. e=9375.
In this example, the sample data set is divided into a training subset and a verification subset, and the division ratio can be divided into 80% and 20% according to a classical division method, wherein 80% is used as the training set, and 20% is used as the test set. Stage, e1=7500, e2=1875.
Step 2, optimizing according to the three targets obtained in the step 1Model, determining state set S and action set A 0 And a reward function R, and calculates an average reward
In this embodiment, the implementation procedure of step 2 is as follows:
the state set S is defined as follows:
defining action set A 0 The following are provided:
recording a certain moment of the system as T, and a moment of the system termination state as T, t=1, 2. The state of the system at the time t is recorded as s t The action taken by the system at time t is denoted as a t The specific expression is as follows:
let s be the state at the time t+1 next to the time t t+1 The operation at time t+1 is denoted as a t+1 The specific expression is as follows:
the bonus function R represents a weighted sum of the bonus values generated by all actions of the system from the current state to the end state, expressed as follows:
wherein, gamma is a discount factor and represents the influence degree of the time length on the rewarding value t-1 For accumulation of discount factors at time t, r t For the state s of the system at time t t Take action a t The single step prize value obtained after that has the expression:
wherein, psi is penalty coefficient, eta 1 As the first weight coefficient, eta 2 As the second weight coefficient, eta 3 Is a third weight coefficient;
will single step prize value r t Record as average rewards
Specifically, let ψ=1000 and η 123 =1, taking γ=0.9.
Step 3, according to the state set S and the action set A obtained in the step 2 0 And a reward function R, performing offline learning by using a DDPG algorithm of machine learning to obtain an optimal strategy pi(s) y );
The DDPG algorithm comprises 4 neural networks, namely an online strategy network, a target strategy network, an online evaluation network and a target evaluation network, wherein the neural network parameters of the online strategy network are recorded as theta μ The neural network parameters of the target policy network are noted as θ μ’ The neural network parameter of the on-line evaluation network is marked as theta Q The neural network parameters of the target evaluation network are marked as theta Q’
The optimal strategy pi (s y ) The expression of (2) is as follows:
wherein s is y For the entered state value, a y For the state passing through the optimal strategy pi (s y ) Output action value s y =(I y ,F y ,L y ,T jy ) y Wherein I y For current level in any one of the set of states S, F y 、L y 、T jy Respectively the initial module size and the current level I y Corresponding stray inductance, chip junction temperature and solder layer stress; a, a y =(d 1y ,d 2y ,d 3y ,h 1y ,h 2y ) y Wherein (d) 1y ,d 2y ,d 3y ,h 1y ,h 2y ) y To be at current level I y In the state, the optimal IGBT module size corresponding to the lowest stray inductance, the lowest chip junction temperature and the lowest solder layer stress is output after the optimal strategy.
Step 4, the optimal strategy pi (s y ) Substituting the three-objective optimization model based on the neural network established in the step 1, and adopting an optimal strategy pi (S) by the system under any state in the state set S y ) All can realize average rewardsMaximization.
In this embodiment, step 3 of performing offline learning by using the DDPG algorithm of machine learning to obtain an optimal policy pi(s) y ) The specific implementation process of (2) is as follows:
step 3.1, initializing neural network parameters theta of an online policy network, a target policy network, an online evaluation network and a target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Let theta μ’μ
θ Q’Q The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of an experience playback pool P as D;
the output of the online policy network is noted as a, a=μ (s|θ μ ) Where a is the action value output by the online policy network, a corresponds to the action set A in claim 1 0 And a= (d) 1 ,d 2 ,d 3 ,h 1 ,h 2 ) The method comprises the steps of carrying out a first treatment on the surface of the S is the state value entered by the online policy network, S corresponds to an individual in the state set S in claim 1, and s= (I, F, L, T) j ) The method comprises the steps of carrying out a first treatment on the surface of the μ is the neural network parameter θ through the online policy network μ And a policy derived from the entered state value s;
step 3.2, the system is arranged inState s at time t t Inputting the online policy network to obtain the output mu of the online policy network t (s tμt ) And adding noise delta t Action a of obtaining final output t The specific expression is as follows:
step 3.3, the system is based on the state s t Executing action a t
Loading the three-objective optimization model into a machine learning algorithm, and recording the three-objective optimization model as an environment model; will I t ,d 1t ,d 2t ,d 3t ,h 1t ,h 2t As input variables of the environmental model, output variables are obtained and denoted as F t+1 ,L t+1 ,T jt+1 The method comprises the steps of carrying out a first treatment on the surface of the Build I t For normal distribution function of mean value, giving standard deviation, randomly sampling to obtain I t+1
Transition to a new state s t+1 =(I t+1 ,F t+1 ,L t+1 ,T jt+1 ) t+1 At the same time get the execution action a t The single step prize value r t Will(s) t ,a t ,r t ,s t+1 ) Called a state transition sequence, and stored in the experience playback pool P, the system enters a state s of t+1 at the next time t+1
Circularly executing the steps 3.2-3.3, recording the number of state transition sequences in the experience playback pool P as N, entering the step 3.4 if N=D, otherwise returning to the step 3.2;
step 3.4, randomly extracting n state transition sequences from the experience playback pool P, wherein n is less than D, taking the n state transition sequences as small batch data for training an online strategy network and an online evaluation network, and recording the kth state transition sequence in the small batch data as(s) k ,a k ,r k ,s k+1 ) N is a small batch sampling factor, k=1, 2,;
step 3.5, based on the small batch data(s) obtained in step 3.4 k ,a k ,r k ,s k+1 ),k=1,2, 3..n, calculated as jackpot y k And error function L (θ) Q ) The specific expression is as follows:
in which Q (s k+1 ,u (s k+1μ’Q’ ) Scoring value output for target evaluation network, wherein u (s k+1μ’ )|θ Q’ Action value s output for target strategy network k+1 The state values input for the target evaluation network and the target strategy network; q(s) k ,a kQ ) For on-line evaluation of the scoring value output by the network s k And a k The method comprises the steps of evaluating a state value and an action value input by a network on line;
step 3.6, on-line evaluation network is performed by minimizing the error function L (θ Q ) To update theta Q The online strategy network passes through a deterministic strategy gradient V θμ J update θ μ The target evaluation network and the target policy network update theta by a moving average method Q’ And theta μ’ The specific expression is as follows:
wherein, V is a partial guide symbol, wherein V θμ J represents policy J vs. theta μ The deviation is calculated and guided, and the deviation is calculated,input representing online evaluation network is s=s k ,a=μ(s k ) When in use, the scoring value output by the network is evaluated onlineDeviation of the action value a is determined, +.>Input representing online policy network is s=s k On-line policy at the timeAction value of network output ∈>For theta μ Deviation-inducing and->Representing an error function L (θ) Q ) For theta Q Obtaining a deflection guide;
α Q to evaluate the learning rate of a network on line, alpha μ Learning rate of online strategy network, tau is a running average update parameter, and 0 < alpha Q <1,0<α μ <1,0<τ<1,Neural network parameters for an online evaluation network after updating, +.>For the neural network parameters of the online policy network after updating, +.>To update the neural network parameters of the target evaluation network after,neural network parameters of the target strategy network after updating;
step 3.7, giving a step size, a maximum step size max Training round number M and maximum training round number M, step=1, 2,..step max M=1, 2, M, when the steps 3.4 to 3.6 are completed once, the training process of one step is completed, the steps 3.4 to 3.6 are repeatedly executed, and when step is finished max When the training process of each step length is completed, the training process of one round is completed; starting the training process of the next round from the step 3.2 to the step 3.6, repeatedly executing the steps 3.2-3.6, and ending the learning process of the DDPG algorithm when the training processes of M rounds are completed;
on-line policy network, target policy network, on-line evaluation network, and neural network parameter θ of target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Toward maximizationIs updated in the direction of the (c) to finally obtain the optimal strategy pi (s y )。
In this embodiment, D is 10000, M is 300, α Qμ =0.001, factor n is 32.
FIG. 5 is a chart showing the convergence effect of average rewards in the embodiment of the present invention, wherein the abscissa in FIG. 5 is the training round number m, and the ordinate is the average rewards. As can be seen from FIG. 5, as the number of training rounds m increases, the average prize +.>Oscillating up and down, then gradually increasing and finally keeping in +.>Between, and when m=300, +.>The training effect is optimized, and the neural network parameters theta of the online strategy network, the target strategy network, the online evaluation network and the target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Has been updated to obtain an optimal strategy pi (s y )。/>

Claims (4)

1. A multi-objective optimization method for IGBT module packaging based on machine learning, wherein the IGBT module comprises an upper bridge arm chip, a lower bridge arm chip, a DBC substrate, a solder layer and a bonding wire; the DBC substrate comprises an upper copper layer, a ceramic layer and a lower copper layer, wherein the thicknesses of the upper copper layer and the lower copper layer are the same; the method is characterized by comprising the following steps of:
step 1, constructing a three-objective optimization model based on a neural network;
the IGBT module is recorded as a system, and the stress F, the stray inductance L and the stray inductance L of the solder layer of the system are used forChip junction temperature T j Establishing a three-target optimization model based on a neural network as a target;
the input variables of the neural network are 6, and the neural network is divided into two groups, wherein the first group is a current level I, and the second group is an IGBT module size, and the neural network comprises: lateral distance d of the same bridge arm chip 1 Lateral distance d between upper bridge arm chip and lower bridge arm chip 2 Longitudinal distance d between upper bridge arm chip and lower bridge arm chip 3 Copper layer thickness h 1 And ceramic layer thickness h 2
The output variables of the neural network are 3, and the output variables are respectively: solder layer stress F, stray inductance L and chip junction temperature T j
Step 2, determining a state set S and an action set A according to the three-objective optimization model obtained in the step 1 0 And a reward function R, and calculates an average reward
Step 3, according to the state set S and the action set A obtained in the step 2 0 And a reward function R, performing offline learning by using a DDPG algorithm of machine learning to obtain an optimal strategy pi(s) y );
The DDPG algorithm comprises 4 neural networks, namely an online strategy network, a target strategy network, an online evaluation network and a target evaluation network, wherein the neural network parameters of the online strategy network are recorded as theta μ The neural network parameters of the target policy network are noted as θ μ’ The neural network parameter of the on-line evaluation network is marked as theta Q The neural network parameters of the target evaluation network are marked as theta Q’
The optimal strategy pi (s y ) The expression of (2) is as follows:
wherein s is y For the entered state value, a y For the state passing through the optimal strategy pi (s y ) Output action value s y =(I y ,F y ,L y ,T jy ) y Wherein I y For current level in any one of the set of states S, F y 、L y 、T jy Respectively the initial module size and the current level I y Corresponding stray inductance, chip junction temperature and solder layer stress; a, a y =(d 1y ,d 2y ,d 3y ,h 1y ,h 2y ) y Wherein (d) 1y ,d 2y ,d 3y ,h 1y ,h 2y ) y To be at current level I y In a state, outputting the optimal IGBT module size corresponding to the lowest stray inductance, the lowest chip junction temperature and the lowest solder layer stress after the optimal strategy;
step 4, the optimal strategy pi (s y ) Substituting the three-objective optimization model based on the neural network established in the step 1, and adopting an optimal strategy pi (S) by the system under any state in the state set S y ) All can realize average rewardsMaximization.
2. The multi-objective optimization method of machine learning based IGBT module packaging of claim 1, wherein the implementation process of step 1 is as follows:
step 1.1, determining input variables and output variables of a neural network;
the input variables of the neural network are 6, namely the current level I of the system and the transverse distance d of the same bridge arm chip 1 Lateral distance d between upper bridge arm chip and lower bridge arm chip 2 Longitudinal distance d between upper bridge arm chip and lower bridge arm chip 3 Copper layer thickness h 1 And ceramic layer thickness h 2 The method comprises the steps of carrying out a first treatment on the surface of the The output variables of the neural network are 3, namely the stress F of the solder layer, the stray inductance L and the junction temperature T of the chip j
Step 1.2, acquiring a sample data set required for constructing a neural network by using simulation software;
acquisition and construction using simulation softwareSample data required by the neural network is established, a sample data set is established, and E groups of sample data are included in the sample data set, wherein each group of data comprises 6 pieces of neural network input data and 3 pieces of neural network simulation output values corresponding to the 6 pieces of neural network input data and are respectively recorded as input T and simulation output y, and T= (I) F ,d 1F ,d 2F ,d 3F ,h 1F ,h 2F ),Wherein->For the stress simulation output value of the solder layer, +.>Simulation output value for stray inductance, < >>The output values were simulated for the chip junction temperature, where f=1, 2, E;
dividing the sample data set into a training subset and a verification subset, wherein the training subset comprises E1 group sample data, and the verification subset comprises E2 group sample data, and E1+E2=E;
step 1.3, constructing a neural network A;
the method comprises the steps of constructing a neural network A, wherein the neural network A consists of an input layer, an output layer and an hidden layer, the input layer contains 6 neurons, the hidden layer contains 11 neurons, and the output layer contains 3 neurons;
step 1.4, randomly extracting a group of input data from the training subset obtained in step 1.2, marking as F1 group, inputting into the neural network A, obtaining outputs corresponding to the input data, and marking as solder layer stress network output values F respectively F1 Stray inductance network output value L F1 And chip junction temperature network output value T jF1 Wherein f1=1, 2, E1;
step 1.5, carrying out parameter updating on the neural network A by adopting an error back propagation gradient descent algorithm to obtain an updated neural network B;
step 1.6, respectively inputting the E2 group input data of the verification subset obtained in the step 1.2 into a neural network B to obtain E2 group output corresponding to the E2 group input data, and marking any one of the E2 group input data as F2 group, wherein the F2 group comprises a solder layer stress network output value F F2 Stray inductance network output value L F2 And chip junction temperature network output value T jF2 Wherein f2=e1+1, e1+2, E;
step 1.7, defining a root mean square error sigma, wherein the expression is as follows:
comparing the root mean square error sigma with a preset target error epsilon, and making the following judgment:
if sigma < epsilon, the neural network model is constructed; otherwise, returning to the step 1.4;
and marking the constructed neural network model as a three-objective optimization model.
3. The multi-objective optimization method of machine learning based IGBT module packaging of claim 1, wherein the implementation process of step 2 is as follows:
the state set S is defined as follows:
defining action set A 0 The following are provided:
recording a certain moment of the system as T, and a moment of the system termination state as T, t=1, 2. The state of the system at the time t is recorded as s t The action taken by the system at time t is denoted as a t The specific expression is as follows:
let s be the state at the time t+1 next to the time t t+1 The operation at time t+1 is denoted as a t+1 The specific expression is as follows:
the bonus function R represents a weighted sum of the bonus values generated by all actions of the system from the current state to the end state, expressed as follows:
wherein, gamma is a discount factor and represents the influence degree of the time length on the rewarding value t-1 For accumulation of discount factors at time t, r t For the state s of the system at time t t Take action a t The single step prize value obtained after that has the expression:
wherein, psi is penalty coefficient, eta 1 As the first weight coefficient, eta 2 As the second weight coefficient, eta 3 Is a third weight coefficient;
will single step prize value r t Record as average rewards
4. The multi-objective optimization method of machine learning-based IGBT module package according to claim 1, wherein in step 3, the machine learning-based DDPG algorithm performs offline learning to obtain an optimal policy pi (s y ) The specific implementation process of (2) is as follows:
step 3Initializing neural network parameters theta of an online policy network, a target policy network, an online evaluation network and a target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Let theta μ’μ
θ Q’Q The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of an experience playback pool P as D;
the output of the online policy network is noted as a, a=μ (s|θ μ ) Where a is the action value output by the online policy network, a corresponds to the action set A in claim 1 0 And a= (d) 1 ,d 2 ,d 3 ,h 1 ,h 2 ) The method comprises the steps of carrying out a first treatment on the surface of the S is the state value entered by the online policy network, S corresponds to an individual in the state set S in claim 1, and s= (I, F, L, T) j ) The method comprises the steps of carrying out a first treatment on the surface of the μ is the neural network parameter θ through the online policy network μ And a policy derived from the entered state value s;
step 3.2, state s of the system at time t t Inputting the online policy network to obtain the output mu of the online policy network t (s tμt ) And adding noise delta t Action a of obtaining final output t The specific expression is as follows:
step 3.3, the system is based on the state s t Executing action a t
Loading the three-objective optimization model into a machine learning algorithm, and recording the three-objective optimization model as an environment model; will I t ,d 1t ,d 2t ,d 3t ,h 1t ,h 2t As input variables of the environmental model, output variables are obtained and denoted as F t+1 ,L t+1 ,T jt+1 The method comprises the steps of carrying out a first treatment on the surface of the Build I t For normal distribution function of mean value, giving standard deviation, randomly sampling to obtain I t+1
Transition to a new state s t+1 =(I t+1 ,F t+1 ,L t+1 ,T jt+1 ) t+1 At the same time get the execution action a t The single step prize value r t Will(s) t ,a t ,r t ,s t+1 ) Called a state transition sequence, and stored in the experience playback pool P, the system enters a state s of t+1 at the next time t+1
Circularly executing the steps 3.2-3.3, recording the number of state transition sequences in the experience playback pool P as N, entering the step 3.4 if N=D, otherwise returning to the step 3.2;
step 3.4, randomly extracting n state transition sequences from the experience playback pool P, wherein n is less than D, taking the n state transition sequences as small batch data for training an online strategy network and an online evaluation network, and recording the kth state transition sequence in the small batch data as(s) k ,a k ,r k ,s k+1 ) N is a small batch sampling factor, k=1, 2,;
step 3.5, based on the small batch data(s) obtained in step 3.4 k ,a k ,r k ,s k+1 ) K=1, 2,3,..n, n, calculated as the jackpot y k And error function L (θ) Q ) The specific expression is as follows:
in which Q (s k+1 ,u (s k+1μ’Q’ ) Scoring value output for target evaluation network, wherein u (s k+1μ’ )|θ Q’ Action value s output for target strategy network k+1 The state values input for the target evaluation network and the target strategy network; q(s) k ,a kQ ) For on-line evaluation of the scoring value output by the network s k And a k The method comprises the steps of evaluating a state value and an action value input by a network on line;
step 3.6, on-line evaluation network is performed by minimizing the error function L (θ Q ) To update theta Q The online strategy network passes through a deterministic strategy gradient V θμ J update θ μ Target evaluation network and targetPolicy network updating θ by a moving average method Q’ And theta μ’ The specific expression is as follows:
wherein, V is a partial guide symbol, wherein V θμ J represents policy J vs. theta μ The deviation is calculated and guided, and the deviation is calculated,input representing online evaluation network is s=s k ,a=μ(s k ) When in use, the scoring value output by the network is evaluated onlineDeviation of the action value a is determined, +.>Input representing online policy network is s=s k When the online policy network outputs action value +.>For theta μ Deviation-inducing and->Representing an error function L (θ) Q ) For theta Q Obtaining a deflection guide;
α Q to evaluate the learning rate of a network on line, alpha μ Learning rate of online strategy network, tau is a running average update parameter, and 0 < alpha Q <1,0<α μ <1,0<τ<1,Neural network parameters for an online evaluation network after updating, +.>For the neural network parameters of the online policy network after updating, +.>To update the neural network parameters of the target evaluation network after,neural network parameters of the target strategy network after updating;
step 3.7, giving a step size, a maximum step size max Training round number M and maximum training round number M, step=1, 2,..step max M=1, 2, M, when the steps 3.4 to 3.6 are completed once, the training process of one step is completed, the steps 3.4 to 3.6 are repeatedly executed, and when step is finished max When the training process of each step length is completed, the training process of one round is completed; starting the training process of the next round from the step 3.2 to the step 3.6, repeatedly executing the steps 3.2-3.6, and ending the learning process of the DDPG algorithm when the training processes of M rounds are completed;
on-line policy network, target policy network, on-line evaluation network, and neural network parameter θ of target evaluation network μ 、θ μ’ 、θ Q 、θ Q’ Toward maximizationIs updated in the direction of the (c) to finally obtain the optimal strategy pi (s y )。
CN202311617795.6A 2023-11-30 2023-11-30 Multi-objective optimization method for IGBT module packaging based on machine learning Active CN117313560B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311617795.6A CN117313560B (en) 2023-11-30 2023-11-30 Multi-objective optimization method for IGBT module packaging based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311617795.6A CN117313560B (en) 2023-11-30 2023-11-30 Multi-objective optimization method for IGBT module packaging based on machine learning

Publications (2)

Publication Number Publication Date
CN117313560A true CN117313560A (en) 2023-12-29
CN117313560B CN117313560B (en) 2024-02-09

Family

ID=89281586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311617795.6A Active CN117313560B (en) 2023-11-30 2023-11-30 Multi-objective optimization method for IGBT module packaging based on machine learning

Country Status (1)

Country Link
CN (1) CN117313560B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210117770A1 (en) * 2019-10-18 2021-04-22 Wuhan University Power electronic circuit troubleshoot method based on beetle antennae optimized deep belief network algorithm
CN114172403A (en) * 2021-12-07 2022-03-11 合肥工业大学 Inverter efficiency optimization method based on deep reinforcement learning
CN115021325A (en) * 2022-06-22 2022-09-06 合肥工业大学 Photovoltaic inverter multi-objective optimization method based on DDPG algorithm
CN115765396A (en) * 2022-11-23 2023-03-07 天地(常州)自动化股份有限公司 Coordination optimization method for IGBT spike voltage suppression
DE102022108379A1 (en) * 2022-04-07 2023-10-12 Dr. Ing. H.C. F. Porsche Aktiengesellschaft Method, system and computer program product for the optimized construction and/or design of a technical component
CN117057229A (en) * 2023-08-10 2023-11-14 合肥工业大学 Multi-objective optimization method based on deep reinforcement learning power module
CN117057228A (en) * 2023-08-10 2023-11-14 合肥工业大学 Inverter multi-objective optimization method based on deep reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210117770A1 (en) * 2019-10-18 2021-04-22 Wuhan University Power electronic circuit troubleshoot method based on beetle antennae optimized deep belief network algorithm
CN114172403A (en) * 2021-12-07 2022-03-11 合肥工业大学 Inverter efficiency optimization method based on deep reinforcement learning
DE102022108379A1 (en) * 2022-04-07 2023-10-12 Dr. Ing. H.C. F. Porsche Aktiengesellschaft Method, system and computer program product for the optimized construction and/or design of a technical component
CN115021325A (en) * 2022-06-22 2022-09-06 合肥工业大学 Photovoltaic inverter multi-objective optimization method based on DDPG algorithm
CN115765396A (en) * 2022-11-23 2023-03-07 天地(常州)自动化股份有限公司 Coordination optimization method for IGBT spike voltage suppression
CN117057229A (en) * 2023-08-10 2023-11-14 合肥工业大学 Multi-objective optimization method based on deep reinforcement learning power module
CN117057228A (en) * 2023-08-10 2023-11-14 合肥工业大学 Inverter multi-objective optimization method based on deep reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JIANING WANG等: "Co-Reduction of Common Mode Noise and Loop Current of Three-Level Active Neutral Point Clamped Inverters", 《IEEE》 *
王存乐等: "功率IGBT模块热网络参数提取研究综述", 《电工电气》 *
罗旭;王学梅;吴海平;: "基于多目标优化的电动汽车变流器IGBT及开关频率的选择", 电工技术学报, no. 10 *

Also Published As

Publication number Publication date
CN117313560B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN112149316B (en) Aero-engine residual life prediction method based on improved CNN model
CN110175386B (en) Method for predicting temperature of electrical equipment of transformer substation
CN110084221B (en) Serialized human face key point detection method with relay supervision based on deep learning
WO2021109644A1 (en) Hybrid vehicle working condition prediction method based on meta-learning
CN103970965A (en) Test run method for accelerated life test of gas turbine engine
CN109754122A (en) A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction
CN110245390B (en) Automobile engine oil consumption prediction method based on RS-BP neural network
CN113313306A (en) Elastic neural network load prediction method based on improved wolf optimization algorithm
CN117236278B (en) Chip production simulation method and system based on digital twin technology
CN110991737A (en) Ultra-short-term wind power prediction method based on deep belief network
CN111832839B (en) Energy consumption prediction method based on sufficient incremental learning
CN114021483A (en) Ultra-short-term wind power prediction method based on time domain characteristics and XGboost
CN117057229A (en) Multi-objective optimization method based on deep reinforcement learning power module
CN116484495A (en) Pneumatic data fusion modeling method based on test design
CN112947080B (en) Scene parameter transformation-based intelligent decision model performance evaluation system
CN117313560B (en) Multi-objective optimization method for IGBT module packaging based on machine learning
CN112731098B (en) Radio frequency low-noise discharge circuit fault diagnosis method, system, medium and application
CN110276478B (en) Short-term wind power prediction method based on segmented ant colony algorithm optimization SVM
CN111553400A (en) Accurate diagnosis method for vibration fault of wind generating set
CN116488151A (en) Short-term wind power prediction method based on condition generation countermeasure network
CN113111588B (en) NO of gas turbine X Emission concentration prediction method and device
CN114091392A (en) Boolean satisfiability judgment method based on linear programming
CN114202106A (en) Air conditioning system load prediction method based on deep learning
CN109992587B (en) Blast furnace molten iron silicon content prediction key attribute judgment method based on big data
Guo et al. Research on Improved GA-BP Pattern Recognition Algorithm for Quality Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant