CN117313560A

CN117313560A - Multi-objective optimization method for IGBT module packaging based on machine learning

Info

Publication number: CN117313560A
Application number: CN202311617795.6A
Authority: CN
Inventors: 王佳宁; 孙菲双; 王睿源
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2023-12-29
Anticipated expiration: 2043-11-30
Also published as: CN117313560B

Abstract

The invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, and belongs to the technical field of power electronics. The method comprises the steps of establishing a three-objective optimization model by utilizing a neural network; determining a state set, an action set and a reward function; offline learning is carried out by using a DDPG algorithm of machine learning, so as to obtain an optimal strategy; and substituting the optimal strategy into the three-objective optimization model for application. The optimization method can enable the IGBT module to realize optimization of solder layer stress, stray inductance and chip junction temperature under any state and any weight coefficient. The invention can solve the problem of high-dimensional design variable of the complex IGBT module, avoids the problem that the traditional manual design cannot be optimized, and adopts the DDPG algorithm to avoid the time-consuming optimizing solving process when the traditional genetic algorithm is changed, thereby greatly saving the computing resource and improving the design efficiency of the IGBT module.

Description

Multi-objective optimization method for IGBT module packaging based on machine learning

Technical Field

The invention belongs to the technical field of power electronics, and particularly relates to a multi-objective optimization method for IGBT module packaging.

Background

The IGBT module, as one of the main failure elements in the photovoltaic inverter, the failure of which will directly affect the safe operation of the system. In the IGBT module, larger stray inductance can bring voltage overshoot and oscillation, and reliability is reduced. Meanwhile, the IGBT module in the photovoltaic inverter is influenced by illumination intensity and external environment temperature throughout the year, and junction temperature inside the device is easy to fluctuate. Because of the mismatch in thermal expansion coefficients of the internal materials, junction temperature fluctuations can cause extrusion deformation between device layers, thereby creating thermal stresses. Such cyclic thermal stresses can accelerate life consumption of the device, ultimately leading to thermal fatigue failure of the device.

Therefore, optimization of IGBT module packaging targeting low stray inductance, low junction temperature and low stress is carried out, and the method has great engineering application value and research significance. The traditional artificial-based packaging optimization design method has the main defects that the mathematical model of a power module and the optimization process need repeated manual iteration improvement, the whole design period is long, and the time cost is high. Many expert scholars propose different solutions for this:

the method can be used for structural optimization based on the thermal performance of the power module by adopting a response surface optimization method in the electric-thermal-force multiple physical field simulation design study [ D ] of the multi-chip parallel SiC MOSFET module (Tianjin university's treatise on the university of Shuo's, 2022.DOI: 10.27356/d.cnki.gtjdu.2020.002346). However, this solution has the following drawbacks:

1) Because of the limitation of finite element software, the optimization target only selects thermal and mechanical performance indexes, and the influence of parasitic parameter stray inductance is not considered;

2) The current level of the method is a fixed value, when the system design requirement changes, iterative optimization is needed again, and the defect of long time consumption exists;

a layout for automatically designing modules using the idea of genetic algorithm is proposed in the heading "Automatic layout design for power module [ J ]" (IEEE Transactions on Power electronics.2013, 28 (1): 481-487 ]), and the layout design process is generally iterated based on genetic algorithm by expressing layout position information, chip placement information, etc. by genetic characters. However, this solution has the following drawbacks:

(1) If the number of the parallel chips of the modules is large, the premature convergence of the genetic algorithm can be caused by the excessive population number, and the accuracy of the layout result is reduced;

(2) In addition, the method lacks accurate mathematical modeling for parasitic inductance, temperature and the like of the module, and influences of factors such as mutual inductance, thermal coupling and the like are often ignored in the design process.

Disclosure of Invention

The technical problem to be solved by the invention is that the optimal solution cannot be provided under a variable state in the multi-objective optimization method of the existing IGBT module, and the defect of long calculation time is overcome. Aiming at the defects, the invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, which utilizes depth certainty in machine learning to provide a strategy algorithm (DDPG), provides an optimal solution under a variable state in a quick response manner, and avoids the limitation that a meta heuristic algorithm needs to carry out iterative optimization again.

In order to achieve the aim of the invention, the invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, wherein the IGBT module comprises an upper bridge arm chip, a lower bridge arm chip, a DBC substrate, a solder layer and a bonding wire; the DBC substrate comprises an upper copper layer, a ceramic layer and a lower copper layer, wherein the thicknesses of the upper copper layer and the lower copper layer are the same; the method comprises the following steps:

step 1, constructing a three-objective optimization model based on a neural network;

the IGBT module is recorded as a system, and the stress F of a solder layer, the stray inductance L and the junction temperature T of a chip of the system are used _j Establishing a three-target optimization model based on a neural network as a target;

the input variables of the neural network are 6, and the neural network is divided into two groups, wherein the first group is a current level I, and the second group is an IGBT module size, and the neural network comprises: lateral distance d of the same bridge arm chip ₁ Lateral distance d between upper bridge arm chip and lower bridge arm chip ₂ Longitudinal distance d between upper bridge arm chip and lower bridge arm chip ₃ Copper layer thickness h ₁ And ceramic layer thickness h ₂ ；

The output variables of the neural network are 3, and the output variables are respectively: solder layer stress F, stray inductance L and chip junction temperature T _j ；

Step 2, determining a state set S and an action set A according to the three-objective optimization model obtained in the step 1 ₀ And a reward function R, and calculates an average reward；

Step 3, according to the state set S and the action set A obtained in the step 2 ₀ And a reward function R, performing offline learning by using a DDPG algorithm of machine learning to obtain an optimal strategy pi(s) _y )；

The DDPG algorithm comprises 4 neural networks, namely an online strategy network, a target strategy network, an online evaluation network and a target evaluation network, wherein the neural network parameters of the online strategy network are recorded as theta _μ The neural network parameters of the target policy network are noted as θ _μ’ The neural network parameter of the on-line evaluation network is marked as theta _Q The neural network parameters of the target evaluation network are marked as theta _Q’ ；

The optimal strategy pi (s _y ) The expression of (2) is as follows:

wherein s is _y For the state of inputValue of a _y For the state passing through the optimal strategy pi (s _y ) Output action value s _y =(I _y ,F _y ,L _y ,T _jy ) _y Wherein I _y For current level in any one of the set of states S, F _y 、L _y 、T _jy Respectively the initial module size and the current level I _y Corresponding stray inductance, chip junction temperature and solder layer stress; a, a _y =(d _1y ,d _2y ,d _3y ,h _1y ,h _2y ) _y Wherein (d) _1y ,d _2y ,d _3y ,h _1y ,h _2y ) _y To be at current level I _y In a state, outputting the optimal IGBT module size corresponding to the lowest stray inductance, the lowest chip junction temperature and the lowest solder layer stress after the optimal strategy;

step 4, the optimal strategy pi (s _y ) Substituting the three-objective optimization model based on the neural network established in the step 1, and collecting the system in a stateAdopts an optimal strategy pi(s) _y ) All can realize average rewards->Maximization.

Preferably, the implementation process of step 1 is as follows:

step 1.1, determining input variables and output variables of a neural network;

the input variables of the neural network are 6, namely the current level I of the system and the transverse distance d of the same bridge arm chip ₁ Lateral distance d between upper bridge arm chip and lower bridge arm chip ₂ Longitudinal distance d between upper bridge arm chip and lower bridge arm chip ₃ Copper layer thickness h ₁ And ceramic layer thickness h ₂ The method comprises the steps of carrying out a first treatment on the surface of the The output variables of the neural network are 3, namely the stress F of the solder layer, the stray inductance L and the junction temperature T of the chip _j ；

Step 1.2, acquiring a sample data set required for constructing a neural network by using simulation software;

acquiring sample data required for constructing a neural network by using simulation software, and establishing a sample data set, wherein the sample data set comprises E groups of sample data, each group of data comprises 6 pieces of neural network input data and 3 pieces of neural network simulation output values corresponding to the 6 pieces of neural network input data, and the three groups of data are respectively recorded as input T and simulation output gamma, T= (I) _F ,d _1F ,d _2F ,d _3F ,h _1F ,h _2F )，Wherein->For the stress simulation output value of the solder layer, +.>Simulation output value for stray inductance, < >>The output values were simulated for the chip junction temperature, where f=1, 2, E;

dividing the sample data set into a training subset and a verification subset, wherein the training subset comprises E1 group sample data, and the verification subset comprises E2 group sample data, and E1+E2=E;

step 1.3, constructing a neural network A;

the method comprises the steps of constructing a neural network A, wherein the neural network 1 consists of an input layer, an output layer and an hidden layer, wherein the input layer contains 6 neurons, the hidden layer contains 11 neurons, and the output layer contains 3 neurons;

step 1.4, randomly extracting a group of input data from the training subset obtained in step 1.2, inputting the input data into the neural network A to obtain outputs corresponding to the input data, and respectively recording the outputs as solder layer stress network output values F _F1 Stray inductance network output value L _F1 And chip junction temperature network output value T _jF1 Wherein f1=1, 2, E1;

step 1.5, carrying out parameter updating on the neural network A by adopting an error back propagation gradient descent algorithm to obtain an updated neural network B;

step 1.6, respectively inputting the E2 group input data of the verification subset obtained in the step 1.2 into a neural network B to obtain E2 group output corresponding to the E2 group input data, and marking any one of the E2 group input data as F2 group, wherein the F2 group comprises a solder layer stress network output value F _F2 Stray inductance network output value L _F2 And chip junction temperature network output value T _Jf2 Wherein f2=e1+1, e1+2, E;

step 1.7, defining a root mean square error sigma, wherein the expression is as follows:

comparing the root mean square error sigma with a preset target error epsilon, and making the following judgment:

if sigma < epsilon, the neural network model is constructed; otherwise, returning to the step 1.4;

and marking the constructed neural network model as a three-objective optimization model.

Preferably, the implementation process of step 2 is as follows:

the state set S is defined as follows:

defining action set A ₀ The following are provided:

recording a certain moment of the system as T, and a moment of the system termination state as T, t=1, 2. The state of the system at the time t is recorded as s _t The action taken by the system at time t is denoted as a _t The specific expression is as follows:

let s be the state at the time t+1 next to the time t _t+1 The operation at time t+1 is denoted as a _t+1 The specific expression is as follows:

the bonus function R represents a weighted sum of the bonus values generated by all actions of the system from the current state to the end state, expressed as follows:

wherein, gamma is a discount factor and represents the influence degree of the time length on the rewarding value ^t-1 For accumulation of discount factors at time t, r _t For the state s of the system at time t _t Take action a _t The single step prize value obtained after that has the expression:

wherein, psi is penalty coefficient, eta ₁ As the first weight coefficient, eta ₂ As the second weight coefficient, eta ₃ Is a third weight coefficient;

will single step prize value r _t Record as average rewards。

Preferably, in step 3, offline learning is performed by using a DDPG algorithm of machine learning to obtain an optimal policy pi (s _y ) The specific implementation process of (2) is as follows:

step 3.1, initializing neural network parameters theta of an online policy network, a target policy network, an online evaluation network and a target evaluation network _μ 、θ _μ’ 、θ _Q 、θ _Q’ Let theta _μ’ =θ _μ 、θ _Q’ =θ _Q The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of an experience playback pool P as D;

the output of the online policy network is noted as a, a=μ (s|θ _μ ) Wherein a is onlineAction value, a, outputted by the policy network, corresponding to said action set, a, in claim 1 ₀ And a= (d) ₁ ,d ₂ ,d ₃ ,h ₁ ,h ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the S is the state value entered by the online policy network, S corresponds to an individual in the state set S in claim 1, and s= (I, F, L, T) _j ) The method comprises the steps of carrying out a first treatment on the surface of the μ is the neural network parameter θ through the online policy network _μ And a policy derived from the entered state value s;

step 3.2, state s of the system at time t _t Inputting the online policy network to obtain the output mu of the online policy network _t (s _t |θ _μt ) And adding noise delta _t Action a of obtaining final output _t The specific expression is as follows:

step 3.3, the system is based on the state s _t Executing action a _t ；

Loading the three-objective optimization model into a machine learning algorithm, and recording the three-objective optimization model as an environment model; will I _t ，d _1t ，d _2t ，d _3t ，h _1t ，h _2t As input variables of the environmental model, output variables are obtained and denoted as F _t+1 ，L _t+1 ，T _jt+1 The method comprises the steps of carrying out a first treatment on the surface of the Build I _t For normal distribution function of mean value, giving standard deviation, randomly sampling to obtain I _t+1 ；

Transition to a new state s _t+1 =(I _t+1 ,F _t+1 ,L _t+1 ,T _jt+1 ) _t+1 At the same time get the execution action a _t The single step prize value r _t Will(s) _t ,a _t ,r _t ,s _t+1 ) Called a state transition sequence, and stored in the experience playback pool P, the system enters a state s of t+1 at the next time _t+1 ；

Circularly executing the steps 3.2-3.3, recording the number of state transition sequences in the experience playback pool P as N, entering the step 3.4 if N=D, otherwise returning to the step 3.2;

step 3.4, randomly extracting n state transition sequences from the experience playback pool P, wherein n is less than D, taking the n state transition sequences as small batch data for training an online strategy network and an online evaluation network, and recording the kth state transition sequence in the small batch data as(s) _k ,a _k ,r _k ,s _k+1 ) N is a small batch sampling factor, k=1, 2,;

step 3.5, based on the small batch data(s) obtained in step 3.4 _k ,a _k ,r _k ,s _k+1 ) K=1, 2,3,..n, n, calculated as the jackpot y _k And error function L (θ) _Q ) The specific expression is as follows:

in which Q ^’ （s _k+1 ,u ^’ （s _k+1 |θ _μ’ )θ _Q’ ) Scoring value output for target evaluation network, wherein u ^’ (s _k+1 |θ _μ’ )|θ _Q’ Action value s output for target strategy network _k+1 The state values input for the target evaluation network and the target strategy network; q(s) _k ,a _k |θ _Q ) For on-line evaluation of the scoring value output by the network s _k And a _k The method comprises the steps of evaluating a state value and an action value input by a network on line;

step 3.6, on-line evaluation network is performed by minimizing the error function L (θ _Q ) To update theta _Q The online strategy network passes through a deterministic strategy gradient V _θμ J update θ _μ The target evaluation network and the target policy network update theta by a moving average method _Q’ And theta _μ’ The specific expression is as follows:

wherein, V is a partial guide symbol, wherein V _θμ J represents policy J vs. theta _μ The deviation is calculated and guided, and the deviation is calculated,input representing online evaluation network is s=s _k ,a=μ(s _k ) When in use, the scoring value output by the network is evaluated onlineDeviation of the action value a is determined, +.>Input representing online policy network is s=s _k When the online policy network outputs action value +.>For theta _μ Deviation-inducing and->Representing an error function L (θ) _Q ) For theta _Q Obtaining a deflection guide;

α _Q to evaluate the learning rate of a network on line, alpha _μ Learning rate of online strategy network, tau is a running average update parameter, and 0 < alpha _Q ＜1，0＜α _μ ＜1，0＜τ＜1，Neural network parameters for an online evaluation network after updating, +.>For the neural network parameters of the online policy network after updating, +.>To update the neural network parameters of the target evaluation network after,neural network parameters of the target strategy network after updating;

step 3.7, giving a step size, a maximum step size _max Training round number M and maximum training round number M, step=1, 2,..step _max M=1, 2, M, when the steps 3.4 to 3.6 are completed once, training of one stepThe process is completed, the steps 3.4 to 3.6 are repeatedly executed, and when step is completed _max When the training process of each step length is completed, the training process of one round is completed; starting the training process of the next round from the step 3.2 to the step 3.6, repeatedly executing the steps 3.2-3.6, and ending the learning process of the DDPG algorithm when the training processes of M rounds are completed;

on-line policy network, target policy network, on-line evaluation network, and neural network parameter θ of target evaluation network _μ 、θ _μ’ 、θ _Q 、θ _Q’ Toward maximizationIs updated in the direction of the (c) to finally obtain the optimal strategy pi (s _y )。

Compared with the prior art, the invention has the beneficial effects that:

(1) According to the invention, the nonlinear mapping relation between the module size and the optimization target is constructed by utilizing the neural network, the capability of learning the distributed characteristics among data is utilized by the ANN, the complex finite element simulation is converted into a matrix expression corresponding to the neural network, the optimization target of the IGBT module can be quickly obtained by inputting the corresponding module size, the problems of time consumption, complex finite element simulation and difficult software interaction caused by multi-physical field coupling are avoided, and the calculation resources are greatly reduced;

(2) The optimal strategy pi (a|s) provided by the invention can directly obtain the optimal design variable value under the design requirements of different IGBT module current levels so as to maximize the efficiency, does not need to carry out complex and time-consuming optimizing solving process again, is simple, convenient and quick, and saves calculation resources.

Drawings

Fig. 1 is a three-objective optimization model structure based on a neural network, which is built in an embodiment of the present invention.

FIG. 2 is a chart showing the convergence of the training frequency and error of the neural network according to the present invention.

FIG. 3 is a block diagram of a multi-objective optimization method of the present invention.

FIG. 4 is a flow chart of the multi-objective optimization method of the present invention.

FIG. 5 is a chart showing the convergence effect of average rewards in an embodiment of the invention.

Detailed Description

The present invention will be described in detail with reference to the accompanying drawings.

Fig. 1 is a block diagram of a neural network based on a three-objective optimization model of the neural network established in an embodiment of the present invention. As can be seen, the input layer contains 6 neurons, the output layer contains 3 neurons, the hidden layer contains 11 neurons, given a set of inputs (I, d ₁ ,d ₂ ,d ₃ ,h ₁ ,h ₂ ) Can obtain a group of outputs (F, L, T _j ). In addition, fig. 1 shows the relationship between 11 neurons in the hidden layer and 6 neurons in the input layer and 3 neurons in the output layer, respectively.

Fig. 2 is a convergence graph of the root mean square error of the neural network training in this example. After the data set and structure of the neural network are determined, the neural network is trained using the training commands. After about 10000 iterations, the root mean square error is significantly reduced, and the neural network training is completed.

Fig. 3 is a block diagram of the IGBT module multi-objective optimization method of the present invention, and fig. 4 is a flowchart of the IGBT module multi-objective optimization method of the present invention. As can be seen from fig. 3 and 4, the IGBT module multi-objective optimization method optimizes the IGBT module stress, stray inductance, and junction temperature based on machine learning.

The invention provides a multi-objective optimization method for IGBT module packaging based on machine learning, wherein the IGBT module comprises an upper bridge arm chip, a lower bridge arm chip, a DBC substrate (directly copper-clad ceramic substrate), a solder layer and a bonding wire; the DBC substrate comprises an upper copper layer, a ceramic layer and a lower copper layer, wherein the thicknesses of the upper copper layer and the lower copper layer are the same. Specifically, the method comprises the following steps:

the nerveThe input variable of network is 6, divides into two groups, and first group is electric current class I, and the second group is IGBT module size, includes: lateral distance d of the same bridge arm chip ₁ Lateral distance d between upper bridge arm chip and lower bridge arm chip ₂ Longitudinal distance d between upper bridge arm chip and lower bridge arm chip ₃ Copper layer thickness h ₁ And ceramic layer thickness h ₂ ；

The output variables of the neural network are 3, and the output variables are respectively: solder layer stress F, stray inductance L and chip junction temperature T _j 。

In this embodiment, the implementation procedure of step 1 is as follows:

step 1.1, determining input variables and output variables of a neural network;

step 1.3, constructing a neural network A;

Specifically, in this example, the stray inductance of the IGBT module is first extracted using Q3D finite element simulation software, the direct current positive input side is set to source, the direct current negative input side is sink, and the module size in the half bridge module is as follows: the same bridge arm chip is subjected to parameterization treatment by a transverse distance d1, a transverse distance d2 between an upper bridge arm chip and a lower bridge arm chip, a longitudinal distance d3 between the upper bridge arm chip and the lower bridge arm chip, a copper layer thickness h1 and a ceramic layer thickness h 2. d1 is 1-11 mm in value range, and the step length is 2 mm; d2 is 5-20 mm in value range and 3 mm in step length; d3 is 5-25 mm in value range and 5 mm in step length; the value range of h1 is 0.1-0.2 mm, and the step length is 0.05; the value range of the millimeter h2 is 0.2-0.4 millimeter, and the step length is 0.1 millimeter; a total of 54×3=1875 sets of data were available, noted as data a.

In this example, the IGBT module was simulated using the comsol software for electro-thermal-force coupling to extract the chip junction temperature and solder layer stress. And carrying out parameterization treatment on the module size and the current level in the half-bridge module, wherein the value range of the current level is 230-250 milliamperes, the step length is 4, and the value range and the step length of the module size are the same. A total of 55×3=9375 sets of data were available, noted as data B.

In this example, the input data of the data a and the data B are inclusion relationships, and may be integrated into a set of data, and after normalization processing and random scrambling on the data, a set of sample data sets is obtained, including 55×3=9375 sets of sample data, i.e. e=9375.

In this example, the sample data set is divided into a training subset and a verification subset, and the division ratio can be divided into 80% and 20% according to a classical division method, wherein 80% is used as the training set, and 20% is used as the test set. Stage, e1=7500, e2=1875.

Step 2, optimizing according to the three targets obtained in the step 1Model, determining state set S and action set A ₀ And a reward function R, and calculates an average reward。

In this embodiment, the implementation procedure of step 2 is as follows:

the state set S is defined as follows:

defining action set A ₀ The following are provided:

will single step prize value r _t Record as average rewards。

Specifically, let ψ=1000 and η ₁ =η ₂ =η ₃ =1, taking γ=0.9.

The optimal strategy pi (s _y ) The expression of (2) is as follows:

wherein s is _y For the entered state value, a _y For the state passing through the optimal strategy pi (s _y ) Output action value s _y =(I _y ,F _y ,L _y ,T _jy ) _y Wherein I _y For current level in any one of the set of states S, F _y 、L _y 、T _jy Respectively the initial module size and the current level I _y Corresponding stray inductance, chip junction temperature and solder layer stress; a, a _y =(d _1y ,d _2y ,d _3y ,h _1y ,h _2y ) _y Wherein (d) _1y ,d _2y ,d _3y ,h _1y ,h _2y ) _y To be at current level I _y In the state, the optimal IGBT module size corresponding to the lowest stray inductance, the lowest chip junction temperature and the lowest solder layer stress is output after the optimal strategy.

Step 4, the optimal strategy pi (s _y ) Substituting the three-objective optimization model based on the neural network established in the step 1, and adopting an optimal strategy pi (S) by the system under any state in the state set S _y ) All can realize average rewardsMaximization.

In this embodiment, step 3 of performing offline learning by using the DDPG algorithm of machine learning to obtain an optimal policy pi(s) _y ) The specific implementation process of (2) is as follows:

step 3.1, initializing neural network parameters theta of an online policy network, a target policy network, an online evaluation network and a target evaluation network _μ 、θ _μ’ 、θ _Q 、θ _Q’ Let theta _μ’ =θ _μ 、

θ _Q’ =θ _Q The method comprises the steps of carrying out a first treatment on the surface of the Initializing the capacity of an experience playback pool P as D;

the output of the online policy network is noted as a, a=μ (s|θ _μ ) Where a is the action value output by the online policy network, a corresponds to the action set A in claim 1 ₀ And a= (d) ₁ ,d ₂ ,d ₃ ,h ₁ ,h ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the S is the state value entered by the online policy network, S corresponds to an individual in the state set S in claim 1, and s= (I, F, L, T) _j ) The method comprises the steps of carrying out a first treatment on the surface of the μ is the neural network parameter θ through the online policy network _μ And a policy derived from the entered state value s;

step 3.2, the system is arranged inState s at time t _t Inputting the online policy network to obtain the output mu of the online policy network _t (s _t |θ _μt ) And adding noise delta _t Action a of obtaining final output _t The specific expression is as follows:

step 3.3, the system is based on the state s _t Executing action a _t ；

step 3.5, based on the small batch data(s) obtained in step 3.4 _k ,a _k ,r _k ,s _k+1 )，k=1,2, 3..n, calculated as jackpot y _k And error function L (θ) _Q ) The specific expression is as follows:

in which Q ^’ (s _k+1 ,u ^’ (s _k+1 |θ _μ’ )θ _Q’ ) Scoring value output for target evaluation network, wherein u ^’ (s _k+1 |θ _μ’ )|θ _Q’ Action value s output for target strategy network _k+1 The state values input for the target evaluation network and the target strategy network; q(s) _k ,a _k |θ _Q ) For on-line evaluation of the scoring value output by the network s _k And a _k The method comprises the steps of evaluating a state value and an action value input by a network on line;

wherein, V is a partial guide symbol, wherein V _θμ J represents policy J vs. theta _μ The deviation is calculated and guided, and the deviation is calculated,input representing online evaluation network is s=s _k ,a=μ(s _k ) When in use, the scoring value output by the network is evaluated onlineDeviation of the action value a is determined, +.>Input representing online policy network is s=s _k On-line policy at the timeAction value of network output ∈>For theta _μ Deviation-inducing and->Representing an error function L (θ) _Q ) For theta _Q Obtaining a deflection guide;

step 3.7, giving a step size, a maximum step size _max Training round number M and maximum training round number M, step=1, 2,..step _max M=1, 2, M, when the steps 3.4 to 3.6 are completed once, the training process of one step is completed, the steps 3.4 to 3.6 are repeatedly executed, and when step is finished _max When the training process of each step length is completed, the training process of one round is completed; starting the training process of the next round from the step 3.2 to the step 3.6, repeatedly executing the steps 3.2-3.6, and ending the learning process of the DDPG algorithm when the training processes of M rounds are completed;

In this embodiment, D is 10000, M is 300, α _Q =α _μ =0.001, factor n is 32.

FIG. 5 is a chart showing the convergence effect of average rewards in the embodiment of the present invention, wherein the abscissa in FIG. 5 is the training round number m, and the ordinate is the average rewards. As can be seen from FIG. 5, as the number of training rounds m increases, the average prize +.>Oscillating up and down, then gradually increasing and finally keeping in +.>Between, and when m=300, +.>The training effect is optimized, and the neural network parameters theta of the online strategy network, the target strategy network, the online evaluation network and the target evaluation network _μ 、θ _μ’ 、θ _Q 、θ _Q’ Has been updated to obtain an optimal strategy pi (s _y )。/>

Claims

1. A multi-objective optimization method for IGBT module packaging based on machine learning, wherein the IGBT module comprises an upper bridge arm chip, a lower bridge arm chip, a DBC substrate, a solder layer and a bonding wire; the DBC substrate comprises an upper copper layer, a ceramic layer and a lower copper layer, wherein the thicknesses of the upper copper layer and the lower copper layer are the same; the method is characterized by comprising the following steps of:

the IGBT module is recorded as a system, and the stress F, the stray inductance L and the stray inductance L of the solder layer of the system are used forChip junction temperature T _j Establishing a three-target optimization model based on a neural network as a target;

The optimal strategy pi (s _y ) The expression of (2) is as follows:

；

wherein s is _y For the entered state value, a _y For the state passing through the optimal strategy pi (s _y ) Output action value s _y =(I _y ,F _y ,L _y ,T _jy ) _y Wherein I _y For current level in any one of the set of states S, F _y 、L _y 、T _jy Respectively the initial module size and the current level I _y Corresponding stray inductance, chip junction temperature and solder layer stress; a, a _y =(d _1y ,d _2y ,d _3y ,h _1y ,h _2y ) _y Wherein (d) _1y ,d _2y ,d _3y ,h _1y ,h _2y ) _y To be at current level I _y In a state, outputting the optimal IGBT module size corresponding to the lowest stray inductance, the lowest chip junction temperature and the lowest solder layer stress after the optimal strategy;

2. The multi-objective optimization method of machine learning based IGBT module packaging of claim 1, wherein the implementation process of step 1 is as follows:

step 1.1, determining input variables and output variables of a neural network;

acquisition and construction using simulation softwareSample data required by the neural network is established, a sample data set is established, and E groups of sample data are included in the sample data set, wherein each group of data comprises 6 pieces of neural network input data and 3 pieces of neural network simulation output values corresponding to the 6 pieces of neural network input data and are respectively recorded as input T and simulation output y, and T= (I) _F ,d _1F ,d _2F ,d _3F ,h _1F ,h _2F )，Wherein->For the stress simulation output value of the solder layer, +.>Simulation output value for stray inductance, < >>The output values were simulated for the chip junction temperature, where f=1, 2, E;

step 1.3, constructing a neural network A;

the method comprises the steps of constructing a neural network A, wherein the neural network A consists of an input layer, an output layer and an hidden layer, the input layer contains 6 neurons, the hidden layer contains 11 neurons, and the output layer contains 3 neurons;

step 1.4, randomly extracting a group of input data from the training subset obtained in step 1.2, marking as F1 group, inputting into the neural network A, obtaining outputs corresponding to the input data, and marking as solder layer stress network output values F respectively _F1 Stray inductance network output value L _F1 And chip junction temperature network output value T _jF1 Wherein f1=1, 2, E1;

；

3. The multi-objective optimization method of machine learning based IGBT module packaging of claim 1, wherein the implementation process of step 2 is as follows:

the state set S is defined as follows:

；

defining action set A ₀ The following are provided:

；

will single step prize value r _t Record as average rewards。

4. The multi-objective optimization method of machine learning-based IGBT module package according to claim 1, wherein in step 3, the machine learning-based DDPG algorithm performs offline learning to obtain an optimal policy pi (s _y ) The specific implementation process of (2) is as follows:

step 3Initializing neural network parameters theta of an online policy network, a target policy network, an online evaluation network and a target evaluation network _μ 、θ _μ’ 、θ _Q 、θ _Q’ Let theta _μ’ =θ _μ 、

；

step 3.3, the system is based on the state s _t Executing action a _t ；

；

step 3.6, on-line evaluation network is performed by minimizing the error function L (θ _Q ) To update theta _Q The online strategy network passes through a deterministic strategy gradient V _θμ J update θ _μ Target evaluation network and targetPolicy network updating θ by a moving average method _Q’ And theta _μ’ The specific expression is as follows:

；