CN111857081B - Chip packaging test production linear energy control method based on Q-learning reinforcement learning - Google Patents
Chip packaging test production linear energy control method based on Q-learning reinforcement learning Download PDFInfo
- Publication number
- CN111857081B CN111857081B CN202010797879.2A CN202010797879A CN111857081B CN 111857081 B CN111857081 B CN 111857081B CN 202010797879 A CN202010797879 A CN 202010797879A CN 111857081 B CN111857081 B CN 111857081B
- Authority
- CN
- China
- Prior art keywords
- production line
- production
- performance
- station
- work
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004519 manufacturing process Methods 0.000 title claims abstract description 185
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012360 testing method Methods 0.000 title claims abstract description 35
- 238000004806 packaging method and process Methods 0.000 title claims abstract description 27
- 230000002787 reinforcement Effects 0.000 title claims abstract description 17
- 230000035945 sensitivity Effects 0.000 claims abstract description 37
- 239000004065 semiconductor Substances 0.000 claims abstract description 31
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 17
- 238000004088 simulation Methods 0.000 claims abstract description 12
- 238000005457 optimization Methods 0.000 claims abstract description 10
- 238000012216 screening Methods 0.000 claims abstract description 9
- 238000004445 quantitative analysis Methods 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 34
- 230000008859 change Effects 0.000 claims description 22
- 238000011156 evaluation Methods 0.000 claims description 16
- 230000002950 deficient Effects 0.000 claims description 14
- 230000009471 action Effects 0.000 claims description 12
- 230000008901 benefit Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000009826 distribution Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 5
- 238000011160 research Methods 0.000 claims description 5
- 238000012423 maintenance Methods 0.000 claims description 4
- 238000011217 control strategy Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims description 3
- 230000000737 periodic effect Effects 0.000 claims description 3
- 238000004451 qualitative analysis Methods 0.000 claims description 3
- 230000000903 blocking effect Effects 0.000 claims description 2
- 239000003550 marker Substances 0.000 claims description 2
- 230000007246 mechanism Effects 0.000 claims description 2
- 230000008439 repair process Effects 0.000 claims description 2
- 238000010206 sensitivity analysis Methods 0.000 abstract description 4
- 238000013178 mathematical model Methods 0.000 abstract description 3
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003449 preventive effect Effects 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/418—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
- G05B19/41885—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by modeling, simulation of the manufacturing system
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/32—Operator till task planning
- G05B2219/32339—Object oriented modeling, design, analysis, implementation, simulation language
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Manufacturing & Machinery (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- General Factory Administration (AREA)
Abstract
The invention relates to the field of control and optimization of the production linear energy of semiconductor chip packaging test, in particular to a method for controlling the production linear energy of the chip packaging test based on Q-learning reinforcement learning. According to the invention, a more accurate performance prediction model of the semiconductor packaging test series-parallel production line is established, and a Morris screening method and an Arena simulation method are comprehensively used for carrying out global sensitivity quantitative analysis, so that a plurality of influence factors and influence rules thereof with the greatest influence on production linearity can be obtained, and the situations that the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable are avoided. The invention controls the variability factors of the production line on the basis of performance prediction and sensitivity analysis, improves the value mode of the parameter epsilon, ensures that the algorithm converges more rapidly and avoids local optimization, and simultaneously has better flexibility and instantaneity.
Description
Technical Field
The invention relates to the field of semiconductor chip packaging test production linear energy control and optimization, in particular to a performance control method for a semiconductor chip packaging test production line, which combines sensitivity analysis and a Q-learning reinforcement learning algorithm.
Background
The semiconductor manufacturing industry has great strategic value for the development of national economy, and in order to keep the good development of the semiconductor manufacturing industry in China, the production efficiency of a manufacturing system needs to be focused and the production management control technology needs to be enhanced besides the expansion of the production scale. Because the semiconductor manufacturing system has the production characteristics of high reentry of a process path, high complexity of a production process, long manufacturing period, huge system scale, high uncertainty and the like, the difficulty of controlling the performance of a production line is high. The production performance of a manufacturing system is greatly influenced by various variability factors such as buffer capacity, equipment sudden faults, equipment preventive maintenance, product reworking and the like, so that the production efficiency is reduced, the production period is prolonged, and the normal execution of a production plan is influenced.
The current research on intelligent, comprehensive and dynamic control of the performance of the production line is less, and is mostly limited to a certain aspect of the variability of the production line, and various variability factors on the production line cannot be considered globally; the semiconductor serial-parallel production linear energy prediction model established in the current research has certain deviation from the actual production condition, and the accuracy is deficient; the traditional performance control optimization method is difficult to control in real time aiming at the change of the variability factors of the production line, and has insufficient flexibility.
Disclosure of Invention
Aiming at the defects of the performance control model and strategy of the existing semiconductor chip packaging test production line, the invention provides a chip packaging test production linear energy control method based on Q-learning reinforcement learning. Aiming at the problems of untimely response of the existing variability factors, incomplete consideration of the variability factors, conflict of control strategies and the like, the method provided by the invention is used for intelligently controlling the manufacturing performance of the semiconductor chip packaging test production line by combining sensitivity analysis and a Q-learning reinforcement learning algorithm.
A chip packaging test production line performance control method based on Q-learning reinforcement learning comprises the following steps:
step 1: constructing an abstract model of a semiconductor chip packaging test serial-parallel production line;
step 2: based on the production line abstract model constructed in the step 1, a prediction model of the performance of the semiconductor chip packaging test serial-parallel production line is established;
step 3: based on the production line abstract model constructed in the step 1, obtaining an influence mechanism of key variability factors on the performance of the production line according to Morris screening legal analysis and Arena simulation quantitative analysis;
step 4: and (3) establishing a performance control model based on a Q-learning reinforcement learning algorithm based on the prediction model of the performance of the semiconductor chip packaging test serial-parallel production line established in the step (2) and the key variability analysis obtained in the step (3), and carrying out iterative solution by taking the benefit index of the production line as a performance control target to obtain a global optimal performance control strategy.
The step 1 specifically comprises the following steps:
semiconductor chip package test line model abstraction: the method uses the subsequent process of the semiconductor production line, namely the chip packaging test production line as a research object, and supposes that a limited buffer area exists between stations, the queuing rule is first to serve, and abstracts the queuing rule into a multi-station serial-parallel queuing production line model containing re-entry (re-working).
The step 2 specifically comprises the following steps:
step 2.1: variability calculation: calculating arrival variability c a And processing time variability c e 。
Step 2.2: and determining a performance prediction basic index.
Average processing time CT of workpieces at queue q And effective processing time t e The average time CT (production period) of the work station is obtained, the average work-in-process level WIP at the work station is further calculated, and the work-in-process production rate TH, the production period CT and the work-in-process level WIP are used as basic indexes for predicting the production linear energy.
CT=CT q +t e
WIP=CT×TH
Step 2.3: and establishing a production line performance prediction model.
Step 2.3.1: calculating queuing time of the product j at the workstation i:
wherein c a ij 、c e ij The arrival variability and the processing time variability of the product j at the station i, u ij For the utilization rate of station i, m ij For the number of equipment connected in parallel for work station i, t e ij For the effective processing time of product j at station i.
Step 2.3.2: and calculating the production rate TH of the workpiece.
Let m be the case in station i ij (b>m>1) The parallel equipment of the station, b is the capacity of a buffer zone in front of the station i, k is the number of workpieces being processed by the station i, if k is more than or equal to 0 and less than or equal to b, the probability p of processing the workpieces j (0 < j < r, r represents the number of products processed together in the production line) which are not waiting in front of the station i 0 The method comprises the following steps:
blocking probability of workpiece j with capacity b in cache regionThe method comprises the following steps:
let q hj For the defective rate of the workpiece j on the work station h, Q ij For the defective product rate monitored by the work station i, the value range is more than 0 and less than i and less than or equal to s, wherein s represents the number of the work stations in the serial-parallel production line, and the defective product probability Q of the work piece j detected and removed on the work station i ij The method comprises the following steps:
The production rate TH of the workpiece j at the station i ij The method comprises the following steps:
when the utilization rate of a certain station is maximum, the station I is the bottleneck station of the product J, and the production rate is r b IJ =max(u ij )。
Step 2.3.3: calculating production cycle (logic production cycle) CT of production line j And WIP at work-in-process level j 。
Calculating the workpiece average wait batch time WTBT:
wherein r is a Representing the rate at which the workpiece arrives at the workstation, where k ij Indicating the product j processing lot size at station i, at this timeThen->Rewriting CT q ij The calculation formula is as follows:
calculating the production period CT of the product j at the station i j And WIP at work-in-process level j :
Thereby obtaining the production cycle (logic production cycle) CT of the product j in the whole series-parallel production line j And WIP at work-in-process level j :
Step 2.4: and evaluating the performance of the production line performance prediction model.
Step 2.4.1: and calculating the performance index F of the production line.
As shown in FIG. 3, the WIP-CT and WIP-TH curves in the best case, worst case and actual worst case of the production line are used as targets to define the "good zone" and "bad zone" in the performance quadrant, which constitutes the performance evaluation graph of the production line.
Taking the ratio of the distance between the actual performance point and the distance between the best case and the actual worst case targets as a performance evaluation index, and marking as F:
wherein w represents a given actual work-in-process level, T represents an actual production period, T 0 Represents the theoretical processing time of the production line, where T 0 =CT;r b Represents the bottleneck rate of the production line, where r b =TH ij If and only if u ij =u max 。
Step 2.4.2: and calculating the benefit index Bf of the production line.
Considering the production cost, and rewriting the production linear energy index F into a benefit index Bf:
Bf=C*F
wherein C is a cost factor, C 1 Unit equipment cost, c 2 Cost per buffer capacity, c 3 For the rest of the fixed cost, m 1 And b 1 Respectively the current parallel equipment number and the buffer capacity size, m 0 And b 0 The initial number of parallel devices and the buffer capacity size, respectively.
The step 3 specifically comprises the following steps:
step 3.1: and (5) performing qualitative analysis on sensitivity of Morris screening method.
Selecting a random parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
wherein Y is 0 The performance evaluation index F corresponding to the initial value of the parameter x; y is Y g 、Y g+1 Disturbance for parameter xg at g-th and g+1-th timesA performance evaluation index F after the change; p (P) g 、P g +1 is the change rate of the value of the parameter disturbance change after the parameter disturbance change of the g th time and the g+1 th time relative to the initial value, and n is the operation times.
The parameters of the more sensitive and high sensitivity coefficients are determined as factors that can affect the semiconductor package test production linearity more according to the sensitivity grading criteria of table 1.
TABLE 1 sensitivity grading criteria
Absolute value of sensitivity coefficient | Sensitivity grading |
0.00≤/S/<0.05 | Insensitivity to |
0.05≤/S/<0.20 | Moderate sensitivity |
0.20≤/S/<1.00 | More sensitive |
/S/≥1.00 | High sensitivity |
Step 3.2: arena simulation sensitivity quantitative analysis.
And establishing a semiconductor chip packaging test serial-parallel production line model in Arena software. Each device has an independent random process time, failure time and maintenance time.
The work piece arrival rate, the work station equipment processing rate and the average time before failure m on the production line f Average repairComplex time m p And respectively obeying negative exponential distribution and normal distribution, wherein the processing batch size k, the buffer capacity b and the parallel equipment number m are fixed positive integers, b is more than m and more than 1, and the simulation experiment preheating time setting, the running total time and the experiment repetition times are set.
Experiments have resulted in a profile of overall line performance, production cycle CT, production rate TH, and WIP at work-in-process level with respect to key factors affecting line performance.
The step 4 specifically comprises the following steps:
step 4.1: the method is characterized in that a production line performance prediction model is used as a reinforcement learning external environment, the change of the production line variability is used as a trigger condition, and a semiconductor chip packaging test production line performance control model based on reinforcement learning as shown in fig. 5 is established based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy.
Step 4.2: the initialization of the values of Q (s, a),a epsilon A (S), wherein the Q value is a reflection of long-term consideration, S is a system state set, and A (S) is an action strategy set of key factors obtained in the step 4.2. Given the parameter learning rate factor α and the discount factor γ, a return function r is determined.
Step 4.3: given a starting state s, and selecting action a at state s according to an ε -greedy strategy. The modified epsilon value mode is set as a function:wherein p is the current execution deployment step number of the algorithm, and M is the total iteration step number of the algorithm, so that the value of the algorithm gradually decreases from an initial value of 0.2 along with the increase of the execution step number of the algorithm.
Step 4.4: selecting action a and b as the selection sequence number of a in state s according to epsilon-greedy strategy to obtain return r and next state s next ,a next The Q value is updated on behalf of the next action:
s=s next ,a=a next
step 4.5: the process goes to step 4.4 until the system goes towards a steady state, i.e. a converging state.
Step 4.6: and repeatedly executing the steps 4.2 to 4.5 until the learning period (the number of times that the steps 4.2 to 4.5 are repeatedly executed, which are preset by the algorithm) is ended, and stopping iteration.
Step 4.7: outputting the final policyAnd obtaining the index optimization condition of the production line performance.
According to the invention, a more accurate performance prediction model of the semiconductor packaging test series-parallel production line is established, and a Morris screening method and an Arena simulation method are comprehensively used for carrying out global sensitivity quantitative analysis, so that a plurality of influence factors and influence rules thereof with the greatest influence on production linearity can be obtained, and the situations that the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable are avoided. The invention provides a production line performance control model based on a Q-learning algorithm, which is used for controlling the production line variability factor on the basis of performance prediction and sensitivity analysis, and improving the value mode of a parameter epsilon, so that the algorithm convergence speed is higher, local optimization is avoided, and meanwhile, the performance control method has better flexibility and real-time performance.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an abstract model of a semiconductor chip package test line;
FIG. 3 is a diagram of a method for evaluating performance of a three-major industrial physical marker post;
FIG. 4 is a schematic diagram of a simulation model logic structure of a production line;
FIG. 5 is a model of line performance control based on reinforcement learning according to an embodiment;
FIG. 6 is a graph of production line performance versus variability ca and ce;
FIG. 7 shows the production line performance index variation before and after performance control at different volatility levels CV 1;
fig. 8 shows the production line performance index change before and after performance control at different variability levels CV 2.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustrating a detailed embodiment and a specific operation process (fig. 1) based on the technical scheme of the present invention, but the scope of the present invention is not limited to the following examples.
The embodiment can be mainly divided into the following steps:
step 1: semiconductor chip package test line model abstraction: taking a chip packaging test production line as a research object, assuming that a buffer zone with a limited size exists between stations, a queuing rule is first served and abstracted into a multi-station serial-parallel queuing production line model containing re-entry (re-engineering) (figure 2).
Step 2:
step 2.1: and (5) calculating variability.
Calculating arrival variability c a And processing time variability c e 。
Step 2.2: and determining a performance prediction basic index.
Average processing time CT of workpieces at queue q And effective processing time t e The average time CT (production period) of the work station is obtained, the average work-in-process level WIP at the work station is further calculated, and the work-in-process production rate TH, the production period CT and the work-in-process level WIP are used as basic indexes for predicting the production linear energy.
CT=CT q +t e
WIP=CT×TH
Step 2.3: and establishing a production line performance prediction model.
Step 2.3.1: calculating queuing time of the product j at the workstation i:
wherein c a ij 、c e ij The arrival variability and the processing time variability of the product j at the station i, u ij For the utilization rate of station i, m ij For the number of equipment connected in parallel for work station i, t e ij For the effective processing time of product j at station i.
Step 2.3.2: and calculating the production rate TH of the workpiece.
Let m be the case in station i ij (b>m>1) The parallel equipment of the station, b is the capacity of a buffer zone in front of the station i, k is the number of workpieces being processed by the station i, if k is more than or equal to 0 and less than or equal to b, the probability p of processing the workpieces j (0 < j < r, r represents the number of products processed together in the production line) which are not waiting in front of the station i 0 The method comprises the following steps:
let q hj For the defective rate of the workpiece j on the work station h, Q ij The defective rate monitored by the work station i is in a value range of 0 < h < i.ltoreq.s, wherein s represents the number of the work stations in the series-parallel production line. Defective probability Q of workpiece j detected and removed at station i ij The method comprises the following steps:
The production rate TH of the workpiece j at the station i ij The method comprises the following steps:
the production rate of the bottleneck station I of the product J is recorded as r b IJ =max(u ij )。
Step 2.3.3: calculating production cycle (logic production cycle) CT of production line j And WIP at work-in-process level j 。
Calculating the workpiece average wait batch time WTBT:
wherein r is a Representing the rate of arrival of the workpiece at the workstation, k ij Indicating the product j processing lot size at station i, at this timeThen->Rewriting CT q ij The calculation formula is as follows:
calculating the production period CT of the product j at the station i j And WIP at work-in-process level j :
Thereby obtainingProduction cycle (logic production cycle) CT to product j in whole series-parallel production line j And WIP at work-in-process level j :
Step 2.4: and evaluating the performance of the production line performance prediction model.
Step 2.4.1: and calculating the performance index F of the production line.
As shown in FIG. 3, the WIP-CT and WIP-TH curves in the best case, worst case and actual worst case of the production line are used as targets to define the "good zone" and "bad zone" in the performance quadrant, which constitutes the performance evaluation graph of the production line.
Taking the ratio of the distance between the actual performance point and the distance between the best case and the actual worst case targets as a performance evaluation index, and marking as F:
wherein w represents a given actual work-in-process level, T represents an actual production period, T 0 Represents the theoretical processing time of the production line, where T 0 =CT;r b Represents the bottleneck rate of the production line, where r b =TH ij If and only if u ij =u max 。
Step 2.4.2: and calculating the benefit index Bf of the production line.
Considering the production cost, and rewriting the production linear energy index F into a benefit index Bf:
Bf=C*F
wherein C is a cost factor, C 1 Unit equipment cost, c 2 Cost per buffer capacity, c 3 For the rest of the fixed cost, m 1 And b 1 Respectively the current parallel equipment number and the buffer capacity size, m 0 And b 0 The initial number of parallel devices and the buffer capacity size, respectively.
Step 3:
step 3.1: and (5) performing qualitative analysis on sensitivity of Morris screening method.
Selecting a certain parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
wherein Y is 0 The performance evaluation index F corresponding to the initial value of the parameter x; y is Y g 、Y g+1 The performance evaluation index F after the disturbance change of the parameter x of the g time and the g+1st time; p (P) g 、P g +1 is the change rate of the value of the parameter disturbance change after the parameter disturbance change of the g th time and the g+1 th time relative to the initial value, and n is the operation times.
Table 1 shows the sensitivity coefficients of the performance evaluation index F obtained by the Morris screening method for different parameters.
TABLE 1 sensitivity coefficient S of index F
Parameter name | Unit (B) | Parameter meaning | Sensitivity coefficient S |
u | % | Utilization rate | 1.242 |
r 0 | Piece/min | Feed rate | -0.163 |
ra | Piece/min | Production rate | 0.622 |
k | Piece | Processing batch size | 0.478 |
c a | / | Workpiece arrival time variability | 0.350 |
c e | / | Workability variability | 0.457 |
m | Bench | Number of parallel devices | -1.134 |
A | % | Device availability | -0.104 |
b | Piece | Buffer size | 0.581 |
Q | % | Defective rate of workpieces | -0.029 |
Based on the sensitivity level and the relation between the parameters in Table 2, the number of parallel devices m, the processing lot size k, and the workpiece arrival time variability c a Workability c e And buffer capacity size b is determined as a factor that can have a greater impact on semiconductor package test production linearity.
TABLE 2 sensitivity grading criteria
Absolute value of sensitivity coefficient | Sensitivity grading |
0.00≤/S/<0.05 | Insensitivity to |
0.05≤/S/<0.20 | Moderate sensitivity |
0.20≤/S/<1.00 | More sensitive |
/S/≥1.00 | High sensitivity |
Step 3.2: arena simulation sensitivity quantitative analysis.
A semiconductor chip package test serial-parallel production line model is built in Arena software as shown in fig. 4. Each device has an independent random process time, failure time and maintenance time.
The work piece arrival rate, the work station equipment processing rate and the average time before failure m on the production line f Average repair time m p And respectively obeying negative exponential distribution and normal distribution, wherein the processing batch size k, the buffer capacity b and the parallel equipment number m are fixed positive integers, b is more than m and more than 1, the preheating time of a simulation experiment is set to 600 minutes, the total operation time is set to 1200 minutes, and the test is repeated for 3 times.
Experiments have resulted in a profile of overall line performance, production cycle CT, production rate TH, and WIP at work-in-process level with respect to key factors affecting line performance. As shown in fig. 6, the production line performance is related to time variability c a And processing variability c e Is a variation graph of (a).
Step 4:
step 4.1: the method is characterized in that a production line performance prediction model is used as an reinforcement learning external environment, the change of the variability of the production line is used as a trigger condition, and a semiconductor chip packaging test production line performance control model based on reinforcement learning as shown in fig. 5 is established based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy.
Step 4.2: the initialization of the values of Q (s, a),a.epsilon.A(s), which isThe medium Q value is a reflection of long term consideration and S is a system state set. The division is shown in table 3:
TABLE 3 System State set Spartitionings
System status | Division basis | System status | |
s1 | |||
0≤Bf≤0.1 | s2 | 0.1<Bf≤0.2 | |
s3 | 0.2<Bf≤0.3 | s4 | 0.3<Bf≤0.4 |
s5 | 0.4<Bf≤05 | s6 | 0.5<Bf≤0.6 |
s7 | 0.6<Bf≤0.7 | s8 | 0.7<Bf≤0.8 |
s9 | 0.8<Bf≤0.9 | s10 | 0.9<Bf≤1.0 |
s11 | Bf≥1.0 |
A(s) is an action policy set, A(s): { a1+1, a2:1, a3:1+1, a4:1, a5:1+1, a6:1 }. Setting the parameter learning rate factor alpha as 0.1, the discount factor gamma as 0.9, and determining the return function r as follows, bf pre Representing the benefit index after the last optimization of the production line:
step 4.3: given a starting state s, and selecting action a at state s according to an ε -greedy strategy.
Step 4.4: selecting action a and b as the selection sequence number of a in state s according to epsilon-greedy strategy to obtain return r and next state s next ,a next The Q value is updated on behalf of the next action:
s=s next ,a=a next
step 4.5: the process goes to step 4.4 until the system goes towards a steady state, i.e. a converging state.
Step 4.6: and repeatedly executing the steps 4.2 to 4.5 until the learning period (the number of times that the steps 4.2 to 4.5 are repeatedly executed, which are preset by the algorithm) is ended, and stopping iteration.
Step 4.7: outputting the final policyAnd obtaining the index optimization condition of the production line performance. Fig. 7 and 8 show the production linear energy index variation before and after performance control at different levels of variability CV1 and CV2, respectively.
In summary, the invention establishes a more accurate semiconductor packaging test series-parallel production linear energy prediction model, comprehensively uses Morris screening method and Arena simulation method to carry out global sensitivity quantitative analysis, obtains a plurality of influence factors and influence rules thereof which have the greatest influence on production linear energy, and avoids the conditions that the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable; and the value mode of the parameter epsilon is improved, so that the algorithm convergence speed is higher, local optimization is avoided, and better flexibility and instantaneity are realized.
Claims (1)
1. The chip packaging test production line performance control method based on Q-learning reinforcement learning comprises the following steps:
step 1: constructing an abstract model of a semiconductor chip packaging test serial-parallel production line;
step 2: based on the production line abstract model constructed in the step 1, a prediction model of the performance of the semiconductor chip packaging test serial-parallel production line is established;
step 3: based on the production line abstract model constructed in the step 1, obtaining an influence mechanism of key variability factors on the performance of the production line according to Morris screening legal analysis and Arena simulation quantitative analysis;
step 4: based on the prediction model established in the step 2 and the key variability analysis obtained in the step 3, establishing a performance control model based on a Q-learning reinforcement learning algorithm, and carrying out iterative solution by taking the optimal benefit index of the production line as a performance control target to obtain a global optimal performance control strategy;
the step 1 specifically comprises the following steps: taking the subsequent process of a semiconductor production line, namely a chip packaging test production line as a research object, assuming that a limited buffer area exists between stations, the queuing rule is first served, and abstracting the queuing rule into a multi-station serial-parallel queuing production line model containing reentrant;
the step 2 specifically comprises the following steps:
step 2.1: variability calculation: calculating arrival variability c a And processing time variability c e ;
Step 2.2: determining a performance prediction basic index;
average processing time CT of workpieces at queue q And effective processing time t e Obtaining an average time CT of residing in a workstation, namely a production period; further calculating to obtain average work-in-process level WIP at a work station, and taking the work-in-process level WIP, the production rate TH and the production period CT of the work-in-process as basic production linear energy prediction indexes;
CT=CT q +t e
WIP=CT×TH
step 2.3: establishing a production line performance prediction model;
step 2.3.1: calculating queuing time of the product j at the workstation i:
wherein c a ij 、c e ij The arrival variability and the processing time variability of the product j at the station i, u ij For the utilization rate of station i, m ij For the number of equipment connected in parallel for work station i, t e ij The effective processing time of the product j at the station i is;
step 2.3.2: calculating the production rate TH of the workpiece;
the station i has m ij Station parallel equipment, b is the capacity of a buffer zone before a work station i, k is the number of work pieces being processed by the work station i, b>m>1, a step of; if k is more than or equal to 0 and less than or equal to b, the probability p of processing work j without waiting before work station i 0 For, where 0 < j < r, r represents the number of co-processed products in the production line:
Let q hj For the defective rate of the workpiece j on the work station h, Q ij For the defective product rate monitored by the work station i, the value range is more than 0 and less than i and less than or equal to s, wherein s represents the number of the work stations in the serial-parallel production line, and the defective product probability Q of the work piece j detected and removed on the work station i ij The method comprises the following steps:
the production rate TH of the workpiece j at the station i ij The method comprises the following steps:
when the utilization rate of a certain station is maximum, the station I is the bottleneck station of the product J, and the production rate is r b IJ =max(u ij );
Step 2.3.3: calculating production cycle CT of production line j And WIP at work-in-process level j ;
Calculating the workpiece average wait batch time WTBT:
wherein r is a Representing the rate at which the workpiece arrives at the workstation, where k ij Indicating the product j processing lot size at station i, at this timeThen->Rewriting CT q ij The calculation formula is as follows:
calculating the production period CT of the product j at the station i j And WIP at work-in-process level j :
Thereby obtaining the production period CT of the product j in the whole series-parallel production line j And WIP at work-in-process level j :
Step 2.4: evaluating the performance of the production line performance prediction model;
step 2.4.1: calculating a performance index F of the production line;
the WIP-CT and WIP-TH curves of the production line under the best condition, the worst condition and the actual worst condition are used as marker posts to define a good area and a bad area in the performance quadrant, so as to form a performance evaluation graph of the production line;
taking the ratio of the distance between the actual performance point and the distance between the best case and the actual worst case targets as a performance evaluation index, and marking as F:
wherein w represents a given actual work-in-process level, T represents an actual production period, T 0 Represents the theoretical processing time of the production line, where T 0 =CT;r b Represents the bottleneck rate of the production line, where r b =TH ij If and only if u ij =u max ;
Step 2.4.2: calculating a benefit index Bf of the production line;
considering the production cost, and rewriting the production linear energy index F into a benefit index Bf:
Bf=C*F
wherein C is a cost factor, C 1 Unit equipment cost, c 2 Cost per buffer capacity, c 3 For the rest of the fixed cost, m 1 And b 1 Respectively the current parallel equipment number and the buffer capacity size, m 0 And b 0 The number of the initial parallel devices and the size of the buffer area capacity are respectively;
the step 3 specifically comprises the following steps:
step 3.1: qualitative analysis of sensitivity of Morris screening method;
selecting a random parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
wherein Y is 0 The performance evaluation index F corresponding to the initial value of the parameter x; y is Y g 、Y g+1 For parameter x of g-th and g+1th times g Performance evaluation index F after disturbance change; p (P) g 、P g +1 is the change rate of the value of the parameter after disturbance change of the g-th parameter and the g+1st parameter relative to the initial value, and n is the operation times;
according to the sensitivity grading standard, determining parameters of higher sensitivity and high sensitivity coefficient as factors which have larger influence on the semiconductor package test production linearity; the sensitivity grading standard according to the absolute value of the sensitivity coefficient is as follows: the sensitivity is not more than 0.00 and less than 0.05, the sensitivity is not more than 0.05 and less than 0.20, the sensitivity is more sensitive and less than 0.20 and less than 1.00, and the sensitivity is high and more than 1.00;
step 3.2: arena simulation sensitivity quantitative analysis;
establishing a semiconductor chip packaging test serial-parallel production line model in Arena software, wherein each device has independent random processing time, failure time and maintenance time;
the work piece arrival rate, the work station equipment processing rate and the average time before failure m on the production line f Average repair time m p Respectively obeying negative index distribution and normal distribution, wherein the processing batch size k, the buffer capacity b and the parallel equipment number m are fixed positive integers, b is more than m and more than 1, and the simulation experiment preheating time setting, the running total time and the experiment repetition times are set;
the variation curves of the overall performance of the production line, the production period CT, the production rate TH and the WIP of the product level about key factors influencing the performance of the production line are obtained through experiments;
the step 4 specifically comprises the following steps:
step 4.1: taking a production line performance prediction model as an reinforcement learning external environment, taking the change of the production line variability as a trigger condition, and establishing a semiconductor chip packaging test production line performance control model based on reinforcement learning based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy;
step 4.2: initializing the initial values of A (s, a),a epsilon A (S), wherein the A value is the reflection of long-term rewards, S is a system state set, and A (S) is an action strategy set of key factors obtained in the step 4.2; setting a parameter learning rate factor alpha and a discount factor gamma, and determining a return function r;
step 4.3: giving a starting state s, and selecting an action a in the state s according to an epsilon-greedy strategy; the modified epsilon value mode is set as a function:wherein p is the current execution deployment step number of the algorithm, and M is the total iteration step number of the algorithm;
step 4.4: selecting action a and b as the selection sequence number of a in state s according to the e greedy strategy to obtain return r and next state s next s,a next The Q value is updated on behalf of the next action:
s=s next ,a=a next
step 4.5: turning to step 4.4 until the system goes towards a steady state, i.e. a converging state;
step 4.6: repeatedly executing the steps 4.2 to 4.5 until the learning period, namely the repeated execution times of the steps 4.2 to 4.5 preset by the algorithm, is ended, and stopping iteration;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010797879.2A CN111857081B (en) | 2020-08-10 | 2020-08-10 | Chip packaging test production linear energy control method based on Q-learning reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010797879.2A CN111857081B (en) | 2020-08-10 | 2020-08-10 | Chip packaging test production linear energy control method based on Q-learning reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111857081A CN111857081A (en) | 2020-10-30 |
CN111857081B true CN111857081B (en) | 2023-05-05 |
Family
ID=72971238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010797879.2A Active CN111857081B (en) | 2020-08-10 | 2020-08-10 | Chip packaging test production linear energy control method based on Q-learning reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111857081B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112631216B (en) * | 2020-12-11 | 2023-07-21 | 江苏晶度半导体科技有限公司 | Semiconductor test packaging production linear energy prediction control system based on DQN and DNN twin neural network algorithm |
CN113033815A (en) * | 2021-02-07 | 2021-06-25 | 广州杰赛科技股份有限公司 | Intelligent valve cooperation control method, device, equipment and storage medium |
CN113962470B (en) * | 2021-10-29 | 2022-06-24 | 上海新科乾物联技术有限公司 | Optimized scheduling method and system based on disturbance prediction |
CN115933412B (en) * | 2023-01-12 | 2023-07-14 | 中国航发湖南动力机械研究所 | Aeroengine control method and device based on event-triggered predictive control |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103676881A (en) * | 2013-12-16 | 2014-03-26 | 北京化工大学 | Dynamic bottleneck analytical method of semiconductor production line |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4334843B2 (en) * | 2002-10-07 | 2009-09-30 | パナソニック株式会社 | Production plan creation method |
CN108646684B (en) * | 2018-05-30 | 2020-10-23 | 电子科技大学 | Multi-product production line production period prediction method based on variability measurement |
CN109270904A (en) * | 2018-10-22 | 2019-01-25 | 中车青岛四方机车车辆股份有限公司 | A kind of flexible job shop batch dynamic dispatching optimization method |
CN110378439B (en) * | 2019-08-09 | 2021-03-30 | 重庆理工大学 | Single robot path planning method based on Q-Learning algorithm |
CN110517002B (en) * | 2019-08-29 | 2022-11-15 | 烟台大学 | Production control method based on reinforcement learning |
-
2020
- 2020-08-10 CN CN202010797879.2A patent/CN111857081B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103676881A (en) * | 2013-12-16 | 2014-03-26 | 北京化工大学 | Dynamic bottleneck analytical method of semiconductor production line |
Also Published As
Publication number | Publication date |
---|---|
CN111857081A (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111857081B (en) | Chip packaging test production linear energy control method based on Q-learning reinforcement learning | |
CN106874581B (en) | Building air conditioner energy consumption prediction method based on BP neural network model | |
CN108694502B (en) | Self-adaptive scheduling method for robot manufacturing unit based on XGboost algorithm | |
CN111353656B (en) | Steel enterprise oxygen load prediction method based on production plan | |
CN103745273B (en) | Semiconductor fabrication process multi-performance prediction method | |
CN103676881A (en) | Dynamic bottleneck analytical method of semiconductor production line | |
El Adl et al. | Hierarchical modeling and control of re-entrant semiconductor manufacturing facilities | |
CN112947300A (en) | Virtual measuring method, system, medium and equipment for processing quality | |
Liao et al. | Short term load forecasting and early warning of charging station based on PSO-SVM | |
CN111680712B (en) | Method, device and system for predicting oil temperature of transformer based on similar time in day | |
CN104850923B (en) | Semiconductor production analogue system | |
Waheeb et al. | Forecasting the behavior of gas furnace multivariate time series using ridge polynomial based neural network models | |
CN112700050A (en) | Method and system for predicting ultra-short-term 1 st point power of photovoltaic power station | |
CN111027760A (en) | Power load prediction method based on least square vector machine | |
CN115793456A (en) | Lightweight sensitivity-based power distribution network edge side multi-mode self-adaptive control method | |
Deng et al. | A bottleneck prediction and rolling horizon scheme combined dynamic scheduling algorithm for semiconductor wafer fabrication | |
Nugraheni et al. | Hybrid Metaheuristics for Job Shop Scheduling Problems. | |
CN113900379A (en) | Neural network-based predictive control algorithm | |
Naderi et al. | Optimizing flexible manufacturing system: A developed computer simulation model | |
CN113054653A (en) | Power system transient stability evaluation method based on VGGNet-SVM | |
CN110543724A (en) | Satellite structure performance prediction method for overall design | |
Lee et al. | Machine learning-based periodic setup changes for semiconductor manufacturing machines | |
CN113705089B (en) | Aerial optical cable stress calculation method based on artificial fish swarm-Newton iteration | |
Wang et al. | A hybrid technology for assembly sequence planning of reflector panels | |
ZHANG et al. | Hybrid Fruit Fly Algorithm Based on Bi-objective Job-shop Scheduling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |