CN111857081B - Chip packaging test production linear energy control method based on Q-learning reinforcement learning - Google Patents

Chip packaging test production linear energy control method based on Q-learning reinforcement learning Download PDF

Info

Publication number
CN111857081B
CN111857081B CN202010797879.2A CN202010797879A CN111857081B CN 111857081 B CN111857081 B CN 111857081B CN 202010797879 A CN202010797879 A CN 202010797879A CN 111857081 B CN111857081 B CN 111857081B
Authority
CN
China
Prior art keywords
production line
production
performance
station
work
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010797879.2A
Other languages
Chinese (zh)
Other versions
CN111857081A (en
Inventor
李波
冯益铭
钱鑫森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010797879.2A priority Critical patent/CN111857081B/en
Publication of CN111857081A publication Critical patent/CN111857081A/en
Application granted granted Critical
Publication of CN111857081B publication Critical patent/CN111857081B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/41885Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by modeling, simulation of the manufacturing system
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32339Object oriented modeling, design, analysis, implementation, simulation language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • General Factory Administration (AREA)

Abstract

The invention relates to the field of control and optimization of the production linear energy of semiconductor chip packaging test, in particular to a method for controlling the production linear energy of the chip packaging test based on Q-learning reinforcement learning. According to the invention, a more accurate performance prediction model of the semiconductor packaging test series-parallel production line is established, and a Morris screening method and an Arena simulation method are comprehensively used for carrying out global sensitivity quantitative analysis, so that a plurality of influence factors and influence rules thereof with the greatest influence on production linearity can be obtained, and the situations that the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable are avoided. The invention controls the variability factors of the production line on the basis of performance prediction and sensitivity analysis, improves the value mode of the parameter epsilon, ensures that the algorithm converges more rapidly and avoids local optimization, and simultaneously has better flexibility and instantaneity.

Description

Chip packaging test production linear energy control method based on Q-learning reinforcement learning
Technical Field
The invention relates to the field of semiconductor chip packaging test production linear energy control and optimization, in particular to a performance control method for a semiconductor chip packaging test production line, which combines sensitivity analysis and a Q-learning reinforcement learning algorithm.
Background
The semiconductor manufacturing industry has great strategic value for the development of national economy, and in order to keep the good development of the semiconductor manufacturing industry in China, the production efficiency of a manufacturing system needs to be focused and the production management control technology needs to be enhanced besides the expansion of the production scale. Because the semiconductor manufacturing system has the production characteristics of high reentry of a process path, high complexity of a production process, long manufacturing period, huge system scale, high uncertainty and the like, the difficulty of controlling the performance of a production line is high. The production performance of a manufacturing system is greatly influenced by various variability factors such as buffer capacity, equipment sudden faults, equipment preventive maintenance, product reworking and the like, so that the production efficiency is reduced, the production period is prolonged, and the normal execution of a production plan is influenced.
The current research on intelligent, comprehensive and dynamic control of the performance of the production line is less, and is mostly limited to a certain aspect of the variability of the production line, and various variability factors on the production line cannot be considered globally; the semiconductor serial-parallel production linear energy prediction model established in the current research has certain deviation from the actual production condition, and the accuracy is deficient; the traditional performance control optimization method is difficult to control in real time aiming at the change of the variability factors of the production line, and has insufficient flexibility.
Disclosure of Invention
Aiming at the defects of the performance control model and strategy of the existing semiconductor chip packaging test production line, the invention provides a chip packaging test production linear energy control method based on Q-learning reinforcement learning. Aiming at the problems of untimely response of the existing variability factors, incomplete consideration of the variability factors, conflict of control strategies and the like, the method provided by the invention is used for intelligently controlling the manufacturing performance of the semiconductor chip packaging test production line by combining sensitivity analysis and a Q-learning reinforcement learning algorithm.
A chip packaging test production line performance control method based on Q-learning reinforcement learning comprises the following steps:
step 1: constructing an abstract model of a semiconductor chip packaging test serial-parallel production line;
step 2: based on the production line abstract model constructed in the step 1, a prediction model of the performance of the semiconductor chip packaging test serial-parallel production line is established;
step 3: based on the production line abstract model constructed in the step 1, obtaining an influence mechanism of key variability factors on the performance of the production line according to Morris screening legal analysis and Arena simulation quantitative analysis;
step 4: and (3) establishing a performance control model based on a Q-learning reinforcement learning algorithm based on the prediction model of the performance of the semiconductor chip packaging test serial-parallel production line established in the step (2) and the key variability analysis obtained in the step (3), and carrying out iterative solution by taking the benefit index of the production line as a performance control target to obtain a global optimal performance control strategy.
The step 1 specifically comprises the following steps:
semiconductor chip package test line model abstraction: the method uses the subsequent process of the semiconductor production line, namely the chip packaging test production line as a research object, and supposes that a limited buffer area exists between stations, the queuing rule is first to serve, and abstracts the queuing rule into a multi-station serial-parallel queuing production line model containing re-entry (re-working).
The step 2 specifically comprises the following steps:
step 2.1: variability calculation: calculating arrival variability c a And processing time variability c e
Step 2.2: and determining a performance prediction basic index.
Average processing time CT of workpieces at queue q And effective processing time t e The average time CT (production period) of the work station is obtained, the average work-in-process level WIP at the work station is further calculated, and the work-in-process production rate TH, the production period CT and the work-in-process level WIP are used as basic indexes for predicting the production linear energy.
CT=CT q +t e
WIP=CT×TH
Step 2.3: and establishing a production line performance prediction model.
Step 2.3.1: calculating queuing time of the product j at the workstation i:
Figure BDA0002626324590000021
wherein c a ij 、c e ij The arrival variability and the processing time variability of the product j at the station i, u ij For the utilization rate of station i, m ij For the number of equipment connected in parallel for work station i, t e ij For the effective processing time of product j at station i.
Step 2.3.2: and calculating the production rate TH of the workpiece.
Let m be the case in station i ij (b>m>1) The parallel equipment of the station, b is the capacity of a buffer zone in front of the station i, k is the number of workpieces being processed by the station i, if k is more than or equal to 0 and less than or equal to b, the probability p of processing the workpieces j (0 < j < r, r represents the number of products processed together in the production line) which are not waiting in front of the station i 0 The method comprises the following steps:
Figure BDA0002626324590000022
blocking probability of workpiece j with capacity b in cache region
Figure BDA0002626324590000023
The method comprises the following steps:
Figure BDA0002626324590000024
let q hj For the defective rate of the workpiece j on the work station h, Q ij For the defective product rate monitored by the work station i, the value range is more than 0 and less than i and less than or equal to s, wherein s represents the number of the work stations in the serial-parallel production line, and the defective product probability Q of the work piece j detected and removed on the work station i ij The method comprises the following steps:
Figure BDA0002626324590000031
Figure BDA0002626324590000039
representing a set of all defective product detection station numbers in the production line.
The production rate TH of the workpiece j at the station i ij The method comprises the following steps:
Figure BDA0002626324590000032
when the utilization rate of a certain station is maximum, the station I is the bottleneck station of the product J, and the production rate is r b IJ =max(u ij )。
Step 2.3.3: calculating production cycle (logic production cycle) CT of production line j And WIP at work-in-process level j
Calculating the workpiece average wait batch time WTBT:
Figure BDA0002626324590000033
wherein r is a Representing the rate at which the workpiece arrives at the workstation, where k ij Indicating the product j processing lot size at station i, at this time
Figure BDA0002626324590000034
Then->
Figure BDA0002626324590000035
Rewriting CT q ij The calculation formula is as follows:
Figure BDA0002626324590000036
calculating the production period CT of the product j at the station i j And WIP at work-in-process level j
Figure BDA0002626324590000037
Figure BDA0002626324590000038
Thereby obtaining the production cycle (logic production cycle) CT of the product j in the whole series-parallel production line j And WIP at work-in-process level j
Figure BDA0002626324590000041
Figure BDA0002626324590000042
Step 2.4: and evaluating the performance of the production line performance prediction model.
Step 2.4.1: and calculating the performance index F of the production line.
As shown in FIG. 3, the WIP-CT and WIP-TH curves in the best case, worst case and actual worst case of the production line are used as targets to define the "good zone" and "bad zone" in the performance quadrant, which constitutes the performance evaluation graph of the production line.
Taking the ratio of the distance between the actual performance point and the distance between the best case and the actual worst case targets as a performance evaluation index, and marking as F:
Figure BDA0002626324590000043
/>
wherein w represents a given actual work-in-process level, T represents an actual production period, T 0 Represents the theoretical processing time of the production line, where T 0 =CT;r b Represents the bottleneck rate of the production line, where r b =TH ij If and only if u ij =u max
Step 2.4.2: and calculating the benefit index Bf of the production line.
Considering the production cost, and rewriting the production linear energy index F into a benefit index Bf:
Bf=C*F
Figure BDA0002626324590000044
wherein C is a cost factor, C 1 Unit equipment cost, c 2 Cost per buffer capacity, c 3 For the rest of the fixed cost, m 1 And b 1 Respectively the current parallel equipment number and the buffer capacity size, m 0 And b 0 The initial number of parallel devices and the buffer capacity size, respectively.
The step 3 specifically comprises the following steps:
step 3.1: and (5) performing qualitative analysis on sensitivity of Morris screening method.
Selecting a random parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
Figure BDA0002626324590000045
wherein Y is 0 The performance evaluation index F corresponding to the initial value of the parameter x; y is Y g 、Y g+1 Disturbance for parameter xg at g-th and g+1-th timesA performance evaluation index F after the change; p (P) g 、P g +1 is the change rate of the value of the parameter disturbance change after the parameter disturbance change of the g th time and the g+1 th time relative to the initial value, and n is the operation times.
The parameters of the more sensitive and high sensitivity coefficients are determined as factors that can affect the semiconductor package test production linearity more according to the sensitivity grading criteria of table 1.
TABLE 1 sensitivity grading criteria
Absolute value of sensitivity coefficient Sensitivity grading
0.00≤/S/<0.05 Insensitivity to
0.05≤/S/<0.20 Moderate sensitivity
0.20≤/S/<1.00 More sensitive
/S/≥1.00 High sensitivity
Step 3.2: arena simulation sensitivity quantitative analysis.
And establishing a semiconductor chip packaging test serial-parallel production line model in Arena software. Each device has an independent random process time, failure time and maintenance time.
The work piece arrival rate, the work station equipment processing rate and the average time before failure m on the production line f Average repairComplex time m p And respectively obeying negative exponential distribution and normal distribution, wherein the processing batch size k, the buffer capacity b and the parallel equipment number m are fixed positive integers, b is more than m and more than 1, and the simulation experiment preheating time setting, the running total time and the experiment repetition times are set.
Experiments have resulted in a profile of overall line performance, production cycle CT, production rate TH, and WIP at work-in-process level with respect to key factors affecting line performance.
The step 4 specifically comprises the following steps:
step 4.1: the method is characterized in that a production line performance prediction model is used as a reinforcement learning external environment, the change of the production line variability is used as a trigger condition, and a semiconductor chip packaging test production line performance control model based on reinforcement learning as shown in fig. 5 is established based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy.
Step 4.2: the initialization of the values of Q (s, a),
Figure BDA0002626324590000051
a epsilon A (S), wherein the Q value is a reflection of long-term consideration, S is a system state set, and A (S) is an action strategy set of key factors obtained in the step 4.2. Given the parameter learning rate factor α and the discount factor γ, a return function r is determined.
Step 4.3: given a starting state s, and selecting action a at state s according to an ε -greedy strategy. The modified epsilon value mode is set as a function:
Figure BDA0002626324590000052
wherein p is the current execution deployment step number of the algorithm, and M is the total iteration step number of the algorithm, so that the value of the algorithm gradually decreases from an initial value of 0.2 along with the increase of the execution step number of the algorithm.
Step 4.4: selecting action a and b as the selection sequence number of a in state s according to epsilon-greedy strategy to obtain return r and next state s next ,a next The Q value is updated on behalf of the next action:
Figure BDA0002626324590000061
s=s next ,a=a next
step 4.5: the process goes to step 4.4 until the system goes towards a steady state, i.e. a converging state.
Step 4.6: and repeatedly executing the steps 4.2 to 4.5 until the learning period (the number of times that the steps 4.2 to 4.5 are repeatedly executed, which are preset by the algorithm) is ended, and stopping iteration.
Step 4.7: outputting the final policy
Figure BDA0002626324590000062
And obtaining the index optimization condition of the production line performance.
According to the invention, a more accurate performance prediction model of the semiconductor packaging test series-parallel production line is established, and a Morris screening method and an Arena simulation method are comprehensively used for carrying out global sensitivity quantitative analysis, so that a plurality of influence factors and influence rules thereof with the greatest influence on production linearity can be obtained, and the situations that the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable are avoided. The invention provides a production line performance control model based on a Q-learning algorithm, which is used for controlling the production line variability factor on the basis of performance prediction and sensitivity analysis, and improving the value mode of a parameter epsilon, so that the algorithm convergence speed is higher, local optimization is avoided, and meanwhile, the performance control method has better flexibility and real-time performance.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is an abstract model of a semiconductor chip package test line;
FIG. 3 is a diagram of a method for evaluating performance of a three-major industrial physical marker post;
FIG. 4 is a schematic diagram of a simulation model logic structure of a production line;
FIG. 5 is a model of line performance control based on reinforcement learning according to an embodiment;
FIG. 6 is a graph of production line performance versus variability ca and ce;
FIG. 7 shows the production line performance index variation before and after performance control at different volatility levels CV 1;
fig. 8 shows the production line performance index change before and after performance control at different variability levels CV 2.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for the purpose of illustrating a detailed embodiment and a specific operation process (fig. 1) based on the technical scheme of the present invention, but the scope of the present invention is not limited to the following examples.
The embodiment can be mainly divided into the following steps:
step 1: semiconductor chip package test line model abstraction: taking a chip packaging test production line as a research object, assuming that a buffer zone with a limited size exists between stations, a queuing rule is first served and abstracted into a multi-station serial-parallel queuing production line model containing re-entry (re-engineering) (figure 2).
Step 2:
step 2.1: and (5) calculating variability.
Calculating arrival variability c a And processing time variability c e
Step 2.2: and determining a performance prediction basic index.
Average processing time CT of workpieces at queue q And effective processing time t e The average time CT (production period) of the work station is obtained, the average work-in-process level WIP at the work station is further calculated, and the work-in-process production rate TH, the production period CT and the work-in-process level WIP are used as basic indexes for predicting the production linear energy.
CT=CT q +t e
WIP=CT×TH
Step 2.3: and establishing a production line performance prediction model.
Step 2.3.1: calculating queuing time of the product j at the workstation i:
Figure BDA0002626324590000071
wherein c a ij 、c e ij The arrival variability and the processing time variability of the product j at the station i, u ij For the utilization rate of station i, m ij For the number of equipment connected in parallel for work station i, t e ij For the effective processing time of product j at station i.
Step 2.3.2: and calculating the production rate TH of the workpiece.
Let m be the case in station i ij (b>m>1) The parallel equipment of the station, b is the capacity of a buffer zone in front of the station i, k is the number of workpieces being processed by the station i, if k is more than or equal to 0 and less than or equal to b, the probability p of processing the workpieces j (0 < j < r, r represents the number of products processed together in the production line) which are not waiting in front of the station i 0 The method comprises the following steps:
Figure BDA0002626324590000072
loss rate of workpiece j at station i
Figure BDA0002626324590000073
The method comprises the following steps:
Figure BDA0002626324590000074
let q hj For the defective rate of the workpiece j on the work station h, Q ij The defective rate monitored by the work station i is in a value range of 0 < h < i.ltoreq.s, wherein s represents the number of the work stations in the series-parallel production line. Defective probability Q of workpiece j detected and removed at station i ij The method comprises the following steps:
Figure BDA0002626324590000081
Figure BDA0002626324590000082
representing all sets with defective product detection station numbers in production line。
The production rate TH of the workpiece j at the station i ij The method comprises the following steps:
Figure BDA0002626324590000083
the production rate of the bottleneck station I of the product J is recorded as r b IJ =max(u ij )。
Step 2.3.3: calculating production cycle (logic production cycle) CT of production line j And WIP at work-in-process level j
Calculating the workpiece average wait batch time WTBT:
Figure BDA0002626324590000084
wherein r is a Representing the rate of arrival of the workpiece at the workstation, k ij Indicating the product j processing lot size at station i, at this time
Figure BDA0002626324590000085
Then->
Figure BDA0002626324590000086
Rewriting CT q ij The calculation formula is as follows:
Figure BDA0002626324590000087
calculating the production period CT of the product j at the station i j And WIP at work-in-process level j
Figure BDA0002626324590000088
Figure BDA0002626324590000089
Thereby obtainingProduction cycle (logic production cycle) CT to product j in whole series-parallel production line j And WIP at work-in-process level j
Figure BDA00026263245900000810
Figure BDA00026263245900000811
Step 2.4: and evaluating the performance of the production line performance prediction model.
Step 2.4.1: and calculating the performance index F of the production line.
As shown in FIG. 3, the WIP-CT and WIP-TH curves in the best case, worst case and actual worst case of the production line are used as targets to define the "good zone" and "bad zone" in the performance quadrant, which constitutes the performance evaluation graph of the production line.
Taking the ratio of the distance between the actual performance point and the distance between the best case and the actual worst case targets as a performance evaluation index, and marking as F:
Figure BDA0002626324590000091
wherein w represents a given actual work-in-process level, T represents an actual production period, T 0 Represents the theoretical processing time of the production line, where T 0 =CT;r b Represents the bottleneck rate of the production line, where r b =TH ij If and only if u ij =u max
Step 2.4.2: and calculating the benefit index Bf of the production line.
Considering the production cost, and rewriting the production linear energy index F into a benefit index Bf:
Bf=C*F
Figure BDA0002626324590000092
wherein C is a cost factor, C 1 Unit equipment cost, c 2 Cost per buffer capacity, c 3 For the rest of the fixed cost, m 1 And b 1 Respectively the current parallel equipment number and the buffer capacity size, m 0 And b 0 The initial number of parallel devices and the buffer capacity size, respectively.
Step 3:
step 3.1: and (5) performing qualitative analysis on sensitivity of Morris screening method.
Selecting a certain parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
Figure BDA0002626324590000093
wherein Y is 0 The performance evaluation index F corresponding to the initial value of the parameter x; y is Y g 、Y g+1 The performance evaluation index F after the disturbance change of the parameter x of the g time and the g+1st time; p (P) g 、P g +1 is the change rate of the value of the parameter disturbance change after the parameter disturbance change of the g th time and the g+1 th time relative to the initial value, and n is the operation times.
Table 1 shows the sensitivity coefficients of the performance evaluation index F obtained by the Morris screening method for different parameters.
TABLE 1 sensitivity coefficient S of index F
Parameter name Unit (B) Parameter meaning Sensitivity coefficient S
u Utilization rate 1.242
r 0 Piece/min Feed rate -0.163
ra Piece/min Production rate 0.622
k Piece Processing batch size 0.478
c a / Workpiece arrival time variability 0.350
c e / Workability variability 0.457
m Bench Number of parallel devices -1.134
A Device availability -0.104
b Piece Buffer size 0.581
Q Defective rate of workpieces -0.029
Based on the sensitivity level and the relation between the parameters in Table 2, the number of parallel devices m, the processing lot size k, and the workpiece arrival time variability c a Workability c e And buffer capacity size b is determined as a factor that can have a greater impact on semiconductor package test production linearity.
TABLE 2 sensitivity grading criteria
Absolute value of sensitivity coefficient Sensitivity grading
0.00≤/S/<0.05 Insensitivity to
0.05≤/S/<0.20 Moderate sensitivity
0.20≤/S/<1.00 More sensitive
/S/≥1.00 High sensitivity
Step 3.2: arena simulation sensitivity quantitative analysis.
A semiconductor chip package test serial-parallel production line model is built in Arena software as shown in fig. 4. Each device has an independent random process time, failure time and maintenance time.
The work piece arrival rate, the work station equipment processing rate and the average time before failure m on the production line f Average repair time m p And respectively obeying negative exponential distribution and normal distribution, wherein the processing batch size k, the buffer capacity b and the parallel equipment number m are fixed positive integers, b is more than m and more than 1, the preheating time of a simulation experiment is set to 600 minutes, the total operation time is set to 1200 minutes, and the test is repeated for 3 times.
Experiments have resulted in a profile of overall line performance, production cycle CT, production rate TH, and WIP at work-in-process level with respect to key factors affecting line performance. As shown in fig. 6, the production line performance is related to time variability c a And processing variability c e Is a variation graph of (a).
Step 4:
step 4.1: the method is characterized in that a production line performance prediction model is used as an reinforcement learning external environment, the change of the variability of the production line is used as a trigger condition, and a semiconductor chip packaging test production line performance control model based on reinforcement learning as shown in fig. 5 is established based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy.
Step 4.2: the initialization of the values of Q (s, a),
Figure BDA0002626324590000113
a.epsilon.A(s), which isThe medium Q value is a reflection of long term consideration and S is a system state set. The division is shown in table 3:
TABLE 3 System State set Spartitionings
System status Division basis System status Division basis
s1
0≤Bf≤0.1 s2 0.1<Bf≤0.2
s3 0.2<Bf≤0.3 s4 0.3<Bf≤0.4
s5 0.4<Bf≤05 s6 0.5<Bf≤0.6
s7 0.6<Bf≤0.7 s8 0.7<Bf≤0.8
s9 0.8<Bf≤0.9 s10 0.9<Bf≤1.0
s11 Bf≥1.0
A(s) is an action policy set, A(s): { a1+1, a2:1, a3:1+1, a4:1, a5:1+1, a6:1 }. Setting the parameter learning rate factor alpha as 0.1, the discount factor gamma as 0.9, and determining the return function r as follows, bf pre Representing the benefit index after the last optimization of the production line:
Figure BDA0002626324590000111
step 4.3: given a starting state s, and selecting action a at state s according to an ε -greedy strategy.
Step 4.4: selecting action a and b as the selection sequence number of a in state s according to epsilon-greedy strategy to obtain return r and next state s next ,a next The Q value is updated on behalf of the next action:
Figure BDA0002626324590000112
s=s next ,a=a next
step 4.5: the process goes to step 4.4 until the system goes towards a steady state, i.e. a converging state.
Step 4.6: and repeatedly executing the steps 4.2 to 4.5 until the learning period (the number of times that the steps 4.2 to 4.5 are repeatedly executed, which are preset by the algorithm) is ended, and stopping iteration.
Step 4.7: outputting the final policy
Figure BDA0002626324590000121
And obtaining the index optimization condition of the production line performance. Fig. 7 and 8 show the production linear energy index variation before and after performance control at different levels of variability CV1 and CV2, respectively.
In summary, the invention establishes a more accurate semiconductor packaging test series-parallel production linear energy prediction model, comprehensively uses Morris screening method and Arena simulation method to carry out global sensitivity quantitative analysis, obtains a plurality of influence factors and influence rules thereof which have the greatest influence on production linear energy, and avoids the conditions that the equipment Markov state space is huge and the traditional mathematical model analysis is not applicable; and the value mode of the parameter epsilon is improved, so that the algorithm convergence speed is higher, local optimization is avoided, and better flexibility and instantaneity are realized.

Claims (1)

1. The chip packaging test production line performance control method based on Q-learning reinforcement learning comprises the following steps:
step 1: constructing an abstract model of a semiconductor chip packaging test serial-parallel production line;
step 2: based on the production line abstract model constructed in the step 1, a prediction model of the performance of the semiconductor chip packaging test serial-parallel production line is established;
step 3: based on the production line abstract model constructed in the step 1, obtaining an influence mechanism of key variability factors on the performance of the production line according to Morris screening legal analysis and Arena simulation quantitative analysis;
step 4: based on the prediction model established in the step 2 and the key variability analysis obtained in the step 3, establishing a performance control model based on a Q-learning reinforcement learning algorithm, and carrying out iterative solution by taking the optimal benefit index of the production line as a performance control target to obtain a global optimal performance control strategy;
the step 1 specifically comprises the following steps: taking the subsequent process of a semiconductor production line, namely a chip packaging test production line as a research object, assuming that a limited buffer area exists between stations, the queuing rule is first served, and abstracting the queuing rule into a multi-station serial-parallel queuing production line model containing reentrant;
the step 2 specifically comprises the following steps:
step 2.1: variability calculation: calculating arrival variability c a And processing time variability c e
Step 2.2: determining a performance prediction basic index;
average processing time CT of workpieces at queue q And effective processing time t e Obtaining an average time CT of residing in a workstation, namely a production period; further calculating to obtain average work-in-process level WIP at a work station, and taking the work-in-process level WIP, the production rate TH and the production period CT of the work-in-process as basic production linear energy prediction indexes;
CT=CT q +t e
WIP=CT×TH
step 2.3: establishing a production line performance prediction model;
step 2.3.1: calculating queuing time of the product j at the workstation i:
Figure FDA0004123693180000011
wherein c a ij 、c e ij The arrival variability and the processing time variability of the product j at the station i, u ij For the utilization rate of station i, m ij For the number of equipment connected in parallel for work station i, t e ij The effective processing time of the product j at the station i is;
step 2.3.2: calculating the production rate TH of the workpiece;
the station i has m ij Station parallel equipment, b is the capacity of a buffer zone before a work station i, k is the number of work pieces being processed by the work station i, b>m>1, a step of; if k is more than or equal to 0 and less than or equal to b, the probability p of processing work j without waiting before work station i 0 For, where 0 < j < r, r represents the number of co-processed products in the production line:
Figure FDA0004123693180000021
blocking probability of workpiece j with capacity b in cache region
Figure FDA0004123693180000022
Is>
Figure FDA0004123693180000023
Let q hj For the defective rate of the workpiece j on the work station h, Q ij For the defective product rate monitored by the work station i, the value range is more than 0 and less than i and less than or equal to s, wherein s represents the number of the work stations in the serial-parallel production line, and the defective product probability Q of the work piece j detected and removed on the work station i ij The method comprises the following steps:
Figure FDA0004123693180000024
Figure FDA00041236931800000210
representing all sets with defective product detection station numbers in the production line;
the production rate TH of the workpiece j at the station i ij The method comprises the following steps:
Figure FDA0004123693180000025
when the utilization rate of a certain station is maximum, the station I is the bottleneck station of the product J, and the production rate is r b IJ =max(u ij );
Step 2.3.3: calculating production cycle CT of production line j And WIP at work-in-process level j
Calculating the workpiece average wait batch time WTBT:
Figure FDA0004123693180000026
wherein r is a Representing the rate at which the workpiece arrives at the workstation, where k ij Indicating the product j processing lot size at station i, at this time
Figure FDA0004123693180000027
Then->
Figure FDA0004123693180000028
Rewriting CT q ij The calculation formula is as follows:
Figure FDA0004123693180000029
calculating the production period CT of the product j at the station i j And WIP at work-in-process level j
Figure FDA0004123693180000031
Figure FDA0004123693180000032
Thereby obtaining the production period CT of the product j in the whole series-parallel production line j And WIP at work-in-process level j
Figure FDA0004123693180000033
Figure FDA0004123693180000034
Step 2.4: evaluating the performance of the production line performance prediction model;
step 2.4.1: calculating a performance index F of the production line;
the WIP-CT and WIP-TH curves of the production line under the best condition, the worst condition and the actual worst condition are used as marker posts to define a good area and a bad area in the performance quadrant, so as to form a performance evaluation graph of the production line;
taking the ratio of the distance between the actual performance point and the distance between the best case and the actual worst case targets as a performance evaluation index, and marking as F:
Figure FDA0004123693180000035
wherein w represents a given actual work-in-process level, T represents an actual production period, T 0 Represents the theoretical processing time of the production line, where T 0 =CT;r b Represents the bottleneck rate of the production line, where r b =TH ij If and only if u ij =u max
Step 2.4.2: calculating a benefit index Bf of the production line;
considering the production cost, and rewriting the production linear energy index F into a benefit index Bf:
Bf=C*F
Figure FDA0004123693180000036
wherein C is a cost factor, C 1 Unit equipment cost, c 2 Cost per buffer capacity, c 3 For the rest of the fixed cost, m 1 And b 1 Respectively the current parallel equipment number and the buffer capacity size, m 0 And b 0 The number of the initial parallel devices and the size of the buffer area capacity are respectively;
the step 3 specifically comprises the following steps:
step 3.1: qualitative analysis of sensitivity of Morris screening method;
selecting a random parameter x in a production line performance prediction model, presetting a fixed step length C and a maximum amplitude M, carrying out disturbance change on the parameter x by the step length C, and taking the average change rate of a performance evaluation index F as a sensitivity coefficient S:
Figure FDA0004123693180000041
wherein Y is 0 The performance evaluation index F corresponding to the initial value of the parameter x; y is Y g 、Y g+1 For parameter x of g-th and g+1th times g Performance evaluation index F after disturbance change; p (P) g 、P g +1 is the change rate of the value of the parameter after disturbance change of the g-th parameter and the g+1st parameter relative to the initial value, and n is the operation times;
according to the sensitivity grading standard, determining parameters of higher sensitivity and high sensitivity coefficient as factors which have larger influence on the semiconductor package test production linearity; the sensitivity grading standard according to the absolute value of the sensitivity coefficient is as follows: the sensitivity is not more than 0.00 and less than 0.05, the sensitivity is not more than 0.05 and less than 0.20, the sensitivity is more sensitive and less than 0.20 and less than 1.00, and the sensitivity is high and more than 1.00;
step 3.2: arena simulation sensitivity quantitative analysis;
establishing a semiconductor chip packaging test serial-parallel production line model in Arena software, wherein each device has independent random processing time, failure time and maintenance time;
the work piece arrival rate, the work station equipment processing rate and the average time before failure m on the production line f Average repair time m p Respectively obeying negative index distribution and normal distribution, wherein the processing batch size k, the buffer capacity b and the parallel equipment number m are fixed positive integers, b is more than m and more than 1, and the simulation experiment preheating time setting, the running total time and the experiment repetition times are set;
the variation curves of the overall performance of the production line, the production period CT, the production rate TH and the WIP of the product level about key factors influencing the performance of the production line are obtained through experiments;
the step 4 specifically comprises the following steps:
step 4.1: taking a production line performance prediction model as an reinforcement learning external environment, taking the change of the production line variability as a trigger condition, and establishing a semiconductor chip packaging test production line performance control model based on reinforcement learning based on a dynamic control method combining an event trigger strategy and a periodic trigger strategy;
step 4.2: initializing the initial values of A (s, a),
Figure FDA0004123693180000042
a epsilon A (S), wherein the A value is the reflection of long-term rewards, S is a system state set, and A (S) is an action strategy set of key factors obtained in the step 4.2; setting a parameter learning rate factor alpha and a discount factor gamma, and determining a return function r;
step 4.3: giving a starting state s, and selecting an action a in the state s according to an epsilon-greedy strategy; the modified epsilon value mode is set as a function:
Figure FDA0004123693180000043
wherein p is the current execution deployment step number of the algorithm, and M is the total iteration step number of the algorithm;
step 4.4: selecting action a and b as the selection sequence number of a in state s according to the e greedy strategy to obtain return r and next state s next s,a next The Q value is updated on behalf of the next action:
Figure FDA0004123693180000051
s=s next ,a=a next
step 4.5: turning to step 4.4 until the system goes towards a steady state, i.e. a converging state;
step 4.6: repeatedly executing the steps 4.2 to 4.5 until the learning period, namely the repeated execution times of the steps 4.2 to 4.5 preset by the algorithm, is ended, and stopping iteration;
step 4.7: outputting the final policy
Figure FDA0004123693180000052
And obtaining the index optimization condition of the production line performance. />
CN202010797879.2A 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning Active CN111857081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010797879.2A CN111857081B (en) 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010797879.2A CN111857081B (en) 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning

Publications (2)

Publication Number Publication Date
CN111857081A CN111857081A (en) 2020-10-30
CN111857081B true CN111857081B (en) 2023-05-05

Family

ID=72971238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010797879.2A Active CN111857081B (en) 2020-08-10 2020-08-10 Chip packaging test production linear energy control method based on Q-learning reinforcement learning

Country Status (1)

Country Link
CN (1) CN111857081B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631216B (en) * 2020-12-11 2023-07-21 江苏晶度半导体科技有限公司 Semiconductor test packaging production linear energy prediction control system based on DQN and DNN twin neural network algorithm
CN113033815A (en) * 2021-02-07 2021-06-25 广州杰赛科技股份有限公司 Intelligent valve cooperation control method, device, equipment and storage medium
CN113962470B (en) * 2021-10-29 2022-06-24 上海新科乾物联技术有限公司 Optimized scheduling method and system based on disturbance prediction
CN115933412B (en) * 2023-01-12 2023-07-14 中国航发湖南动力机械研究所 Aeroengine control method and device based on event-triggered predictive control

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103676881A (en) * 2013-12-16 2014-03-26 北京化工大学 Dynamic bottleneck analytical method of semiconductor production line

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4334843B2 (en) * 2002-10-07 2009-09-30 パナソニック株式会社 Production plan creation method
CN108646684B (en) * 2018-05-30 2020-10-23 电子科技大学 Multi-product production line production period prediction method based on variability measurement
CN109270904A (en) * 2018-10-22 2019-01-25 中车青岛四方机车车辆股份有限公司 A kind of flexible job shop batch dynamic dispatching optimization method
CN110378439B (en) * 2019-08-09 2021-03-30 重庆理工大学 Single robot path planning method based on Q-Learning algorithm
CN110517002B (en) * 2019-08-29 2022-11-15 烟台大学 Production control method based on reinforcement learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103676881A (en) * 2013-12-16 2014-03-26 北京化工大学 Dynamic bottleneck analytical method of semiconductor production line

Also Published As

Publication number Publication date
CN111857081A (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111857081B (en) Chip packaging test production linear energy control method based on Q-learning reinforcement learning
CN106874581B (en) Building air conditioner energy consumption prediction method based on BP neural network model
CN108694502B (en) Self-adaptive scheduling method for robot manufacturing unit based on XGboost algorithm
CN111353656B (en) Steel enterprise oxygen load prediction method based on production plan
CN103745273B (en) Semiconductor fabrication process multi-performance prediction method
CN103676881A (en) Dynamic bottleneck analytical method of semiconductor production line
El Adl et al. Hierarchical modeling and control of re-entrant semiconductor manufacturing facilities
CN112947300A (en) Virtual measuring method, system, medium and equipment for processing quality
Liao et al. Short term load forecasting and early warning of charging station based on PSO-SVM
CN111680712B (en) Method, device and system for predicting oil temperature of transformer based on similar time in day
CN104850923B (en) Semiconductor production analogue system
Waheeb et al. Forecasting the behavior of gas furnace multivariate time series using ridge polynomial based neural network models
CN112700050A (en) Method and system for predicting ultra-short-term 1 st point power of photovoltaic power station
CN111027760A (en) Power load prediction method based on least square vector machine
CN115793456A (en) Lightweight sensitivity-based power distribution network edge side multi-mode self-adaptive control method
Deng et al. A bottleneck prediction and rolling horizon scheme combined dynamic scheduling algorithm for semiconductor wafer fabrication
Nugraheni et al. Hybrid Metaheuristics for Job Shop Scheduling Problems.
CN113900379A (en) Neural network-based predictive control algorithm
Naderi et al. Optimizing flexible manufacturing system: A developed computer simulation model
CN113054653A (en) Power system transient stability evaluation method based on VGGNet-SVM
CN110543724A (en) Satellite structure performance prediction method for overall design
Lee et al. Machine learning-based periodic setup changes for semiconductor manufacturing machines
CN113705089B (en) Aerial optical cable stress calculation method based on artificial fish swarm-Newton iteration
Wang et al. A hybrid technology for assembly sequence planning of reflector panels
ZHANG et al. Hybrid Fruit Fly Algorithm Based on Bi-objective Job-shop Scheduling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant