CN117520741A

CN117520741A - Method for predicting and improving yield of semiconductor factory based on big data

Info

Publication number: CN117520741A
Application number: CN202311313568.4A
Authority: CN
Inventors: 陈一宁; 郭庞; 高大为; 陈鼎崴
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-10-11
Filing date: 2023-10-11
Publication date: 2024-02-06

Abstract

A method for predicting and improving yield of a semiconductor factory based on big data is used for preprocessing unbalance of WAT data and CP data in the collected semiconductor factory, performing dimension reduction processing on the high-dimensional data, enhancing robustness and interpretability of a model, correlating the big data with the yield of a wafer by adopting a machine learning model, and analyzing test factors causing reduction of the yield. The model establishment and analysis are carried out on the WAT data and the CP data after the processing, the root cause analysis of the yield is carried out according to the result of the model, the root cause analysis efficiency can be greatly improved, the economic benefit of a manufacturing plant is improved, and the model establishment method has comprehensiveness, prediction accuracy, and analysis rapidity and reliability on the WAT data and the CP data in the processing process.

Description

Method for predicting and improving yield of semiconductor factory based on big data

Technical Field

The invention relates to the technical field of integrated circuits, in particular to a method for predicting and improving yield of a semiconductor factory based on big data.

Background

In integrated circuit manufacturing, yield is directly related to the benefit of the production unit. In the process, a lot of test data including online Inline data (mainly, error detection and classification (Fault Detection and Classification, FDC) data), defect (defect) data, wafer acceptance test (Wafer Acceptance Test, WAT) data, and wafer probe test (CP) data may occur. The WAT data is mainly used for testing the electrical performance of the wafer to monitor the stability of process fluctuation occurring in production overload, and the wafer is subjected to the CP test after passing the WAT test. WAT data, however, typically contains tens to hundreds of test variables including voltage, resistance, capacitance, etc. of transistors, and the amount of data is very large. The conventional analysis method adopts a method of T test value, analysis of variance and average value comparison to determine whether the test variables have problems, however, the methods have certain limitations, firstly, the methods cannot be related to the wafer yield, secondly, the dimension of the data is very high, the analysis method needs engineers to perform manual analysis, time and labor are wasted, thirdly, if the process parameters change, the analysis method also needs to change, if the analysis is also based on the analysis, the waste of production resources can be caused, meanwhile, the conventional WAT test data analysis method is difficult to be related to the wafer yield, the analysis is time and labor wasted due to the excessively high dimension of the data, and even errors can occur in the analysis result.

Disclosure of Invention

Aiming at the problems and technical requirements existing in the prior art, the invention aims to provide a method for predicting and improving the yield of a semiconductor factory based on big data, which is used for preprocessing the unbalance of WAT data and CP data in the collected semiconductor factory, simultaneously carrying out dimension reduction on the high-dimensional data, enhancing the robustness and the interpretability of a model, and finally adopting a machine learning model to correlate the big data with the wafer yield, and analyzing test factors which lead to the reduction of the yield.

In order to achieve the above purpose, the invention is realized by adopting the following technical scheme:

a method for predicting and improving yield of a semiconductor factory based on big data comprises the following steps:

step 1: collecting wafer acceptability test data and wafer probe test (CP) data in a semiconductor factory, forming a data set as raw data and storing the raw data in a storage system;

step 2: performing data preprocessing on the collected wafer acceptability test data and wafer probe test data, wherein the preprocessing comprises outlier processing, missing value processing and data normalization;

step 3: combining wafer acceptance test (Wafer Acceptance Test, WAT) data and wafer probe test data, and dividing a data set into a training set, a test set and a verification set;

step 4: sample enhancement is carried out on a training set in the wafer acceptance test data, and a balance sample set is generated;

step 5: performing dimension reduction treatment on the generated balance sample set, and screening out a parameter set with the best predicted performance effect in a characteristic variable screening mode;

step 6: carrying out wafer yield prediction modeling on the processed wafer acceptance test data and the processed wafer probe test data;

step 7: calculating the feature importance of each feature in the wafer yield prediction model, outputting the model importance of each feature, and sequencing from large to small;

step 8: and calculating the SHAP values of the features, and outputting the SHAP values and the analysis chart of each feature.

Step 9: and (5) carrying out root cause analysis of yield loss according to the size ordering of the model importance of the features and the SHAP value and the analysis chart.

The wafer acceptability test data collected in the step 1 includes data of resistance, capacitance and inductance of the transistor, threshold voltage, saturation current, subthreshold current and capacitance, resistance and inductance of the metal interconnection layer.

The probe test data collected in the step 1 include electrical test data of each bare chip in the wafer and yield data of the wafer.

The pretreatment in the step 2 comprises the following steps:

step 21: performing outlier processing on the wafer acceptability test data and the wafer probe test data to remove abnormal conditions in the test process;

step 22: carrying out missing value processing on the wafer acceptability test data and the wafer probe test data after the abnormal value processing, and removing the condition that errors are not stored in the test process;

step 23: and carrying out data normalization on the wafer acceptability test data and the wafer probe test data after the missing value processing, and removing the situation of inconsistent dimension of the test parameters in the test process.

The abnormal value processing method adopts a box diagram method, a Z-score method and a mean square error analysis method.

The missing value processing adopts the methods of missing value removal, interpolation and mean filling.

The data normalization adopts the methods of Min-Max normalization, mean variance normalization and batch normalization.

The sample enhancement in step 4 was performed by anti-formative networking (Generative Adversarial Network, GAN).

The feature screening method in the step 5 adopts an algorithm combining a Borata algorithm and a genetic algorithm.

And in the step 6, the wafer yield is predicted and modeled by adopting a Catboost model, a random forest model, a decision tree model, an XGboost model and a support vector machine model.

Compared with the prior art, the invention has the beneficial effects that:

according to the method for predicting and improving the yield of the semiconductor factory based on the big data, the quality of the data is improved through the effective data enhancement means and the data dimension reduction method for the WAT data and the CP data in the collected semiconductor factory, the robustness and the interpretability of the model are improved, meanwhile, the analysis difficulty is reduced, the analysis time is shortened, the reliable guarantee is provided for the subsequent analysis, meanwhile, the model establishment and the analysis are carried out on the WAT data and the CP data after the processing, the root cause analysis of the yield is carried out according to the result of the model, the root cause analysis efficiency can be greatly improved, the economic benefit of a manufacturing factory is improved, and the method has comprehensiveness, the prediction accuracy, the analysis rapidity and the reliability on the processing process of the WAT data and the CP data.

The foregoing description is only an overview of the present invention, and in order that the present invention may be more clearly understood by reference to the following description, the present invention will be described in more detail with reference to the accompanying drawings.

The above and other objects, features and advantages of the present invention will become more apparent to those skilled in the art from the following detailed description of the specific embodiments of the present invention taken in conjunction with the accompanying drawings, which are not to be construed as limiting the invention.

Drawings

FIG. 1 is a frame diagram of the present invention

FIG. 2 is a sample enhanced frame diagram of the present invention

Detailed Description

The present invention will be described more fully hereinafter in order to facilitate an understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The following provides a detailed description of embodiments of the present invention.

As shown in FIG. 1, a method for predicting and improving yield of a semiconductor factory based on big data comprises the following steps:

The wafer acceptability test data collected in step 1 includes data of resistance, capacitance, inductance of the transistor, threshold voltage, saturation current, subthreshold current, and capacitance, resistance, and inductance of the metal interconnect layer.

The probe test data collected in step 1 includes electrical test data of each die in the wafer and yield data of the wafer.

The wafer acceptability test typically includes tens to hundreds of tests, and the probe test data typically includes tens of test classes, each of which includes a different electrical functional test.

The pretreatment in step 2 comprises the following steps:

step 21: performing abnormal value processing on the wafer acceptability test data and the wafer probe test data to remove abnormal conditions existing in the test process, such as poor contact or test value preservation errors existing in the contact of the probe and the wafer during the test;

The abnormal value is processed by using a box diagram method, a Z-score method, a mean square error analysis method and the like.

The missing value processing adopts the methods of missing value removal, interpolation, mean filling and the like.

The data normalization adopts the methods of Min-Max normalization, mean variance normalization, batch normalization and the like.

Data merging in step 3, the ratio of training set, test set and validation set is set to 7:2:1.

the sample enhancement in step 4 was performed by anti-formative networks (Generative Adversarial Network, GAN).

In step 4, the distribution of the wafer yield is generally different according to different development periods of the semiconductor factory on the product, in the development period, the number of high-yield wafers is generally far less than that of low-yield wafers, in the mature mass production period, the number of high-yield wafers is far more than that of low-yield wafers, and for these different yield distributions, a data enhancement technology is required to enhance the reliability of the subsequent machine learning or deep learning model.

As shown in fig. 2, the sample enhancement method inputs a few types of samples in the samples into the model for learning, and generates the few types of samples so that the few types of samples are consistent with the number of the majority types of samples. Typically, the anti-formative network comprises a generator for generating minority class samples and a discriminator for discriminating the generated minority class samples from the original samples. An equilibrium dataset is achieved through an antagonistic training.

The steps of anti-networking are as follows: a noise is firstly applied to the minority class samples and is input into the generator, then the discriminator is trained to distinguish the generated minority class samples from the original samples, a proper loss function is set until the discriminator cannot distinguish the generated minority class samples from the original samples, and finally a balance data set is generated.

In the feature screening method in the step 5, the adopted method is an algorithm combining a Borata algorithm and a genetic algorithm, a group of feature variables which are most relevant to dependent variables in the feature variables are screened out through the Borata algorithm, and then the genetic algorithm is adopted to screen out a group of feature variables with the best effect on the model, wherein the feature variables comprise various parameters such as resistance, capacitance and inductance of a transistor.

In the step 6, the wafer yield is predicted and modeled by adopting a Catboost model, a random forest model, a decision tree model, an XGboost model and a support vector machine model.

Wafer yield prediction modeling to predict wafer yield, failure test terms, failure categories, etc.

The technical features of the above examples may be arbitrarily combined, and all possible combinations of the technical features in the above examples are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent embodiments of the invention, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that several variations and insubstantial modifications could be made by those skilled in the art without departing from the spirit of the invention, which would still fall within the scope of the invention. Accordingly, the scope of the invention should be assessed as that of the appended claims.

Claims

1. The method for predicting and improving the yield of the semiconductor factory based on the big data is characterized by comprising the following steps of:

2. The method of claim 1, wherein the wafer acceptability test data collected in step 1 includes data of resistance, capacitance, inductance, threshold voltage, saturation current, subthreshold current, and capacitance, resistance, and inductance of the metal interconnect layer.

3. The method of claim 1, wherein the probe test data collected in step 1 includes electrical test data of each die in the wafer and yield data of the wafer.

4. The method for predicting and improving yield of semiconductor factories based on big data according to claim 1, wherein the preprocessing in step 2 comprises the following steps:

5. The method for predicting and improving yield of semiconductor factories based on big data according to claim 4, wherein the abnormal value is processed by a box-line graph method, a Z-score method and a mean square error analysis method.

6. The method for predicting and improving yield of semiconductor manufacturing plant based on big data as claimed in claim 4, wherein the missing value processing adopts a method of removing missing value, interpolation method and mean filling method.

7. The method for predicting and improving yield of semiconductor plants based on big data as claimed in claim 4, wherein the data normalization is Min-Max normalization, mean variance normalization and batch normalization.

8. The method for predicting and improving yield of semiconductor manufacturing process based on big data as recited in claim 1, wherein the sample enhancement in the step 4 is anti-growth network (Generative Adversarial Network, GAN).

9. The method for predicting and improving yield of semiconductor factories based on big data according to claim 1, wherein the feature screening method in the step 5 is an algorithm combining a Boruta algorithm with a genetic algorithm.

10. The method for predicting and improving the yield of the semiconductor factory based on big data according to claim 1, wherein the wafer yield prediction modeling in the step 6 is a Catboost model, a random forest model, a decision tree model, an XGboost model and a support vector machine model.