CN112257337A

CN112257337A - Prediction method for removal rate of wafer CMP (chemical mechanical polishing) material of GMDH (Gaussian mixture distribution) neural network

Info

Publication number: CN112257337A
Application number: CN202011094499.9A
Authority: CN
Inventors: 贾花; 宋万清
Original assignee: Shanghai University of Engineering Science
Current assignee: Shanghai University of Engineering Science
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-22
Anticipated expiration: 2040-10-14
Also published as: CN112257337B

Abstract

The invention relates to a method for predicting the removal rate of a wafer CMP material of a GMDH neural network, which comprises the following steps: (1) acquiring a polishing sample data set after removing the abnormal value; (2) analyzing the samples in the polishing sample data set to determine b effective process variables; (3) extracting the mean value, standard deviation, skewness and kurtosis of each effective process variable to obtain 4 × b characteristic vectors; (4) screening the correlation between the 4 × b characteristic vectors and the corresponding MRR values, and determining m characteristic vectors as input characteristic vectors of the GMDH neural network model; (5) carrying out normalization processing on a data set formed by the m feature vectors to obtain a training feature set; (6) obtaining a trained GMDH network model by adopting a binary quadratic Volterra polynomial regression model and taking input characteristic values in a training characteristic set as input layers and correspondingly output MRR values as output layers; (7) and inputting the m characteristic values serving as input in the sample to be predicted into the trained GMDH network model, and outputting the predicted MRR value.

Description

Prediction method for removal rate of wafer CMP (chemical mechanical polishing) material of GMDH (Gaussian mixture distribution) neural network

Technical Field

The invention belongs to the technical field of semiconductor material prediction methods, and relates to a wafer CMP material removal rate prediction method of a GMDH neural network.

Background

Chemical Mechanical Polishing (CMP) is a mainstream process downstream of the end of wafer fabrication; the purpose of this process is to overcome the problem of wafer multilayer metallization. CMP planarizes the wafer surface by passivating and etching the wafer material with a slurry chemistry, i.e., the wafer is pressed downward to slide its surface over the slurry particles. Wafer CMP processes are very complex and involve a variety of chemical and mechanical phenomena, such as surface dynamics, electrochemical interfaces, contact mechanics, stress mechanics, fluid dynamics, and tribochemistry.

In the CMP process of a wafer, MRR is an important index (MRR value, i.e., material removal rate) for measuring the performance in the process. However, the quality of all wafers is controlled based on measurements of process variables, requiring expensive metrology tools and production cycles; and stringent experimental work procedures are required in the laboratory. The influencing factors influencing the MRR are mainly as follows: the down pressure, the rotation speed and temperature of the polishing disk and the polishing head, the type and flow rate of the polishing solution and the like. Currently, to simulate the physical mechanism of CMP, several models have been proposed: a typical model is the Preston equation, which describes MRR as a function of pressure and relative velocity V between the pad and the wafer, and the prediction accuracy of the physical model is the mean square error MSE of 870.25. There are also improvements based on the aforementioned Preston equation, such as increasing the slurry flow rate, contact stress, and chemical reaction rate into the original Preston equation; only the Depth Belief Network (DBN) of the general process parameters is considered, and the prediction precision of the test set of the model is that the Mean Square Error (MSE) is 7.29. Research efforts have been focused on the development of both physics-based and model-based predictive modeling techniques for CMP.

The wafer MRR is affected by many factors, including the rotational speed, temperature, and type of slurry of the polishing platen and polishing head, in addition to the factors involved in the above model. The above method is limited to only considering general process parameters. The above-mentioned general process optimum parameters have been basically determined by research and experimental studies based on the influence factors of the existing CMP process MRR: such as pressure, rotational speed, and slurry flow rate, are all tightly controlled. However, in actual polishing, wear (i.e., consumption) of worn components such as a polishing pad and a dresser may also have an irreversible effect on the MRR value over time. The existing method only considers few general process variables and neglects the critical influence defect of important consumption variables on MRR value.

In a CMP process system, a large number of process variables are collected for process control purposes. Thus, the input dimensions of a typical modeling approach are very high, sometimes involving hundreds of input variables, where the presence of redundant features can significantly impact the performance of the virtual metrology model.

Therefore, the research of a model which can introduce important consumption variables, simplify feature selection and model selection and has high measurement accuracy is of great significance.

Disclosure of Invention

In order to solve the problem that the removal rate of materials in the CMP technology cannot be accurately obtained in the prior art, the invention provides a prediction method of the removal rate of the CMP materials of a wafer of a GMDH neural network, and a complex system modeling method based on the combination of physical knowledge and statistics; the self-adaptive model method of the GMDH neural network is adopted, the MRR value is accurately predicted, the MRR value is guaranteed to be within a normal range, so that the removal rate precision is improved, for example, under the wafer rough polishing mode and the wafer fine polishing mode, the MRR value range is respectively controlled to be 140-170 nm/min and 50-110 nm/min, if the predicted value does not conform to the range, the process parameters are timely adjusted, and for example, worn materials such as a trimmer, a polishing pad and the like are timely replaced. Accurate prediction of the model provides a decision analysis basis for evaluating the performance health state of each component of the CMP.

In order to achieve the purpose, the invention adopts the following scheme:

a wafer CMP material removal rate prediction method of a GMDH neural network comprises the following steps:

(1) acquiring a polishing sample data set after removing the abnormal value; where the number of samples is n, each sample contains a process variables and corresponding MRR values (i.e., material removal rates);

(2) performing statistical analysis on main process variables generated by polishing a plurality of wafers in a polishing sample data set, comprehensively considering the distribution uniformity and distribution range of each variable, and determining b effective process variables which are uniformly distributed and have wide ranges; wherein b < a;

(3) extracting the mean value, standard deviation, skewness and kurtosis of each effective process variable to obtain 4 × b characteristic vectors;

(4) screening the 4 x b characteristic vectors and corresponding MRR values by adopting a regression correlation analysis method, and determining m characteristic vectors as input characteristic vectors of the GMDH neural network model after setting a correlation coefficient threshold; wherein m < (4 × b);

determining the input dimension of the GMDH neural network model as m, and determining y as the corresponding output MRR value (the input feature dimension m is in direct proportion to the total number n of samples, and the specific data is determined to improve the training precision, the more features are, the higher precision is, but the precision tends to be stable along with the increase of the feature dimension, and even can be reduced);

(5) carrying out normalization processing on a data set A formed by m feature vectors to obtain a training feature set A' and a testing feature set, wherein the sample size of the training feature set is n₁The sample size of the test feature set is n₂，n＝n₁+n₂. The normalization process is intended to reduce the influence of the inconsistency of the unit dimensions of the input data on the prediction accuracy.

(6) Adopting a binary quadratic Volterra polynomial regression model, and taking the sample size as n₁The m feature vectors in the training feature set A' are input layers, the corresponding MRR values in the training feature set are output layers, and a GMDH neural network model is trained; namely:

wherein the content of the first and second substances,

for the b-th feature vector, x, in the training feature set A_a′_,bIs the corresponding eigenvalue, y, in the b-th eigenvector in the a-th sample in the training characteristic set A_aIs the actual MRR value, n, corresponding to the a-th sample in the training feature set A₁For training the sample size, a ∈ {1,2, …, n₁},b∈{1,2,…,m}；

(7) Inputting m characteristic values serving as input in a test characteristic set (a sample set to be tested) into a trained GMDH network model, and outputting a predicted MRR value;

(8) and comparing the predicted MRR value with the MRR values corresponding to the input m characteristic values in the test characteristic set to obtain the accuracy of the model prediction.

Under the rough polishing working mode, the accuracy of a prediction result is as follows: mean square error MSE is 3.95, mean square error RMSE is 1.99;

under the fine polishing working mode, the accuracy of a prediction result is as follows: the mean square error MSE is 9.82 and the mean square error RMSE is 3.31.

As a preferred technical scheme:

according to the prediction method for the removal rate of the CMP material of the wafer of the GMDH neural network, in the step (1), a polishing sample data set is acquired through a sensor on CMP equipment, and when the MRR value is 140-170 nm/min, the polishing sample data set refers to a rough polishing sample data set; when the MRR value is 50-110 nm/min, the polishing sample data set is a fine polishing sample data set;

in the step (1), a is 25, and the process variables include chamber pressure, main and external pressure, center pressure, retaining ring pressure, ripple pressure, edge pressure, dresser rotation speed, wafer rotation speed, polishing table rotation speed, a-type slurry flow speed, B-type slurry flow speed, C-type slurry flow speed, consumption of a polishing table backing film, consumption of a polishing pad, consumption of a wafer carrier flexible plate, consumption of a partition film, consumption of a dresser table, dressing liquid state, a chamber for wafer processing, a process treatment stage, a wafer identifier, a time cut, a wafer ring position identifier and a polishing machine identifier. The first 18 main process variables were selected for statistical distribution analysis. Other identifier variables have less impact on the target output and therefore are not analyzed as predictors.

In the step (1), the method for removing the abnormal value comprises the following steps: and detecting an abnormal value by using Grubbs to improve the prediction precision, wherein the abnormal value is generated by the measurement failure of a sensor and the occurrence of random errors of process parameters and is a maximum value or a minimum value.

In the method for predicting the wafer CMP material removal rate of the GMDH neural network, in the step (2), the effective process variable is a variable with a statistical width range of 0.12 to 11 (the width is a difference between a maximum value and a minimum value of the variable). Narrow distribution of variables does not increase the accuracy of MRR values because the data of rotational speed, pressure and flow rate variables are distributed in a narrow range throughout the CMP process, and the pressure, rotational speed and slurry flow rate are all tightly controlled in actual wafer CMP. These process parameters are essentially fixed values to ensure wafer material removal accuracy and removal uniformity. From a physics perspective, pressure and rotational speed are key factors that affect MRR, but from a computational perspective, the inclusion of pressure and rotational speed in the GMDH neural network model does not increase MRR prediction accuracy.

When the polishing sample data set refers to a rough polishing sample data set, the effective process variables are as follows: backing film consumption, polishing pad consumption, zoning film consumption and flexible board consumption; other invalid variables are distributed more discretely and basically have a constant value; these variables therefore cannot be used as predictors for the model;

or, when the polishing sample data set refers to a fine polishing sample data set, the effective process variables are as follows: backing film consumption, polishing pad consumption, and zoned film consumption.

In the method for predicting the material removal rate of the wafer CMP of the GMDH neural network, in step (4), when the polishing sample data set is the rough polishing sample data set, m is 8, and the input process variable features corresponding to the input feature vectors are respectively: a mean value of backing film consumption, a warp of backing film consumption, a mean value of polishing pad consumption, a standard deviation of polishing pad consumption, a warp of polishing pad consumption, a mean value of zoning film consumption, a warp of zoning film consumption and a mean value of flexible sheet consumption;

or, when the polishing sample data set is the fine polishing sample data set, where m is 8, and the input process variable features corresponding to the input feature vectors are respectively: the waviness of the consumption of the backing film, the mean of the consumption of the polishing pad, the standard deviation of the consumption of the polishing pad, the waviness of the consumption of the polishing pad, the kurtosis of the consumption of the polishing pad, the mean of the consumption of the partition film, the standard deviation of the consumption of the partition film, and the waviness of the consumption of the partition film.

The manner of obtaining the characteristic values is well known.

The method for predicting the wafer CMP material removal rate of the GMDH neural network as described above, in step (5),

the data set formed by the m feature vectors is a, as follows:

wherein (x)_1,b,x_2,b,…,x_a,b,…,x_n,b)^TFor the b-th feature vector, x, in data set A_a,bFor the corresponding eigenvalue in the b-th eigenvector in the a-th sample in the dataset a, m ═ 8, a ∈ {1,2, …, n }, and b ∈ {1,2, …, m };

the normalization processing means that feature vectors in the data set A are normalized one by one to obtain a normalized data set, wherein the feature vectors refer to feature vectors (x)_1,b,x_2,b,…,x_a,b,…,x_n,b)^TA is equal to {1,2, …, n }, b is equal to {1,2, …, m }, and the normalized calculation formula is as follows:

wherein x is_normalizedFor normalized eigenvalue, x, in the b-th eigenvector_actualIs the b characteristicEigenvalues in vectors, x_maxIs the largest eigenvalue, x, in the b-th eigenvector_minB ∈ {1,2, …, m } which is the smallest eigenvalue in the b-th eigenvector;

in the normalized dataset, random selection

Taking each sample as a training feature set A', and recording as:

wherein the content of the first and second substances,

is the b-th feature vector, x 'in the training feature set A'_a,bIs the corresponding eigenvalue, n, in the b-th eigenvector in the a-th sample in the training characteristic set A₁For training sample size, m is 8, a ∈ {1,2, …, n₁},b∈{1,2,…,m}。

In the method for predicting the wafer CMP material removal rate of the GMDH neural network, in the step (6), the step of training the GMDH neural network model specifically includes:

(61) establishing a 1 st hidden layer:

(611) arbitrarily taking two feature vectors X from 8 feature vectors in the training feature set A_iAnd X_jCreating G₁Second order polynomial equation as the basic neuron, i.e. the total number of the 1 st hidden layer basic neurons

Wherein m is 8, and P is the threshold value of the maximum neuron total number in each hidden layer;

(612) respectively calculating and obtaining models corresponding to all basic neurons in the 1 st hidden layer according to Volterra quadratic polynomial regression

Wherein the content of the first and second substances,

is the target output vector predictor of the quadratic polynomial equation,

refers to a target output predicted value, x 'of the a sample in the r basic neuron model in the 1 st hidden layer'_a,jMeans that the jth input characteristic value of the ath sample is selected from the input and is used as an element connected to form the 1 st hidden layer and the r th basic neuron model, and a is e {1,2, …, n₁},b∈{1,2,…,m}；

{w₀,w₂,w₃,w₄,w₅Is the coefficient of a quadratic polynomial equation

And

the minimum difference is used as a target and is obtained by calculation by adopting a least square method; wherein the content of the first and second substances,

for training n in the feature set A₁A vector of actual MRR values for each training sample;

(613) respectively calculating the output root mean square error of each neuron in the 1 st hidden layer

A value; namely, it is

Wherein the content of the first and second substances,

for the target output predicted value of the a sample in the r basic neuron model in the 1 st hidden layer,

outputting a real value for a target of an a-th sample in an r-th basic neuron model in a 1 st hidden layer;

(614) sequencing all neurons in a hidden layer 1 from small to large in output root mean square error, and taking P neurons sequenced at the front as effective neurons to form a hidden layer 1;

(615) taking each output of P neurons in the 1 st hidden layer as an input feature vector of the 2 nd hidden layer; the 2 nd hidden layer forms G₂A basic neuron, and

G₂>p, repeating the steps (611) to (614) to obtain a hidden layer 2 containing P neurons, wherein the hidden layer 1 is participated in combination and connection to form a hidden layer 2 with U₁A number of effective neurons;

(616) computingU participating in combination and connection in the 1 st hidden layer to form the 2 nd hidden layer₁Output of an effective neuron

Mean value of₁I.e. by

(62) Establishing a middle hidden layer: repeating the steps (611) to (616) by taking each output of P neurons in the k-1 hidden layer (equivalent to the 1 st hidden layer) as an input feature vector of the k hidden layer (equivalent to the 2 nd hidden layer) to obtain an intermediate hidden layer;

wherein the total number of basic neurons in the kth hidden layer

k≥2；

U participating in combination and connection in the k-1 hidden layer to form the k hidden layer_kAn effective neuron output

Mean value of_kI.e. by

(63) Establishing an output layer: when E is_k-1-E_kLess than or equal to 0.3 (i.e. the k hidden layer E)_kNot when there is a significant decrease with increasing number of hidden layers), training stops; and 2 output RMSE small neurons in the k hidden layer are used as new input vectors and target output MRR vectors corresponding to the new input vectors to construct a quadratic polynomial equation, and the output of the equation is used as the final output prediction value of the GMDH neural network model.

The method for predicting the wafer CMP material removal rate of the GMDH neural network as described above, where P is 12.

Advantageous effects

(1) According to the wafer CMP material removal rate virtual prediction method of the GMDH neural network, the advantages of single-cycle prediction capability and short operation time of the GMDH network are utilized, and the optimal feature set is self-organized and selected to establish a wafer MRR prediction model. The combination of comprehensive consideration of physics knowledge and statistics overcomes the defects that the traditional prediction method only considers few variables and neglects the influence of important consumption variables on MRR. The problem that the grinding removal rate in the wafer CMP process cannot be rapidly and accurately obtained is solved.

(2) According to the virtual prediction method for the removal rate of the CMP material of the wafer of the GMDH neural network, error sources such as drift in the polishing process and difference among wafer products in different batches are comprehensively considered, and in order to improve the accuracy of a prediction model, two modes of rough polishing and fine polishing of the wafer are respectively modeled.

(3) According to the virtual prediction method for the removal rate of the CMP material of the wafer of the GMDH neural network, the average value of consumption variables of a polishing pad, a backing film and a flexible plate is selected as a most effective prediction factor from a rough polishing training network structure. The most effective variable widely used in the input layer polynomial equation at this time is the polishing pad consumption average.

(4) According to the virtual prediction method for the removal rate of the wafer CMP material of the GMDH neural network, the mean value, the standard deviation and the skewness of the consumption of the polishing pad, the skewness of the backing film and the skewness of the consumption variable of the partition film are selected as the most effective prediction factors from the structure of fine polishing training, and the most effective variable widely used in the polynomial equation of the input layer is the standard deviation of the consumption of the polishing pad.

(5) According to the virtual prediction method for the removal rate of the CMP material of the wafer of the GMDH neural network, the accurate prediction of the MRR value is rapidly obtained through the network model, and the influence of the abrasion consumption of the polishing pad on the target output MRR is the largest no matter in a fine polishing mode or a rough polishing mode, so that if the predicted value is not in the range of the MRR value, the process parameters are timely adjusted, for example, the new polishing pad, a trimmer and other worn materials are timely replaced, and the polishing pad is particularly used as a key maintenance component of the CMP process of the wafer. According to 2016PHM data and the result of experimental prediction error evaluation index calculation, the MRR prediction method provided by the method is proved to be more excellent than the prediction effect of a physical model and a traditional neural network. The method is suitable for nonlinear complex CMP process modeling.

Drawings

FIGS. 1 and 2 are flow charts of wafer removal rate prediction according to the present invention;

FIG. 3(a) shows the detection results of 4 MRR abnormal values according to the present invention;

FIG. 3(b) is a graph of MRR value distribution for two modes of operation in a CMP process of the present invention;

FIG. 4 is a schematic diagram of a GMDH network structure trained during a wafer rough polishing stage according to the present invention;

FIGS. 5 and 6 are predicted results of the rough polishing stage and the fine polishing stage, respectively, of the present invention;

FIG. 7 is a graph of statistical analysis of process variables in the rough polishing mode according to the present invention.

Detailed Description

The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

A method for predicting a wafer CMP material removal rate of a GMDH neural network, the flow diagram of which is shown in FIGS. 1-2, comprises the following steps:

(1) acquiring a polishing sample data set after removing the abnormal value; where the number of samples is n, each sample contains 25 process variables and corresponding MRR values (i.e., material removal rates);

the process variables include chamber pressure, main and external pressure, center pressure, retaining ring pressure, ripple pressure, edge pressure, dresser rotational speed, wafer rotational speed, polishing table rotational speed, type a slurry flow rate, type B slurry flow rate, type C slurry flow rate, consumption of polishing table backing film, consumption of polishing pad, consumption of wafer carrier flex, consumption of zoning film, consumption of dresser table, dressing solution state, chamber for wafer processing, process treatment stage, wafer identifier, time cut, wafer ring position identifier, and polishing machine identifier. The first 18 main process variables were selected for statistical distribution analysis. Other identifier variables have less impact on the target output and therefore are not analyzed as predictors.

The method for removing the abnormal value comprises the following steps: and detecting an abnormal value by using Grubbs to improve the prediction precision, wherein the abnormal value is generated by the measurement failure of a sensor and the occurrence of random errors of process parameters and is a maximum value or a minimum value.

(2) Performing statistical analysis on main process variables generated by polishing a plurality of wafers in the polishing sample data set, and determining b effective process variables with statistical width ranges of 0.12-11 (the width is the difference between the maximum value and the minimum value in the variables); wherein b < a;

(4) screening the 4 x b characteristic vectors and corresponding MRR values by adopting a regression correlation analysis method, and determining m characteristic vectors as input characteristic vectors of the GMDH neural network model after setting a correlation coefficient threshold; wherein m is 8, m < (4 × b);

(5) carrying out normalization processing on a data set A formed by m feature vectors to obtain a training feature set A' and a testing feature set, wherein the sample size of the training feature set is n₁The sample size of the test feature set is n₂，n＝n₁+n₂(ii) a The specific process is as follows:

the data set formed by the m feature vectors is a, as follows:

wherein x is_normalizedFor normalized eigenvalue, x, in the b-th eigenvector_actualIs the eigenvalue, x, in the b-th eigenvector_maxIs the largest eigenvalue, x, in the b-th eigenvector_minB ∈ {1,2, …, m } which is the smallest eigenvalue in the b-th eigenvector;

in the normalized dataset, random selection

Taking each sample as a training feature set A', and recording as:

wherein the content of the first and second substances,

is the b-th feature vector, x 'in the training feature set A'_a,bIs the corresponding eigenvalue, n, in the b-th eigenvector in the a-th sample in the training characteristic set A₁For training sample size, m is 8, a ∈ {1,2, …, n₁},b∈{1,2,…,m}；

wherein the content of the first and second substances,

is the b-th feature vector, x 'in the training feature set A'_a,bIs the corresponding eigenvalue, y, in the b-th eigenvector in the a-th sample in the training characteristic set A_aIs the actual MRR value, n, corresponding to the a-th sample in the training feature set A₁For training the sample size, a ∈ {1,2, …, n₁},b∈{1,2,…,m}；

The specific steps for training the GMDH neural network model are as follows:

(61) establishing a 1 st hidden layer:

Wherein m is 8, and P is the threshold value of the maximum neuron total number in each hidden layer; p ═ 12;

Wherein the content of the first and second substances,

is the target output vector predictor of the quadratic polynomial equation,

the target output predicted value of the a sample in the r basic neuron model in the 1 st hidden layer is referred to, r is the serial number of the basic neuron in the 1 st hidden layer, x'_a,jMeans that the jth input characteristic value of the ath sample is selected from the input and is used as an element connected to form the 1 st hidden layer and the r th basic neuron model, and a is e {1,2, …, n₁},b∈{1,2,…,m}；P＝12。

{w₀,w₂,w₃,w₄,w₅Is the coefficient of a quadratic polynomial equation

And

A value; namely, it is

Wherein the content of the first and second substances,

outputting a real value for a target of an a-th sample in an r-th basic neuron model in a 1 st hidden layer, wherein r is a serial number of a basic neuron in the 1 st hidden layer;

(615) taking each output of P neurons in the hidden layer 1 as an input feature vector of the hidden layer 2, wherein P is 12; the 2 nd hidden layer forms G₂A basic neuron, and

G₂>and P, repeating the steps (611) to (614) to obtain a hidden layer 2 containing P neurons, wherein P is 12, U is contained in the hidden layer 1, which is combined and connected to form the hidden layer 2₁A number of effective neurons;

(616) calculating U participating in combination and connection in the 1 st hidden layer to form the 2 nd hidden layer₁Output of an effective neuron

Mean value of₁I.e. by

Wherein r is U in the 1 st hidden layer₁The number of each valid neuron;

(62) establishing a middle hidden layer: outputting each item of P neurons in the (k-1) th hidden layer as an input feature vector of the (k) th hidden layer, and repeating the steps (611) to (616) to obtain a middle hidden layer;

wherein the total number of basic neurons in the kth hidden layer

Mean value of_kI.e. by

Predicting a rough polishing sample data set (the MRR value distribution graph is shown in figure 3 (b)) with the MRR value of 140-170 nm/min acquired by a sensor on CMP equipment by adopting the method for predicting the removal rate of the CMP material of the wafer of the GMDH neural network, wherein the data is from 2016PHM challenge race, Grubbs is adopted to detect abnormal values, 4 MRR values are found to be far greater than the abnormal values of 170nm/min (the abnormal value detection result is shown in figure 3 (a)), and the rough polishing sample data set with the abnormal values removed is acquired; the number of samples of the rough polishing sample data set is n, which is 102, and each sample contains 25 process variables and corresponding MRR values (i.e., material removal rates);

in the rough polishing sample data set after removing the abnormal values, randomly selecting five wafer polishing samples to perform process variable statistical analysis (as shown in fig. 7), and determining 4 effective process variables (variables with data width of 0.12-11, namely backing film consumption, polishing pad consumption, partition film consumption and flexible board consumption); wherein, the distribution range of the consumption of the backing film is 10.83, the distribution range of the consumption of the polishing pad is 9.63, the distribution range of the consumption of the partition film is 3.25, and the distribution range of the consumption of the flexible board is 0.12; other invalid variables are distributed more discretely and basically have a constant value; these variables therefore cannot be used as predictors for the model;

extracting the mean value, standard deviation, skewness and kurtosis of each effective process variable in the rough polishing sample data set to obtain 16 feature vectors;

screening 16 characteristic vectors and corresponding MRR values in the rough polishing sample data set by adopting a regression correlation analysis method, and setting a correlation coefficient threshold (the value is 0.65), so that 8 characteristic vectors with strong correlation can be determined to be used as input characteristic vectors of the GMDH neural network model, see table 1, as follows:

TABLE 1

The schematic diagram of the trained GMDH network structure is shown in FIG. 4, and the input vector which shows that the network is most widely used from the input layer is_X3(i.e., the average pad consumption in Table 1), is the most effective predictor of MRR. In the 1 st hidden layer, the first hidden layer,

the selected neurons are eliminated, and the output RMSE of the next layer of new neurons formed by combining the neurons according to the model evaluation criterion is larger, which indicates that the correlation between the neurons and the output MRR is weak and the neurons do not participate in the network connection of the next layer. Then

Are the first hidden layer of active neurons. When the 4 th hidden layer is constructed, the mean value of the RMSE output by all the effective neurons does not have obvious descending trend along with the increase of the number of the hidden layers, the training is stopped, and the GMDH neural network topological structure containing 4 hidden layers is obtained, namely the GMDH network model is obtained.

Inputting 8 characteristic values which are taken as input in the corresponding test characteristic set into a trained GMDH network model, and outputting a predicted MRR value;

comparing the predicted MRR value with the MRR values corresponding to the 8 input characteristic values in the test characteristic set to obtain the accuracy of the model prediction, wherein a schematic diagram of the prediction result is shown in FIG. 5, and under a rough polishing working mode, the accuracy of the prediction result is as follows: the mean square error MSE is 3.95 and the mean square error RMSE is 1.99.

By adopting the prediction method for the removal rate of the CMP material of the wafer with the GMDH neural network, a fine polishing sample data set (MRR value distribution diagram is shown in figure 3 (b)) with an MRR value of 50-110 nm/min acquired by a sensor on CMP equipment is predicted, the data is obtained from 2016PHM challenge match data, Grubbs is adopted to detect abnormal values, and the fine polishing sample data set (the abnormal value detection result is shown in figure 3 (a)) with the abnormal values removed is acquired; where the number of samples, n 105, each sample contained 25 process variables and corresponding MRR values (i.e., material removal rates);

analyzing the sample generated by a single wafer in the fine polishing sample data set after the abnormal value is removed, and determining 3 effective process variables (the data width is 0.12-11, namely the consumption of the backing film, the consumption of the polishing pad and the consumption of the partition film);

extracting the mean value, standard deviation, skewness and kurtosis of each effective process variable to obtain 12 feature vectors;

screening 12 feature vectors and corresponding MRR values by adopting a regression correlation analysis method, and setting a correlation coefficient threshold (the value is 0.7), namely determining 8 feature vectors as input feature vectors of the GMDH neural network model, as shown in Table 2, as follows:

TABLE 2

Inputting 8 feature vector samples which are taken as input in the test feature set into a trained GMDH network model, and outputting a predicted MRR value;

comparing the predicted MRR value with the MRR values corresponding to the 8 input characteristic values in the test characteristic set to obtain the accuracy of the model prediction, wherein a prediction result schematic diagram is shown in FIG. 6, and under a fine polishing working mode, the accuracy of the prediction result is as follows: the mean square error MSE is 9.82 and the mean square error RMSE is 3.13.

Table 1 shows the detailed predicted results of the training samples and the test samples under two different working modes

The training model obtained from the training samples in table 1 will also be analyzed for errors from the true values, which is called training error.

The prediction result shows that the MRR predicted value obtained by the GMDH network is in good accordance with the real measured value, when a network topological structure is established, a balance point is found between the fitting precision of the training sample and the prediction precision of the test set, so that the real internal relation (the nonlinear relation between each consumption characteristic and the MRR value) of the system can be reflected to the maximum extent by the algorithm even if the network model is small in sample or has high noise, and the optimality and the generalization of the established model are further ensured. The MRR real-time change of the wafer CMP process can be effectively monitored by the model. Mean Square Error (MSE) and Root Mean Square Error (RMSE) are used as model performance evaluation indicators. The smaller the RMSE, the higher the model prediction accuracy.

Claims

1. A wafer CMP material removal rate prediction method of a GMDH neural network is characterized in that: the method comprises the following steps:

(1) acquiring a polishing sample data set after removing the abnormal value; wherein the number of samples is n, each sample contains a process variables and corresponding MRR values;

(2) performing statistical analysis on main process variables generated by polishing a plurality of wafers in the polishing sample data set to determine b effective process variables; wherein b < a;

(4) screening the 4 x b characteristic vectors and corresponding MRR values by adopting a regression correlation analysis method, and determining m characteristic vectors as input characteristic vectors of the GMDH neural network model after setting a correlation coefficient threshold;

(5) carrying out normalization processing on a data set A formed by m feature vectors to obtain a training feature set A ', wherein the sample size of the training feature set A' is n₁And n is₁<n；

(6) Adopting a binary quadratic Volterra polynomial regression model, and taking the sample size as n₁The m feature vectors in the training feature set A' are input layers, the corresponding MRR values in the training feature set are output layers, and a GMDH neural network model is trained and obtained, namely:

wherein the content of the first and second substances,

(7) And inputting the m characteristic values serving as input in the sample to be tested into the trained GMDH network model, and outputting the predicted MRR value.

2. The method according to claim 1, wherein in the step (1), the polishing sample data set is acquired by a sensor on a CMP device, and when the MRR value is 140-170 nm/min, the polishing sample data set is a rough polishing sample data set; and when the MRR value is 50-110 nm/min, the polishing sample data set is a fine polishing sample data set.

3. The method of claim 2, wherein in step (2), the effective process variable is a variable with a statistical width in the range of 0.12-11.

4. The method of claim 3, wherein when the polishing sample data set is a rough polishing sample data set, the effective process variables are: backing film consumption, polishing pad consumption, zoning film consumption and flexible board consumption;

5. The method as claimed in claim 4, wherein in the step (4), when the polishing sample data set is a rough polishing sample data set, m is 8, and the input process variable characteristics corresponding to the input feature vectors are respectively: a mean value of backing film consumption, a warp of backing film consumption, a mean value of polishing pad consumption, a standard deviation of polishing pad consumption, a warp of polishing pad consumption, a mean value of zoning film consumption, a warp of zoning film consumption and a mean value of flexible sheet consumption;

6. The method of claim 5, wherein in step (5),

the data set formed by the m feature vectors is a, as follows:

in the normalized dataset, random selection

Taking each sample as a training feature set A', and recording as:

wherein the content of the first and second substances,

7. The method of claim 6, wherein the step (6) of training the GMDH neural network model comprises the following steps:

(61) establishing a 1 st hidden layer:

(612) according toRespectively calculating and obtaining models corresponding to all basic neurons in the 1 st hidden layer by Volterra quadratic polynomial regression

Wherein the content of the first and second substances,

is the target output vector predictor of the quadratic polynomial equation,

{w₀,w₂,w₃,w₄,w₅Is asCoefficients of a quadratic polynomial equation

And

A value; namely, it is

Wherein the content of the first and second substances,

(615) is hidden by No. 1Each item of P neurons in the layer is output as an input feature vector of a hidden layer 2; the 2 nd hidden layer forms G₂A basic neuron, and

repeating the steps (611) to (614) to obtain a 2 nd hidden layer containing P neurons, wherein the 1 st hidden layer is participated in combination and connection to form the 2 nd hidden layer with U₁A number of effective neurons;

Mean value of₁I.e. by

wherein the total number of basic neurons in the kth hidden layer

Mean value of_kI.e. by

(63) Establishing an output layer: when E is_k-1-E_kWhen the training time is less than or equal to 0.3, the training is stopped; and 2 output RMSE smaller neurons in the k-th hidden layerAnd constructing a quadratic polynomial equation for the new input vector and the target output MRR vector corresponding to the new input vector, wherein the output of the equation is used as the final output prediction value of the GMDH neural network model.

8. The method of claim 7, wherein P-12 is used for predicting the removal rate of CMP material from wafer by GMDH neural network.