CN110766660A

CN110766660A - Integrated circuit defect image recognition and classification system based on fusion depth learning model

Info

Publication number: CN110766660A
Application number: CN201910910060.XA
Authority: CN
Inventors: 林义征
Original assignee: Shanghai Zhongyi Cloud Computing Technology Co ltd
Current assignee: Shanghai Zhongyi Cloud Computing Technology Co ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-02-07

Abstract

The invention discloses an integrated circuit defect image recognition and classification system based on a fusion deep learning model, which is used for carrying out online automatic recognition and classification on a defect image of a wafer in a mode of using a fusion model based on a deep Convolutional Neural Network (CNN), and detecting the change of the number of various defects of the wafer in time; the core mechanism of the method is a defect image feature extraction method constructed by two deep learning models which are integrated into a learning mechanism, the deep CNN fusion model constructs a Combined3 defect image classification model based on two frameworks of SE _ Incepton _ V4 and SE _ Incepton _ ResNet _ V2, and a sequence model optimization (SMBO) algorithm is used for carrying out super-parameter optimization on the fusion deep CNN recognition model, so that the model recognition precision is improved. The level of automation is increased. The recognition cost is reduced because the AI model replaces the engineer and the work efficiency is greatly improved. Based on the real-time identification and classification result, engineers can count the defect data in time, search the reasons, further adjust the process parameters and improve the yield.

Description

Integrated circuit defect image recognition and classification system based on fusion deep learning model

Technical Field

The invention relates to the field of image recognition and classification systems, in particular to an integrated circuit defect image recognition and classification system based on a fusion deep learning model.

Background

Integrated circuit wafer fabrication, in which circuits and electronic devices (such as transistors, capacitors, logic switches, etc.) are fabricated on a wafer, generally involves a process sequence related to the type of product and the technology used, and generally includes the steps of slicing, grinding, polishing, chemical vapor deposition, photolithography, etching, ion implantation, chemical mechanical polishing, etc. the process sequence includes several steps, such as slicing, grinding, polishing, chemical vapor deposition, photolithography, etching, ion implantation, chemical mechanical polishing, etc., to complete the fabrication and fabrication of circuits and devices on the wafer. In the semiconductor manufacturing process, in order to monitor any abnormal defect characteristics and quickly respond to process problems, an online measurement tool is used for inspection after a certain process step before entering the next production link, the abnormal condition on a crystal grain (dies) is monitored, then a wafer defect image is shot according to the detected abnormal position, and finally, a manual method or a Machine Vision Algorithm (MVA) is used for analyzing the sensor image. Conventional MVA includes geometric transformation methods, semantic segmentation methods, filtering methods, pattern and texture matching methods, advanced morphological methods, and the like. By applying the fusion of the above algorithms, certain features in the sensor microscopic image are enhanced, size and shape parameters of the defect are obtained, and the defect class is identified [1 ].

In recent years, visual systems have made significant improvements in performance by combining some surface defect studies with machine learning algorithms, i.e., extracting key features of images using the traditional MVA method, and then learning the patterns of these key features using machine learning methods to identify defects [1 ]. Di Li and the like [2] combines the MVA technology of binaryzation, edge detection and the like with a principal component analysis algorithm to construct an automatic detection system for five typical defects on the surface of the mobile phone protective glass. Yunwon Park et al [3] apply the filtering method to the defect image of the surface shape of the display panel module, select important features and classify defects by using a Wrapper-based feature selection method and using a random forest as a learning algorithm, thereby effectively solving the problem of ambiguous classification of surface defects. Matthias Demant [4] and the like extract key defect characteristics through a characteristic extraction technology, and combine three models of linear regression, support vector machine regression and elastic network regression to predict the quality of the polycrystalline silicon chip. Kwon et al [5] apply the simple variance distribution value of the pixel intensity of the defect image to the random forest machine learning algorithm to detect the defects of various surface types, thereby effectively reducing the false detection. Naoaki et al [6] combine the image enhancement technology, the feature extraction technology and the machine learning algorithm to construct an automatic defect classification system to classify and identify the wafer defect images, and improve the accuracy and efficiency of defect identification. Chung-Feng Jeffrey Kuo et al [7] combines the traditional MVA technology, the feature extraction technology and the neural network model to construct a high-precision automatic detection and classification system for the defects of the polarizing film. The surface defect detection studies described above all include an image feature extraction section and a defect classification section. However, in image feature modeling, feature extraction algorithms are of great importance to machine learning, the recognition capability of the extracted features determines the reliability of the defect detection system, and the model performance may be limited by how the best features are selected [8 ].

The Convolutional Neural Network (CNN) [9] as a neural network having a deep structure can perform feature extraction and recognition by itself at the same time, greatly reducing the cost of extracting features. CNNs increase the level of abstraction by extracting features from lower levels to higher levels, with lower levels of the network performing detailed feature extraction and higher levels being able to more fully characterize image features in conjunction with lower level features. In the process of CNN network learning, the network can automatically extract the characteristics suitable for classification, thereby improving the performance. In the field of surface defect recognition, deep CNN is often used to solve the deficiencies of existing machine learning. Lavrika et al [10] use a CNN network for classifying defect images, and the results show that the method has a significantly improved recognition and classification rate for defects such as scratches, short circuits, and metal wire cracks compared with the conventional MVA method. Zhou Ying et al [11] proposed using a multichannel convolutional neural network to extract image features of a photovoltaic module battery defect image, and fusing the features, and finally classifying by a Random Forest (RF) classifier, and the results show that the method can accurately identify defects and defect categories of the photovoltaic module battery. Je-Kang Park et al [8] applied CNN network to classification recognition of dirt, scratch, burr and abrasion of surface parts, and experimental results show that CNN has higher performance than PSO-ICA, Gabor-filter, VOV + RF. Yu-Shan Deng et al [12] classify Killer defects and Non-Killer defects based on a deep neural network, and remarkably improve the identification accuracy of the defects. Hui Lin et al [13] present the application of CNN networks in LED chip defect detection. Nakazawa et al [14] propose a CNN-based wafer map defect pattern classification and image retrieval method.

The CNN network structure used in the above studies is relatively simple. In fact, in order to further improve the image recognition rate, deeper and wider CNN networks were proposed since 2014, including VGG [15], google net [16], Resnet [17], SEnet [18], and the like, which have been proved to show more excellent performance in the image recognition field, even exceeding artificial performance.

The existing wafer detection machine has the problems of low defect image category identification rate and low manual identification efficiency. Therefore, the integrated circuit defect image recognition and classification system based on the fusion deep learning model is provided by the technical personnel in the field, so as to solve the problems in the background technology.

Disclosure of Invention

The invention aims to provide an integrated circuit defect image identification and classification system based on a fusion deep learning model, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

a integrated circuit defect image recognition and classification system based on a fusion deep learning model is provided, which carries out online automatic recognition and classification on a defect image of a wafer in a mode of using a fusion model based on a depth Convolution Neural Network (CNN), and detects the change of the quantity of various defects of the wafer in time; the core mechanism of the method is a defect image feature extraction method constructed by two deep learning models which are integrated into a learning mechanism, the deep CNN fusion model constructs a Combined3 defect image classification model based on two frameworks of SE _ Incepton _ V4 and SE _ Incepton _ ResNet _ V2, and a sequence model optimization (SMBO) algorithm is used for carrying out super-parameter optimization on the fusion deep CNN recognition model, so that the model recognition precision is improved.

The method for identifying the image defects by fusing the deep learning comprises the following steps:

the first step is as follows: connecting a semiconductor manufacturing factory database, sorting out database information related to data required by the fusion deep learning image defect identification system, and connecting;

the second step is that: extracting data required by the fusion deep learning image defect identification system from a database, and storing the data in a local database or storing the data as a local data file;

the third step: carrying out image enhancement on the extracted data, including image denoising, image cutting, image rotation, image size transformation and the like;

the fourth step: performing feature extraction on the preprocessed image by using a fusion deep learning model, wherein the fusion deep learning model is optimized by adopting an SMCO optimization algorithm during model training;

the fifth step: reducing the dimension of the features extracted in the fourth step, and inputting the features into a full-connection classification layer;

and a sixth step: classifying the extracted characteristic data at the full connection layer, wherein the full connection layer is optimized by adopting an SMCO optimization algorithm during model training to obtain an optimized classification recognition model/image classification recognition result;

the seventh step: and evaluating the classification recognition result of the image defect recognition system.

Data enhancement is the most common method for reducing overfitting in CNN training, and is used for increasing disturbance to an original image on the premise of not changing the image category so as to expand a data set; only adopting a data enhancement method of cutting, respectively cutting four corners and five positions in the middle, thus the data can be enhanced by 5 times; the image data sets used in training and testing are respectively 2000 and 600 wafer defect images, both of which have the problem of data skew, and the image size is 480 pixels × 480 pixels; for a training data set, carrying out graying, resize and denoising on an image by using an image processing technology, and then cutting the processed image, wherein the data volume is enhanced to 5 times of the original data volume; the operating system is Centos7, the deep learning framework is TensorFlow, the developed software is Python2.7, and the GPU is NVIDIA Tesla P4.

Compared with the prior art, the invention has the beneficial effects that:

1. the level of automation is increased.

2. The work efficiency is improved because the AI model is not fatigued and the human is fatigued, and the AI model has much higher workload per unit time than the human.

3. The recognition cost is reduced because the AI model replaces the engineer and the work efficiency is greatly improved.

4. Based on the real-time identification and classification result, engineers can count the defect data in time, search the reasons, further adjust the process parameters and improve the yield.

Drawings

FIG. 1 is a schematic diagram of an etching process defect in an integrated circuit defect image recognition and classification system based on a fusion deep learning model.

Fig. 2 is a structural diagram of a fused depth CNN image defect recognition system in an integrated circuit defect image recognition classification system based on a fused deep learning model.

Fig. 3 is a schematic structural diagram of a fused deep learning image defect recognition system in an integrated circuit defect image recognition classification system based on a fused deep learning model.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Referring to fig. 1 to 3, in an embodiment of the present invention, an integrated circuit defect image identification and classification system based on a fusion deep learning model is used as a neural network for simulating a process of processing a visual image by a human brain, and the convolutional network is composed of a convolutional layer, a pooling layer, and a full connection layer. The convolution layer is matched with the pooling layer to form a plurality of convolution groups, and the features are extracted layer by layer. The convolutional layer extracts different input features based on a local receptive field concept, and the pooling layer reduces data dimension based on a downsampling idea. CNN has proven to be an effective model for addressing various visual tasks. For each convolutional layer, a set of filters is learned along the input channel to express local spatial connection patterns. Namely, the convolution filter fuses the spatial information and the channel information in the local receptive field. By superimposing a series of nonlinear and downsampled interleaved convolutional layers, the CNN is able to capture a layered pattern with a global receptive field as a powerful image description. Although CNNs have achieved good results in the classification field, they have been rarely studied in the field of industrial applications such as defect detection and surface inspection systems.

Studies have demonstrated that the performance of deep learning networks can be improved by incorporating a learning mechanism into the network that helps capture spatial correlations. The deep learning method [16,19] based on the inclusion module architecture shows that the network can achieve the improvement of the accuracy by embedding multi-scale convolution in the modules of the network, and can simplify the calculation amount by designing the bottleneck layer, but the network has the problem of gradient attenuation as the depth of the network increases. ResNet is a large-scale convolutional neural network established by using a residual block, and the important contribution of ResNet is to provide the residual block constructed by a jump connection technology in an implied layer, so that the problem of gradient attenuation in deep network learning is solved. That is, ResNet can converge the entire structure toward the identity mapping based on the identity mapping theory, and ensure that the final error rate is not worse as the depth is larger. Because the inclusion network is often very deep, the combination of the residual block and the inclusion architecture enables the inclusion network to overcome the problem of gradient attenuation. To further improve the performance of the network, SEnet improves the representation capability of the network by explicitly constructing interdependencies between the convolutional feature channels and introduces a mechanism that enables the network to perform feature recalibration, by which the network can learn to use global information to selectively emphasize information features and suppress less useful features.

The Incep _ v4[19], Incep _ Resnet _ v2[19], and SEnet networks have proven to be very effective in learning the depth representation. In order to construct a defect image identification model, one SE _ inclusion _ V4 and two SE _ inclusion _ ResNet _ V2 are used as basic models to extract image features, the flatted global average pooling layers of the three models are returned and spliced together to serve as the original input of a classification model. The reason is that extracting image features is a key step of image recognition. When the image background is simple and the defect features are prominent, the traditional feature extraction [20], such as color features, geometric features, shape features and the like, can obtain a relatively ideal recognition rate. However, the wafer defects are of a large variety and single color, and the appearance of the same defect image is greatly changed under the influence of changes of shooting angles, distances, illumination and shadows. These problems make it difficult for conventional feature extraction techniques to meet the requirements of wafer defect image identification. Compared with the traditional feature extraction, the deep CNN directly acts on the original data to automatically extract the features layer by layer, and the obtained features are more abstract and have stronger expression capability along with the increase of the number of network layers.

Because the number of parameters to be learned by the two depth CNNs is very large, a large amount of data is required to be trained to improve the generalization capability of the model and improve the robustness of the model, in actual situations, defect data is not too much as what we imagine, and a wafer circuit is complex, besides a key defect, a lot of interference targets often exist in an acquired image, such as peripheral lines, other defects, lines similar to the defect, and the like, and the identification of the wafer defect is greatly influenced. There are two methods of obtaining large amounts of data: firstly, new data is acquired, however, the defect rate in the wafer manufacturing process is very low, the accumulation of defect image data is very time-consuming, and the method is not applicable to the problem of wafer defect image identification; and secondly, the data volume is increased through a data enhancement technology [21], wherein the data enhancement refers to that more data are generated by using image processing technologies such as overturning, translation or rotation and the like, and the trained data volume is increased.

To further increase the performance of the algorithm, we sampled the SMCO (Sequential model-based optimization) method [22] to optimize the hyper-parameters of the fusion model. The reason is that the SMBO is a universal random optimization algorithm suitable for classification and continuous hyper-parameters, and has better performance compared with a manual adjustment and random search method. Structure diagram of fused depth CNN image defect recognition system (fig. 3):

the fourth step: and performing feature extraction on the preprocessed image by using a fusion deep learning model, wherein the fusion deep learning model is optimized by adopting an SMCO optimization algorithm during model training.

The fifth step: and reducing the dimension of the features extracted in the fourth step, and inputting the features into a full-connection classification layer.

And a sixth step: and classifying the extracted characteristic data at the full connection layer, wherein the full connection layer is optimized by adopting an SMCO optimization algorithm during model training to obtain an optimized classification recognition model/image classification recognition result.

CMP is a technique capable of realizing global and local planarization of a wafer, and removes redundant materials on the surface of a wafer by utilizing the chemical action of chemicals in a grinding fluid and the mechanical action of microparticles in the grinding fluid, so that the surface of the wafer meets a certain flatness requirement. The chemical action means that the grinding fluid and the grinding medium have chemical action and react to generate a substance dissolved in water. The mechanical action means that the grinding fluid removes the product on the surface of the medium under a certain pressure through the mechanical action and is carried away by the flowing grinding fluid, so that the medium on the surface of the ground wafer is exposed again. The chemical and mechanical interaction are mutually matched, so that the reaction can continue, and the combined action of the chemical action and the mechanical action realizes low damage and high flatness of the surface of the wafer. However, during the CMP process, defects such as dishing, metal damage, scratches, and various contamination residues are often introduced into the surface of the material. In this study, we subdivided the defects into 12 classes, as in fig. 2.

Data enhancement is the most common method for reducing overfitting in CNN training, and adds disturbance to an original image on the premise of not changing the image category, thereby enlarging a data set. During wafer CMP, there are two main morphological changes to defects: direction and shape. Scratches are typically defects that can be in any orientation, whether rotated or flipped, without changing the characteristics of the defect. The residue defect is another typical defect, the shape of which is various, the identification of the defect by light rays is greatly influenced, and the rotation or the inversion changes the characteristics of the defect, so that the residue defect is not suitable for the processing method of image rotation or inversion. Therefore, in order to not change the characteristics of various defects, only a data enhancement method of cropping is adopted, and the cropping of five positions in the four corners and the middle is respectively adopted, so that the data can be enhanced by 5 times. The image data sets used in training and testing were 2000 and 600 wafer defect images, respectively, both of which had the problem of data skew, with image sizes of 480 pixels by 480 pixels. For the training data set, the image processing technology is used for carrying out graying, resize and denoising on the image, then the processed image is cut, and the data volume is enhanced to 5 times of the original data volume. The operating system is Centos7, the deep learning framework is TensorFlow, the developed software is Python2.7, and the GPU is NVIDIA Tesla P4.

In order to test the effect of the proposed defect image feature extraction method, 3 methods (Combined3) were compared, namely, a classification method based on a single depth CNN model, a classification method based on a combination of SE _ inclusion _ v4 and SE _ inclusion _ Resnet _ v2 for depth CNN (Combined2), and a classification method based on a combination of SE _ inclusion _ v4 and two SE _ inclusion _ Resnet _ v2 for depth CNN.

To test the effect of the proposed defect image classification method, in the inferred classification process, we first generated a cropped set of 5 positions of the test image using the same process as data enhancement. And then according to the classifier, obtaining the posterior probability of the ith cutting image of the test image. And finally, determining the category of the image after fusing according to the posterior probability of the 5-position cut image. On the basis of feature extraction of the deep CNN model, comparison is performed by using deep learning methods of different combinations respectively. The value range of the SMBO optimization of the hyper-parameters selected by the model is determined according to the performance of an algorithm and a server:

each deep CNN architecture requires memory to store the filter weights, so the batch size depends on the memory capacity of the training hardware. We set the batch to 16, which is the power of 2, and the memory usage of our 8GB GPU is improved to the maximum extent. Higher learning rates tend to result in overfitting of the network, while lower learning rates result in limited error variations (i.e., slow learning) over time periods. We set the learning rate value range to (10)^-4,10^-2.5) Has a decay rate of 0.9 per 30 epochs. Momentum controls the fluctuation of the network weights by increasing the proportion of change from the previous iteration to the current iteration. Thus, higher momentum values reduce the ripple by forcing the weights to change in a similar direction as the last iteration, thereby making convergence of the optimal weights smoother and faster. We set a uniform distribution with a range of values of (0.1, 0.95). The weight attenuation term can prevent the network weight from increasing too much and is used as a regularization term of gradient descent; this is also important to avoid overfitting. We set its value range to (10)^-4,10^-2) Is uniformly distributed. The discarding rate is a neural network regularization method, which randomly discards some neurons, increases the randomness of the model, and is a commonly used means for reducing the overfitting of the model. We set a uniform distribution with a range of values of (0.2, 0.8).

To analyze the overall recognition ability of the different combinations of deep learning methods, we counted Accuracy for each model prediction result in the dataset, see table 1. It can be seen from table 1 that Combined3 model is significantly superior to other deep CNN models.

TABLE 1 deep CNN model representation

For Combined3 model overall performance, we adopted Top1 accuracy. To analyze the ability to identify individual classes in an unbalanced dataset, we counted Precision, Recall, and Accuracy for each defect class in the dataset, see Table 2.

TABLE 2 Combined3 model Overall Performance

In order to further research the superiority of the proposed model, 1913 features including texture features, gray features, shape features and spatial features are extracted based on the traditional image feature extraction technology HOG, LBP, Haar, GLCM, SIFT and the like, dimension reduction is performed by using a feature screening method, and the adopted model has three representative classification methods of a Support Vector Machine (SVM), a Random Forest (RF) and a multilayer perceptron (MLP). And optimizing the RF, MLP and SVM hyper-parameters respectively by adopting an SMBO method, wherein the hyper-parameter value ranges of the corresponding models are as follows:

the estimator number (n _ estimators) in the random forest classification model is over-participated in configuring the number of trees in the forest, generally, the larger the value of the estimator number is, the better the model performance is, but the larger the estimator number is, the model cannot be obviously improved, the calculation speed of the model is reduced, and the value range is set to be (100,2000). The maximum depth (max _ depth) is used to set the maximum depth of the tree, and when there are many samples and many features, the depth is too deep, which may result in overfitting, and the value range is set to (3, 30). The maximum feature number (max _ features) is used to set the feature number or feature proportion used by each tree, and for reference to find the best splitting path, generally, the larger the value is, the better the model performance is, and the value range is set to be (0.1,1) uniform distribution.

The super-parameters selected by the MLP classification model comprise a discarding rate and a learning rate, and the set value range is consistent with the depth CNN.

The punishment parameter C of the error item in the support vector machine classification model is used for adjusting the preference of two indexes of interval size and classification accuracy in the optimization direction, when C tends to infinity, the model does not allow the existence of a sample with classification error, the model is over-fitted, when C tends to 0, the model does not pay attention to the classification correctness, only the larger the interval is, the better the model is under-fitted, and the value range of the model is set to be (0,200) uniform distribution. The kernel function kernel is used to specify the kernel function type that maps low-dimensional feature vectors into high-dimensional space, so that originally linearly indivisible data is linearly separable in high-dimensional space, and we set its selectable values to 'rbf', 'sigmoid' and 'linear'. The kernel coefficient gamma is a parameter of the kernel function, implicitly determines the distribution of the data after mapping to a new feature space, and the larger the gamma is, the fewer the support vectors are, and the value range is set to be (0,200) uniform distribution.

Table 3 shows the comparative test results of the combined depth CNN image defect recognition system and other non-depth CNN recognition models. Analysis of table 3 shows that the highest recognition accuracy can be achieved using the methods presented herein. The method proposed herein is advantageous over other methods because the features of the defect image are extracted from multiple angles using multiple depths CNN, fully exploiting the characteristics of the sample.

TABLE 3 comparison of deep CNN versus non-deep CNN model Performance

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention are equivalent to or changed within the technical scope of the present invention.

Claims

1. A defect image recognition and classification system of an integrated circuit based on a fusion deep learning model is characterized in that the defect image of a wafer is automatically recognized and classified on line in a mode of using the fusion model based on a depth Convolution Neural Network (CNN), and the change of the number of various defects of the wafer is detected in time;

the core mechanism of the method is a defect image feature extraction method constructed by two deep learning models which are integrated into a learning mechanism, the deep CNN fusion model constructs a Combined3 defect image classification model based on two frameworks of SE _ Incepton _ V4 and SE _ Incepton _ ResNet _ V2, and a sequence model optimization (SMBO) algorithm is used for carrying out super-parameter optimization on the fusion deep CNN recognition model, so that the model recognition precision is improved.

2. The integrated circuit defect image recognition and classification system based on the fusion deep learning model as claimed in claim 1, wherein the fusion deep learning image defect recognition system is implemented by the following steps:

3. The integrated circuit defect image recognition and classification system based on the fusion deep learning model as claimed in claim 1, wherein data enhancement is the most common method for reducing overfitting in CNN training, which adds disturbance to the original image without changing the image category, thereby enlarging the data set;

only adopting a data enhancement method of cutting, respectively cutting four corners and five positions in the middle, thus the data can be enhanced by 5 times;

the image data sets used in training and testing are respectively 2000 and 600 wafer defect images, both of which have the problem of data skew, and the image size is 480 pixels × 480 pixels;

for a training data set, carrying out graying, resize and denoising on an image by using an image processing technology, and then cutting the processed image, wherein the data volume is enhanced to 5 times of the original data volume;

the operating system is Centos7, the deep learning framework is TensorFlow, the developed software is Python2.7, and the GPU is NVIDIA Tesla P4.