CN116740002A

CN116740002A - Wafer defect detection method based on neural network architecture search

Info

Publication number: CN116740002A
Application number: CN202310630271.4A
Authority: CN
Inventors: 代梦航; 何子潇; 王欢; 刘志亮; 左明健
Original assignee: Qingdao Mingsiwei Technology Co ltd
Current assignee: Qingdao Mingsiwei Technology Co ltd
Priority date: 2023-05-30
Filing date: 2023-05-30
Publication date: 2023-09-12

Abstract

The invention discloses a wafer defect detection method based on neural network architecture search, which comprises the steps of converting an input wafer into a feature map through convolution operation, and taking the feature map as input of a first space unit and a second space unit; in the space unit, each node is connected with the previous node by a set connection mode, the input feature map is processed according to all the set connection modes, and weighted summation is carried out according to the set initial weight. Then updating the weight by using the cross entropy loss function to obtain an optimal network architecture for realizing the wafer defect task; and finally, performing network retraining by utilizing the output optimal network to obtain a model for wafer defect detection tasks so as to improve the wafer defect detection performance.

Description

Wafer defect detection method based on neural network architecture search

Technical Field

The invention belongs to the technical field of defect detection, and particularly relates to a wafer defect detection method based on neural network architecture search.

Background

In recent years, the importance of the semiconductor industry has increased, and semiconductor manufacturing involves four processes, namely production, testing, assembly, and final testing. In the wafer test stage, the function of each die on the wafer will be tested and the test results plotted as a wafer map. The wafer test section is critical to Semiconductor Wafer Fabrication Systems (SWFS) because wafer maps can be used for Defect Pattern Recognition (DPR), facilitating inspection for problems in production and improving yields of subsequent products.

The die density on wafer maps has increased greatly with the development of wafer fabrication technology under the direction of moore's law. As the dies on a wafer become dense, multiple defect patterns are more likely to be observed on a single wafer map. FIG. 1 shows some example images with mixed type defect modes, from left to right (a) - (d) are: single defect class, double mixed defect class, triple mixed defect class, and quadruple mixed defect class. Due to the complex and varied defect patterns, hybrid wafer defect detection is significantly more difficult.

Some deep learning methods are designed to be more complex due to the complexity of the actual operating conditions. In addition, designing a neural network architecture requires a great deal of expertise and experimentation, which is a time-consuming and labor-consuming process. Fortunately, neural Network Architecture Search (NAS) is rapidly evolving as an automated machine learning (AutoML) method, where the most appropriate neural network architecture can be found automatically by candidate operations and data sets. NAS is generally defined by three basic elements: search space, search policy, and performance evaluation. Defining search operations and diversified search strategies in the search space may help find network architectures with higher complexity and better performance. Efficient search space also helps to improve search efficiency and optimize search strategies.

However, despite the wide application of deep learning methods and attention mechanisms in many areas, challenges remain in the area of wafer defect detection:

(1) The conventional deep learning method is mainly used for identifying single type of wafer defects. However, in real world industrial processes, wafers often have multiple types of defects at the same time. Such hybrid defect pattern recognition is significantly more difficult due to the complexity and diversity of the defect patterns.

(2) The traditional deep learning method used in the wafer defect detection field needs to manually construct a neural network architecture, needs a great deal of expertise and experiments, and is time-consuming and labor-consuming. With the advent of NAS methods, the design of network architecture is increasingly automated. However, although NAS methods are rapidly developed, they have not been effectively applied to the field of wafer DPR and achieve good performance.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a wafer defect detection method based on neural network architecture search, which detects wafer defects in a neural network architecture search mode, so that the wafer defect detection performance is improved.

In order to achieve the above object, the present invention provides a wafer defect detection method based on neural network architecture search, which is characterized by comprising the following steps:

(1) Acquiring a wafer defect data set;

collecting a plurality of wafer pictures under m different defect conditions, and then generating wafer pictures under different defect conditions by utilizing a generation countermeasure network GAN, wherein the j wafer picture under the i-th defect is marked as X _ij I=1, 2, …, m, j=1, 2, …, n, n is the number of wafer pictures at the ith defect;

marking the wafer picture under each defect condition with a fault label F corresponding to the defect type _i ；

Forming a wafer defect data set by all wafer pictures under different defect conditions and corresponding fault labels;

(2) Building a neural network architecture search network DA-DARTS based on a dual-attention mechanism;

(2.1) building a dual-attention module DAM;

the dual-attention module DAM comprises a channel attention module CA and a space attention module SA; the CA comprises a maximum pooling layer maxpool, an average pooling layer avgpool, a neural network of two layers of convolution layers and a sigmoid activation function; the SA comprises a large pooling layer maxpool, an average pooling layer avgpool, a convolution layer and a sigmoid activation function;

let the length, width, channel number of the input feature map of DAM be: h×w×c, first, the input feature map is input to CA, and two feature maps of 1×1×c are obtained through maxpool and avgpool based on the feature map width and height, respectively; then, respectively sending the two convolutional layers into a neural network of two layers, wherein each convolutional layer adopts an activation function leakage ReLU, the convolution kernel size is 1 multiplied by 1, and the neural networks of the two layers are shared; then, adding the characteristics output by the two layers of neural networks, and performing sigmoid activation operation to obtain an output characteristic diagram of the CA; then, multiplying the output characteristic diagram and the input characteristic diagram of the CA to generate input characteristics required by the SA; in SA, inputting a feature map to a maxpool and an avgpool based on a channel to obtain two H multiplied by W multiplied by 1 feature maps, and then splicing the two feature maps as a channel; then, through a convolution operation with the convolution kernel size of 7 multiplied by 7, the dimension is reduced to 1 channel, namely H multiplied by W multiplied by 1, then an output characteristic diagram of SA is generated through sigmoid, and finally the characteristic diagram and the input characteristic diagram of SA are multiplied to obtain an output characteristic diagram of DAM;

(2.2) adding the DAM into a traditional microarchitecturable search network DARTS to obtain a DA-DARTS network architecture;

setting a connection operation mode of DA-DARTS, comprising: maximum pooling operation, average pooling operation, convolution operation of 3×3 of convolution kernel size, convolution operation of 5×5 of convolution kernel size, separable convolution operation of 5×5 of convolution kernel size, hole convolution operation of 5×5 of convolution kernel size, skip connection operation, and DAM operation;

assigning an initial weight o to each join operation _p ；

A conventional microarchitecturable search network DARTS consists of a plurality of identical spatial units C _k Cascade-formed, in each space unit C _k Each node is connected with all the nodes in front of the node, and the connection mode is any one of the set connection operations;

(3) Training a DA-DARTS network architecture;

(3.1) setting the maximum iteration number as the EPOCH, and initializing the current iteration number epoch=1; giving the expected model training error as tau;

(3.2) in the epoch-th iteration, M wafer pictures are extracted from the wafer defect dataset and recorded asThen will->As single batch input to the DA-DARTS network, its corresponding failure tag +.>As a desired output of the network;

(3.3) when the wafer is on the waferAfter entering the network, it will act as the first space unit C ₁ And a second space unit C ₂ From a third space unit C ₃ Initially, the input of the subsequent spatial unit is the output of the first two spatial units;

in particular, space unit C ₁ And C ₂ Firstly converting a picture into a feature picture through a convolution operation, then inputting the feature picture between two nodes with connection, processing the feature picture through all set connection operations, and then according to a set initial weight o _p Carrying out weighted summation on the result; space unit C ₃ The following space unit directly takes the output characteristic diagram of the previous two space units as input, and its processing and space unit C ₁ And space unit C ₂ The same applies, and so on, and finally, a fault label corresponding to the defect type is predicted through the last space unit;

(3.4) calculating a cross entropy loss value loss according to the expected output fault label and the predicted fault label, then judging that the current iteration number epoch=epoch or loss is smaller than tau, and if yes, stopping iterative training to obtain a trained DA-DARTS network architecture; otherwise, updating the connection operation weight o of the DA-DARTS by the cross entropy loss value loss through a back propagation algorithm _p Then, performing the next round of training;

(4) Retraining by DA-DARTS network;

loading the trained DA-DARTS network architecture, then re-inputting the wafer image and the corresponding fault label in the wafer defect data set, and re-training according to the training mode of the DA-DARTS network architecture in the step (3), thereby obtaining the DA-DARTS network for wafer defect detection;

(5) Detecting defects of the wafer picture;

and (3) collecting wafer pictures of unknown defect types, and inputting the wafer pictures into the DA-DARTS network after retraining in the step (4), so as to judge the defect types corresponding to the wafer pictures.

The invention aims at realizing the following steps:

the invention relates to a wafer defect detection method based on neural network architecture search, which comprises the steps of converting an input wafer into a feature map through a convolution operation, and taking the feature map as the input of a first space unit and a second space unit; in the space unit, each node is connected with the previous node by a set connection mode, the input feature map is processed according to all the set connection modes, and weighted summation is carried out according to the set initial weight. Then updating the weight by using the cross entropy loss function to obtain an optimal network architecture for realizing the wafer defect task; and finally, performing network retraining by utilizing the output optimal network to obtain a model for wafer defect detection tasks so as to improve the wafer defect detection performance.

Meanwhile, the wafer defect detection method based on the neural network architecture search has the following beneficial effects:

(1) The invention provides a network named DA-DARTS, which effectively improves the accuracy of the hybrid wafer defect mode identification on the basis of the existing DARTS algorithm.

(2) The present invention optimizes the search space of the original DARTS algorithm by combining attention mechanisms, increasing the diversity of the search space, and enhancing the complexity of the search architecture.

(3) DA-DARTS has been extensively tested on MixedWM38 dataset, experimental results show that our proposed method is superior to the comparison algorithm in all performance indicators, and that the proposed DAM improves the performance of all aspects of DARTS algorithm through ablation experiments.

Drawings

FIG. 1 is a diagram of four defect types common to wafers;

FIG. 2 is a flow chart of a wafer defect detection method based on neural network architecture search according to the present invention;

FIG. 3 is a network architecture diagram of a dual-attention module DAM in accordance with the present invention;

FIG. 4 is an overall block diagram of the DA-DARTS proposed by the present invention.

FIG. 5 is a graph of loss curve and accuracy curve for the search and retrain stages in an embodiment.

FIG. 6 is a test result of the proposed DA-DARTS algorithm at each defect type.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples

FIG. 2 is a flow chart of a wafer defect detection method based on neural network architecture search according to the present invention.

In this embodiment, as shown in fig. 2, the present invention provides a wafer defect detection method based on neural network architecture search, which includes the following steps:

s1, acquiring a wafer defect data set;

s2, building a neural network architecture search network DA-DARTS based on a dual-attention mechanism;

s2.1, building a dual-attention module DAM;

as shown in fig. 3, the dual-attention module DAM includes a channel attention module CA and a spatial attention module SA; the CA comprises a maximum pooling layer maxpool, an average pooling layer avgpool, a neural network of two layers of convolution layers and a sigmoid activation function; the SA comprises a large pooling layer maxpool, an average pooling layer avgpool, a convolution layer and a sigmoid activation function;

s2.2, adding the DAM into a traditional micro-architecture search network DARTS to obtain a DA-DARTS network architecture;

assigning an initial weight o to each join operation _p ；

A conventional microarchitecturable search network DARTS consists of a plurality of identical spatial units C _k In this embodiment, a total of 8 space units are obtained by cascading, as shown in FIG. 4, with each space unit C _k Each node is connected with all the nodes in front of the node, and the connection mode is any one of the set connection operations;

the connection mode between every two nodes is completely consistent, and the connection mode from 0 node to 3 nodes is selected for explanation without losing generality. The characteristic diagram of the 0 node is multiplied by a predefined weight o after being processed by various predefined connection modes _p Weighting the output characteristic diagram of the p-th predefined connection mode after weighting is y _p Then

f _p (x _p )＝0.5(tanh(y _p )+1)

Therefore, the output of node 0 to node 3 is expressed as:

each node represents a feature map obtained by processing the output of all the previous nodes, the processing algorithm is identical, the 3 nodes are used for description, and the input of the 3 nodes isAnd carrying out average pooling operation on the three inputs to obtain an output characteristic diagram. And performing channel splicing on the output characteristic diagram, and respectively summing the result and the input of the 3 nodes to serve as the characteristic diagram represented by the 3 nodes.

S3, training a DA-DARTS network architecture;

s3.1, setting the maximum iteration number as the EPOCH, and initializing the current iteration number epoch=1; giving the expected model training error as tau;

s3.2, in the epoch iteration, M wafer pictures are extracted from the wafer defect data set and recorded asThen will->As single batch input to the DA-DARTS network, its corresponding failure tag +.>As a desired output of the network;

s3.3, when the wafer is a waferAfter entering the network, it will act as the first space unit C ₁ And a second space unit C ₂ From a third space unit C ₃ Initially, the input of the subsequent spatial unit is the output of the first two spatial units;

in particular, space unit C ₁ And C ₂ Converting the picture into a feature map through a convolution operation, inputting the feature map into a space between two nodes with connection, and passing the feature mapWith set connection operations, then processing according to set initial weights o _p Carrying out weighted summation on the result; space unit C ₃ The following space unit directly takes the output characteristic diagram of the previous two space units as input, and its processing and space unit C ₁ And space unit C ₂ The same applies, and so on, and finally, a fault label corresponding to the defect type is predicted through the last space unit;

s3.4, calculating a cross entropy loss value loss according to the expected output fault label and the predicted fault label, then judging that the current iteration number epoch=EPOH or loss is smaller than tau, and stopping iterative training if the current iteration number epoch=EPOH or loss is smaller than tau, so as to obtain a trained DA-DARTS network architecture; otherwise, updating the connection operation weight o of the DA-DARTS by the cross entropy loss value loss through a back propagation algorithm _p Then, performing the next round of training;

s4, retraining the DA-DARTS network;

loading the trained DA-DARTS network architecture, then re-inputting the wafer image and the corresponding fault label in the wafer defect data set, and re-training according to the training mode of the DA-DARTS network architecture in the step S3, so as to obtain the DA-DARTS network for wafer defect detection;

s5, detecting defects of the wafer picture;

and (3) collecting wafer pictures of unknown defect types, and inputting the wafer pictures into the DA-DARTS network after retraining in the step S4 so as to judge the defect types corresponding to the wafer pictures.

Instance verification

In this example, the verification of the present invention was performed using the MixedWM38 dataset containing 4 mixed types of wafer defect pictures, amounting to 38000 pictures.

Since the NAS algorithm will use the data from the training set and the validation set to search for the best architecture, we divide the MixedWM38 dataset into the training set, validation set and test set in a 4:4:2 ratio. All images in the dataset are shaped to a size (32, 32) to reduce GPU memory usage and then normalized to speed up the search. The super parameter settings used in this example are shown in table 1.

Table 1 experimental parameter settings

The loss and M-precision curves for the search and retraining stages of the example are shown in fig. 5. After 15 periods the loss is reduced to less than 0.05, which means that the proposed network has a satisfactory fitting capacity. The M-precision curve does not decrease over the past few periods, indicating that no overfitting problems occur for the selected search space and hyper-parameters.

Fig. 6 shows the test results for each defect type of the proposed method. Regions 1 to 9 represent detection performance of a single defect class, regions 10 to 22 represent double mixed defect classes, regions 23 to 34 represent triple mixed defect classes, and 35 to 38 represent quadruple mixed defect classes. As can be seen from the figure, the S-precision of all defect types is close to 100% because the number of positive and negative samples obtained by the calculation method is not equal. The accuracy of single defects is slightly higher than that of mixed defects, which proves the difficulty in detecting mixed defects. However, the recall value (recall) is stable for all types and approaches the precision value, which means that the proposed method has a high correct recognition rate.

To demonstrate the superiority of the proposed method, we compared our method with several methods in related work and the most advanced computer vision algorithms, including ResNet-18, VGG-16, denseNet-121 and DC Net. The comparison method was trained on the same dataset for 100 periods of time and then tested for four indices. The test results are shown in Table 2.

Table 2 experimental results

Model	S-Accuracy	Precision	Recall
				DA-DARTS	99.77％	95.60％	95.45％
ResNet-18	99.62％	92.56％	92.64％
				VGG-16	99.66％	93.58％	93.23％
DenseNet-121	99.64％	93.31％	93.19％
				DC-Net	98.58％	92.20％	93.47％

Experimental results show that the method is superior to the comparison method in four indexes. ResNet-18, VGG-16, and DenseNet-121 models rely on their optimization modules for computer vision tasks and their feature learning capabilities, but they ignore the relationships between different types of defect patterns and therefore fail to achieve high performance. Furthermore, denseNet-121 has outstanding feature learning capabilities due to its network depth, but it has over-fitting problems, which lead to its poor performance, where the proposed network avoids over-fitting by adjusting the network architecture during the search phase. DC-nets have deformable convolutional layers that can take advantage of the graphic and positional information of defect patterns, but their proposed network architecture is designed manually, which may not be the best architecture for this task. Our model consists of a specially designed DAM that helps extract the spatial and channel attention of the wafer defect pattern. Using NAS-based search algorithms, we can ensure that the network architecture is suited for this task. Thus, the performance of DA-DARTS is significantly better than other methods.

To verify the validity of the dual attention module, this example compares the DA-DARTS method with the dual attention module with the DARTS method without the attention module. Table 3 defines a search space for experiments.

Table 3 experiment predefined search space

The test results are shown in Table 4. Despite the deletion of the DAM from the search space, the network still has good results. This shows that NAS algorithms are powerful and can find the best network architecture for different search spaces. Nevertheless, DADARTS without DAM performed worse than normal DA-DARTS. Thus, the validity of the DAM was demonstrated.

Table 4 method comparison results

Model	S-Accuracy	precision	recall
				DA-DARTS	99.77％	95.60％	95.45％
DARTS	99.72％	94.63％	94.57％

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The wafer defect detection method based on the neural network architecture search is characterized by comprising the following steps of:

(1) Acquiring a wafer defect data set;

(2.1) building a dual-attention module DAM;

assigning an initial weight o to each join operation _p ；

(3) Training a DA-DARTS network architecture;

(3.2) in the epoch-th iteration, M wafer pictures are extracted from the wafer defect dataset and recorded asThen willAs single batch input to the DA-DARTS network, its corresponding failure tag +.>As a desired output of the network;

in particular, space unit C ₁ And C ₂ The picture is converted into a feature map by a convolution operation, and then,inputting a feature map between two nodes with connection, processing the feature map through all set connection operations, and then according to the set initial weight o _p Carrying out weighted summation on the result; space unit C ₃ The following space unit directly takes the output characteristic diagram of the previous two space units as input, and its processing and space unit C ₁ And space unit C ₂ The same applies, and so on, and finally, a fault label corresponding to the defect type is predicted through the last space unit;

(4) Retraining by DA-DARTS network;

(5) Detecting defects of the wafer picture;