CN116128839A

CN116128839A - Wafer defect identification method, device, electronic equipment and storage medium

Info

Publication number: CN116128839A
Application number: CN202310062961.4A
Authority: CN
Inventors: 姜辉; 邵康鹏; 陆叶
Original assignee: Hangzhou Guangli Microelectronics Co ltd
Current assignee: Hangzhou Guangli Microelectronics Co ltd
Priority date: 2023-01-20
Filing date: 2023-01-20
Publication date: 2023-05-16

Abstract

The application relates to a wafer defect identification method, a wafer defect identification device, electronic equipment and a storage medium. Wherein the method comprises the following steps: acquiring a wafer map to be detected; inputting the wafer map to be detected into a target wafer defect recognition model, and outputting defect information of the wafer map to be detected through the wafer defect recognition model; wherein the wafer defect identification model comprises at least one multi-scale depth separable network, each channel of the multi-scale depth separable network comprising a plurality of convolved branches of different receptive fields. The wafer defect recognition model provided by the embodiment of the application extracts the characteristic information more comprehensively and abundantly, so that the accuracy of classification recognition and the robustness of the model are improved. In addition, the use of a depth separable network instead of a conventional convolutional network can also reduce the number of parameters while deepening the network, thereby saving computing resources.

Description

Wafer defect identification method, device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a wafer defect identification method, a device, an electronic apparatus, and a storage medium.

Background

In the processing process of the wafer, defects are generated due to the problems of imperfect manufacturing process, lack of maintenance of a machine, pollution of manufacturing materials and the like, so that the yield of the wafer is reduced. The size of the defect has an effect on the chip function in the area where it is located. Because the source that the defect produced is more, lead to producing defect kind also more, need arrange the test engineer manual classification to the defect that produces, but can increase the cost of labor like this and waste time and energy, and classification efficiency is lower.

In the related art, the wafer map is extracted by manual features and sent to a classifier, but the method is time-consuming and labor-consuming and has low generalization; yet another approach is to use convolutional neural networks instead of manually extracting features, outputting classification results through softmax of fully connected layers, but the performance of the model does not get better as the convolutional neural network depth increases; in addition, because the sizes of the defects are different and have large differences, the common convolutional neural network does not accurately extract the characteristic information of all the defects when extracting the characteristics, so that the classification accuracy and the robustness of the model are not high.

Therefore, there is a need in the art for a high-accuracy wafer defect identification method.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a wafer defect recognition method, apparatus, electronic device, and storage medium that can improve the accuracy of wafer defect recognition.

In a first aspect, an embodiment of the present application provides a method for identifying a wafer defect, where the method includes:

acquiring a wafer map to be detected;

inputting the wafer map to be detected into a target wafer defect recognition model, and outputting defect information of the wafer map to be detected through the target wafer defect recognition model; wherein the target wafer defect identification model comprises at least one multi-scale depth separable network, each channel of the multi-scale depth separable network comprising a plurality of convolved branches of different receptive fields.

The wafer defect identification model provided by the embodiment of the application can be used for analyzing and processing the input wafer map to be detected by utilizing the target wafer defect identification model, and determining the defect type of the wafer map to be detected. Wherein, because the wafer defect recognition model comprises a multi-scale depth separable network, each channel of the multi-scale depth separable network comprises a plurality of convolution branches with different receptive fields. The convolution branches of different receptive fields can extract the image characteristic information of different sizes, namely the extracted characteristic information is more comprehensive and rich, thereby improving the accuracy of classification and identification and the robustness of the model. In addition, the use of a depth separable network instead of a conventional convolutional network can also reduce the number of parameters while deepening the network, thereby saving computing resources.

Optionally, in an embodiment of the present application, before inputting the wafer map to be detected to a wafer defect recognition model and outputting defect information of the wafer map to be detected via the target wafer defect recognition model, the method includes:

determining rough defect information of the wafer map to be detected;

selecting a target wafer defect recognition model matched with the rough defect information from a plurality of candidate wafer defect recognition models; the size and/or number of convolved branches of the multi-scale depth separable network in the candidate wafer defect identification model are different.

Optionally, in an embodiment of the present application, the output result of the channel is determined according to fusion information of feature information output by the multiple convolution branches.

Optionally, in an embodiment of the present application, the defect information includes a defect type, and the target wafer defect identification model is set to be trained according to the following manner:

obtaining a plurality of wafer map samples, wherein the wafer map samples are marked with defect types;

constructing an initial wafer defect identification model, wherein model parameters are set in the wafer defect identification model;

respectively inputting the plurality of wafer map samples into the wafer defect recognition model to generate a prediction result;

And iteratively adjusting the model parameters based on the difference between the prediction result and the defect type until the difference meets a preset requirement.

Optionally, in an embodiment of the present application, the difference between the prediction result and the defect type is determined according to a loss function, and the loss function is further provided with a first adjustment coefficient and a second adjustment coefficient, where the first adjustment coefficient is used for adjusting the loss contribution of the positive and negative wafer map samples, and the second adjustment coefficient is used for adjusting the loss contribution of the easily distinguishable wafer map samples.

Optionally, in an embodiment of the present application, before the inputting the wafer map to be detected into the target wafer defect identification model, the method further includes:

and carrying out binarization processing on the wafer map to be detected to obtain a binarized wafer to be detected.

Optionally, in an embodiment of the present application, the target wafer defect identifying module further includes at least one single-scale depth separable network, and the first number of the single-scale depth separable networks and/or the second number of the multi-scale depth separable networks are determined according to feature information of the wafer to be inspected.

Optionally, in an embodiment of the present application, the acquiring a plurality of wafer map samples, where the wafer map samples are labeled with defect types includes:

acquiring a wafer map data set, wherein each wafer map sample in the wafer map data set is marked with a defect type;

respectively calculating the duty ratio of each wafer map sample of the defect type in the wafer map data set, and selecting the wafer map sample of the defect type with the duty ratio exceeding the preset proportion threshold as a target wafer map sample;

and downsampling the target wafer map samples until the difference between the number of the target wafer map samples and the number of the wafer map samples of other defect types is smaller than the preset proportion threshold value.

Optionally, in an embodiment of the present application, the target wafer defect recognition model includes a feature extraction layer, a multi-scale depth separable network layer, a pooling layer, and a full-connection layer connected in sequence;

the feature extraction layer comprises a plurality of convolution layers which are connected in sequence and is used for carrying out preliminary feature extraction on a wafer map to be detected to obtain preliminary feature information of the wafer map to be detected;

the multi-scale depth separable network layer comprises a first point-by-point convolution layer, a depth-by-depth convolution layer and a second point-by-point convolution layer which are sequentially connected, and is used for extracting the characteristic information of the wafer map to be detected in depth; the first point-by-point convolution layer is used for changing M channels of the input preliminary characteristic information into N channels; the depth-by-depth convolution layer comprises N channels, and each channel convolves the input preliminary feature information to obtain depth feature information; each channel comprises a plurality of convolution branches, and each convolution branch comprises at least one convolution check for convolving the preliminary characteristic information; the second point-by-point convolution layer performs fusion splicing on depth characteristic information output by each channel of the depth-by-depth convolution layer;

The pooling layer is used for reducing the dimension of the depth characteristic information;

and the full-connection layer is used for processing the depth characteristic information after dimension reduction and outputting the defect information of the wafer map to be detected.

In a second aspect, an embodiment of the present application further provides a wafer defect identifying apparatus, where the apparatus includes:

the acquisition module is used for acquiring a wafer map to be detected;

the identification module is used for inputting the wafer map to be detected into a target wafer defect identification model, and outputting defect information of the wafer map to be detected through the target wafer defect identification model; wherein the target wafer defect identification model comprises at least one multi-scale depth separable network, each channel of the multi-scale depth separable network comprising a plurality of convolved branches of different receptive fields.

In a third aspect, embodiments of the present application further provide an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the methods described in the foregoing embodiments when the processor executes the computer program.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the methods described in the respective embodiments above.

In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described in the respective embodiments above.

Drawings

FIG. 1 is a flow chart of a method for identifying wafer defects according to one embodiment of the present disclosure;

FIG. 2 is a schematic diagram of wafer defect types according to one embodiment of the present application;

FIG. 3 is a schematic diagram of a target wafer defect recognition model according to one embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of a multi-scale depth separable network layer according to an embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a single-scale depth separable network layer according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a confusion matrix provided in one embodiment of the present application;

FIG. 7 is a diagram of a normalized confusion matrix provided in one embodiment of the present application;

FIG. 8 is a schematic diagram of a normalized confusion matrix provided in accordance with another embodiment of the present application;

fig. 9 is a schematic block diagram of a wafer defect identifying apparatus according to an embodiment of the present application;

fig. 10 is a schematic block diagram of an electronic device according to an embodiment of the present application;

fig. 11 is a conceptual partial view of a computer program product provided by embodiments of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In addition, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present application. It will be understood by those skilled in the art that the present application may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits have not been described in detail as not to unnecessarily obscure the present application.

In the embodiment of the present application, "/" may indicate that the associated object is an "or" relationship, for example, a/B may indicate a or B; "and/or" may be used to describe that there are three relationships associated with an object, e.g., a and/or B, which may represent: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In order to facilitate description of the technical solutions of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. may be used to distinguish between technical features that are the same or similar in function. The terms "first," "second," and the like do not necessarily denote any order of quantity or order of execution, nor do the terms "first," "second," and the like. In this application embodiment, the terms "exemplary" or "such as" and the like are used to denote examples, illustrations, or descriptions, and any embodiment or design described as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. The use of the word "exemplary" or "such as" is intended to present the relevant concepts in a concrete fashion to facilitate understanding.

In the embodiment of the present application, for a technical feature, the technical features of the technical feature are distinguished by "first", "second", "third", "a", "B", "C", and "D", and the technical features described by "first", "second", "third", "a", "B", "C", and "D" are not in sequence or in order of magnitude.

In the prior art, the wafer defect classification model is generally a conventional classification model, such as a decision tree model, a naive bayes model, a support vector machine model, and the like. However, due to the specificity of the technical field of semiconductor defect detection and identification, for example, the sizes of defects corresponding to different defect types are different, the characteristic information extracted by the conventional wafer defect classification model is not comprehensive.

Based on technical requirements similar to the above, the embodiments of the present application provide a wafer defect identification method. The method provides an improved wafer defect recognition model, which can analyze and process an input wafer map to be detected and determine the defect type of the wafer map to be detected. Wherein, because the wafer defect recognition model comprises a multi-scale depth separable network, each channel of the multi-scale depth separable network comprises a plurality of convolution branches with different receptive fields. The convolution branches of different receptive fields can extract the image characteristic information of different sizes, namely the extracted characteristic information is more comprehensive and rich, thereby improving the accuracy of classification and identification and the robustness of the model. In addition, the use of a depth separable network instead of a conventional convolutional network can also reduce the number of parameters while deepening the network, thereby saving computing resources.

The wafer defect recognition method described in the application is described in detail below with reference to the accompanying drawings. Although the present application provides method operational steps as illustrated in the following examples or figures, more or fewer operational steps may be included in the method, either on a routine or non-inventive basis. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided in the embodiments of the present application. The method may be performed sequentially or in parallel (e.g., in a parallel processor or multithreaded environment) in accordance with the method illustrated in the embodiments or figures during or when the apparatus is performing wafer defect identification.

Specifically, as shown in fig. 1, an embodiment of a method for identifying a wafer defect provided in the present application may include:

s101: acquiring a wafer map to be detected;

s103: inputting the wafer map to be detected into a target wafer defect recognition model, and outputting defect information of the wafer map to be detected through the target wafer defect recognition model; wherein the target wafer defect identification model comprises at least one multi-scale depth separable network, each channel of the multi-scale depth separable network comprising a plurality of convolved branches of different receptive fields.

In the embodiment of the present application, in the wafer manufacturing process, electrical testing is required to be performed on a wafer (wafer), and a large amount of wafer test data is generated. Specifically, in the process of performing an electrical test on the wafer, a plurality of dies in the wafer may be tested by using a die (die) as a unit, and wafer test data under a corresponding electrical test item may be generated. The wafer can be a slice monocrystalline silicon slice obtained by cutting from a silicon crystal bar, and the bare chip is a circuit unit for cutting the wafer along a scribing groove. The purpose of wafer testing is to ensure that each die in a complete wafer substantially meets device characteristics or design specifications to verify the process level of wafer fab manufacturing. Specifically, the electrical Test may include a Chip Test (CP), a wafer acceptance Test (Wafer Acceptance Test, WAT), a package Test (Final), and the like. Correspondingly, the generated test data may include CP test parameters, WAT test parameters, FT test parameters, and the like. After the test data of the Wafer are acquired, a Wafer Map to be detected can be determined according to the coordinate information of each die and the test parameter values corresponding to each die. The wafer map is a display map established in a computer device and used for displaying information such as the shape, size, orientation, observation points and the like of a target wafer. In the actual production process, if the corresponding test parameter value of the bare chip does not meet the test requirement, the bare chip is a failed bare chip; otherwise, if the test parameter value corresponding to the bare chip meets the test requirement, the bare chip is a qualified bare chip. On this basis, the positions of the qualified dies and the failed dies in the wafer map to be detected are distributed differently, and the formed defect patterns and the corresponding defect types are also different. In one embodiment of the present application, the defect types may be divided into a plurality of types, for example, in one example, as shown in fig. 2, the defect types may include nine defect types such as a Center defect (Center), a Ring defect (Donut), an Edge location defect (Edge-Loc), an Edge Ring defect (Edge-Ring), a location defect (Loc), a Near Full defect (Near Full), a Scratch defect (Scratch), and a defect-free (None), etc. In one embodiment of the application, after performing an electrical test on the wafer, judging whether each die fails according to an electrical test result to obtain a wafer map of the type of defect to be detected; in other embodiments of the present application, the wafer map to be inspected may also be obtained from the defect inspection apparatus. The defect inspection apparatus is used to scan or inspect individual wafers and may include, but is not limited to, automated optical inspection (Automatic Optic Inspection, AOI), X-ray inspection, scanning electron microscopy (ScanningElectron Microscope, SEM), and the like.

In an embodiment of the present application, in order to facilitate extraction of feature information of a wafer map to be detected, so as to improve efficiency of identifying wafer defects, the wafer map to be detected may be preprocessed, for example, may be subjected to binarization processing. Specifically, before the inputting the wafer map to be detected into the wafer defect recognition model, the method further includes:

s201: and carrying out binarization processing on the wafer map to be detected to obtain a binarized wafer to be detected.

In this embodiment of the present application, first, binarization processing may be performed on the wafer map to be detected. Specifically, through the binarization process, the die of the wafer map to be detected may be set to two values, where one value represents a qualified die and the other value represents a failed die and a background area, and the two values may be, for example, 0 and 1, or may be 1 and 2. The specific binarization processing can determine whether the bare chip of the wafer map to be detected is a qualified bare chip or a failed bare chip, namely 0 or 1, according to a preset test parameter value. Setting the target bare chip to be detected to be 1 in response to the test parameter value of the target bare chip of the wafer map to be detected being greater than the preset test parameter value; and setting the target bare chip to be 0 in response to the test parameter value of the target bare chip of the wafer map to be detected being smaller than the preset test parameter value. Of course, in this process, the background area in the wafer map to be detected may be set to 0. Further, 1 and 2 may be used as the pass die and the fail die, respectively, and 0 may be used as the background area, and the present patent is not limited to the description that the wafer map after the binarization process is represented by only 2 numerical values. In an embodiment of the present application, after determining the binarized wafer map to be detected, the wafer map to be detected may be further subjected to standardization processing, for example, the size of the wafer map to be detected may be unified to 96×96×1, so as to further improve accuracy of classification of wafer defects.

In this embodiment of the present application, after the wafer map to be detected is obtained, the wafer map to be detected may be input into a target wafer defect recognition model, and defect information of the wafer map to be detected may be output after analysis and processing by the target wafer defect recognition model. In an embodiment of the present application, the defect information may include a defect type of the wafer map to be detected, where the defect type of the wafer map to be detected is a center defect. Of course, the defect information may also include defect positions, defect shapes, defect sizes, and the like. In one embodiment of the present application, the target wafer defect recognition model may be trained using a plurality of wafer image samples, where the plurality of wafer image samples are labeled with defect types. The target wafer defect recognition model may include a model trained using machine learning. The machine learning approach may include a deep learning approach, and the model may include a convolutional neural network model (Convolutional Neural Networks, CNN), a residual network (res net) model, and the like, as the application is not limited herein. Specifically, in one embodiment of the present application, the target wafer defect recognition model may include a feature extraction layer, a multi-scale depth separable network layer, a pooling layer, and a full-connection layer connected in sequence;

S301: the feature extraction layer comprises a plurality of convolution layers which are connected in sequence and is used for carrying out preliminary feature extraction on a wafer map to be detected to obtain preliminary feature information of the wafer map to be detected;

s303: the multi-scale depth separable network layer comprises a first point-by-point convolution layer, a depth-by-depth convolution layer and a second point-by-point convolution layer which are sequentially connected, and is used for extracting the characteristic information of the wafer map to be detected in depth; the first point-by-point convolution layer is used for changing M channels of the input preliminary characteristic information into N channels; the depth-by-depth convolution layer comprises N channels, and each channel convolves the input preliminary feature information to obtain depth feature information; each channel comprises a plurality of convolution branches, and each convolution branch comprises at least one convolution check for convolving the preliminary characteristic information; the second point-by-point convolution layer performs fusion splicing on depth characteristic information output by each channel of the depth-by-depth convolution layer;

s305: the pooling layer is used for reducing the dimension of the depth characteristic information;

s307: and the full-connection layer is used for processing the depth characteristic information after dimension reduction and outputting the defect information of the wafer map to be detected.

In this embodiment, as shown in fig. 3, the target wafer defect recognition model 300 may include a feature extraction layer 301, a multi-scale depth separable network layer 303, a pooling layer 305, a full connection layer 307, and so on, which are sequentially connected. The feature extraction layer 301 may perform preliminary feature extraction on the wafer map to be detected to obtain preliminary feature information of the wafer map to be detected. The preliminary feature information may include information such as gray scale, edge, texture, color, gradient histogram, etc. of the bare chip in the wafer map to be detected. In one embodiment of the present application, the feature extraction layer 301 may be formed of a plurality of convolution layers, where the size and number of the convolution layers may be determined according to the actual identification requirement, for example, the feature extraction layer 301 may be two sequentially connected convolution layers, and the size of the convolution kernel in the convolution layers is 3×3. In one embodiment of the application, each convolution layer may be followed by a batch regularization layer (BatchNorm) and a linear rectification function (Relu function), so that nonlinear variation of the model can be effectively increased, and generalization capability of the model is enhanced.

In an embodiment of the present application, the multi-scale depth separable network layer 303 may be configured to further extract feature information of the wafer map to be detected, so as to obtain depth feature information of the wafer map to be detected. In one embodiment of the present application, as shown in fig. 4, the multi-scale separable network layer 303 may include a first point-wise convolutional layer 3031, a depth-wise convolutional layer (Depthwise Convolution) 3033, and a second point-wise convolutional layer (Pointwise Convolution) 3035. The first point-by-point convolution layer 3031 may include a 1×1 convolution kernel, and the first point-by-point convolution layer 3031 may change from M channels to N channels of the preliminary feature information to be input, thereby implementing the purpose of flexibly transforming the number of channels. The depth-wise convolutional layer 3033 may include a plurality of channels, for example, may include N channels. Each channel can convolve the input preliminary feature information to obtain depth feature information. In one embodiment of the present application, each channel may contain a plurality of convolution branches, which may include at least one convolution check to convolve the initial characteristic information. On this basis, the depth-wise convolutional layer 3033 may output depth characteristic information of N channels. In one embodiment of the present application, due to the diversity of defect types, the corresponding defect sizes and shapes are also varied, so that the depth feature information extracted by the multi-scale depth separable network layer 303 is more comprehensive, that is, in order to ensure that the feature information of the extracted wafer map to be detected is comprehensive rather than local information, a plurality of convolution branches may be configured to have different Receptive fields. Specifically, the receptive field may be a region size mapped on the input picture by a pixel point on a feature map (feature map) output by each layer of the convolutional neural network. In one embodiment of the present application, the size of the receptive field of the convolution branches may be determined according to the size and step size of the convolution kernel included in the convolution branches. For example, in one example, the receptive field after a two-layer 3×3 convolution kernel convolution operation is 5×5 and the receptive field after a three-layer 3×3 convolution kernel convolution operation is 7×7. In one embodiment of the present application, the convolution branch 1 of the channel 1 of the multi-scale depth separable network layer 303 comprises a layer of 3×3 convolution kernel, and the receptive field is 3; the convolution branch 2 contains two layers of 3×3 convolution kernels with a receptive field of 5. It should be noted that, in an embodiment of the present application, two 3×3 convolution kernels may be stacked to replace the 5×5 convolution kernels, where the receptive fields are all 5, so that not only network parameters may be reduced, but also the depth of the network may be increased under the condition that the receptive fields are kept the same. In one embodiment of the present application, since the channel includes a plurality of convolution branches, each convolution branch may extract the feature information, and thus the output result of the channel may be determined according to the fusion information of the feature information output by the plurality of convolution branches. In an embodiment of the present application, in order to combine the depth feature information output by each channel to generate final depth feature information, a second point-by-point convolution layer 3035 may be added after the point-by-point convolution layer 3033, so as to perform fusion splicing (Concat) on the depth feature information output by each channel to obtain the depth feature information of the wafer map to be detected, so that the dimension expansion and the parameter number can be controlled, and the dimension of 2×n channels is reduced to N channels.

In this embodiment, as shown in fig. 3, in order to avoid training degradation of the multi-scale depth separable network in the training process, so as to affect the recognition accuracy of the model, jump connection may be performed through a residual error module. Because the residual error module can bypass the input information to output directly, the subsequent convolution layer or pooling layer can directly learn residual error characteristics, so that the problem that the gradient of the multi-scale depth separable network disappears in the gradient feedback process can be solved, the integrity of an output result can be protected, and the learning difficulty is simplified. In one embodiment of the present application, the pooling layer 305 may include a global average pooling layer (Global Average Pooling) that is used to connect the convolution layers and the full connection layer, and perform a pooling downsampling operation to further reduce the number of input dimensions and computations, effectively reduce network parameters, and preserve the feature information extracted by the previous convolution layers. The fully-connected layer 307 is configured to classify the depth feature information, for example, by converting the output into a probability density function of a sample class through softmax, and output a recognition classification result of a model, that is, a defect type.

The target wafer defect identification model provided by the embodiment of the application can be used for analyzing and processing the input wafer map to be detected to determine the defect type of the wafer map to be detected. Wherein, because the target wafer defect recognition model comprises a multi-scale depth separable network, each channel of the multi-scale depth separable network comprises a plurality of convolution branches with different receptive fields. The convolution branches of different receptive fields can extract the image characteristic information of different sizes, namely the extracted characteristic information is more comprehensive and rich, thereby improving the accuracy of classification and identification and the robustness of the model. In addition, the use of a depth separable network instead of a conventional convolutional network can also reduce the number of parameters while deepening the network, thereby saving computing resources.

In practical applications, the defect information contained in wafers of different batches may be different, such as different sizes, positions, shapes, etc. of defects, due to different wafer factories or different testing processes. Wafer defect models adapted to different defect information are also different. In order to improve the defect recognition speed of the model, different wafer defect recognition models can be selected according to the difference of rough defect information of the wafer map to be detected. Specifically, in one embodiment of the present application, before inputting the wafer map to be detected to a wafer defect recognition model and outputting defect information of the wafer map to be detected via the wafer defect recognition model, the method includes:

S401: determining rough defect information of the wafer map to be detected;

s403: selecting a target wafer defect recognition model matched with the rough defect information from a plurality of candidate wafer defect recognition models; the size and/or number of convolved branches of the multi-scale depth separable network in the plurality of candidate wafer defect identification models are different.

In this embodiment of the present application, before performing defect recognition on the wafer map to be detected, an appropriate target wafer defect recognition model may be selected from a plurality of candidate wafer defect recognition models, so as to improve efficiency and accuracy of defect recognition. In one embodiment of the present application, a plurality of candidate wafer defect recognition models may be obtained through training in advance according to the recognition requirements. For example, a plurality of candidate wafer defect recognition models can be trained according to wafer map samples with different defect sizes. The sizes and/or the numbers of convolution branches of the multi-scale depth separable network in the plurality of candidate wafer defect recognition models are different, namely, the capability of extracting characteristic information of each candidate wafer defect recognition model is different, and the recognition precision and the recognition efficiency are also different. The wafer map samples may be wafer map samples corresponding to different types of wafers. Wherein the different categories may include different production lots, different production times, different test dimensions, and so forth. In an embodiment of the present application, if the defect size of a lot of wafer map samples is larger, in order to ensure that the feature information of the wafer samples extracted by the candidate wafer defect recognition model is comprehensive, it is necessary to set the size of the convolution kernel included in the convolution branch of the multi-scale depth separable network of the candidate wafer defect recognition model to be larger, or the number of the convolution kernels included in the convolution branch to be larger, so that the receptive field of the convolution branch is larger. Based on this, in an embodiment of the present application, after training to obtain the plurality of candidate wafer defect recognition models, an association relationship between the candidate wafer defect recognition models and the defect sizes may be established. The association may include an association table, an association model, and the like. In one embodiment of the present application, the coarse defect information may be determined by a user (engineer) according to his own experience, and the coarse defect information may be a specific defect size or a defect size level, which is not limited herein. On this basis, after the rough defect information is acquired, the defect size corresponding to the rough defect information or the defect size range in which the rough defect information is located can be determined. And selecting a target wafer defect recognition model matched with the rough defect information, namely the defect size, from a plurality of candidate wafer defect recognition models by utilizing the association relation. Wherein, the liquid crystal display device comprises a liquid crystal display device,

In one embodiment of the present application, in order to balance the calculation cost and the recognition efficiency, the target wafer defect recognition model may further include a single-scale depth separable network, where the number ratio of the single-scale depth separable network to the multi-scale depth separable network may be determined according to specific application requirements. Specifically, the wafer defect recognition model further includes at least one single-scale depth separable network, and the first number of the single-scale depth separable networks and/or the second number of the multi-scale depth separable networks are determined according to the feature information of the wafer to be detected.

In this embodiment of the present application, the feature information of the wafer to be detected may include the number and the position information of the failed dies in the wafer to be detected. The location information may include location coordinates of the failed die, which may be two-dimensional location coordinates, such as (3, 4), (5, 6), and so on. In one embodiment of the present application, the single-scale depth separable network may be a conventional depth separable network. For example, fig. 5 shows the structure of the single-scale depth separable network, unlike the multi-scale depth separable network, where the convolution branches of the individual channels of the single-scale depth separable network are single branches, such as a 3×3 convolution kernel, which may be a single layer. In an embodiment of the present application, in the case where the number of failed dies is smaller and more concentrated, the first number of the single-scale depth separable network layers may be set to be greater than the second number of the multi-scale depth separable network layers, so that the number of parameters of the model may be reduced while the identification accuracy of the target wafer defect identification model is ensured, so as to save computing resources, and the identification speed of the target wafer defect identification model may be further improved. Of course, in the case that the number of the failed dies is large and/or the failed dies are scattered, the first number of the single-scale depth separable network layers can be set smaller than the second number of the multi-scale depth separable network layers, so that the characteristic information extracted by the target wafer defect identification model can be guaranteed to be comprehensive but not local, and the identification accuracy of the model can be guaranteed. In some embodiments of the present application, the positions of the single-scale depth separable network and the multi-scale depth separable network in the target wafer defect model may be determined according to specific application requirements.

Further, in an embodiment of the present application, the defect information includes a defect type, and the target wafer defect recognition model is set to be trained in the following manner:

s501: obtaining a plurality of wafer map samples, wherein the wafer map samples are marked with defect types;

s503: constructing an initial wafer defect identification model, wherein model parameters are set in the wafer defect identification model;

s505: respectively inputting the plurality of wafer map samples into the wafer defect recognition model to generate a prediction result;

s507: and iteratively adjusting the training model parameters based on the difference between the prediction result and the defect type until the difference meets a preset requirement.

In the embodiment of the application, a plurality of wafer map samples may be first obtained in the training process, where the wafer map samples are labeled with defect types. For example, a plurality of wafer map samples may be obtained from a WM-811K wafer dataset. In one embodiment of the present application, a portion of the wafer map samples, such as 75% of the wafer map samples, may be selected from the WM-811K wafer dataset as a training set, and a portion of the wafer map samples, such as 25% of the wafer map samples, may be used as a testing set, so as to test the target defect identification model obtained by training, and determine whether the target defect identification model meets the identification requirement. In one embodiment of the present application, different types of defects are coded with different symbols to facilitate training of wafer defect identification models. The symbols used for the code marks can be numbers, letters, figures, greek, latin or other special symbols, and the like. In one embodiment of the present application, in order to train the obtained model with higher accuracy, the wafer map sample may be preprocessed, for example, the wafer map sample may be subjected to enhancement processing. The enhancement processing may include rotation, translation, scaling, cropping, filling, and the like. In one embodiment of the present application, since each wafer (wafer) is very different in size due to different products, it is necessary to perform uniform size processing on a plurality of wafer map samples. For example, a plurality of wafer map samples may be unified to a size of 96×96×1.

In practical applications, most wafer maps are defect-free, and during model training, there may be a serious number imbalance in the class of wafer map samples, thereby affecting the training results. Based on this, in one embodiment of the present application, in order to solve the defect of uneven classification performance caused by the problem of unbalanced number of categories, the obtaining a plurality of wafer map samples, where the wafer map samples are marked with defect types, includes:

s601: acquiring a wafer map data set, wherein each wafer map sample in the wafer map data set is marked with a defect type;

s603: respectively calculating the duty ratio of each wafer map sample of the defect type in the wafer map data set, and selecting the wafer map sample of the defect type with the duty ratio exceeding the preset proportion threshold as a target wafer map sample; the method comprises the steps of carrying out a first treatment on the surface of the

S605: and downsampling the target wafer map samples until the difference between the number of the target wafer map samples and the number of the wafer map samples of other defect types is smaller than the preset proportion threshold value.

In the embodiment of the present application, since there are many wafer map samples with defect types that are defect-free, the wafer map samples are generally far greater than those with other defect types such as center defects, so that the input wafer map samples are unbalanced in type. The downsampling is to make the sample data of different classifications the same for an unbalanced data set, and the sample number of the side with smaller data size is used as the reference. Specifically, the downsampling may include decimating the wafer map samples of the defect type that are defect-free such that a number of the wafer map samples of the defect type that are defect-free differs from wafer map samples of other defect types by less than a preset scale threshold. Of course, in other embodiments of the present application, the number of wafer map samples with the defect type being defect-free may be equal to the number of wafer map samples with other defect types. Wherein the extraction may include random extraction, equidistant extraction, and the like. The preset proportional threshold may include a determination by a user based on actual application requirements. The smaller the numerical value set by the preset proportion threshold value is, the higher the recognition accuracy of the wafer defect recognition model obtained through training is.

It should be noted that the defect type of the target wafer map sample may be other defect types, and it is only necessary to determine that the duty ratio of the number of the target wafer map samples in the wafer map data set is greater than the preset ratio threshold.

In the embodiment of the application, an initial wafer defect identification model may be constructed, where the initial wafer defect identification model may include a feature extraction layer, a multi-scale depth separable network layer, a pooling layer, a full-connection layer, and the like. Then, the wafer map sample may be input into the initial wafer defect recognition model, and a prediction result of the wafer map sample may be output through the initial wafer defect recognition model. The prediction result may include a defect type of the wafer map sample. And then, iteratively adjusting model parameters of the initial wafer defect recognition model based on the difference between the prediction result and the defect types marked in the wafer map sample until the initial wafer defect recognition model meets the preset requirement. The preset requirements may include, for example, that the difference is smaller than or equal to a preset threshold, or that the number of iterative adjustments is greater than or equal to a preset number of times threshold, which is not limited herein.

Further, in one embodiment of the present application, the difference between the predicted result and the defect type is determined according to a loss function, the loss function including an adjustment coefficient for adjusting a loss contribution of the easily distinguishable wafer map sample.

In the embodiment of the application, the loss function may include a Focal-loss function, a qfacal-loss function, and the like. The Focal-loss function may be a variation of the cross entropy loss function, specifically expressed by the following formula:

FL(p _t )＝-α _t (1-p _t ) ^γ ×CE(p _t )

wherein, the gamma is a modulation parameter, the 1-p _t The second adjustment coefficient may be a second adjustment coefficient, where the second adjustment coefficient may adjust a weight of the loss generated by the easily distinguishable wafer map sample in the total loss, that is, may be used to adjust a contribution of the loss generated by the easily distinguishable wafer map sample to the total loss. For example, when Pt tends to be 1, i.e., the wafer map sample is a distinguishable wafer map sample, the second adjustment factor tends to be 0, indicating a smaller contribution to the loss, i.e., a reduced loss contribution of the distinguishable wafer map sample. Said alpha _t Can be a first adjustment factor, typically less than 1, said alpha _t Can be used to control the loss contribution (loss duty) of the loss generated by the positive and negative wafer map samples to the total loss. In one embodiment of the present application, since the Focal-loss function is applied only to a classification model, in order to increase the generalization energy of the model The force, the loss function may also be a QFoceal-loss function. In the case where the loss function is a qfacal-loss function, a Label Smoothing (Label Smoothing) operation may also be performed on the labels of the wafer map samples. The label smoothing operation is a regularization method in the machine learning field, and can prevent the labels from being predicted too confident when the model is trained, and improve the problem of poor generalization capability. Wherein, the label smoothing formula is as follows: y is _i ＝y _hot (1-alpha) +alpha/K. The K is the total number of multiple classified categories, and the α is a smaller super parameter (generally 0.1), because the wafer map sample has the condition of marking the wrong label, the distribution of the label after being smoothed is equivalent to that the label is determined to be inaccurate with a certain small probability, so that the model can be prevented from being too confident for the correct label, the overfitting is avoided, and the generalization capability of the model is improved. On this basis, the QFoceal-loss function can be expressed by the following formula:

QFL(σ)＝-α _t *y-σ ^β *[(1-y)log(1-σ)+ylog(σ)]

wherein- [ (1-y) log (1-sigma) +ylog (sigma)]The y is label smoothed label (0-1) and the sigma is a model prediction output result; QFocelLoss is equivalent to adding two terms on the basis of cross entropy loss, wherein the first term alpha _t The first adjustment coefficient is adjusted to balance the weights of the positive and negative wafer map samples, so as to adjust the loss contribution of the positive and negative wafer map samples, i.e., the weight of the loss generated by the positive and negative wafer map samples in the total loss; another term y-sigma ^β And for the second adjustment coefficient, balancing the loss contribution of the wafer map sample difficult to classify, wherein beta is a set super parameter, and is generally greater than 1. For the wafer map samples which are difficult to classify, y-sigma is larger, and after the wafer map samples are modulated, the wafer map samples which are relatively easy to classify have larger influence on final loss, so that the processing of the loss function on the wafer map samples which are easy to classify and are complex to classify is realized. Thus, the QFocalLoss function can adjust the positive and negative wafer patternsThe weight of the sample can be controlled, so that the problem of unbalanced data distribution can be solved.

In an embodiment of the present application, in order to determine the performance of the trained target wafer defect recognition model, a certain number of wafer map samples may be selected from the WM-811K wafer data set as test samples, for example, 25% of samples may be selected from the WM-811K wafer data set as test samples, and the test samples are input into the trained target wafer defect recognition model, and evaluation analysis is performed on the target wafer defect recognition model by calculating a performance evaluation index. Specifically, the performance evaluation index may include an Accuracy (Accuracy), a Precision (Precision), and a Recall (Recall), and the calculation formulas are as follows: accuracy= (tp+tn)/(tp+tn+fp+fn), precision=tp/(tp+fp), recall=tp/(tp+fn). Wherein TP represents the number of correct marks, TN represents the number of identification results not corresponding to the real labels, FP represents the number to which no identification should be made, and FN represents the number of incorrect marks.

The beneficial effects of the target wafer defect recognition model are described below in a specific experimental procedure. Experiment 1 may input a test sample to a model employing standard depth separable convolution; experiment 2 test samples may be input into a model of a multi-scale depth separable convolution as described in the above embodiments; experiment 3 Label Smoothing and QFocalLoss loss functions can be used on the basis of experiment 2. The overall accuracy of the test samples is compared, the three experiments are 94.78%,96.70% and 97.78%, the accuracy of the depth separable convolution network which is obtained by changing the standard convolution into the design is improved by 2%, and the accuracy can be improved by 1% by using label smoothening and QFALLloss on the basis of 2. Fig. 6 shows confusion matrix 1 of experiment 1, confusion matrix 2 of experiment 2, and confusion matrix 3 of experiment 3. As can be seen from fig. 6, the improvement of experiment 2 in each category is obvious, and the improvement of experiment 3 in the category with less data amount is larger than that of experiment 2, so as to meet the design purpose. Wherein the abscissa of the confusion matrix represents the prediction classification and the ordinate represents the label classification, wherein (i, j) represents the probability that the i-th class object is classified into the j-th class, and the larger the value of the diagonal, the better. Of course, the normalization processing may be performed on the confusion matrices 1, 2, and 3, for example, the confusion matrices 1, 2, and 3 may be normalized according to the accuracy, and the obtained normalized confusion matrices are shown in fig. 7. In other examples, the confusion matrix 1, the confusion matrix 2 and the confusion matrix 3 may be normalized according to the recall ratio, and the obtained normalized confusion matrices are shown in fig. 8 respectively.

Based on the same inventive concept, the embodiment of the application also provides a wafer defect recognition device for realizing the above related wafer defect recognition method. The implementation of the solution provided by the apparatus is similar to that described in the above method, so the specific limitation of one or more embodiments of the wafer defect recognition apparatus provided below may be referred to the limitation of the wafer defect recognition method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 9, there is provided a wafer defect recognition apparatus 900, the apparatus comprising:

an acquiring module 901, configured to acquire a wafer map to be detected;

the recognition module 903 is configured to input the wafer map to be detected to a target wafer defect recognition model, and output defect information of the wafer map to be detected through the wafer defect recognition model; wherein the wafer defect identification model comprises at least one multi-scale depth separable network, each channel of the multi-scale depth separable network comprising a plurality of convolved branches of different receptive fields.

Optionally, in an embodiment of the present application, before inputting the wafer map to be detected to a wafer defect recognition model, outputting defect information of the wafer map to be detected via the wafer defect recognition model includes:

Determining rough defect information of the wafer map to be detected;

Optionally, in an embodiment of the present application, the defect information includes a defect type, and the wafer defect identification model is set to be trained according to the following manner:

Optionally, in an embodiment of the present application, before the inputting the wafer map to be detected into the wafer defect identification model, the apparatus further includes:

Optionally, in an embodiment of the present application, the wafer defect identifying module further includes at least one single-scale depth separable network, and the first number of the single-scale depth separable networks and/or the second number of the multi-scale depth separable networks are determined according to feature information of the wafer to be inspected.

It should be further noted that the embodiments described above are merely illustrative, and that the modules described as separate components may or may not be physically separate, and that components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.

As shown in fig. 10, embodiments of the present application further provide an electronic device 1000, where the electronic device 1000 includes: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to implement the above-described method when executing the instructions. The electronic device 1000 comprises a memory 1001, a processor 1003, a bus 1005 and a communication interface 1007. The memory 1001, the processor 1003, and the communication interface 1007 communicate over a bus 1005. Bus 1005 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus. The communication interface 1007 is used for communication with the outside. The processor 1003 may be a central processing unit (central processing unit, CPU). Memory 1001 may include volatile memory (RAM), such as random access memory (random access memory). The memory 1001 may also include a nonvolatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, an HDD, or an SSD. The memory 1001 has stored therein executable code that the processor 1003 executes to perform the methods described in the foregoing embodiments.

Embodiments of the present application provide a non-transitory computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

Embodiments of the present application provide a computer program product comprising a computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.

In some embodiments, the disclosed methods may be implemented as computer program instructions encoded on a computer-readable storage medium in a machine-readable format or encoded on other non-transitory media or articles of manufacture. FIG. 11 schematically illustrates a conceptual partial view of an example computer program product comprising a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein. In one embodiment, the example computer program product 1100 is provided using a signal bearing medium 1101. The signal bearing medium 1101 may include one or more program instructions 1102 that when executed by one or more processors may provide the functionality or portions of the functionality described above with respect to fig. 1. Further, program instructions 1102 in FIG. 11 also describe example instructions.

In some examples, the signal bearing medium 1101 may comprise a computer readable medium 1103 such as, but not limited to, a hard disk drive, compact Disk (CD), digital Video Disk (DVD), digital tape, memory, read-Only Memory (ROM), or random access Memory (Random Access Memory, RAM), among others. In some implementations, the signal bearing medium 1101 may comprise a computer recordable medium 1104, such as, but not limited to, memory, read/write (R/W) CD, R/W DVD, and the like. In some implementations, the signal bearing medium 1101 may include a communication medium 1105, such as, but not limited to, a digital and/or analog communication medium (e.g., fiber optic cable, waveguide, wired communications link, wireless communications link, etc.). Thus, for example, the signal bearing medium 1101 may be conveyed by a communication medium 1105 in wireless form (e.g., a wireless communication medium that complies with the IEEE 802.11 standard or other transmission protocol). The one or more program instructions 1102 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, a computing device, such as the electronic device described with respect to fig. 10, may be configured to provide various operations, functions, or actions in response to program instructions 1102 conveyed to the computing device through one or more of computer readable medium 1103, computer recordable medium 1104, and/or communication medium 1105. It should be understood that the arrangement described herein is for illustrative purposes only. Thus, those skilled in the art will appreciate that other arrangements and other elements (e.g., machines, interfaces, functions, orders, and groupings of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired results. In addition, many of the elements described are functional entities that may be implemented as discrete or distributed components, or in any suitable combination and location in conjunction with other components. The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, systems, and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by hardware (e.g., circuits or ASICs (Application Specific Integrated Circuit, application specific integrated circuits)) which perform the corresponding functions or acts, or combinations of hardware and software, such as firmware, etc.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for identifying wafer defects, the method comprising:

Acquiring a wafer map to be detected;

2. The method according to claim 1, wherein the inputting the wafer map to be inspected into a target wafer defect recognition model, before outputting defect information of the wafer map to be inspected via the target wafer defect recognition model, comprises:

determining rough defect information of the wafer map to be detected;

3. The method of claim 1, wherein the output result of the channel is determined based on fusion information of characteristic information of the plurality of convolved branches.

4. A method according to any of claims 1-3, wherein the defect information comprises defect types, and the target wafer defect identification model is arranged to be trained as follows:

5. The method of claim 4, wherein the difference between the predicted result and the defect type is determined from a loss function, the loss function further provided with a first adjustment factor for adjusting the loss contribution of positive and negative wafer map samples, and a second adjustment factor for adjusting the loss contribution of easily distinguishable wafer map samples.

6. The method of claim 1, wherein prior to said inputting the wafer map to be inspected into a target wafer defect identification model, the method further comprises:

7. The method of claim 1, wherein the target wafer defect recognition module further comprises at least one single-scale depth separable network, the first number of single-scale depth separable networks and/or the second number of multi-scale depth separable networks being determined based on the feature information of the wafer to be inspected.

8. The method of claim 4, wherein the acquiring a plurality of wafer map samples, the wafer map samples labeled with a defect type, comprises:

9. The method of claim 1, wherein the target wafer defect recognition model comprises a feature extraction layer, a multi-scale depth separable network layer, a pooling layer, and a fully connected layer connected in sequence;

10. A wafer defect identification apparatus, the apparatus comprising:

the acquisition module is used for acquiring a wafer map to be detected;

11. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 9 when the computer program is executed.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 9.

13. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 9.