CN117934420A

CN117934420A - Chip multi-defect detection method and system based on SwinIR and RT-DETR fusion improvement

Info

Publication number: CN117934420A
Application number: CN202410104874.5A
Authority: CN
Inventors: 宁永新; 徐立君; 明锦翼; 许胤; 徐艺桐; 赵珈妤
Original assignee: Changchun Lantuo Technology Co ltd
Current assignee: Changchun Lantuo Technology Co ltd
Priority date: 2024-01-25
Filing date: 2024-01-25
Publication date: 2024-04-26

Abstract

A chip multi-defect detection method and system based on SwinIR and RT-DETR fusion improvement. Relates to the field of chip defect detection. The method solves the problems that when the conventional transducer network is used for target detection, the detection effect of a tiny target is poor, the accuracy is low, and samples in the task of detecting the actual chip defects are unbalanced. The method comprises the following steps: collecting image information of a chip, labeling defect positions and categories in the image information, and constructing a data set according to the image information; expanding the data set by adopting a GAN network; constructing SwinIR a network model to carry out super-resolution reconstruction on the data in the extended dataset, and constructing a super-resolution dataset; constructing an improved RT-DETR network model, and training the RT-DETR network model by adopting an extended data set and a super-resolution data set; and carrying out detection classification on the image to be detected to obtain a detection result. The method is applied to the field of semiconductor automatic detection.

Description

Chip multi-defect detection method and system based on SwinIR and RT-DETR fusion improvement

Technical Field

The invention relates to the field of chip defect detection, in particular to a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement.

Background

With the rapid development of the semiconductor industry, an automatic detection technology of an integrated circuit becomes one of research hotspots. There are 3 common methods for appearance detection: (1) traditional manual detection methods; (2) chip appearance detection using laser measurement techniques; (3) a machine vision-based inspection method. At present, the machine vision method has the advantages of low cost, high speed, high efficiency, non-contact and the like, so that the machine vision method becomes a research hot spot.

With the rapid development of deep learning and artificial intelligence in recent years, machine vision technology is updated. In the field of detection, more and more deep learning algorithms are applied. Compared with the traditional machine vision method, the deep learning method has the characteristics of high accuracy, high running speed and strong generalization capability.

In the chip defect detection task, multiple defects may exist on one chip sample, and the complexity of the multiple detection task is high. The traditional detection algorithm can only detect a certain defect on a chip, but can only utilize a plurality of different algorithms or detection for a plurality of times when a plurality of defects are needed, and cannot better solve the complex task. The more kinds of defects also cause the detection task to be complicated. Target detection based on deep learning can better solve the complex problems.

With the advent of the transducer-based Vision Transformer (hereinafter ViT) and the Swin transducer (hereinafter SwinT) in the field of target detection, many neural network models have ViT or SwinT as the backbone network to achieve higher performance, including the target detection model DETR. Although DETR has achieved end-to-end detection, to some extent reducing the algorithm complexity, it still has very high computational costs, directly resulting in its inability to be deployed directly into practical applications.

In general, when a transducer network is used for target detection, problems of poor detection effect and low accuracy of a micro target occur. This is because the characteristics extracted by the backbone network are often relied upon in the target detection by the transducer. Since in deep learning, as the number of convolution layers increases, the feature extraction level increases, which results in low-level information being discarded, the detection of minute objects depends on these low-level features. Especially in the field of chip defect detection, there are many kinds of tiny defects that rely on low-level information detection.

Further, there is a problem of sample imbalance in the task of defect detection of the actual chip. Sample imbalance refers to some categories being more and some categories being very few among the categories of classification. For example, in the defect detection of a chip, scratches and scratch defects of the chip are common, and missing leads of the chip are rare. There are many samples of scratches and few samples of missing pins in the training set collected in actual production. This results in a network that is very trained and poorly suited for classifying such few categories.

Disclosure of Invention

Aiming at the problems that when a transducer network is used for target detection in the prior art, the detection effect of a tiny target is poor, the accuracy is low, and sample imbalance exists in the actual task of chip defect detection, the invention provides an improved chip multi-defect detection method based on SwinIR and RT-DETR fusion, which comprises the following steps:

A chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement, the method comprising:

s1: collecting image information of a chip, marking defect positions and defect types in the image information, and constructing a data set according to the image information;

s2: expanding the data set by adopting a GAN network to obtain an expanded data set;

s3: constructing SwinIR a network model to carry out super-resolution reconstruction on the data in the expanded dataset, and constructing a super-resolution dataset according to the reconstructed data;

s4: constructing an improved RT-DETR network model, and training the RT-DETR network model by adopting the expanded data set and the super-resolution data set;

s5: and detecting and classifying the images to be detected according to the trained RT-DETR network model, and obtaining a detection result.

Further, there is also provided a preferred mode, wherein the marking the defect position and defect type in the image information in the step S1 includes:

Marking the defect position by using a rectangular frame and recording parameters of the rectangular frame;

marking defect categories include overlength, off-set, missing, and scratch category 5.

Further, there is also provided a preferred manner, wherein the constructing SwinIR the network model in step S3 includes:

The SwinIR network model is constructed by a shallow layer feature extraction module, a deep layer feature extraction module and a high-quality image reconstruction module;

The shallow feature extraction module is a lightweight feature extraction network MobileNet;

the deep feature extraction module consists of a plurality of residual SwinTransformer blocks and convolution blocks;

The high-quality image reconstruction module consists of a convolution layer and a pixel reorganization PixelShuffle layer.

Further, there is provided a preferred embodiment in which the convolution layer converts the low-resolution image into a multi-channel feature map using a convolution kernel of 1×1, and the pixel reorganization layer reorganizes the multi-channel feature map having a shape of (nxc× (w×r) × (h×h)) obtained by the convolution layer into a high-resolution image having a shape of (nxc× (w×r) × (h×r)).

Further, a preferred manner is also provided, wherein the improved RT-DETR network model in step S4 includes a backbone network BackBone, encoder EFFICIENT HYBRID ENCODER, ioU perceptual layers, and Decoder & Head;

The backbone network BackBone is a lightweight feature extraction network MobileNet;

The encoder EFFICIENT HYBRID ENCODER consists of an intra-scale feature interaction AIFI module and a trans-scale feature fusion module CCFM, wherein the intra-scale feature interaction AIFI module adopts a layer of Encoder of a common transducer and comprises standard MHSA and FFN;

the cross-scale feature fusion module CCFM consists of three fusion layers and three convolution layers.

Further, a preferred manner is also provided, wherein the fusion layer in the cross-scale feature fusion module CCFM consists of two 1×1 convolution layers.

Further, there is also provided a preferred manner, the loss function of the improved RT-DETR network model is:

wherein, As a total loss function,/>For the predicted value, y is the true value, L _box is the frame regression loss function,/>For prediction of bounding box coordinates, b is true bounding box coordinates, L _cls is the classification loss function,/>Is a predictive category.

Based on the same inventive concept, the invention also provides a chip multi-defect detection system based on SwinIR and RT-DETR fusion improvement, wherein the system comprises:

the data set construction unit is used for collecting image information of the chip, labeling defect positions and defect types in the image information and constructing a data set according to the image information;

The expansion unit is used for expanding the data set by adopting the GAN network to obtain an expanded data set;

the super-resolution dataset construction unit is used for constructing SwinIR a network model to carry out super-resolution reconstruction on the data in the expanded dataset, and constructing a super-resolution dataset according to the reconstructed data;

The training unit is used for constructing an improved RT-DETR network model and training the RT-DETR network model by adopting the expanded data set and the super-resolution data set;

and the detection unit is used for detecting and classifying the images to be detected according to the trained RT-DETR network model, and obtaining detection results.

Based on the same inventive concept, the invention also provides a computer readable storage medium for storing a computer program, wherein the computer program executes a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement.

Based on the same inventive concept, the invention also provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and when the processor runs the computer program stored in the memory, the processor executes a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to any one of the above.

The invention has the advantages that:

The invention solves the problems that when the conventional transducer network is used for target detection, the detection effect of a tiny target is poor, the accuracy is low, and the sample imbalance exists in the actual task of defect detection of a chip.

In the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement, swinIR is introduced, so that the method has more advantages in treating a small target than a traditional transducer network. The Swin transducer is a layered transducer architecture, which is helpful for capturing detailed information in an image better and improving the detection capability of a tiny target. And the GAN network is used for expanding the original data set, so that the generalization capability of the model is improved. The extended dataset may contain more diversified images, thereby better adapting the model to different actual scenarios. SwinIR the network model is used for super-resolution reconstruction of the dataset. This may improve the sharpness and detail of the image, helping to more accurately capture and classify defects in the RT-DETR model. The RT-DETR network model is improved in detection, and training is performed by combining the expanded data set and the super-resolution data set. The comprehensive training process can enable the model to be better suitable for various defect situations, and the detection accuracy is improved.

In the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement, a labeled dataset is constructed by collecting chip image information and marking defect positions and defect types. The data set is augmented with a Generation Antagonism Network (GAN) to generate more diverse data to increase the robustness and generalization capability of the model. And performing super-resolution reconstruction on the expanded dataset by using SwinIR network model to generate a high-resolution image, thereby being beneficial to improving the image quality. An improved RT-DETR network model is built and trained using the extended dataset and super-resolution dataset. This ensures that the model has good adaptability to extended and high resolution data. And detecting and classifying the images to be detected by using the trained RT-DETR network model to obtain a final defect detection result. SwinIR and comprehensive improvement on the data set are introduced, and the detection accuracy of the micro target is expected to be remarkably improved. The GAN data expansion and comprehensive training process can help to cope with the problem of sample unbalance, and the detection capability of the model on various defects is improved. The generalization of the model is enhanced under the combined action of data expansion, super-resolution reconstruction and improvement on RT-DETR, so that the model is better suitable for chip defect detection tasks in actual scenes.

The invention is applied to the field of semiconductor automatic detection.

Drawings

FIG. 1 is a flow chart of a method for detecting multiple defects of a chip based on SwinIR and RT-DETR fusion improvement according to the first embodiment;

Fig. 2 is a schematic diagram of a network model of a modification SwinIR according to the third embodiment;

Fig. 3 is a schematic diagram of an improved RT-DETR network model according to the fifth embodiment;

Fig. 4 is a schematic overall structure of a converged network according to an eleventh embodiment.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments.

Embodiment one, this embodiment will be described with reference to fig. 1. The chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement of the embodiment comprises the following steps:

By introducing SwinIR in this embodiment, this approach is advantageous over conventional Transformer networks in handling small targets. The Swin transducer is a layered transducer architecture, which is helpful for capturing detailed information in an image better and improving the detection capability of a tiny target. And the GAN network is used for expanding the original data set, so that the generalization capability of the model is improved. The extended dataset may contain more diversified images, thereby better adapting the model to different actual scenarios. SwinIR the network model is used for super-resolution reconstruction of the dataset. This may improve the sharpness and detail of the image, helping to more accurately capture and classify defects in the RT-DETR model. The RT-DETR network model is improved in detection, and training is performed by combining the expanded data set and the super-resolution data set. The comprehensive training process can enable the model to be better suitable for various defect situations, and the detection accuracy is improved.

In the embodiment, a labeled data set is constructed by collecting chip image information and labeling defect positions and defect types. The data set is augmented with a Generation Antagonism Network (GAN) to generate more diverse data to increase the robustness and generalization capability of the model. And performing super-resolution reconstruction on the expanded dataset by using SwinIR network model to generate a high-resolution image, thereby being beneficial to improving the image quality. An improved RT-DETR network model is built and trained using the extended dataset and super-resolution dataset. This ensures that the model has good adaptability to extended and high resolution data. And detecting and classifying the images to be detected by using the trained RT-DETR network model to obtain a final defect detection result. SwinIR and comprehensive improvement on the data set are introduced, and the detection accuracy of the micro target is expected to be remarkably improved. The GAN data expansion and comprehensive training process can help to cope with the problem of sample unbalance, and the detection capability of the model on various defects is improved. The generalization of the model is enhanced under the combined action of data expansion, super-resolution reconstruction and improvement on RT-DETR, so that the model is better suitable for chip defect detection tasks in actual scenes.

In a second embodiment, the present embodiment is further defined by the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement in the first embodiment, wherein the marking the defect position and defect type in the image information in the step S1 includes:

The defect positions are marked by the rectangular frame and the parameters of the rectangular frame are recorded, so that high-precision defect position information can be provided, and the model is facilitated to accurately capture the specific positions of the defects.

In the embodiment, compared with labeling of other complex shapes, such as polygonal or pixel-level labeling, the rectangular frame labeling is easier to manually operate and label, so that the labeling process can be simplified, and the labeling efficiency can be improved. The marks of five defect categories including overlong, excessively short, outward deviation, missing and scratch provide detailed identification of different types of defects, and are helpful for model learning and distinguishing various defect types. Rectangular box labeling provides a compact representation of the location of defects. Parameters (e.g., coordinates, width, height, etc.) of the recorded rectangular box may explicitly indicate the specific location of the defect in the image. The five types of defects, namely overlong defects, off-set defects, missing defects and scratches, provide multi-angle classification of defects, and each type represents the characteristics of a specific defect type, so that the model can learn and identify the characteristics.

Accurate recording of rectangular frame parameters in this embodiment helps the model to accurately understand the location of the defect, which is an important step in achieving accurate detection and localization of the defect. The marks comprising five types of defect categories provide more detailed defect descriptions, so that the model can better learn and distinguish different types of defects, and the accuracy and the robustness of multi-category defect detection are improved. The method has obvious advantages in accuracy and simplicity in marking and classifying defects and detailed information provided for multi-class defects, provides better label information for training the model, and is beneficial to improving performance of the model in chip multi-defect detection tasks.

Embodiment three, this embodiment will be described with reference to fig. 2. The present embodiment is a further limitation of the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement of the first embodiment, and the constructing SwinIR the network model in the step S3 includes:

The embodiment uses MobileNet as a shallow feature extraction module, has a lightweight structure, is suitable for super-resolution tasks, and reduces the calculation and storage cost of the model. MobileNet adopts depth separable convolution, and through the depth convolution and point-by-point convolution separation, the parameter quantity and the calculation complexity are effectively reduced, and meanwhile, the better feature extraction capability is maintained. The deep feature extraction module adopts a residual SwinTransformer block and a convolution block, combines a classical residual structure in deep learning, can capture more abstract and complex features, and is beneficial to improving the representation capability of images. SwinTransformer combines the attention mechanism, can capture the dependency relationship of long distance, is favorable for processing the global information in the image. The convolution block is used to further extract local features. The high-quality image reconstruction module adopts a convolution layer and a pixel recombination (PixelShuffle) layer, can effectively map the learned characteristics into a high-resolution image, and provides a clearer and real image reconstruction effect. The convolution layer is used to learn the feature representation and the PixelShuffle layer is used to upsample to restore the low resolution feature map to a high resolution image. This structure reduces artifacts while preserving detail.

The SwinIR network model in the embodiment combines lightweight, deep, global and local feature extraction modules, so that the model can learn the features of the image more comprehensively, and the capability of detecting complex defects and reconstructing super-resolution is improved. The lightweight structure of MobileNet, in combination with the attention mechanism of SwinTransformer, balances computational efficiency and model performance so that it can operate more efficiently in chip multi-defect detection. The high-quality image reconstruction module with the convolution layer and the PixelShuffle layers is adopted, so that the model can restore the image details more effectively, and a clearer and real super-resolution reconstruction result is obtained.

In the fourth embodiment, the present embodiment is a further limitation of the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement described in the third embodiment, in the high-quality image reconstruction module, the convolution layer converts the low-resolution image into the multi-channel feature map using a convolution kernel of 1×1, and the pixel reorganization layer reorganizes the multi-channel feature map in the shape of (nxc× (mxr) × (h×r)) obtained by the convolution layer into the high-resolution image in the shape of (nxc× (mxr) × (h×r)).

The embodiment uses the convolution kernel of 1x1 to carry out convolution operation, can realize the mapping from the low-resolution image to the multi-channel characteristic map, and is beneficial to extracting richer characteristic information. The 1x1 convolution kernel can adjust the number of channels without changing the spatial resolution of the feature map. This allows the model to better capture the correlation between channels and improve the expressive power of features in convolution operations. The pixel reorganization layer can reorganize the multichannel characteristic images into high-resolution images with shapes, and a clearer and real image restoration effect is provided. The pixel rebinning operation uses PixelShuffle to achieve a mapping of the low-resolution feature map to the high-resolution image by rearranging the pixels in the feature map. This operation helps to reduce artifacts and improve the quality of the reconstructed image.

In the embodiment, the 1x1 convolution kernel is used for mapping the channel dimension, so that the association relation among channels in the feature map is optimized, the extraction and expression capabilities of image features are improved, and the reconstruction effect is improved. The use of a 1x1 convolution kernel for dimension mapping helps preserve more information in the low resolution image. This helps to reduce information loss and improve accuracy of image restoration. The multi-channel feature image is successfully restored to the high-resolution image through the pixel recombination layer, so that the quality of the image is remarkably improved, the reconstructed image is more in line with a real scene, and multi-defect detection is facilitated to be more accurately carried out. The high-quality image reconstruction module based on the 1x1 convolution kernel and the pixel recombination can effectively improve the performance of the chip multi-defect detection method, so that the image restoration is more real and clear, and the defects are detected more accurately.

Embodiment five, this embodiment will be described with reference to fig. 3. The present embodiment is a further limitation of the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to the first embodiment, where the improved RT-DETR network model in step S4 includes four parts of a backbone network BackBone, encoders EFFICIENT HYBRID ENCODER, ioU perception layers, and decoders Decoder & Head;

The backbone network of this embodiment uses MobileNet as the backbone network, which is lighter in weight compared to some heavy networks, helping to reduce the computational and storage overhead of the model while maintaining performance. MobileNet adopts a light-weight structure such as depth separable convolution and the like, and can keep relatively good feature extraction capability while reducing the number of parameters. Encoder EFFICIENT HYBRID ENCODER combines intra-scale feature interactions and cross-scale feature fusion to facilitate more efficient capture of feature information in images. The AIFI module realizes intra-scale feature interaction through Encoder of the transducer, so that the model can better process local feature information; the CCFM module realizes the effective fusion of the cross-scale features through the fusion layer and the convolution layer, and enhances the perception capability of the model on the global features. The IoU perception layer is helpful for estimating the boundary box of the target more accurately, and improves the accuracy of target detection. The IoU perception layer focuses on processing the bounding box of the target, and the model is helped to better understand the position and the shape of the target by carrying out perceived query selection on IoU, so that the detection accuracy is improved.

According to the embodiment, mobileNet is adopted as a backbone network, so that the overall model is lighter, the calculation efficiency is improved to some extent, and the method is suitable for a scene with limited resources. The AIFI module and the CCFM module in EFFICIENT HYBRID ENCODER effectively combine intra-scale features and trans-scale features, so that the model can better sense and understand the global and local information of the image. The introduction of oU perception layers helps to improve the accuracy of target detection, making the model more accurate when predicting target locations and bounding boxes.

In a sixth embodiment, the present embodiment is further defined by a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement as described in the fifth embodiment, where the fusion layer in the cross-scale feature fusion module CCFM is composed of two 1×1 convolution layers.

In the embodiment, the combination of the two convolution layers can fully fuse the characteristic information under different scales, so that the characteristic expression capability is enhanced. Through multi-layer convolution, the model can learn the features more deeply, and capture and combine information at different levels, so that the final feature representation is richer and meaningful. Stacking of two convolution layers can increase the nonlinearity of the network, helping the model learn more complex feature representations. The nonlinear activation function in the convolution layer can introduce nonlinearity, so that the model can be better adapted to complex characteristics of data, and the expression capability of the model is enhanced. Two convolution layers add a certain number of learnable parameters, helping the model to fit the data better. More parameters can provide more flexibility, so that the model can be better adapted to data characteristics, and modeling capacity of complex characteristics is improved.

In the embodiment, the fusion of the two convolution layers can effectively integrate the characteristic information of different scales, and provide stronger characteristic expression capability for the model, so that the understanding and prediction capability of multi-defect detection are improved. The combination of the multi-layer convolution enhances the extraction of nonlinear features, so that the model can better capture abstract and complex features in the image, and the detection accuracy and the robustness can be improved. Through the fusion of the two convolution layers, the model can more comprehensively integrate the characteristic information under different scales, improves the perception and recognition capability of the model on complex scenes and defects, and improves the effect of multi-defect detection.

Embodiment seven, this embodiment is a further limitation of the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement described in embodiment one, where the loss function of the improved RT-DETR network model is:

An eighth embodiment is a chip multi-defect detection system based on SwinIR and RT-DETR fusion improvement according to the present embodiment, the system comprising:

The computer readable storage medium according to the ninth embodiment is used for storing a computer program, and the computer program executes the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to any one of the first to seventh embodiments.

An embodiment ten is a computer device according to the present embodiment, characterized in that: the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to any one of the first to seventh embodiments when the processor runs the computer program stored in the memory.

Embodiment eleven, this embodiment will be described with reference to fig. 1 to 4. The present embodiment provides a specific example of the chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement in the first embodiment, and is also used for explaining the second to seventh embodiments, specifically:

As shown in fig. 1, the overall flow of the present embodiment includes five steps S1 to S5.

Steps S1-S2: acquiring image information of an industrial production front-end chip, marking defect positions and defect types, expanding a data set by using a GAN network, and increasing the diversity of the data set by using image enhancement methods such as overturning, cutting and the like on the expanded data set so as to improve the robustness of the network;

Step S3: the construction of an improved SwinIR network model is shown in figure (2), the structure of the network model is divided into a shallow layer feature extraction module (Shallow Feature Extraction), a deep layer feature extraction module (RSTB) and a high-quality image reconstruction module (HQ Image Reconstruction), and the image obtained in the first step is firstly processed by

The shallow layer feature extraction module (Shallow Feature Extraction) performs downsampling, the obtained feature image is subjected to depth feature extraction through the deep layer feature extraction module (RSTB), and finally the extracted depth feature image is subjected to upsampling through the high-quality image reconstruction module (HQ Image Reconstruction) to obtain a super-resolution reconstructed image.

As shown in fig. 2, in the present embodiment, the shallow feature extraction module (Shallow Feature Extraction) is a lightweight feature extraction network MobileNet; the high-quality image reconstruction module (HQ Image Reconstruction) consists of a convolution layer and a pixel rebinning (PixelShuffle) layer, the convolution layer uses a convolution kernel of 1×1 to convert the low-resolution image into a multi-channel characteristic image, and the pixel rebinning (PixelShuffle) is used for rebinning the multi-channel characteristic image with the shape of (NxC x (W x r) x (H x r)) obtained by the convolution layer into the high-resolution image with the shape of (NxC x (W x r) x (H x r)), and in the process, the model can continuously adjust the rebinning rule through the weight so as to optimize the quality of the reconstructed image; the deep feature extraction module (RSTB) is composed of six residual SwinT blocks (Residual Swin Transformer Block) and one convolution layer, the residual SwinT blocks (Residual Swin Transformer Block) are composed of six SwinT layers (Swin Transformer Layer) and one convolution layer, and the SwinT layers (Swin Transformer Layer) are composed of a multi-head self-attention layer and a full-connection layer (multi-layer perceptron) interspersed with two normalization layers. Each block has a residual connection added: x=f (X) +x, wherein F is the above module.

Wherein, the multi-head attention layer (multi-head self-attention) can be calculated by the formula:

Where q=xp _Q,K＝XP_K,V＝XP_V,P_Q、P_K、P_V is a learnable parameter. This formula is performed multiple times, namely, a multi-head self-attention.

Step S4: the improved RT-DETR network model is built as shown in fig. 3, and includes four parts, namely a Backbone network (BackBone), encoders (EFFICIENT HYBRID ENCODER), ioU-Aware Query Selection and decoders (decoders & heads), the high resolution image obtained in S3 is first uniformly cut into four parts, the four parts are taken as input to the Backbone network (Backbone), the feature maps S3, S4, S5 of the last 3 stages are taken as input to the encoder (EFFICIENT HYBRID ENCODER), then IoU-Aware query selection is used to select a fixed number of image features from the encoder output sequence, as initial target query of the decoders, and finally the Decoder with auxiliary pre-header (decoders & heads) iteratively optimizes the object query to generate the frame and confidence score.

The backbone network (BackBone) is still a lightweight feature extraction network MobileNet.

The encoder (EFFICIENT HYBRID ENCODER) consists of two parts, namely intra-scale feature interaction (AIFI) and a trans-scale feature fusion module (CCFM), converts the multi-scale features into an image feature sequence, the intra-scale feature interaction (AIFI) module only adopts a layer of common transducer Encoder and comprises standard MHSA (or Deformable Attention) and FFN, two-dimensional S5 features are pulled into vectors and then are processed by the intra-scale feature interaction (AIFI) module, then, the output is adjusted back to two dimensions and marked as F5 so as to finish subsequent trans-scale feature fusion CCFM, the trans-scale feature fusion module (CCFM) consists of three fusion layers and three convolution layers, and then three-scale features are spliced together by using multi-scale Transformer Encoder (MSE) and are processed by a subsequent network.

The Fusion layer (Fusion) in the cross-scale feature Fusion module (CCFM) consists of two 1×1 convolution layers, and then the feature images obtained after convolution are added according to elements to obtain the feature images after Fusion.

IoU-Aware Query Selection in RT-DETR differs from DETR in that it produces a high classification score for features with a high IoU score and a low classification score for features with a low IoU score during training by a constraint model. Thus, the prediction box corresponding to the Top-K encoder features selected by the model according to the classification score has a high classification score and a high IoU score, and the loss function is restated as follows:

When the neural network is trained, firstly, the false sample data generated by using the GAN network in S1-S2 is trained, then the real sample data is used for fine tuning the network after freezing part of parameters, and the network model is saved after the network is converged.

Step S5: and transmitting the chip image to be detected into a trained model to obtain all defect positions in the image, wherein the types of the defects correspond to the positions.

In the embodiment, the overall network structure based on SwinIR and RT-DETR fusion improvement is shown as a figure 4, and the problems of few data sets and unbalanced samples are effectively solved by means of image enhancement and GAN network data expansion. The SwinIR super-resolution reconstruction network model is fused with the target detection model RT-DETR model, so that the problems of poor detection effect of the tiny chip defects and low detection speed of the DETR target detection model are solved.

The existing target detection model obtains very high precision in various detection tasks, but the detection effect of the existing target detection model on some tiny targets is still not ideal, and particularly on tiny defects on tiny chips. In order to alleviate the problem, the super-resolution reconstruction network model is skillfully fused with the target detection model, and before the defect is detected, the super-resolution reconstruction network model not only can enable the image to be clearer, but also can enable the defect to be bigger, so that the target detection model can better identify the defect.

The small number of data sets is one of the difficult problems in the field of chip defect detection, and because the size and quality of the data sets directly affect the accuracy of model detection, the invention adopts a GAN network and an image enhancement method to amplify the data sets and increase the diversity of the data sets. Training the network by using a training strategy of big data training and small data fine tuning, firstly training the network by using a false data sample generated by the GAN network, and then freezing part of parameters to fine tune the network by using real data to obtain a final network model.

The embodiment provides a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement. Firstly amplifying a data set by using a GAN network and an image enhancement method, then carrying out super-resolution reconstruction on a chip image by using a super-resolution network SwinIR, uniformly cutting the obtained high-resolution image into four parts, separately transmitting the four parts into an RT-DETR (reverse transcription-direct-detection) to detect the chip defect, and finally combining and inverting the four images to obtain the position and the type of the original chip image defect. The invention finally achieves the results of low delay, high precision and simultaneous detection of multiple targets.

While the preferred embodiments of the present disclosure have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the disclosure.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present disclosure without departing from the spirit or scope of the disclosure. Thus, the present disclosure is intended to include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present disclosure and not for limiting the scope thereof, and although the present disclosure has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: various alterations, modifications, and equivalents may be suggested to the specific embodiments of the invention, which would occur to persons skilled in the art upon reading the disclosure, are intended to be within the scope of the appended claims.

Claims

1. A chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement, the method comprising:

2. The chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to claim 1, wherein marking defect positions and defect categories in the image information in step S1 comprises:

3. The method for chip multi-defect detection based on SwinIR and RT-DETR fusion improvement according to claim 1, wherein the constructing SwinIR network model in step S3 comprises:

4. The chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to claim 3, wherein the convolution layer converts the low-resolution image into a multi-channel feature map using a convolution kernel of 1×1 in the high-quality image reconstruction module, and the pixel reorganization layer reorganizes the multi-channel feature map of (nxc× (mxr) × (mxr)) shape obtained by the convolution layer into a high-resolution image of (nxc× (mxr)) shape.

5. The method for chip multi-defect detection based on SwinIR and RT-DETR fusion improvement according to claim 1, wherein the improved RT-DETR network model in step S4 comprises four parts of a backbone network BackBone, encoders EFFICIENT HYBRID ENCODER, ioU perception layers and decoders Decoder & Head;

6. The improved chip multi-defect detection method based on SwinIR and RT-DETR fusion of claim 5, wherein the fusion layer in the cross-scale feature fusion module CCFM consists of two 1 x 1 convolution layers.

7. The improved chip multi-defect detection method based on SwinIR and RT-DETR fusion of claim 1, wherein the loss function of the improved RT-DETR network model is:

8. A chip multi-defect detection system based on SwinIR and RT-DETR fusion improvement, the system comprising:

9. A computer readable storage medium for storing a computer program for performing a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement according to any one of claims 1-7.

10. A computer device, characterized by: comprising a memory and a processor, said memory having stored therein a computer program, which when executed by said processor performs a chip multi-defect detection method based on SwinIR and RT-DETR fusion improvement as claimed in any one of claims 1-7.