CN116309429A

CN116309429A - A Chip Defect Detection Method Based on Deep Learning

Info

Publication number: CN116309429A
Application number: CN202310240534.0A
Authority: CN
Inventors: 郭永安; 齐帅; 余德泉; 孙洪波; 张申; 龚雪亮; 何清
Original assignee: Nanjing Youxin Technology Co ltd; Nanjing University of Posts and Telecommunications
Current assignee: Nanjing Youxin Technology Co ltd; Nanjing University of Posts and Telecommunications
Priority date: 2023-03-14
Filing date: 2023-03-14
Publication date: 2023-06-23

Abstract

The invention discloses a chip defect detection method based on deep learning, which includes the following steps: collecting different types of chip surface defect images through a high-speed industrial camera; using an image labeling tool to mark the position and type of chip surface image defects, and constructing defects Data set, the defect data set is divided into training data set D and test data set T; based on the improved FasterR-CNN framework, a deep learning network model for chip defect detection is constructed; this chip defect detection method based on deep learning Improve the Faster R-CNN algorithm by adopting deformable convolution reconstruction feature extraction network, multi-scale feature fusion, region of interest alignment (ROI alignment), soft non-maximum suppression and other measures to solve the problem of small size and complex shape of chip defects and other issues, improving the accuracy of defect detection.

Description

A Chip Defect Detection Method Based on Deep Learning

技术领域technical field

本发明涉及人工智能技术领域，具体为一种基于深度学习的芯片缺陷检测方法。The invention relates to the technical field of artificial intelligence, in particular to a chip defect detection method based on deep learning.

背景技术Background technique

芯片在通电之后会产生一个启动指令来传递信0号以及传输数据，也可以让家电智联起来，是高端制造业的核心基础。但是芯片的制造工艺非常复杂，在生产过程中极容易会产生一些不符合预期的结构，造成目标缺陷。近年来，深度学习技术得到迅速发展，基于深度学习在大量数据中的强大学习能力和特征提取能力,因此很多研究者尝试将深度学习技术应用在产品的缺陷检测中,使得缺陷检测的效率得到大幅提高。然而，小型目标检测问题仍是工业产品表面缺陷检测领域的难点之一，因此提高对芯片这种小目标的缺陷检测率成为亟待解决的问题。After the chip is powered on, it will generate a start command to transmit signal 0 and transmit data, and it can also make home appliances intelligently connected, which is the core foundation of high-end manufacturing. However, the chip manufacturing process is very complicated, and it is very easy to produce some unexpected structures during the production process, resulting in target defects. In recent years, deep learning technology has developed rapidly. Based on the powerful learning ability and feature extraction ability of deep learning in a large amount of data, many researchers try to apply deep learning technology to product defect detection, which greatly improves the efficiency of defect detection. improve. However, the problem of small target detection is still one of the difficulties in the field of surface defect detection of industrial products, so improving the defect detection rate of small targets such as chips has become an urgent problem to be solved.

发明内容Contents of the invention

本发明的目的在于针对现有技术的不足之处，提供一种基于深度学习的芯片缺陷检测方法，以解决背景技术中所提出的问题。The purpose of the present invention is to provide a chip defect detection method based on deep learning to solve the problems raised in the background technology, aiming at the deficiencies of the prior art.

为实现上述目的，本发明提供如下技术方案：包括如下步骤：To achieve the above object, the present invention provides the following technical solutions: comprising the steps of:

步骤一：通过高速工业相机采集不同种类的芯片表面缺陷图像；Step 1: Collect different types of chip surface defect images through high-speed industrial cameras;

步骤二：通过图像标注工具对芯片表面图像缺陷进行位置和类型的标注，构建缺陷数据集，将缺陷数据集分为训练数据集D和测试数据集T；Step 2: Use the image annotation tool to mark the position and type of the chip surface image defect, construct a defect data set, and divide the defect data set into a training data set D and a test data set T;

步骤三：基于改进的FasterR-CNN框架，构建用于芯片缺陷检测的深度学习网络模型：Step 3: Based on the improved FasterR-CNN framework, build a deep learning network model for chip defect detection:

1)选取特征提取网络，经典的特征提取网络有VggNet、ResNet、GoogleNet等，其中ResNet网络的优点在于其结构的残差单元，解决了梯度消失问题，使网络的准确率上升，故选取ResNet-50用作特征提取网络。ResNet-50网络中包含49个卷积层、一个全连接层，ResNet-50网络结构可以分成七个部分，第一部分主要负责对输入数据卷积、正则化、激活函数、最大化池的计算。其中第一部分不包含残差块，第二、三、四、五部分结构都包含了残差块。在ResNet-50网络结构中，残差块都有三层卷积，那该网络中共有49个卷积层，加上最后的全连接层总共是50层；1) Select the feature extraction network. Classical feature extraction networks include VggNet, ResNet, GoogleNet, etc. Among them, the advantage of the ResNet network lies in the residual unit of its structure, which solves the problem of gradient disappearance and increases the accuracy of the network. Therefore, ResNet- 50 is used as a feature extraction network. The ResNet-50 network contains 49 convolutional layers and a fully connected layer. The ResNet-50 network structure can be divided into seven parts. The first part is mainly responsible for the calculation of input data convolution, regularization, activation function, and maximization pool. The first part does not contain a residual block, and the second, third, fourth, and fifth parts of the structure all contain a residual block. In the ResNet-50 network structure, the residual block has three layers of convolution, so there are a total of 49 convolution layers in the network, plus the last fully connected layer is a total of 50 layers;

2)可变形卷积重构特征提取网络，可变形卷积是指卷积核在每一个元素上额外增加了一个参数方向参数，这样卷积核就能在训练过程中扩展到很大的范围。由于芯片表面缺陷的形状不同，没有固定的几何结构。因此，针对未知变化能力差、泛化能力弱的ResNet-50，引入可变形卷积的思想，提高神经网络对不规则目标的识别能力。其中可变形卷积的实现过程为：2) Deformable convolution reconstruction feature extraction network, deformable convolution means that the convolution kernel adds an additional parameter direction parameter to each element, so that the convolution kernel can be expanded to a large range during the training process . Due to the different shapes of defects on the chip surface, there is no fixed geometry. Therefore, for ResNet-50, which has poor ability of unknown change and weak generalization ability, the idea of deformable convolution is introduced to improve the recognition ability of neural network for irregular targets. The implementation process of deformable convolution is as follows:

子步骤1：对采集的芯片图像调整输入尺寸以及进行预处理操作。Sub-step 1: adjust the input size and perform preprocessing operations on the collected chip image.

子步骤2：根据输入的图像，利用传统的卷积核提取特征图。Sub-step 2: According to the input image, use the traditional convolution kernel to extract the feature map.

子步骤3：得到可变形卷积的变形的偏移量：把得到的特征图作为输入，对特征图再添加一个卷积层。Sub-step 3: Get the deformed offset of the deformable convolution: take the obtained feature map as input, and add another convolution layer to the feature map.

子步骤4：偏移层是2N(其中2是指x和y两个方向，N是指卷积核大小)，由于需要在二维平面上做平移，因此只需改变x值和y值，其中x值和y值分别代表在图像数据像素在x和y方向上的偏移量。Sub-step 4: The offset layer is 2N (where 2 refers to the two directions of x and y, and N refers to the size of the convolution kernel). Since translation is required on the two-dimensional plane, only the x and y values need to be changed. Wherein the x value and the y value represent the offset of the pixel in the image data in the x and y directions, respectively.

其中在训练过程中，用于生成输出特征的卷积核和用于生成偏移量的卷积核是同步学习的。During the training process, the convolution kernel used to generate output features and the convolution kernel used to generate offsets are learned synchronously.

3)多尺度特征融合，由于芯片缺陷检测本身为小目标检测的特点，因此引入特征金字塔FPN来进行特征融合。特征融合即将深层语义信息融合成浅层特征图，利用深层特征丰富语义信息，同时利用浅层特征的特征。将特征金字塔FPN引入到更快的R-CNN，将所有层的特征集成到特征金字塔中，同时将ResNet-50网络中第二、三、四、五层的残差块的特征激活输出作为输入。通过1x1卷积将C2-C5的通道数减少到256,得到M2-M5(以C2和M2为例，C2代表Conv2所对应的一系列残差结构的特征矩阵，M2代表C2通过1×1卷积后的残差结构的特征矩阵)。通过上采样添加相同大小的浅层特征图和深度特征图,得到FPN的输出P2-P5,然后采用3x3卷积。特征图P6是通过对FPN的输出P5上两个最大的池进行缩减采样获得的，输出多尺度融合特征组合。利用RPN中的融合特征生成目标候选帧，通过分类获得检测结果；3) Multi-scale feature fusion. Since the chip defect detection itself is characterized by small target detection, the feature pyramid FPN is introduced for feature fusion. Feature fusion is to fuse deep semantic information into shallow feature maps, use deep features to enrich semantic information, and use shallow features at the same time. Introduce the feature pyramid FPN into the faster R-CNN, integrate the features of all layers into the feature pyramid, and at the same time use the feature activation output of the second, third, fourth, and fifth layers of the residual block in the ResNet-50 network as input . The number of channels of C2-C5 is reduced to 256 by 1x1 convolution, and M2-M5 is obtained (take C2 and M2 as examples, C2 represents the feature matrix of a series of residual structures corresponding to Conv2, and M2 represents C2 through 1×1 volume feature matrix of the residual structure after product). The shallow feature map and deep feature map of the same size are added by upsampling to obtain the output P2-P5 of the FPN, and then 3x3 convolution is used. The feature map P6 is obtained by downsampling the two largest pools on the output P5 of the FPN, outputting a multi-scale fusion feature combination. Use the fusion features in RPN to generate target candidate frames, and obtain detection results through classification;

4)感兴趣区域对齐(ROI对齐)。ROIAlign与ROI池的不同之处不仅在于简单地量化然后池化，还在于使用区域特征聚合方法将其转换为连续操作。感兴趣区域对齐的具体实现步骤如下：4) Region of interest alignment (ROI alignment). ROIAlign differs from ROI pooling not only by simply quantizing and then pooling, but also by using a region feature aggregation method to convert it into a continuous operation. The specific implementation steps of the alignment of the region of interest are as follows:

子步骤1：循环访问所有候选区域，并保持映射的候选区域浮点坐标未量化。Sub-step 1: Loop through all candidate regions and keep the floating-point coordinates of the mapped candidate regions unquantized.

子步骤2：候选区域分为z×z单元格，并且每个单元格也不量化。Sub-step 2: The candidate area is divided into z×z cells, and each cell is not quantized.

子步骤3：确定每个单元中采样点的浮点坐标，使用双线性插值方法计算采样点的浮点坐标，然后可以获得固定维度的ROI输出。Sub-step 3: Determine the floating-point coordinates of the sampling points in each unit, calculate the floating-point coordinates of the sampling points using a bilinear interpolation method, and then obtain a fixed-dimensional ROI output.

5)软非极大值抑制(SoftNMS)。软非最大值抑制是对所有的候选框去做重新的估分，对比较差的框保留但会抑制分数，其最终实现结果是SoftNMS对框的分数做了一个修改，保留了较高的召回率。SoftNMS可用以下公式来表示：5) Soft non-maximum suppression (SoftNMS). Soft non-maximum suppression is to re-evaluate all candidate frames, retain the poorer frames but suppress the scores, and the final realization result is that SoftNMS has made a modification to the frame scores, retaining a higher recall Rate. SoftNMS can be expressed by the following formula:

作为本发明的优选技术方案，所述SoftNMS公式中Si是检测框i的分数，M是最高分数框，所述bi是检测帧的集合，所述Nt是设置的阈值，所述NMS将IOU大于阈值的检测帧的评分设置为0，而软NMS则对IOU大于阈值的检测帧的评分进行衰减，可以缓解目标漏检和误检的问题，所述软NMS的IOU阈值为0.5，最低分数为0.05。As a preferred technical solution of the present invention, Si is the score of the detection frame i in the SoftNMS formula, M is the highest score frame, the bi is the collection of detection frames, and the Nt is the threshold set, and the NMS will set the IOU greater than The score of the detection frame of the threshold is set to 0, while the soft NMS attenuates the score of the detection frame whose IOU is greater than the threshold, which can alleviate the problem of target missed detection and false detection. The IOU threshold of the soft NMS is 0.5, and the minimum score is 0.05.

与现有技术相比，本发明提供了一种基于深度学习的芯片缺陷检测方法，具备以下有益效果：Compared with the prior art, the present invention provides a chip defect detection method based on deep learning, which has the following beneficial effects:

本发明的目的在于针对芯片缺陷检测的特点和需求，以FasterR-CNN为基本框架，由特征提取网络、区域推荐网络、检测网络组成。故而提出一种基于深度学习的芯片缺陷检测算法，该算法通过采用可变形卷积重构特征提取网络、多尺度特征融合、感兴趣区域对齐(ROI对齐)、软非最大值抑制等措施对FasterR-CNN算法进行改进，解决了芯片缺陷体积小、形状复杂等问题，提高了缺陷检测的精度。The purpose of the present invention is to aim at the characteristics and requirements of chip defect detection, take Faster R-CNN as the basic framework, and consist of a feature extraction network, a region recommendation network, and a detection network. Therefore, a chip defect detection algorithm based on deep learning is proposed. The algorithm adopts measures such as deformable convolution reconstruction feature extraction network, multi-scale feature fusion, alignment of regions of interest (ROI alignment), soft non-maximum suppression, etc. -The CNN algorithm is improved to solve the problems of small size and complex shape of chip defects, and improve the accuracy of defect detection.

附图说明Description of drawings

图1为本发明ResNet50网络结构图；Fig. 1 is the network structural diagram of ResNet50 of the present invention;

图2为本发明可变形卷积示意图；Fig. 2 is a schematic diagram of deformable convolution in the present invention;

图3为本发明多尺度特征融合网络结构图。Fig. 3 is a structure diagram of the multi-scale feature fusion network of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

请参阅图1-3，本实施方案中：包括如下步骤：Please refer to Fig. 1-3, in this embodiment: comprise the following steps:

需要注意的是：SoftNMS公式中Si是检测框i的分数，M是最高分数框，bi是检测帧的集合，Nt是设置的阈值，NMS将IOU大于阈值的检测帧的评分设置为0，而软NMS则对IOU大于阈值的检测帧的评分进行衰减，可以缓解目标漏检和误检的问题，软NMS的IOU阈值为0.5，最低分数为0.05。It should be noted that in the SoftNMS formula, Si is the score of the detection frame i, M is the highest score frame, bi is the set of detection frames, Nt is the set threshold, NMS sets the score of the detection frame with IOU greater than the threshold to 0, and Soft NMS attenuates the scores of detection frames whose IOU is greater than the threshold, which can alleviate the problem of missed and false detection of targets. The IOU threshold of soft NMS is 0.5, and the minimum score is 0.05.

最后应说明的是：以上所述仅为本发明的优选实施例而已，并不用于限制本发明，尽管参照前述实施例对本发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换。凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。Finally, it should be noted that: the above is only a preferred embodiment of the present invention, and is not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it still The technical solutions recorded in the foregoing embodiments may be modified, or some technical features thereof may be equivalently replaced. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A chip defect detection method based on deep learning is characterized in that: the method comprises the following steps:

step one: collecting different types of chip surface defect images through a high-speed industrial camera;

step two: marking positions and types of image defects on the surface of the chip through an image marking tool, constructing a defect data set, and dividing the defect data set into a training data set D and a test data set T;

step three: based on the improved Faster R-CNN framework, a deep learning network model for chip defect detection is constructed:

1) The characteristic extraction network is selected, the classical characteristic extraction network comprises VggNet, resNet, googleNet and the like, wherein the ResNet network has the advantages of residual units of the structure, the gradient disappearance problem is solved, the accuracy of the network is improved, and the ResNet-50 is selected as the characteristic extraction network. The ResNet-50 network comprises 49 convolution layers and one full connection layer, and the ResNet-50 network structure can be divided into seven parts, wherein the first part is mainly responsible for calculating input data convolution, regularization, activation functions and maximization pools. Wherein the first part does not contain residual blocks, and the second, third, fourth and fifth part structures all contain residual blocks. In the ResNet-50 network structure, the residual blocks all have three convolutions, and the total of 49 convolution layers in the network and the total of the last full connection layer is 50 layers;

2) The deformable convolution reconstructing the feature extraction network means that a convolution kernel is additionally added with a parameter direction parameter on each element, so that the convolution kernel can be expanded to a large range in the training process. There is no fixed geometry due to the different shape of the chip surface defects. Therefore, the concept of deformable convolution is introduced aiming at ResNet-50 with poor unknown change capability and weak generalization capability, and the recognition capability of the neural network on irregular targets is improved. The realization process of the deformable convolution is as follows:

sub-step 1: and adjusting the input size of the acquired chip image and performing preprocessing operation.

Sub-step 2: from the input image, a feature map is extracted using a conventional convolution kernel.

Sub-step 3: the offset of the deformation resulting in the deformable convolution: and taking the obtained characteristic diagram as input, and adding a convolution layer to the characteristic diagram.

Sub-step 4: the offset layer is 2N (where 2 refers to both x and y directions and N refers to the convolution kernel size) and only the x and y values, representing the offsets in the x and y directions of the image data pixels, respectively, need to be changed since translation is required in the two-dimensional plane.

Wherein during training, the convolution kernels used to generate the output features and the convolution kernels used to generate the offsets are synchronously learned.

3) The multi-scale feature fusion is carried out by introducing a feature pyramid FPN because the chip defect detection is the characteristic of small target detection. Feature fusion is to fuse deep semantic information into a shallow feature map, enrich semantic information by using deep features and simultaneously use features of shallow features. Feature pyramid FPN is introduced into faster R-CNN, features of all layers are integrated into feature pyramid, and feature activation output of second, third, fourth and fifth residual blocks in ResNet-50 network is taken as input. The number of channels of C2-C5 is reduced to 256 by 1x1 convolution, resulting in M2-M5 (taking C2 and M2 as examples, C2 represents the feature matrix of a series of residual structures corresponding to Conv2, and M2 represents the feature matrix of the residual structure of C2 after 1x1 convolution). The shallow layer feature map and the depth feature map with the same size are added through upsampling to obtain the outputs P2-P5 of the FPN, and then 3x3 convolution is adopted. The feature map P6 is obtained by downsampling the two largest pools on the output P5 of the FPN, outputting a multi-scale fusion feature combination. Generating a target candidate frame by utilizing fusion characteristics in the RPN, and obtaining a detection result through classification;

4) Region of interest alignment (ROI alignment). ROI alignment differs from ROI pooling not only in simple quantization and pooling, but also in converting it to continuous operation using a region feature aggregation method. The specific implementation steps of the region of interest alignment are as follows:

sub-step 1: all candidate regions are accessed in a loop and the mapped candidate region floating point coordinates are kept unquantized.

Sub-step 2: the candidate region is divided into zxz cells and each cell is also not quantized.

Sub-step 3: floating point coordinates of the sampling points in each unit are determined, floating point coordinates of the sampling points are calculated using a bilinear interpolation method, and then ROI output of a fixed dimension can be obtained.

5) Soft non-maximum suppression (Soft NMS). Soft non-maximum suppression is a re-estimation of all candidate boxes, but the score is preserved for the worse boxes, with the end result that Soft NMS makes a modification to the score of the boxes, preserving higher recall. The Soft NMS may be represented by the following formula:

2. the method for detecting the chip defects based on deep learning as claimed in claim 1, wherein: in the third step, si is the score of the detection frame i, M is the highest score frame, bi is the set of detection frames, nt is the set threshold, the NMS sets the score of the detection frame with the IOU greater than the threshold to 0, and the Soft NMS attenuates the score of the detection frame with the IOU greater than the threshold, so as to alleviate the problems of target missing detection and false detection, the IOU threshold of the Soft NMS is 0.5, and the lowest score is 0.05.