CN115527072A

CN115527072A - Chip surface defect detection method based on sparse space perception and meta-learning

Info

Publication number: CN115527072A
Application number: CN202211386361.5A
Authority: CN
Inventors: 黄晓华; 李阳; 邵秀燕; 赵群; 俞佳豪
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2022-11-07
Filing date: 2022-11-07
Publication date: 2022-12-27

Abstract

The invention discloses a chip surface defect detection method based on sparse space perception and meta-learning, which comprises the steps of firstly, acquiring data and carrying out image preprocessing operation, secondly, enhancing an image by selecting a similarity contrast learning enhancement network algorithm, and adding a transfer learning module before inputting image features after enhancement transformation into a sparse space alignment network of cross transformation, so that the model can more easily identify feature information in classes on the fine granularity, and the convergence of the model is accelerated. And finally, training and testing the model by adopting an N-way K-shot task detection method, and finally realizing the detection of the chip defects. The invention greatly reduces the calculation amount required by the model during learning, thereby achieving the effect of light weight; the generalization ability of the model is improved by introducing meta-learning, and a small amount of data sets are used for enhancing the neural network, so that information outside the class of the image label is learned, and the accuracy of detecting the defects on the surface of the chip is improved.

Description

A Chip Surface Defect Detection Method Based on Sparse Spatial Awareness and Meta-Learning

技术领域technical field

本发明涉及芯片表面缺陷检测技术领域，具体涉及一种基于稀疏空间感知与元学习的芯片表面缺陷检测方法。The invention relates to the technical field of chip surface defect detection, in particular to a chip surface defect detection method based on sparse space perception and meta-learning.

背景技术Background technique

芯片在人们的日常生活中有着不可替代的作用，但是带有表面缺陷的芯片会直接影响电子产品的性能和使用寿命。在芯片的生产过程中，需要对制造出来的芯片进行表面缺陷检测，例如：划伤、Bump元件缺陷(凸起、错位或缺失)、金属性污染物和蚀刻液脏污残留等。随着制造业的不断进步，生产出来的芯片质量也变得越来越好，因此获取到表面带有缺陷的芯片数据集是非常有限的。Chips play an irreplaceable role in people's daily life, but chips with surface defects will directly affect the performance and service life of electronic products. During the chip production process, it is necessary to inspect the surface defects of the manufactured chips, such as: scratches, Bump component defects (bumps, dislocations or missing), metallic pollutants, and etchant residues. With the continuous progress of the manufacturing industry, the quality of the chips produced is getting better and better, so the data set of chips with surface defects is very limited.

小样本学习是专门解决少量数据集的问题，但是基于小样本学习的网络模型在训练缺陷检测任务的过程中，比较难以提取表面缺陷的特征信息，且网络的结构较为单一，可能会导致丢失掉重要的表面缺陷特征信息，在新样本中的缺陷特征没有学习到；其次，复杂的模型往往需要很多的运算量以及时间成本。因此，针对以上两个难点，本专利提出基于交叉变换的稀疏空间与元学习进一步结合，来使得模型达到轻量化的效果，提高运算速度，以此完成对带有表面缺陷的芯片进行缺陷检测。Small-sample learning is designed to solve the problem of a small number of data sets, but the network model based on small-sample learning is difficult to extract the characteristic information of surface defects in the process of training defect detection tasks, and the structure of the network is relatively simple, which may lead to loss. Important surface defect feature information is not learned from the defect features in new samples; secondly, complex models often require a lot of computation and time costs. Therefore, in view of the above two difficulties, this patent proposes to further combine the sparse space based on cross-transformation and meta-learning, so that the model can achieve a lightweight effect and improve the calculation speed, so as to complete the defect detection of chips with surface defects.

对于芯片的表面缺陷检测，是生产线中的关键环节之一，其中对芯片进行表面缺陷是指运用常见的表面缺陷特征来进行准确的分类。目前较为常见的表面缺陷检测方法有：The surface defect detection of chips is one of the key links in the production line, and the surface defect detection of chips refers to the accurate classification by using common surface defect features. At present, the more common surface defect detection methods are:

传统分类：对于产品的表面是否带有缺陷的分类问题，传统的表面检测方式有人工对产品的表面进行检测，但是某些缺陷会让人产生视觉疲劳，容易收到外界干扰，检测效率难以得到保证导致很多误判，这种表面缺陷检测的方式操作便捷，但是效率较低、标准不统一、成本较大等缺点。Traditional classification: For the classification of whether the surface of the product has defects, the traditional surface inspection method is to manually detect the surface of the product, but some defects will cause visual fatigue, and it is easy to receive external interference, and the detection efficiency is difficult to obtain. Guarantee leads to many misjudgments. This method of surface defect detection is convenient to operate, but has disadvantages such as low efficiency, inconsistent standards, and high cost.

机器学习分类：主要通过输入的芯片表面的缺陷图像信息来反映出的不同的缺陷特征，来对芯片表面的缺陷类型进行分类。这里的表面缺陷检测算法主要是基于人工设计特征的特征选择算法与模式识别分类算法的结合，其中传统的分类算法有K最近邻算法实现图像分类、BP神经网络算法实现图像分类、贝叶斯算法实现图像分类等。Machine learning classification: Classify the defect types on the chip surface mainly through the different defect features reflected by the defect image information input on the chip surface. The surface defect detection algorithm here is mainly a combination of a feature selection algorithm based on artificially designed features and a pattern recognition classification algorithm. The traditional classification algorithms include K nearest neighbor algorithm for image classification, BP neural network algorithm for image classification, and Bayesian algorithm. Implement image classification, etc.

深度学习分类：以卷积神经网络为代表的深度学习模型在计算机视觉领域的成功应用，给缺陷检测提供了新的发展方向。利用深度学习进行分类是在某种程度上可以达到惊人的效果，找到芯片表面缺陷图像的重要内在特征信息。近年来，随着计算机技术的快速发展以及人工智能的突飞猛进，使得深度学习得到了广泛的应用，涉及到了人们生活的各个方面。在工业的产品分类上也获取了惊人的精度。基于深度学习的目标检测网络模型目前被分为没有独立地提取候选区域和有独立地提取候选区域。对于没有独立地提取候选区域的网络模型，是直接生成目标框进行目标检测。该目标检测模型检测速度快，但检测精度不高，这样的代表算法有Yolo系列、SSD等。对于有独立地提取候选区域，是先生成候选框，接着将这些生成的候选框中的潜在目标选择最终的候选框。有独立地提取候选区域的网络模型具有较高的精度，但其检测的速度较没有独立地提取候选区域慢，代表算法有R-CNN、Faster RCNN等。上述的网络模型均需要大量的数据集，难以满足在产品生产中的应用。Deep learning classification: The successful application of deep learning models represented by convolutional neural networks in the field of computer vision provides a new development direction for defect detection. The use of deep learning for classification can achieve amazing results to some extent, and find important intrinsic feature information of chip surface defect images. In recent years, with the rapid development of computer technology and the rapid development of artificial intelligence, deep learning has been widely used, involving all aspects of people's lives. Amazing accuracy is also achieved in industrial product classification. The target detection network model based on deep learning is currently divided into those that do not extract candidate regions independently and those that extract candidate regions independently. For network models that do not extract candidate regions independently, target boxes are directly generated for target detection. The target detection model has a fast detection speed, but the detection accuracy is not high. Such representative algorithms include Yolo series, SSD, etc. For independent extraction of candidate regions, the candidate boxes are first generated, and then the potential targets in these generated candidate boxes are selected as the final candidate boxes. The network model with independently extracted candidate regions has higher accuracy, but its detection speed is slower than that without independently extracted candidate regions. Representative algorithms include R-CNN, Faster RCNN, etc. The above-mentioned network models all require a large amount of data sets, which are difficult to meet the application in product production.

目前常用的深度学习常用网络模型有：Lenet、Alxnet、VGG系列、Resnet系列、Inception系列、Densenet系列、Googlenet、Nasnet、Xception、Senet，Currently commonly used deep learning network models are: Lenet, Alxnet, VGG series, Resnet series, Inception series, Densenet series, Googlenet, Nasnet, Xception, Senet,

深度学习中的轻量化网络模型有：Mobilenet v1,v2、Shufflenet v1,v2,SqueezenetThe lightweight network models in deep learning are: Mobilenet v1, v2, Shufflenet v1, v2, Squeezenet

上述描述的轻量化模型在项目应用中用的较为居多，他们有着不同的优缺点：The lightweight models described above are mostly used in project applications, and they have different advantages and disadvantages:

优点：(1)参数模型小，方便部署；(2)计算量小，速度快；Advantages: (1) The parameter model is small and easy to deploy; (2) The calculation amount is small and the speed is fast;

缺点：(1)轻量化模型在精度上没有Resnet系列、Inception系列、Densenet系列、Senet的准确度高。Disadvantages: (1) The accuracy of the lightweight model is not as high as that of the Resnet series, Inception series, Densenet series, and Senet.

以上运用计算机进行分类的算法均需要大量的样本才能够得到较高的分类精度。The above classification algorithms using computers all require a large number of samples to obtain higher classification accuracy.

小样本学习。就是通过大量的任务来训练模型，这些任务均是由芯片数据集中随机挑选出来的图片组成并且各个任务均是各不相同，然后在新的未见过的少量的芯片样本上就能够快速的学习。其中元学习就是学习如何去学习，目前主要用来解决小样本学习的问题。Small sample learning. It is to train the model through a large number of tasks. These tasks are composed of randomly selected pictures from the chip data set and each task is different, and then it can be quickly learned on a small number of new unseen chip samples. . Among them, meta-learning is to learn how to learn. At present, it is mainly used to solve the problem of small sample learning.

元学习近几年的发展主要有：The development of meta-learning in recent years mainly includes:

1：基于度量的元学习：训练模型不需要针对测试任务进行调整；但是当测试与训练在任务集上的类别较大的时候，效果不太好；1: Metric-based meta-learning: the training model does not need to be adjusted for the test task; but when the categories of the test and training on the task set are large, the effect is not very good;

2：基于模型的元学习：由于其系统内部动力学的灵活性，相比大多数基于度量的元学习有更广泛适用性；在很多监督任务上表现不如度量学习；当数据量增大时，效果变差；任务间类别相差较大时，模型的效果不如基于优化的元学习方法；2: Model-based meta-learning: Due to the flexibility of the internal dynamics of its system, it has wider applicability than most metric-based meta-learning; it is not as good as metric learning on many supervised tasks; when the amount of data increases, The effect becomes worse; when the category difference between tasks is large, the effect of the model is not as good as the optimization-based meta-learning method;

3：基于优化的元学习：与基于模型的元学习相比，它们可以在任务分布更广泛的情况获得较优性能；基于优化的技术需为每个任务优化在基础学习器上进行学习，导致计算成本昂贵。3: Optimization-based meta-learning: Compared with model-based meta-learning, they can achieve better performance in the case of a wider distribution of tasks; optimization-based techniques need to optimize learning on the base learner for each task, resulting in Computationally expensive.

现有存在的方法有基于小样本的交叉变换的空间对齐网络，通过对于小样本进行数据增强，使得其在类内与类外获得较高的类信息。这样在新的任务上可以有较好的泛化能力，接着传送到交叉变换的空间对齐网络中，该网络具有较好的分类精度。Existing methods include a spatial alignment network based on small-sample cross-transformation, through data enhancement for small samples, so that it can obtain higher class information within and outside the class. This allows for better generalization on new tasks, which are then fed into a cross-transformed spatially aligned network, which has better classification accuracy.

对于人工分类的方法：效率不高，劳动强度大。人为因素占比最大。For the method of manual classification: the efficiency is not high, and the labor intensity is high. The human factor accounts for the largest proportion.

对于机器学习分类的方法：其缺点是对于带有缺陷的芯片，分类算法需要根据不同先验知识进行设计，有针对性地依据芯片表面缺陷特性提取和选择特征，这样使得算法的鲁棒性较低，难以完成在复杂任务下的分类任务。同时，基于机器学习的分类算法对图像的要求标准高，所有的图像还要有统一的背景，特征部位只是正常图像中的某个位置。对于不同尺寸的芯片，采集的背景不同，芯片表面的缺陷位置不同，这些都会使得分类的准确率相对较低，另外，仅依靠机器学习分类算法通常难以较好的获取图像的信息特征，检测芯片表面是否带有缺陷，分类结果也易受到材料本身及其他因素的干扰。因此，传统的机器视觉技术难以充分且有效地提取到缺陷特征，效率低下，不能非常准确的区别芯片表面是否带有缺陷。对于配有芯片的电子设备而言，存在安全隐患，故不适用于具有高精度的芯片质量缺陷检测。For the method of machine learning classification: its disadvantage is that for chips with defects, the classification algorithm needs to be designed according to different prior knowledge, and features are extracted and selected according to the characteristics of chip surface defects, which makes the algorithm less robust. Low, it is difficult to complete the classification task under complex tasks. At the same time, the classification algorithm based on machine learning has high requirements for images. All images must have a uniform background, and the feature part is only a certain position in the normal image. For chips of different sizes, the collected backgrounds are different, and the defect positions on the chip surface are different. These will make the classification accuracy relatively low. In addition, it is usually difficult to obtain the information characteristics of the image and detect the chip by relying on machine learning classification algorithms. Whether the surface has defects or not, the classification results are also easily disturbed by the material itself and other factors. Therefore, it is difficult for traditional machine vision technology to fully and effectively extract defect features, which is inefficient and cannot accurately distinguish whether there are defects on the chip surface. For electronic equipment equipped with chips, there are potential safety hazards, so it is not suitable for high-precision chip quality defect detection.

对于深度学习而言，深度学习比较适合对芯片是否带有缺陷进行准确的分类。深度学习的分类算法与传统方式最大的不同是：深度学习是利用卷积神经网络进行图像特征的提取，通过不同尺寸窗口的数据和卷积核作内积，对图像特征进行提取。但是，深度学习的鲁棒性较差，泛化性不好，严重依赖于海量数据。需要有非常多的数据量才可以达到有效的学习效果。若是没有足够的数据量，可能导致深度学习网络训练结果变差，甚至会出现欠拟合的现象，导致模型难以收敛。因此，通用目标检测算法不适合直接应用于芯片表面缺陷检测任务，需要提出新的解决方法。For deep learning, deep learning is more suitable for accurate classification of whether chips have defects. The biggest difference between the classification algorithm of deep learning and the traditional method is: deep learning uses convolutional neural network to extract image features, and extracts image features through the inner product of data in different size windows and convolution kernel. However, deep learning has poor robustness, poor generalization, and relies heavily on massive data. A very large amount of data is required to achieve effective learning. If there is not enough data, it may lead to poor training results of the deep learning network, or even underfitting, making it difficult for the model to converge. Therefore, general-purpose object detection algorithms are not suitable for direct application to chip surface defect detection tasks, and new solutions need to be proposed.

元学习可以有效解决少量的带有缺陷的芯片样本的分类问题，只需要少量的图片数据就可以对一些特定的任务进行学习，这是与一般的神经网络算法不同的地方。但是由于学习少量的样本，在元学习训练任务的过程中，提取特征信息较少，且网络的结构较为单一，容易丢失一些在训练时不需要的信息，导致在新任务或者新领域需要的信息也随之丢失了；这样对于芯片的缺陷位置信息、缺陷类型、缺陷的外观等发生变化时，可能会导致准确率大大的降低。其次，复杂的模型往往需要很多的运算量以及时间成本，不能够实时的进行分类，难以满足实际生产的需求。因此，同时提高小样本分类的运算速度也非常重要。Meta-learning can effectively solve the classification problem of a small number of chip samples with defects, and only need a small amount of image data to learn some specific tasks, which is different from general neural network algorithms. However, due to learning a small number of samples, in the process of meta-learning training tasks, less feature information is extracted, and the structure of the network is relatively simple, so it is easy to lose some information that is not needed during training, resulting in information needed in new tasks or new fields. It is also lost; in this way, when the defect location information, defect type, defect appearance, etc. of the chip change, the accuracy rate may be greatly reduced. Secondly, complex models often require a lot of calculation and time costs, and cannot be classified in real time, so it is difficult to meet the needs of actual production. Therefore, it is also very important to improve the operation speed of small sample classification at the same time.

基于小样本的交叉变换的空间对齐网络由于其庞大的参数问题，导致其计算时间成本太大，难以满足实时性的需求。Due to its huge parameter problem, the spatial alignment network based on the cross-transformation of small samples leads to a large computational time cost, which is difficult to meet the real-time requirements.

发明内容Contents of the invention

发明目的：本发明本专利针对小样本学习类别之外信息不足与计算量庞大的问题，提供一种基于稀疏空间感知与元学习的芯片表面缺陷检测方法，交叉变换的稀疏空间对齐网络，使得模型在学习的时候所需要的运算量大大的减少，达到了轻量化的效果，从而适应在工业中实时检测缺陷的要求；另一方面，元学习的引入提升模型的泛化能力，少量的数据集来增强神经网络，从而学习图片标签类别之外的信息，提高对于芯片表面的缺陷检测的准确率。Purpose of the invention: This invention aims at the problem of insufficient information and huge amount of calculation outside the small sample learning category, and provides a chip surface defect detection method based on sparse space perception and meta-learning, and a cross-transformed sparse space alignment network, so that the model The amount of calculation required during learning is greatly reduced, achieving a lightweight effect, thereby adapting to the requirements of real-time defect detection in the industry; on the other hand, the introduction of meta-learning improves the generalization ability of the model, and a small amount of data sets To enhance the neural network, so as to learn information other than the image label category, and improve the accuracy of defect detection on the chip surface.

本发明采用的技术方案：一种基于稀疏空间感知与元学习的芯片表面缺陷检测方法，本专利所用技术是融合交叉变换的稀疏空间对齐网络和元学习，从而进行芯片表面缺陷检测：首先，进行数据的采集并进行图像预处理操作，其次，选择相似对比学习增强网络算法来的对图片进行增强，在把增强变换后的图像特征输入到交叉变换的稀疏空间对齐网络之前，加入迁移学习模块，使得模型在细粒度上面更容易识别类内的特征信息，加快模型的收敛。最后，采取N-way K-shot任务检测方法，进行模型的训练和测试，最终实现对芯片缺陷的检测。具体步骤如下：The technical solution adopted in the present invention: a chip surface defect detection method based on sparse space perception and meta-learning. The technology used in this patent is a fusion of cross-transformed sparse space alignment network and meta-learning, so as to detect chip surface defects: first, carry out Data collection and image preprocessing operations, and secondly, choose the similarity contrast learning enhancement network algorithm to enhance the picture, before inputting the enhanced transformed image features into the cross-transformed sparse space alignment network, add the migration learning module, It makes it easier for the model to identify the feature information within the class at a fine-grained level and speed up the convergence of the model. Finally, the N-way K-shot task detection method is adopted to train and test the model, and finally realize the detection of chip defects. Specific steps are as follows:

步骤一，数据收集与处理：首先，准备芯片训练数据集，收集带有缺陷的芯片数据集，并将此数据集按照模型训练方式划分为训练集、验证集以及测试集；从芯片数据集中进行采样，形成许多的任务，这些任务是不相交的，每个集合均是由许多个任务构成，每个任务的里面包含了支持集与查询集，其中支持集是有类别的标签，查询集是没有标签的；在对图像进行相似对比学习增强的方法训练的时候使用支持集和新的查询集，这些新的查询集是从支持集里面随机的抽取一些数据，这样得到与新的查询集图像相同个数的类别，以上是对数据集进行划分；Step 1, data collection and processing: First, prepare the chip training data set, collect the chip data set with defects, and divide this data set into training set, verification set and test set according to the model training method; Sampling to form many tasks, these tasks are disjoint, each set is composed of many tasks, each task contains a support set and a query set, where the support set is a category label, and the query set is There is no label; the support set and the new query set are used when training the image with the method of similarity contrast learning enhancement. These new query sets randomly extract some data from the support set, so that the new query set image can be obtained The same number of categories, the above is to divide the data set;

步骤二，预训练模型：本专利所用相似对比学习增强网络来对数据进行增强的变换，主要用于无监督学习，同时还可以提高基础模型和嵌入的特征信息，这样可以大大的提高在进行迁移学习时获得模型所需要的信息。运用相似对比学习增强网络进行训练可以获得较好的图像嵌入，这样不会因为同一类的不同图像变换而受到影响。因为在训练时，通过对输入的图片进行随机数据增强，来使得网络可以更加学习到更多的图像信息，且不需要学习图片的颜色或者图片中目标的位置信息。因此，在图片的嵌入进行预训练的时候，对图像进行随机的增强，让网络模型学习困难一点，这样在以后的模型中可以拥有更好的泛化能力。Step 2, pre-training model: The similarity-contrast learning enhancement network used in this patent is used to enhance the transformation of data, which is mainly used for unsupervised learning, and can also improve the basic model and embedded feature information, which can greatly improve the migration. Obtain the information needed by the model while learning. Using similarity-contrastive learning to augment the network for training can obtain better image embeddings, which will not be affected by different image transformations of the same class. Because during training, random data enhancement is performed on the input picture, so that the network can learn more image information, and there is no need to learn the color of the picture or the position information of the target in the picture. Therefore, when the image embedding is pre-trained, the image is randomly enhanced to make it more difficult for the network model to learn, so that it can have better generalization ability in the future model.

步骤三，模型选择：本专利运用交叉变换的空间稀疏网络对芯片表面缺陷进行检测，此模型专门针对小目标进行分类进行设计，与此同时达到减少了网络的参数计算以及训练时间的目的。由于芯片表面存在的缺陷比较小，网络模型通过自注意头将图片特征转化为三维的特征空间，这样获得更多的特征信息，通过自注意机制运算得到注意力值的大小，其中，较大的值表示获得较高的语义信息，较小的值表示获得少量的语义信息；为了减少像素点遍历计算所需要的时间与信息冗余，增加稀疏语义对齐网络模块，将语义相关较大的的进行计算，注意力值较小的就不需要进行运算，最后得到的语义对齐特征图与查询集里的图进行度量计算。Step 3, model selection: This patent uses a cross-transformed spatially sparse network to detect chip surface defects. This model is specially designed for classification of small objects, and at the same time achieves the purpose of reducing network parameter calculation and training time. Since the defects on the surface of the chip are relatively small, the network model converts the picture features into a three-dimensional feature space through the self-attention head, so as to obtain more feature information, and obtain the size of the attention value through the operation of the self-attention mechanism. Among them, the larger The value indicates that a higher semantic information is obtained, and a smaller value indicates that a small amount of semantic information is obtained; in order to reduce the time and information redundancy required for pixel traversal calculations, a sparse semantic alignment network module is added, and the semantic correlation is relatively large. Calculation, if the attention value is small, no calculation is required, and the finally obtained semantically aligned feature map and the map in the query set are used for measurement calculation.

步骤四，迁移学习：一般在训练过程中，训练的参数都是随机初始化的，为了获得良好的参数，需要对大量的图片进行训练，然而小样本中特征提取的部分参数占了很多；为了弥补样本数量少的缺点，在元学习的过程中加入了迁移学习模块；首先将之前划分好的训练数据放到相似对比增强网络进行训练，得到网络训练权重；之后在元学习训练的时候，添加先前训练好的模型权重，进行迁移学习，增强在元学习测试集中支持集中图片特征提取的能力；减少模型迭代次数，加快模型快速收敛。Step 4, transfer learning: Generally, during the training process, the training parameters are randomly initialized. In order to obtain good parameters, a large number of pictures need to be trained. However, some parameters of feature extraction in small samples account for a lot; in order to make up for The disadvantage of the small number of samples is that the transfer learning module is added in the process of meta-learning; firstly, the previously divided training data is put into the similarity contrast enhancement network for training, and the network training weight is obtained; The trained model weights are used for transfer learning to enhance the ability to support centralized image feature extraction in the meta-learning test set; reduce the number of model iterations and speed up the rapid convergence of the model.

步骤五，元学习：在元训练阶段，一个任务随机采取N-way K-shot任务分类方法，其中，N是随机选择的类别数量，K是选择的每个类别中对应的图片数量。Step 5, meta-learning: In the meta-training stage, a task randomly adopts the N-way K-shot task classification method, where N is the number of randomly selected categories, and K is the number of pictures corresponding to each selected category.

对于元训练集，本专利采用5-way 1-shot分类方法将数据放到网络进行训练；For the meta-training set, this patent uses the 5-way 1-shot classification method to put the data into the network for training;

对于元测试集，本专利采用5-way 1-shot分类方法将数据放到网络进行测试。For the meta-test set, this patent uses a 5-way 1-shot classification method to put data into the network for testing.

首先，将元训练数据的支持集输入到相似对比学习增强网络中，对输入的图像进行数据两种不同随机的增强变换，接着分别把增强后的图片传送进残差网络进行提取特征，然后使用基于多层感知器的非线性全连接层得到两个嵌入向量，运用余弦相似度计算图像的两个增强的图像之间的相似度，在理想的情况下，在同一个类的图像经过不同的增强后图像之间的相似度会很高，而不同类别图像之间的相似度会很低。接着把每个批次中所有剩余的图像都被视为不相似的类别图像(即当作负样本看待)，再把每两个批次之间的位置互换，将所有配对的损失求和并取平均值作为损失函数。公式如下：First, input the support set of meta-training data into the similarity-contrast learning enhancement network, perform two different random enhancement transformations on the input image, and then send the enhanced images into the residual network to extract features, and then use The nonlinear fully connected layer based on the multi-layer perceptron obtains two embedding vectors, and uses the cosine similarity to calculate the similarity between the two enhanced images of the image. The similarity between enhanced images will be high, while the similarity between images of different categories will be low. Then all the remaining images in each batch are treated as dissimilar category images (that is, as negative samples), and then the positions between each two batches are exchanged, and the losses of all pairs are summed And take the average as the loss function. The formula is as follows:

其中，

in,

l(i,j)是两个增强后图片特征之间的损失，i与j是原始图片增强后的两张图片特征。l(i,j) is the loss between two enhanced image features, i and j are the two enhanced image features of the original image.

在训练完相似对比学习增强网络模型后，只需要把训练好的相似对比学习增强网络通过迁移学习的方法得到支持集里的特征，同样的，对查询集也这样做迁移学习，由于经过增强变换的网络会得到一些正相关的样本，每张图片经过相似对比学习增强变换得到两张图片的特征做特征平均运算，就可以得到每个类别对应的特征图，对查询集也这样做运算。公式如下：After training the similarity contrast learning enhanced network model, it is only necessary to use the trained similarity contrast learning enhanced network to obtain the features in the support set through the transfer learning method. Similarly, transfer learning is also done for the query set. The network will get some positively related samples, and each picture will undergo similarity comparison learning and enhancement transformation to obtain the features of the two pictures for feature averaging operation, and then the feature map corresponding to each category can be obtained, and the same operation is done for the query set. The formula is as follows:

对于支持集里面的第c个类别表示为s^c,|s^c|表示类别c中含有图片的数量，x表示为一张原始的图片，Φ(x)表示为经过迁移学习得到的特征向量。For the c-th category in the support set, it is represented as s ^c , |s ^c | represents the number of pictures contained in category c, x represents an original picture, and Φ(x) represents the feature vector obtained through migration learning.

接着把得到的支持集图像与查询集图像从二维形式转变为三维的张量特征形式，在N-way K-shot任务中，使用两个独立的线性投影为支持集特征

生成键K_s和值V_s，投影头K:

和值投影头V:

进行特征维度的变换。类似地，使用一个线性投影为查询集特征

生成特征Q_q，投影头Q:

进行特征维度的变换。分别得到支持集和查询集的特征空间后，将他们在各自的维度对应点之间进行点乘，就可以得到一系列的查询图像与各支持类之间的语义关系矩阵。Then transform the obtained support set image and query set image from two-dimensional form to three-dimensional tensor feature form. In the N-way K-shot task, two independent linear projections are used as support set features

Generate key K _s and value V _s , projection head K:

and value projection head V:

Transform feature dimensions. Similarly, using a linear projection for the queryset features

Generate feature Q _q , projection head Q:

Transform feature dimensions. After obtaining the feature spaces of the support set and the query set, dot product them between the corresponding points of their respective dimensions, a series of semantic relationship matrices between the query image and each support class can be obtained.

如果查询集与支持集里空间对应点的语义距离相近，即支持集里的空间点与查询集空间对应点的注意力值较大，那么它们很可能具有相似的局部特征，否则它们之间的语义关系也相对较弱。首先计算查询图像与各支持类上空间对应点之间的语义关系矩阵，得到R_n：If the semantic distance between the query set and the spatial corresponding points in the support set is similar, that is, the spatial points in the support set and the spatial corresponding points in the query set have a large attention value, then they are likely to have similar local features, otherwise the distance between them Semantic relationships are also relatively weak. First calculate the semantic relationship matrix between the query image and the spatial corresponding points on each support class, and get R _n :

R_n中的每一行表示查询图像中每个点与支持集中所有图像的所有点的语义相似度。运用了一种稀疏空间交叉注意力的算法，用于在查询图像中找到与任务相关的点特征。Each row in _Rn represents the semantic similarity of each point in the query image to all points in all images in the support set. A sparse spatial cross-attention algorithm is employed to find task-relevant point features in the query image.

在收集完所有与任务相关的注意点后，可以运用掩码m＝[m₁；…；m_k]得到注意点大的特征，而注意处的值小时就将其删除，此处需要提前设定好阈值，若语义关系矩阵里面的值大于阈值时，m_i等于1，否则为0，此处的阈值设置为0.5。使用掩码m和语义关系矩阵R_n进行相乘，我们可以得到稀疏注意图a_n，且使用它来与每个支持集的键值V_s进行语义对齐用来获得对应于查询图像集的空间位置，得到特定于任务的原型向量t，可以计算为：After collecting all the attention points related to the task, you can use the mask m=[m ₁ ;...;m _k ] to get the features with large attention points, and delete them if the value of the attention points is small, here you need to set in advance Set the threshold. If the value in the semantic relationship matrix is greater than the threshold, _mi is equal to 1, otherwise it is 0. The threshold here is set to 0.5. Multiplying the mask m with the semantic relationship matrix R _n , we can get the sparse attention map a _n , and use it to semantically align with the key value V _s of each support set to obtain the space corresponding to the query image set position, resulting in a task-specific prototype vector t, which can be computed as:

a_n＝m*R_n a _n =m*R _n

还要对查询集做投影头V:

进行特征维度的变换，将其变换为与原型向量t相同的大小，进行度量计算：Also do a projection head V on the queryset:

Perform feature dimension transformation, transform it to the same size as the prototype vector t, and perform metric calculation:

其中，H'和W'分别为原始图像的高和宽，w^p表示为查询集特征

经过投影头V:

变换得到的。如果距离相近的那么就是同一个类别，否则就不是同一个类别。Among them, H' and W' are the height and width of the original image respectively, and w ^p represents the query set feature

Through projection head V:

Transformed to get. If the distance is similar then it is the same category, otherwise it is not the same category.

有益效果：本发明使用交叉变换的稀疏空间感知对齐网络作为框架的元学习模型，以及将相似对比学习增强网络和元学习结合的分类方法，提高了小样本分类的可行性，减少了训练样本的数量和训练的迭代的次数，加快了训练的迭代周期，大大的减少了参数的运算量。对数据集进行增强变换的神经网络，学习标签类别之外的信息。将少量的数据集进行网络的训练，通过对数据集进行图像的增强变换，利用增强后的图像进行特征提取，这样可以在学习的时候很好的得到细粒度与粗粒度的类别信息。使得在小样本的情况下也可以把非标签类别的数据信息进行学习，解决在测试上时的数据集与训练类别不同时也可以获得很好的精度问题。在学习完标签类别之外的信息之后，接着通过稀疏空间感知网络进行语义对齐。通过两个网络模型能够提升小样本学习内更多的特征信息，提高少量样本下对于图像分类准确率。在数据的数量难以获取的情况下，与现有的分类方法相比，本发明的分类方法可以有效解决小样本的分类问题，且具有一定的适应性。Beneficial effects: the present invention uses a cross-transformed sparse space-aware alignment network as a meta-learning model of the framework, and a classification method that combines the similarity-contrast learning enhancement network with meta-learning, which improves the feasibility of small-sample classification and reduces the number of training samples. The number and the number of training iterations speed up the training iteration cycle and greatly reduce the amount of parameter calculations. A neural network that augments a dataset to learn information beyond the labeled categories. A small amount of data sets are used for network training, and image enhancement transformation is performed on the data sets, and feature extraction is performed using the enhanced images, so that fine-grained and coarse-grained category information can be obtained well during learning. It makes it possible to learn the data information of non-label categories even in the case of small samples, and solve the problem of good accuracy when the data set on the test is different from the training category. After learning information beyond label categories, semantic alignment is then performed through a sparse spatially aware network. Through the two network models, more feature information in small sample learning can be improved, and the accuracy of image classification under a small number of samples can be improved. When the amount of data is difficult to obtain, compared with the existing classification methods, the classification method of the present invention can effectively solve the classification problem of small samples, and has certain adaptability.

附图说明Description of drawings

图1是本发明基于稀疏空间感知与元学习的芯片表面缺陷检测方法流程图。Fig. 1 is a flow chart of the chip surface defect detection method based on sparse space perception and meta-learning in the present invention.

图2是本发明稀疏空间感知分类模型示意图。Fig. 2 is a schematic diagram of the sparse space-aware classification model of the present invention.

具体实施方式detailed description

下面结合本发明附图和具体实施方式对整个实施例中的技术方案进行详细的叙述。The technical solutions in the entire embodiment will be described in detail below in conjunction with the drawings and specific implementation methods of the present invention.

如图1所示，一种基于稀疏空间感知与元学习的芯片表面缺陷检测方法，首先，对数据收集并进行图像预处理操作；其次，选择相似对比学习增强网络算法来进行图像增强，提升特征提取的能力，加速模型的收敛。在元训练时，把数据输入到训练好的相似对比学习增强网络，得到的特征通过迁移学习模块，传输到交叉变换的稀疏对齐网络，将其变换为三维空间特征，使其具有更加丰富的特征信息，面对未知的是否带有缺陷的芯片数据集，可以让模型在数据集上学到更多的特征信息，提高了模型的分类准确度。在训练和测试中，均采取N-way K-shot任务数据分类方法来对网络模型进行训练和测试，最终实现对少量的芯片数据集是否带有表面缺陷进行准确的分类。As shown in Figure 1, a chip surface defect detection method based on sparse space perception and meta-learning, firstly, collect data and perform image preprocessing operations; secondly, select similarity contrast learning enhancement network algorithm to enhance image and improve features The ability to extract speeds up the convergence of the model. During meta-training, the data is input into the trained similarity-contrast learning enhancement network, and the obtained features are transferred to the cross-transformed sparse alignment network through the transfer learning module, and transformed into three-dimensional spatial features to make it have more abundant features. Information, in the face of unknown chip data sets with or without defects, the model can learn more feature information on the data set and improve the classification accuracy of the model. In both training and testing, the N-way K-shot task data classification method is adopted to train and test the network model, and finally achieve accurate classification of whether a small amount of chip data sets have surface defects.

具体过程包括如下步骤：The specific process includes the following steps:

步骤1：收集实验数据集Step 1: Collect Experimental Dataset

步骤2：数据划分Step 2: Data partitioning

步骤3：数据增强Step 3: Data Augmentation

步骤4：空间稀疏语义对齐网络Step 4: Spatial Sparse Semantic Alignment Network

步骤5：元学习Step 5: Meta-learning

步骤6：度量学习Step 6: Metric Learning

本发明制作用于tensorflow可以读取的tfrecord格式的数据集，运用LabelImg工具，把芯片数据集中图片进行标注，产生xml文件。收集包括标注信息的常见的缺陷类别的数据集。将常见的数据集按照种类进行划分，一部分常见的数据集作为迁移学习训练的数据，另一部分数据集划分为训练集和测试集。将芯片原始图像根据缺陷标定的位置截取局部缺陷图划分为芯片元训练集和芯片元测试集。将收集到图像全部做图像预处理操作，对收集到的图像全部缩放到同一大小的尺寸上，进行图像增强操作，图像的随机反转，水平旋转操作。The present invention produces a data set in tfrecord format that can be read by tensorflow, uses the LabelImg tool to label the pictures in the chip data set, and generates an xml file. Collect a dataset of common defect categories including annotation information. Divide common data sets according to types, some common data sets are used as data for migration learning training, and the other part of data sets are divided into training sets and test sets. The original image of the chip is divided into a chip element training set and a chip element test set according to the position of the defect calibration to intercept the local defect map. Perform image preprocessing operations on all collected images, scale all collected images to the same size, perform image enhancement operations, random image inversion, and horizontal rotation operations.

进行数据集的划分：在元训练集中有许多个任务，每个任务由支持集和查询集组成且每个任务都是各不相同的，它们分别从芯片数据集中进行随机提取。元测试也是由多个不同的任务构成并且元测试集中的任务与元训练集中的任务是不相交的。然后将处理后的数据集随机抽取n个任务作为元训练数据集，依次将任务送到整个网络模型进行训练并更新参数，最后保存更新过后的参数。Divide the data set: There are many tasks in the meta-training set, each task is composed of a support set and a query set, and each task is different, and they are randomly extracted from the chip data set. The meta-test is also composed of multiple different tasks and the tasks in the meta-test set are disjoint from the tasks in the meta-training set. Then randomly select n tasks from the processed data set as the meta-training data set, and send the tasks to the entire network model for training and update parameters in turn, and finally save the updated parameters.

模型选择：本专利通过交叉变换的空间对齐网络与稀疏空间结合来构建模型。相似对比学习增强技术是一种无监督学习的模型，通过对数据集进行随机的增强，使得类内的特征信息可以很好的学习，具有快速适应其他新任务等优点。稀疏空间感知网络模型方法见图2。其中，相似对比学习增强的主干网络由残差网络Resnet-34进行特征图的提取，通过弦函数进行相似度的比较。由于芯片缺陷位置一般都比较小，通过相似对比学习增强网络在学习类内信息的时候丢失较少的类无关信息。对增强变换后的特征传送到稀疏空间感知网络里面进行训练。在该网络模型中，把迁移学习得到的两个增强变换后的特征做加权求和运算，得到了一个该类的特征图，然后将这样的二维特征变换到三维的特征维度，把查询集与支持集里面的三维特征向量相乘得到多个空间对应点，得到每张查询集里的图像与各支持集里所有的图像之间的语义关系矩阵，得到空间感知注意力值，在三维的张量空间中这可能会花费许多的时间，为了轻量化这个模型，将支持集与查询集得到的注意力矩阵图选取前n个注意力最大的数值，代表了查询图像与支持图像联系较大的像素点进行关联。如果查询集与支持集里空间对应点的语义距离相近，即支持集里的空间点与查询集空间对应点的值关联较大，那么它们很可能具有相似的局部特征，否则它们之间的语义关系也是比较弱的。这样就可以把与查询集无关的类别信息舍弃掉。然后把得到语义特征矩阵与支持集图像进行语义特征对齐，这样得到了一个在支持集上的图像特征，把得到的语义对齐特征图与经过键值V变换的支持集特征进行相乘得到查询集上的对齐网络特征图，接着把查询集转为与对齐网络特征图相同的大小，进行度量操作。距离相近的那么就是同一个类别，否则就不是同一个类别。Model selection: This patent builds models by combining a cross-transformed spatially aligned network with a sparse space. The similarity-contrast learning enhancement technology is an unsupervised learning model. By randomly enhancing the data set, the feature information within the class can be well learned, and it has the advantages of quickly adapting to other new tasks. See Figure 2 for the sparse space-aware network model method. Among them, the backbone network enhanced by similarity contrast learning uses the residual network Resnet-34 to extract the feature map, and compares the similarity through the string function. Since chip defect locations are generally relatively small, the enhanced network loses less class-independent information when learning intra-class information through similarity comparison learning. The enhanced and transformed features are sent to the sparse spatial perception network for training. In this network model, the two enhanced and transformed features obtained by transfer learning are weighted and summed to obtain a feature map of this class, and then such two-dimensional features are transformed into three-dimensional feature dimensions, and the query set Multiply with the three-dimensional feature vector in the support set to get multiple spatial corresponding points, get the semantic relationship matrix between the image in each query set and all the images in each support set, and get the spatial perception attention value, in the three-dimensional This may take a lot of time in the tensor space. In order to reduce the weight of this model, the attention matrix obtained by the support set and the query set is selected from the top n values with the largest attention, which means that the query image is more related to the support image. pixels are associated. If the semantic distance between the query set and the spatial corresponding point in the support set is similar, that is, the spatial point in the support set is more correlated with the value of the corresponding point in the query set space, then they are likely to have similar local features, otherwise the semantics between them Relationships are also weak. In this way, category information irrelevant to the query set can be discarded. Then align the semantic feature matrix with the support set image to obtain an image feature on the support set, and multiply the obtained semantic alignment feature map with the support set feature transformed by the key value V to obtain the query set The aligned network feature map above, and then convert the query set to the same size as the aligned network feature map for measurement operations. If the distance is close, then it is the same category, otherwise it is not the same category.

进行元学习：在元训练阶段，一个任务中随机采取N-way K-shot任务分类方法，N是每个任务里面含有的类别的个数，K代表每个类别含有的图像数量。在元训练时数据集使用5-way 1-shot划分方法把数据放到网络进行训练，然后通过训练集中的查询集对其进行元训练里的测试。同样的对于元测试集同样运用5-way 1-shot分类方法将芯片数据放到网络进行测试。Carry out meta-learning: In the meta-training stage, the N-way K-shot task classification method is randomly adopted in a task, N is the number of categories contained in each task, and K represents the number of images contained in each category. During the meta-training, the data set uses the 5-way 1-shot partition method to put the data into the network for training, and then test it in the meta-training through the query set in the training set. Similarly, for the meta-test set, the 5-way 1-shot classification method is also used to put the chip data on the network for testing.

度量学习是在特定的任务里学习一个距离函数，使得该距离函数能够在类别之间取得较好的性能。度量学习是一种较为常见的方法，使得同类物体在嵌入空间上的计算的距离比较近，而不同类的对象之间的距离则比较远。Metric learning is to learn a distance function in a specific task, so that the distance function can achieve better performance between categories. Metric learning is a more common method, so that the calculation distance of similar objects in the embedding space is relatively close, and the distance between different objects is relatively long.

应当指出，对于本技术领域的普通技术人员来说，在不脱离本发明所述原理的前提下，还可以做出若干改进和润饰，这些改进和润饰也应视为本发明的保护范围。It should be pointed out that those skilled in the art can make some improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention.

Claims

1. A chip surface defect detection method based on sparse space perception and meta-learning, characterized in that it includes: firstly, collecting data and performing image preprocessing operations; secondly, selecting similarity contrast learning to enhance the network algorithm to pair pictures For enhancement, before inputting the enhanced transformed image features into the cross-transformed sparse space alignment network, a transfer learning module is added to make it easier for the model to identify the feature information in the class at a fine-grained level and speed up the convergence of the model; finally, adopt The N-way K-shot task detection method performs model training and testing, and finally realizes the detection of chip surface defects; the specific steps are as follows:

Step 1, data collection and processing: first, prepare the chip training data set, collect the chip data set with defects, and divide this data set into training set, verification set and meta-test set according to the model training method; from the chip data set Sampling is performed to form many tasks. These tasks are disjoint. Each set is composed of many tasks. Each task contains a support set and a query set. The support set is a category label, and the query set There is no label; the support set and the new query set are used when training the image with the method of similarity contrast learning enhancement, and these new query sets are randomly extracted from the support set. The category with the same number of images, the above is to divide the data set;

Step 2, pre-training model: use similarity and contrastive learning to enhance the network to enhance the transformation of the data for unsupervised learning, and also improve the basic model and embedded feature information, so as to improve the performance required to obtain the model when performing transfer learning. Information; use similarity contrast learning to enhance the network to train to obtain better image embedding, so that it will not be affected by different image transformations of the same class; because during training, random data enhancement is performed on the input image to make the network more Learn more image information, and do not need to learn the color of the picture or the position information of the target in the picture; therefore, when the image embedding is pre-trained, the image is randomly enhanced to make it difficult for the network model to learn, so that Have better generalization ability in future models;

Step 3, model selection: use the cross-transformed spatial sparse network to detect chip surface defects. This model is specially designed for classification of small objects, and at the same time achieves the purpose of reducing network parameter calculation and training time; The existing defects are relatively small. The network model transforms the picture features into a three-dimensional feature space through the self-attention head, so as to obtain more feature information, and obtain the size of the attention value through the operation of the self-attention mechanism. Among them, a larger value indicates the obtained Higher semantic information, a smaller value means that a small amount of semantic information is obtained; in order to reduce the time and information redundancy required for pixel traversal calculations, a sparse semantic alignment network module is added, and the semantically related ones are calculated. Note If the force value is small, no calculation is required, and the finally obtained semantically aligned feature map and the map in the query set are measured and calculated;

Step 4, transfer learning: Generally, during the training process, the training parameters are randomly initialized. In order to obtain good parameters, a large number of pictures need to be trained. However, some parameters of feature extraction in small samples account for a lot; in order to make up for The disadvantage of the small number of samples is that the transfer learning module is added in the process of meta-learning; firstly, the previously divided training data is put into the similarity contrast enhancement network for training, and the network training weight is obtained; The trained model weights are used for transfer learning to enhance the ability to support centralized image feature extraction in the meta-learning test set; reduce the number of model iterations and speed up the rapid convergence of the model;

Step 5, meta-learning: In the meta-training stage, a task randomly adopts the N-way K-shot task classification method, where N is the number of randomly selected categories, and K is the number of pictures corresponding to each selected category;

For the meta-training set, the 5-way 1-shot classification method is used to put the data into the network for training;

For the meta-test set, a 5-way 1-shot classification method is used to put the data into the network for testing.

2. A chip surface defect detection method based on sparse space perception and meta-learning according to claim 1, characterized in that, in the step five, the specific steps are:

First, input the support set of meta-training data into the similarity-contrast learning enhancement network, perform two different random enhancement transformations on the input image, and then send the enhanced images into the residual network to extract features, and then use The nonlinear fully connected layer based on the multi-layer perceptron obtains two embedding vectors, and uses the cosine similarity to calculate the similarity between the two enhanced images of the image. The similarity between images after enhancement will be high, while the similarity between images of different categories will be low; then all remaining images in each batch are regarded as dissimilar category images, and then every two The positions between batches are swapped, and the losses of all pairs are summed and averaged as the loss function; the formula is as follows:

in,

l(i,j) is the loss between the two enhanced image features, i and j are the two enhanced image features of the original image;

After training the similarity contrast learning enhanced network model, it is only necessary to use the trained similarity contrast learning enhanced network to obtain the features in the support set through the transfer learning method. Similarly, transfer learning is also done for the query set. The network will get some positively correlated samples, and each picture will undergo similarity comparison learning enhancement transformation to obtain the features of the two pictures for feature averaging operation, and then get the feature map corresponding to each category, and do the same for the query set; the formula is as follows :

For the c-th category in the support set, it is represented as s ^c , |s ^c | represents the number of pictures contained in category c, x represents an original picture, and Φ(x) represents the feature vector obtained through migration learning;

Then transform the obtained support set image and query set image from two-dimensional form to three-dimensional tensor feature form. In the N-way K-shot task, two independent linear projections are used as support set features

Generate key K _s and value V _s , projection head

sum value projection head

Transform the feature dimension; similarly, use a linear projection for the query set feature

To generate features Q _q , the projection head

Transform the feature dimension; after obtaining the feature spaces of the support set and the query set respectively, perform point multiplication between the corresponding points of their respective dimensions, and obtain a series of semantic relationship matrices between the query image and each support class;

If the semantic distance between the query set and the spatial corresponding points in the support set is similar, that is, the spatial points in the support set and the spatial corresponding points in the query set have a large attention value, then they are likely to have similar local features, otherwise the distance between them The semantic relationship is also relatively weak; first calculate the semantic relationship matrix between the query image and the spatial corresponding points on each support class, and get R _n :

Each row in R _n represents the semantic similarity of each point in the query image to all points in all images in the support set; a sparse spatial cross-attention algorithm is used to find task-relevant points in the query image feature;

After collecting all the attention points related to the task, use the mask m=[m ₁ ;...;m _k ] to get the features with large attention points, and delete them if the value of the attention points is small, which needs to be set in advance Good threshold, if the value in the semantic relationship matrix is greater than the threshold, mi is equal to 1, otherwise it is 0, the threshold here is set to 0.5; use the mask _{m to multiply the semantic relationship matrix R n} _to get a sparse attention map a _n , and use it to semantically align with the key value V _s of each support set to obtain the spatial location corresponding to the query image set, resulting in a task-specific prototype vector t, calculated as:

a _n =m*R _n

Also do a projection head on the queryset

Among them, H' and W' are the height and width of the original image respectively, and w ^p represents the query set feature

through projection head

Transformed; if the distance is similar, then it is the same category, otherwise it is not the same category.