CN115527072A - Chip surface defect detection method based on sparse space perception and meta-learning - Google Patents

Chip surface defect detection method based on sparse space perception and meta-learning Download PDF

Info

Publication number
CN115527072A
CN115527072A CN202211386361.5A CN202211386361A CN115527072A CN 115527072 A CN115527072 A CN 115527072A CN 202211386361 A CN202211386361 A CN 202211386361A CN 115527072 A CN115527072 A CN 115527072A
Authority
CN
China
Prior art keywords
learning
training
network
model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211386361.5A
Other languages
Chinese (zh)
Inventor
黄晓华
李阳
邵秀燕
赵群
俞佳豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Institute of Technology
Original Assignee
Nanjing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Technology filed Critical Nanjing Institute of Technology
Priority to CN202211386361.5A priority Critical patent/CN115527072A/en
Publication of CN115527072A publication Critical patent/CN115527072A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30148Semiconductor; IC; Wafer

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a chip surface defect detection method based on sparse space perception and meta-learning, which comprises the steps of firstly, acquiring data and carrying out image preprocessing operation, secondly, enhancing an image by selecting a similarity contrast learning enhancement network algorithm, and adding a transfer learning module before inputting image features after enhancement transformation into a sparse space alignment network of cross transformation, so that the model can more easily identify feature information in classes on the fine granularity, and the convergence of the model is accelerated. And finally, training and testing the model by adopting an N-way K-shot task detection method, and finally realizing the detection of the chip defects. The invention greatly reduces the calculation amount required by the model during learning, thereby achieving the effect of light weight; the generalization ability of the model is improved by introducing meta-learning, and a small amount of data sets are used for enhancing the neural network, so that information outside the class of the image label is learned, and the accuracy of detecting the defects on the surface of the chip is improved.

Description

Chip surface defect detection method based on sparse space perception and meta-learning
Technical Field
The invention relates to the technical field of chip surface defect detection, in particular to a chip surface defect detection method based on sparse space perception and meta-learning.
Background
The chip has an irreplaceable role in daily life of people, but the chip with surface defects directly influences the performance and the service life of electronic products. In the production process of chips, surface defect detection needs to be performed on the manufactured chips, for example: scratches, bump element defects (bumps, dislocations, or deletions), metallic contaminants, and etching solution smudge residues, among others. With the continuous progress of the manufacturing industry, the quality of the produced chips becomes better and better, so that the acquisition of the data set of the chip with the defects on the surface is very limited.
The small sample learning is specially used for solving the problem of a small number of data sets, but the network model based on the small sample learning is difficult to extract the characteristic information of the surface defect in the process of training a defect detection task, and the network has a single structure, so that important characteristic information of the surface defect can be lost, and the defect characteristic in a new sample is not learned; second, complex models often require a lot of computation and time cost. Therefore, aiming at the two difficulties, the patent proposes that the sparse space based on the cross transformation is further combined with the meta-learning to ensure that the model achieves the effect of light weight and the operation speed is increased, so as to complete the defect detection of the chip with the surface defect.
The detection of the surface defects of the chips is one of the key links in the production line, wherein the surface defects of the chips are accurately classified by using common surface defect characteristics. The current common surface defect detection methods include:
and (4) traditional classification: to the classification problem whether the surface of a product has defects, the traditional surface detection mode has the defects that the surface of the product is manually detected, but some defects can cause visual fatigue to people, the external interference is easily received, the detection efficiency is difficult to ensure, so that a plurality of misjudgments are caused, the surface defect detection mode is convenient and fast to operate, but the defects of low efficiency, non-uniform standard, high cost and the like are overcome.
Machine learning classification: the defect types of the chip surface are classified mainly through different defect characteristics reflected by the input defect image information of the chip surface. The surface defect detection algorithm is mainly based on the combination of a feature selection algorithm of artificial design features and a pattern recognition classification algorithm, wherein the traditional classification algorithm comprises a K nearest neighbor algorithm for realizing image classification, a BP neural network algorithm for realizing image classification, a Bayesian algorithm for realizing image classification and the like.
Deep learning classification: the successful application of a deep learning model represented by a convolutional neural network in the field of computer vision provides a new development direction for defect detection. The classification by deep learning can achieve a remarkable effect to some extent and find important intrinsic characteristic information of the chip surface defect image. In recent years, with the rapid development of computer technology and the rapid advance of artificial intelligence, deep learning is widely applied, and various aspects of people's life are involved. Surprising accuracy is also achieved in industrial product classification. The deep learning-based target detection network model is currently classified into the case where the candidate region is not independently extracted and the case where the candidate region is independently extracted. For the network model without independently extracting the candidate area, the target frame is directly generated for target detection. The target detection model has high detection speed but low detection precision, and the representative algorithms include a Yolo series, an SSD and the like. For independent candidate region extraction, candidate frames are generated first, and then potential targets in the generated candidate frames are selected as final candidate frames. The network model with the independently extracted candidate regions has higher precision, but the detection speed is slower than that without the independently extracted candidate regions, and the representative algorithms are R-CNN, faster RCNN and the like. The network models described above all require a large number of data sets and are difficult to meet the application in product production.
The currently common deep learning network models include: lenet, alxnet, VGG series, resnet series, inclusion series, densenet series, googlenet, nasnet, xception, senet,
the lightweight network model in deep learning includes: mobilene v1, v2, shufflenet v1, v2, squeezenet
The lightweight models described above are used more frequently in project applications, and they have different advantages and disadvantages:
the advantages are that: the parameter model is small, and the deployment is convenient; (2) the calculated amount is small, and the speed is high;
the disadvantages are that: (1) The lightweight model has no accuracy of Resnet series, increment series, densenet series and Sennet in precision.
The above algorithms for classification by using a computer all require a large number of samples to obtain high classification accuracy.
And (5) learning by using small samples. The model is trained through a large number of tasks, the tasks are all composed of pictures randomly selected from a chip data set, all the tasks are different, and then the model can be rapidly learned on a small number of new unseen chip samples. The meta learning is how to learn, and is mainly used for solving the problem of small sample learning at present.
The main developments in the recent years of meta-learning are:
1: metric-based meta-learning: the training model does not need to be adjusted according to the test task; but when the categories of testing and training on the task set are large, the effect is not good;
2: model-based meta learning: due to the flexibility of the internal dynamics of the system, the method has wider applicability compared with most metric-based meta-learning; performance is less than metric learning on many supervised tasks; when the amount of data increases, the effect becomes worse; when the category difference between tasks is large, the effect of the model is inferior to that of the meta-learning method based on optimization;
3: optimization-based meta-learning: compared with meta-learning based on models, the meta-learning model can obtain better performance under the condition that tasks are more widely distributed; optimization-based techniques require learning on a base learner for each task optimization, resulting in expensive computation.
The existing method has a spatial alignment network based on cross transformation of small samples, and the small samples are subjected to data enhancement, so that the small samples obtain higher class information in a class and out of the class. This allows for better generalization capability on new tasks, which are then passed into a cross-transformed spatially aligned network with better classification accuracy.
For the manual classification method: the efficiency is not high, and the labor intensity is high. The human factor is the largest.
For the machine learning classification method: the method has the defects that for chips with defects, a classification algorithm needs to be designed according to different priori knowledge, and features are extracted and selected according to the characteristics of the defects on the surfaces of the chips in a targeted manner, so that the robustness of the algorithm is low, and the classification task under a complex task is difficult to complete. Meanwhile, the classification algorithm based on machine learning has high requirements on images, all images also have uniform backgrounds, and the characteristic part is only a certain position in a normal image. For chips with different sizes, the acquired backgrounds are different, the defect positions on the chip surface are different, the classification accuracy is relatively low, in addition, the information characteristics of the image are generally difficult to obtain better only by a machine learning classification algorithm, whether the chip surface has defects is detected, and the classification result is also easily interfered by the material and other factors. Therefore, the conventional machine vision technology is difficult to sufficiently and effectively extract defect features, is inefficient, and cannot accurately distinguish whether the chip surface has defects. For electronic equipment equipped with a chip, potential safety hazards exist, so that the method is not suitable for detecting the quality defects of the chip with high precision.
For deep learning, the deep learning is more suitable for accurately classifying whether the chip has defects or not. The most different classification algorithm of deep learning from the traditional method is as follows: the deep learning is to extract image features by using a convolutional neural network, and the image features are extracted by performing inner product on data of windows with different sizes and a convolutional kernel. However, the deep learning has poor robustness and poor generalization, and depends heavily on mass data. A very large amount of data is required to achieve an effective learning effect. If the data amount is not enough, the training result of the deep learning network may be poor, and even an under-fitting phenomenon may occur, so that the model is difficult to converge. Therefore, the general target detection algorithm is not suitable for being directly applied to the chip surface defect detection task, and a new solution needs to be provided.
The meta-learning can effectively solve the classification problem of a small number of chip samples with defects, and can learn some specific tasks only by a small number of picture data, which is different from a general neural network algorithm. However, as a small number of samples are learned, less characteristic information is extracted in the process of meta-learning training tasks, and the structure of the network is single, some information which is not needed in the training process is easy to lose, so that information needed in a new task or a new field is lost; thus, when the defect position information, defect type, defect appearance, etc. of the chip are changed, the accuracy may be greatly reduced. Secondly, the complex models often require a lot of computation and time cost, and cannot be classified in real time, so that the requirements of actual production are difficult to meet. Therefore, it is also important to improve the operation speed of small sample classification at the same time.
The spatial alignment network based on the cross transformation of the small samples has too large calculation time cost due to the huge parameter problem, and the requirement of real-time performance is difficult to meet.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems of insufficient information and huge calculation amount outside the small sample learning category, the invention provides a chip surface defect detection method based on sparse space perception and element learning, and the sparse space of cross transformation aligns the network, so that the calculation amount required by the model in the learning process is greatly reduced, and the effect of light weight is achieved, thereby being suitable for the requirement of detecting defects in the industry in real time; on the other hand, the generalization capability of the model is improved by introducing meta-learning, and a small amount of data sets are used for enhancing the neural network, so that information outside the class of the image label is learned, and the accuracy of detecting the defects on the surface of the chip is improved.
The technical scheme adopted by the invention is as follows: a chip surface defect detection method based on sparse space perception and meta-learning, the technology used in the patent is to fuse a sparse space alignment network of cross transformation and meta-learning, so as to detect the chip surface defect: firstly, data acquisition and image preprocessing operation are carried out, secondly, a similar contrast learning enhancement network algorithm is selected to enhance the image, and before the image features after enhancement transformation are input into a sparse space alignment network of cross transformation, a migration learning module is added, so that the model can more easily identify feature information in the class on the fine granularity, and the convergence of the model is accelerated. And finally, training and testing the model by adopting an N-way K-shot task detection method, and finally realizing the detection of the chip defects. The method comprises the following specific steps:
step one, data collection and processing: firstly, preparing a chip training data set, collecting the chip data set with defects, and dividing the data set into a training set, a verification set and a test set according to a model training mode; sampling from a chip data set to form a plurality of tasks which are not intersected, wherein each set consists of a plurality of tasks, and each task comprises a support set and a query set, wherein the support set is a label with a category, and the query set is not provided with the label; when the method for enhancing the similarity contrast learning of the images is used for training, a support set and a new query set are used, the new query set randomly extracts some data from the support set, so that the classes with the same number as the images of the new query set are obtained, and the data sets are divided;
step two, pre-training the model: the used similarity contrast learning enhancement network of this patent comes the transform of carrying out the reinforcing to data, and mainly used unsupervised study can also improve the characteristic information of basic model and embedding simultaneously, and improvement that like this can be great obtains the required information of model when carrying out the migration study. The similar contrast learning enhancement network is used for training to obtain better image embedding, so that the similar contrast learning enhancement network is not influenced by different image transformation of the same class. The network can learn more image information by performing random data enhancement on the input picture during training, and does not need to learn the color of the picture or the position information of the target in the picture. Therefore, when the picture is embedded for pre-training, the images are randomly enhanced, so that the network model is difficult to learn, and the network model has better generalization capability in the future.
Step three, model selection: the chip surface defect is detected by using the space sparse network of the cross transformation, the model is specially designed for classifying small targets, and meanwhile, the purposes of reducing the parameter calculation and training time of the network are achieved. Because the defects existing on the surface of the chip are small, the network model converts the picture characteristics into a three-dimensional characteristic space through a self-attention head, so that more characteristic information is obtained, and the size of an attention value is obtained through calculation of a self-attention mechanism, wherein a larger value indicates that higher semantic information is obtained, and a smaller value indicates that a small amount of semantic information is obtained; in order to reduce the time and information redundancy required by pixel point traversal calculation, a sparse semantic alignment network module is added, calculation is carried out on the semantic correlation with larger significance, calculation is not required when the attention value is smaller, and finally measurement calculation is carried out on the obtained semantic alignment feature graph and the graph in the query set.
Step four, transfer learning: generally, in the training process, training parameters are initialized randomly, a large number of pictures need to be trained in order to obtain good parameters, but part of parameters extracted by features in a small sample account for a lot of parameters; in order to make up for the defect of small sample number, a transfer learning module is added in the meta-learning process; firstly, putting the divided training data into a similarity contrast enhancement network for training to obtain a network training weight; then, when meta-learning training is carried out, adding the previously trained model weight to carry out transfer learning, and enhancing the capability of supporting the feature extraction of the concentrated pictures in the meta-learning test set; and the iteration times of the model are reduced, and the rapid convergence of the model is accelerated.
Step five, meta learning: in the meta-training stage, one task randomly adopts an N-way K-shot task classification method, wherein N is the number of randomly selected categories, and K is the number of corresponding pictures in each selected category.
For a meta training set, the method adopts a 5-way 1-shot classification method to put data into a network for training;
for the meta test set, the 5-way 1-shot classification method is adopted to put data into a network for testing.
Firstly, a support set of meta-training data is input into a similarity contrast learning enhancement network, two different random enhancement transformations of the input image are performed, the enhanced images are respectively transmitted into a residual error network for feature extraction, then two embedded vectors are obtained by using a nonlinear full-link layer based on a multilayer perceptron, and the similarity between the two enhanced images of the images is calculated by applying cosine similarity. All remaining images in each batch were then considered to be dissimilar class images (i.e., treated as negative samples), and the positions between each two batches were interchanged, all paired losses were summed and averaged as a loss function. The formula is as follows:
Figure BDA0003929975730000061
wherein,
Figure BDA0003929975730000062
l (i, j) is the loss between the two enhanced picture features, i and j are the two enhanced picture features of the original picture.
After the similarity contrast learning enhancement network model is trained, the trained similarity contrast learning enhancement network is only required to obtain the features in the support set by a migration learning method, and similarly, the migration learning is also performed on the query set. The formula is as follows:
Figure BDA0003929975730000063
denoted s for the c-th class within the support set c ,|s c And | represents the number of pictures in the category c, x represents an original picture, and Φ (x) represents a feature vector obtained through migration learning.
Then, the obtained support set image and the query set image are converted into a three-dimensional tensor characteristic form from a two-dimensional form, and two independent linear projections are used as support set characteristics in an N-way K-shot task
Figure BDA0003929975730000079
Generation of bond K s Sum value V s And the projection head K:
Figure BDA0003929975730000071
sum value projection head V:
Figure BDA0003929975730000072
and performing transformation of characteristic dimensions. Similarly, a linear projection is used as a query set feature
Figure BDA00039299757300000710
Generating a characteristic Q q And a projection head Q:
Figure BDA0003929975730000073
to carry outAnd (5) transforming feature dimensions. After the feature spaces of the support set and the query set are obtained respectively, point multiplication is carried out on the feature spaces between the corresponding points of the respective dimensions, and then a series of semantic relation matrixes between the query image and each support class can be obtained.
If the semantic distance between the query set and the corresponding point in the space in the support set is close, that is, the attention value between the space point in the support set and the corresponding point in the space in the query set is large, then they are likely to have similar local features, otherwise, the semantic relationship between them is relatively weak. Firstly, calculating a semantic relation matrix between a query image and a spatial corresponding point on each support class to obtain R n
Figure BDA0003929975730000074
R n Each row in (a) represents the semantic similarity of each point in the query image to all points in all images in the support set. A sparse spatial cross-attention algorithm is applied for finding task-related point features in a query image.
After all the attention points related to the task are collected, a mask m = [ m ] can be applied 1 ;…;m k ]Obtaining the feature with large attention point, deleting the feature when the value of attention point is small, setting a threshold value in advance, and if the value in the semantic relation matrix is larger than the threshold value, m is i Equal to 1, otherwise 0, where the threshold is set to 0.5. Using a mask m and a semantic relation matrix R n By multiplying, we can get sparse attention a n And use it to match the key value V of each supporting set s Performing semantic alignment to obtain a spatial location corresponding to the query image set, resulting in a task-specific prototype vector t, which can be calculated as:
a n =m*R n
Figure BDA0003929975730000075
also for query setsManufacturing a projection head V:
Figure BDA0003929975730000076
and (3) performing characteristic dimension transformation, transforming the characteristic dimension into the same size as the prototype vector t, and performing measurement calculation:
Figure BDA0003929975730000077
where H 'and W' are the height and width of the original image, respectively, and W p Expressed as a query set feature
Figure BDA00039299757300000711
Passing through the projection head V:
Figure BDA0003929975730000078
and (4) obtaining the result through transformation. If the distances are close, the same category is obtained, otherwise, the same category is not obtained.
Has the beneficial effects that: the invention uses the cross-transformed sparse space perception alignment network as the meta-learning model of the frame and the classification method combining the similarity contrast learning enhancement network and the meta-learning, improves the feasibility of small sample classification, reduces the number of training samples and the iteration times of training, accelerates the iteration period of training and greatly reduces the operation amount of parameters. A neural network that performs an enhanced transformation on the data set learns information outside the label categories. The method has the advantages that a small amount of data sets are trained through a network, the data sets are subjected to image enhancement transformation, and the enhanced images are used for feature extraction, so that fine-grained and coarse-grained category information can be well obtained during learning. The data information of the non-label category can be learned under the condition of a small sample, and the problem that good precision can be obtained when a data set is different from a training category in testing is solved. After learning information outside the label category, semantic alignment is then performed through a sparse spatial aware network. More characteristic information in small sample learning can be improved through the two network models, and the image classification accuracy under a small number of samples is improved. Under the condition that the quantity of data is difficult to obtain, compared with the existing classification method, the classification method provided by the invention can effectively solve the classification problem of small samples and has certain adaptability.
Drawings
FIG. 1 is a flowchart of a chip surface defect detection method based on sparse spatial awareness and meta-learning.
FIG. 2 is a schematic diagram of the sparse spatial perceptual classification model of the present invention.
Detailed Description
The technical solutions in the whole embodiments are described in detail below with reference to the drawings and the detailed description of the present invention.
As shown in FIG. 1, a chip surface defect detection method based on sparse spatial perception and meta-learning, first, data collection and image preprocessing operation are performed; secondly, a similarity contrast learning enhancement network algorithm is selected for image enhancement, the capability of feature extraction is improved, and the convergence of the model is accelerated. During meta-training, data are input into a trained similarity contrast learning enhancement network, the obtained features are transmitted to a cross-transformation sparse alignment network through a migration learning module and are transformed into three-dimensional space features, the three-dimensional space features have richer feature information, and in the face of an unknown chip data set with defects or not, the model can learn more feature information on the data set, so that the classification accuracy of the model is improved. In the training and testing, an N-way K-shot task data classification method is adopted to train and test the network model, and finally, whether a small number of chip data sets have surface defects or not is accurately classified.
The specific process comprises the following steps:
step 1: collection of Experimental data sets
And 2, step: data partitioning
And step 3: data enhancement
And 4, step 4: spatially sparse semantic alignment network
And 5: meta learning
And 6: metric learning
The invention makes a tfrecd format data set which can be read by tensierflow, and labels pictures in a chip data set by using a LabelImg tool to generate an xml file. A data set of common defect categories including labeling information is collected. The common data sets are divided according to categories, one part of the common data sets are used as data for migration learning training, and the other part of the common data sets are divided into training sets and test sets. And intercepting a local defect image of the original chip image according to the defect calibration position to divide the local defect image into a chip element training set and a chip element testing set. And performing image preprocessing operation on all the collected images, zooming all the collected images to the size of the same size, performing image enhancement operation, randomly inverting the images, and horizontally rotating the images.
And (3) dividing the data set: in the meta-training set, there are many tasks, each task is composed of a support set and a query set, and each task is different from each other, and they are respectively randomly extracted from the chip data set. The meta-test is also made up of a number of different tasks and the tasks in the meta-test set are disjoint from the tasks in the meta-training set. Then randomly extracting n tasks from the processed data set as a meta-training data set, sequentially sending the tasks to the whole network model for training and updating parameters, and finally storing the updated parameters.
Selecting a model: the patent constructs a model by combining a space alignment network of cross-transformation with a sparse space. The similarity contrast learning enhancement technology is an unsupervised learning model, enables characteristic information in a class to be well learned by randomly enhancing a data set, and has the advantages of being fast suitable for other new tasks and the like. The sparse spatial aware network model approach is shown in fig. 2. The main network with the enhanced similarity comparison learning is used for extracting a characteristic diagram through a residual error network Resnet-34 and comparing the similarity through a chord function. Because the defect positions of the chips are generally smaller, less class irrelevant information is lost when the similar contrast learning enhancement network learns the information in the classes. And transmitting the features after the enhancement transformation into the sparse space perception network for training. In the network model, weighted summation operation is carried out on two enhanced and transformed features obtained by transfer learning to obtain a feature map of the class, then the two-dimensional features are transformed to three-dimensional feature dimensions, a query set is multiplied by three-dimensional feature vectors in a support set to obtain a plurality of space corresponding points, a semantic relation matrix between an image in each query set and all images in each support set is obtained, a space perception attention value is obtained, much time is possibly spent in a three-dimensional tensor space, and in order to lighten the model, the first n most attentive values are selected from the attention moment map obtained by the support set and the query set to represent that the query image is associated with pixels with larger support images. If the semantic distance between the query set and the corresponding point in the space in the support set is close, that is, the value association between the space point in the support set and the corresponding point in the space in the query set is large, then they are likely to have similar local features, otherwise, the semantic relationship between them is weak. This allows discarding of category information that is not relevant to the query set. And then, carrying out semantic feature alignment on the obtained semantic feature matrix and the image of the support set to obtain an image feature on the support set, multiplying the obtained semantic alignment feature map and the support set feature subjected to key value V conversion to obtain an alignment network feature map on the query set, converting the query set into the same size as the alignment network feature map, and carrying out measurement operation. If the distances are close, the same category is obtained, otherwise, the same category is not obtained.
And (3) performing meta learning: in the meta-training stage, an N-way K-shot task classification method is randomly adopted in one task, N is the number of categories contained in each task, and K represents the number of images contained in each category. During meta-training, a data set uses a 5-way 1-shot dividing method to put data into a network for training, and then a query set in the training set is used for carrying out a test in the meta-training. Similarly, for the meta-test set, the 5-way 1-shot classification method is also applied to put the chip data into the network for testing.
Metric learning is the learning of a distance function in a particular task, such that the distance function achieves better performance between classes. Metric learning is a common method, so that the calculated distance of similar objects in the embedding space is relatively close, and the distance between different objects is relatively far.
It should be noted that modifications and adaptations can be made by those skilled in the art without departing from the principles of the present invention and should be considered within the scope of the present invention.

Claims (2)

1. A chip surface defect detection method based on sparse spatial perception and meta-learning is characterized by comprising the following steps: firstly, acquiring data and carrying out image preprocessing operation; secondly, a similar contrast learning enhancement network algorithm is selected to enhance the picture, and before the image features after enhancement transformation are input into a cross-transformed sparse space alignment network, a transfer learning module is added, so that the model can more easily identify feature information in the class on the fine granularity, and the convergence of the model is accelerated; finally, training and testing the model by adopting an N-way K-shot task detection method, and finally detecting the chip surface defects; the method comprises the following specific steps:
step one, data collection and processing: firstly, preparing a chip training data set, collecting the chip data set with defects, and dividing the data set into a training set and a verification set in a meta-test set according to a model training mode; sampling from a chip data set to form a plurality of tasks which are not intersected, wherein each set consists of a plurality of tasks, and each task comprises a support set and a query set, wherein the support set is a label with a category, and the query set is not provided with the label; when the image is trained by a similar contrast learning enhancement method, a support set and new query sets are used, and the new query sets randomly extract some data from the support set, so that the same number of categories as the new query set images are obtained, and the data sets are divided;
step two, pre-training the model: the similarity contrast learning enhancement network is used for carrying out enhanced transformation on the data for unsupervised learning, and meanwhile, the basic model and the embedded characteristic information are improved, so that the information required by the model is improved when transfer learning is carried out; the similarity contrast learning enhancement network is used for training to obtain better image embedding, so that the influence caused by different image transformation of the same class can be avoided; during training, the network can learn more image information by performing random data enhancement on the input picture without learning the color of the picture or the position information of the target in the picture; therefore, when the picture is embedded for pre-training, the image is randomly enhanced, so that the network model is difficult to learn, and the network model has better generalization capability in the future;
step three, model selection: the chip surface defects are detected by using a space sparse network with cross transformation, the model is specially designed aiming at classifying small targets, and meanwhile, the purposes of reducing the parameter calculation and training time of the network are achieved; because the defects existing on the surface of the chip are small, the network model converts the picture characteristics into a three-dimensional characteristic space through a self-attention head, so that more characteristic information is obtained, the size of an attention value is obtained through self-attention mechanism operation, wherein a larger value indicates that higher semantic information is obtained, and a smaller value indicates that a small amount of semantic information is obtained; in order to reduce the time and information redundancy required by pixel point traversal calculation, a sparse semantic alignment network module is added, calculation is carried out on the semantic correlation with larger correlation, calculation is not required when the attention value is smaller, and finally measurement calculation is carried out on the obtained semantic alignment feature graph and the graph in the query set;
step four, transfer learning: generally, in the training process, training parameters are initialized randomly, a large number of pictures need to be trained in order to obtain good parameters, and however, part of parameters extracted from features in a small sample account for a lot of parameters; in order to make up for the defect of small sample number, a transfer learning module is added in the meta-learning process; firstly, putting the divided training data into a similarity contrast enhancement network for training to obtain a network training weight; then, when meta-learning training is carried out, adding the previously trained model weight to carry out transfer learning, and enhancing the capability of supporting the feature extraction of the concentrated pictures in the meta-learning test set; the iteration times of the model are reduced, and the rapid convergence of the model is accelerated;
step five, meta learning: in the meta-training stage, a task randomly adopts an N-way K-shot task classification method, wherein N is the number of randomly selected categories, and K is the number of corresponding pictures in each selected category;
for the meta training set, a 5-way 1-shot classification method is adopted to put data into a network for training;
and for the meta-test set, a 5-way 1-shot classification method is adopted to put the data into the network for testing.
2. The chip surface defect detection method based on sparse spatial perception and meta-learning according to claim 1, wherein in the fifth step, the specific steps are as follows:
firstly, inputting a support set of meta-training data into a similarity contrast learning enhancement network, performing two different random enhancement transformations on the input image, then respectively transmitting the enhanced images into a residual error network to extract features, then obtaining two embedded vectors by using a nonlinear full-link layer based on a multilayer perceptron, and calculating the similarity between the two enhanced images of the images by using cosine similarity, wherein under an ideal condition, the similarity between the images of the same class subjected to different enhancements is very high, and the similarity between the images of different classes is very low; then all the remaining images in each batch are regarded as dissimilar class images, the positions of every two batches are exchanged, losses of all pairs are summed, and the average value is taken as a loss function; the formula is as follows:
Figure FDA0003929975720000021
wherein,
Figure FDA0003929975720000022
l (i, j) is the loss between the two enhanced picture features, i and j are the two enhanced picture features of the original picture;
after the similarity contrast learning enhancement network model is trained, the trained similarity contrast learning enhancement network only needs to obtain the features in the support set by a migration learning method, and similarly, migration learning is also carried out on the query set; the formula is as follows:
Figure FDA0003929975720000031
denoted s for the c-th class within the support set c ,|s c L represents the number of pictures in the category c, x represents an original picture, and phi (x) represents a feature vector obtained through transfer learning;
then, the obtained support set image and the query set image are converted into a three-dimensional tensor characteristic form from a two-dimensional form, and in an N-way K-shot task, two independent linear projections are used as support set characteristics
Figure FDA0003929975720000032
Generation of bond K s Sum value V s Projection head
Figure FDA0003929975720000033
Sum value projection head
Figure FDA0003929975720000034
Carrying out characteristic dimension transformation; similarly, a linear projection is used as a query set feature
Figure FDA0003929975720000035
Generating a characteristic Q q Projection head
Figure FDA0003929975720000036
Carrying out characteristic dimension transformation; after the feature spaces of the support set and the query set are respectively obtained, point multiplication is carried out on the feature spaces between the corresponding points of the respective dimensions to obtain a series of semantic relation matrixes between the query images and the support classes;
if the semantic distance between the corresponding points in the space of the query set and the support set is close, namely the attention value between the corresponding points in the space of the support set and the space of the query set is larger, the corresponding points are likely to have similar local features, otherwise, the semantic relationship between the corresponding points is relatively weaker; firstly, calculating a semantic relation matrix between a query image and a spatial corresponding point on each support class to obtain R n
Figure FDA0003929975720000037
R n Each row in the query image represents semantic similarity of each point in the query image to all points of all images in the support set; a sparse spatial cross attention algorithm is applied and used for finding point features related to tasks in a query image;
after all attention points related to the task are collected, a mask m = [ m ] is applied 1 ;…;m k ]Obtaining the feature with large attention point, deleting the feature when the attention point is small, setting a threshold in advance, and if the value in the semantic relation matrix is larger than the threshold, m is i Equal to 1, otherwise 0, where the threshold is set to 0.5; using a mask m and a semantic relation matrix R n Multiplying to obtain a sparse attention image a n And uses it to match the key value V of each support set s Performing semantic alignment to obtain a spatial location corresponding to the query image set, resulting in a task-specific prototype vector t, calculated as:
a n =m*R n
Figure FDA0003929975720000038
also need to make projection head for inquiry set
Figure FDA0003929975720000039
And (3) performing characteristic dimension transformation, transforming the characteristic dimension into the same size as the prototype vector t, and performing measurement calculation:
Figure FDA0003929975720000041
wherein H 'and W' are the height and width of the original image, respectively, and W p Expressed as a query set feature
Figure FDA0003929975720000042
Passes through the projection head
Figure FDA0003929975720000043
Obtained through conversion; if the distances are close, the same category is obtained, otherwise, the same category is not obtained.
CN202211386361.5A 2022-11-07 2022-11-07 Chip surface defect detection method based on sparse space perception and meta-learning Withdrawn CN115527072A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211386361.5A CN115527072A (en) 2022-11-07 2022-11-07 Chip surface defect detection method based on sparse space perception and meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211386361.5A CN115527072A (en) 2022-11-07 2022-11-07 Chip surface defect detection method based on sparse space perception and meta-learning

Publications (1)

Publication Number Publication Date
CN115527072A true CN115527072A (en) 2022-12-27

Family

ID=84705207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211386361.5A Withdrawn CN115527072A (en) 2022-11-07 2022-11-07 Chip surface defect detection method based on sparse space perception and meta-learning

Country Status (1)

Country Link
CN (1) CN115527072A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309567A (en) * 2023-05-17 2023-06-23 西南石油大学 Shale electron microscope pore intelligent recognition method for small sample
CN116824271A (en) * 2023-08-02 2023-09-29 上海互觉科技有限公司 SMT chip defect detection system and method based on tri-modal vector space alignment
CN117474928A (en) * 2023-12-28 2024-01-30 东北大学 Ceramic package substrate surface defect detection method based on meta-learning model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116309567A (en) * 2023-05-17 2023-06-23 西南石油大学 Shale electron microscope pore intelligent recognition method for small sample
CN116824271A (en) * 2023-08-02 2023-09-29 上海互觉科技有限公司 SMT chip defect detection system and method based on tri-modal vector space alignment
CN116824271B (en) * 2023-08-02 2024-02-09 上海互觉科技有限公司 SMT chip defect detection system and method based on tri-modal vector space alignment
CN117474928A (en) * 2023-12-28 2024-01-30 东北大学 Ceramic package substrate surface defect detection method based on meta-learning model
CN117474928B (en) * 2023-12-28 2024-03-19 东北大学 Ceramic package substrate surface defect detection method based on meta-learning model

Similar Documents

Publication Publication Date Title
Lin et al. Transfer learning based traffic sign recognition using inception-v3 model
CN109584248B (en) Infrared target instance segmentation method based on feature fusion and dense connection network
CN106650721B (en) A kind of industrial character identifying method based on convolutional neural networks
CN106547880B (en) Multi-dimensional geographic scene identification method fusing geographic area knowledge
CN111950649B (en) Attention mechanism and capsule network-based low-illumination image classification method
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN115527072A (en) Chip surface defect detection method based on sparse space perception and meta-learning
CN111611874B (en) Face mask wearing detection method based on ResNet and Canny
CN111079683A (en) Remote sensing image cloud and snow detection method based on convolutional neural network
CN105046272B (en) A kind of image classification method based on succinct non-supervisory formula convolutional network
CN112488205A (en) Neural network image classification and identification method based on optimized KPCA algorithm
CN105205449A (en) Sign language recognition method based on deep learning
CN111652273B (en) Deep learning-based RGB-D image classification method
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN112613350A (en) High-resolution optical remote sensing image airplane target detection method based on deep neural network
CN110414616B (en) Remote sensing image dictionary learning and classifying method utilizing spatial relationship
CN111079514A (en) Face recognition method based on CLBP and convolutional neural network
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN115995040A (en) SAR image small sample target recognition method based on multi-scale network
CN114626476A (en) Bird fine-grained image recognition method and device based on Transformer and component feature fusion
CN112329771A (en) Building material sample identification method based on deep learning
CN114898472A (en) Signature identification method and system based on twin vision Transformer network
CN111968124A (en) Shoulder musculoskeletal ultrasonic structure segmentation method based on semi-supervised semantic segmentation
CN111401434A (en) Image classification method based on unsupervised feature learning
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20221227

WW01 Invention patent application withdrawn after publication