CN117314900B

CN117314900B - Semi-self-supervision feature matching defect detection method

Info

Publication number: CN117314900B
Application number: CN202311596620.1A
Authority: CN
Inventors: 苏茂才; 林仁辉; 廖峪; 李林宽
Original assignee: Nobicam Artificial Intelligence Technology Chengdu Co ltd
Current assignee: Nobicam Artificial Intelligence Technology Chengdu Co ltd
Priority date: 2023-11-28
Filing date: 2023-11-28
Publication date: 2024-03-01
Anticipated expiration: 2043-11-28
Also published as: CN117314900A

Abstract

The invention discloses a semi-self-supervision feature matching defect detection method, which relates to the technical field of neural networks, and comprises the following steps: step 1: acquiring an image sample set from a defect detection dataset; step 2: performing sparse coding-based image denoising processing on each image sample in the image sample set; step 3: extracting a set of image samples using a feature extractor

Description

Semi-self-supervision feature matching defect detection method

Technical Field

The invention relates to the technical field of neural networks, in particular to a semi-self-supervision feature matching defect detection method.

Background

With the continuous progress of science and technology, the fields of image processing and computer vision have made a great breakthrough. In various applications, automatic detection of defects or anomalies is a critical task. For example, in the manufacturing industry, automatically detecting defects in a product can improve production quality and efficiency; in the medical field, automatically detecting abnormalities in medical images may assist a physician in making a diagnosis. Therefore, the automatic defect detection technology is one of the hot spots of research.

However, automatic defect detection faces many challenges, one of which is how to accurately detect defects in images. Conventional approaches typically require manual design of feature extractors and rely on manually constructed rules and heuristics, which limit their performance and applicability. In addition, the conventional method has limited robustness to different types of defects and image noise, and is often difficult to cope with complex real scenes. Therefore, it is highly desirable to find a more efficient, accurate and robust method of automatic defect detection.

In the field of image processing and computer vision, many automatic defect detection methods already exist. The following are some common methods:

The conventional image processing method comprises the following steps: conventional methods include edge detection, texture analysis, shape analysis, and the like. These methods typically require manual selection and adjustment of feature extractors and thresholds and are therefore poorly adaptable to different types of defects and image variations. In addition, they are sensitive to noise and are prone to false positives and false negatives.

The machine learning method comprises the following steps: machine learning methods such as Support Vector Machines (SVMs), random forests, and deep learning have been applied in defect detection. These methods rely on a large number of labeled training data and attempt to learn features and patterns from the data. However, the large amount of marking data required is often expensive and time consuming, and these methods may suffer from label inaccuracy.

The self-supervision learning method comprises the following steps: self-supervised learning is an emerging approach that attempts to address the problem of tag data starvation. It trains the model by automatically generating labels or tasks, such as rotation or color transformation of the image. The generated label may then be used for defect detection. Despite some advances made in self-supervised learning approaches, they still face challenges in extracting meaningful features and solving image matching problems.

Disclosure of Invention

The invention aims to provide a defect detection method based on semi-self-supervision feature matching, which improves the accuracy, robustness and applicability of defect detection and reduces noise influence, thereby having wide application prospects in the fields of industry, medicine, safety monitoring and the like.

In order to solve the technical problems, the invention provides a semi-self-supervision feature matching defect detection method, which comprises the following steps:

a method for detecting a defect based on semi-self-supervised feature matching, the method comprising:

step 1: acquiring a set of image samples from a defect detection datasetWherein->Representing the number of samples; every image sample->Comprises->Individual pixels, denoted->The method comprises the steps of carrying out a first treatment on the surface of the For each pixel->A representation; at the same time, a binary label set corresponding to the image sample set is definedWherein->，/>Representing pixel +.>Whether or not a defect is contained;

step 2: for image sample setsIs +.>Performing image denoising processing based on sparse coding;

step 3: extracting a set of image samples using a feature extractorIs +.>Obtaining a feature map set;

step 4: constructing a feature matching task of self-supervision learning, and defining a matching graph set Wherein each match is->Comprises->A number of matching points, expressed as，/>Representing pixel +.>Matching degree of (3);

step 5: constructing a memory network to encode the matching graph, and defining the hidden state of the memory network encoder asWherein->Indicate->Image samples->Representing pixel position +.>Representing a time step; the input of the memory network encoder is the matching degree of the current pixel +.>And the hidden state of the previous time step +.>；

Step 6: computing feature matching scores for each pixel locationFor measuring each image sample +.>Is +.>Consistency between; use of feature matching score->Estimating probability of defect for each pixel location；

Step 7: minimizing the overall loss function with random gradient descent，/>For each sample->To train the memory network encoder and the feature extractor;

step 8: after training is completed, performing defect detection on the new image by using the trained model; for each pixel location in the new imageCalculate->If->If a set threshold is exceeded, the pixel is marked as defective.

Further, step 2 specifically includes:

step 2.1: each image sample is processedConsidered as a matrix, the height of the matrix is +. >Width is->The method comprises the steps of carrying out a first treatment on the surface of the Each element in the matrix is a pixel +.>Corresponding pixel value +.>The method comprises the steps of carrying out a first treatment on the surface of the Learning a sparse dictionary->Wherein->Is the number of atoms in the dictionary; wherein (1)>Representing a dimension of a matrix, here representing a sparse dictionary +.>In>In (I)>Representing the real number field, ++>Indicating height, ->Representing the width;

step 2.2: for each pixel valueSolving sparse coding by using a sparse coding method;

step 2.3: estimating each pixel value using local statisticsNoise level +.>The method comprises the steps of carrying out a first treatment on the surface of the For each pixel value +.>Based on the estimated noise level ∈ ->Noise correction is performed.

Further, a sparse dictionary is learnedWhen learning is aimed at minimizing the following optimization problems:

；

wherein the method comprises the steps ofIs a sparse coding matrix, < >>Is a regularization parameter, +.>Representing the 2 nd order Frobenius norm; />Representing the Manhattan norm of order 1, the Manhattan norm also being referred to as the absolute value norm or L1 norm; />Represents the dimension of a matrix, here +.>In>In (I)>Representing the real number field, ++>Indicating height, ->Representing the width.

Further, for each pixel value in step 2.2When solving sparse coding by using a sparse coding method, for each sparse coding in a sparse coding matrix +. >The following formula is used to solve:

；

wherein,is a sparse coding parameter, < >>The larger the value, the higher the sparsity.

Further, for each pixelPixel value +.>The estimated noise level based +.>And (3) performing noise correction:

；

wherein the method comprises the steps ofIs a positive value set to prevent zero denominator ++>For the pixel value of each pixelA result after noise correction is carried out; />The representation is->Or->Is a larger value of (a).

Further, each image sample is processedInputting into a preset convolutional neural network for feature extraction to obtain each image sample +.>Feature map set->Wherein->Representing the number of characteristic channels>Representing a first characteristic map,/->Representing a second feature map and so on up to +.>The method comprises the steps of carrying out a first treatment on the surface of the Constructing a memory network to encode the matching graph, and defining the hidden state of the memory network encoder as +.>Wherein->Indicate->Image samples->Representing pixel position +.>Representing a time step; the input of the memory network encoder is the matching degree of the current pixel +.>And hidden state of previous time stepThe method comprises the steps of carrying out a first treatment on the surface of the Initializing a network-based hidden state +.>And memory cell->Is zero vector; for each time step +.>Calculate- >、/>、/>、/>、/>And->：

；

Wherein,is in a hidden state; />The memory unit is used for memorizing the internal state of the network and storing the previous information; />Matching degree calculated in the feature matching task of self-supervision learning; />For the input unit, the control is made at time step +.>How much new information is written into the memory unit; />Is a Sigmoid function; />Is a forgetting unit; />Is an output unit; />As a new memory cell, this is a candidate new memory cell, which is indicated at time step +.>New information that should be updated into the memory cell; it passes the hyperbolic tangent function->Scaling to produce a candidate memory having a value between-1 and 1; />A weight matrix for the input unit; />A weight matrix for forgetting units; />A weight matrix for the output unit;a weight matrix for the new memory cell; />A bias term for the input unit; />Controlling the opening and closing of the forgetting unit as a deviation item of the forgetting unit; />Controlling the opening and closing of the output unit as a deviation term of the output unit; />The bias term for the new memory cell affects the calculation of the new memory cell.

Further, a feature matching score for each pixel location is calculated using the following formula ：

；

Wherein,represented in the image sample +.>The%>The individual characteristic channels are at pixel locations->Is a value of (2).

Further, the defect probability for each pixel location is estimated using the following formula：

。

Further, the feature extractor is a neural network.

Further, the step 7 specifically includes: initializing parameters of a memory network encoder and a feature extractor; for each image sampleAnd corresponding label->The following training steps were performed: use of feature extractor for image +.>Forward propagation is carried out to obtain a feature map set +.>The method comprises the steps of carrying out a first treatment on the surface of the Matching graph using memory network encoder +.>Forward propagation is carried out, and a coded matching diagram is obtained; according to feature matching score->And tag->Calculating a loss function->The method comprises the steps of carrying out a first treatment on the surface of the Calculating gradients using a back propagation algorithm, updating parameters of the memory network encoder and feature extractor to reduce the loss function +.>The method comprises the steps of carrying out a first treatment on the surface of the Repeating the steps until reaching the preset stopping condition.

The defect detection method based on semi-self-supervision feature matching has the following beneficial effects: the invention improves the accuracy of defect detection. Conventional defect detection methods are often limited by manual feature design and threshold selection, and are therefore prone to false positives and false negatives when dealing with complex image scenes and different types of defects. In contrast, the invention adopts a self-supervision learning method to automatically learn the image characteristics, thereby reducing the dependence on the manual design characteristics. In addition, by introducing a feature matching task, the method can better capture the features of the defect area, and improve the sensitivity of defect detection. The method based on feature matching can more accurately locate and identify the defects, reduces the possibility of misjudgment, and has higher accuracy in practical application. The invention improves the accuracy and the robustness of defect detection through sparse coding denoising. In an actual image, defects may be masked or blurred due to factors such as illumination variation and noise, and the conventional method is difficult to accurately identify. Sparse coding denoising techniques have been widely studied and applied in the field of image processing, which recover an original image by learning a sparse representation of the image, thereby effectively removing noise and redundant information. In the invention, the step of denoising by using sparse coding is helpful for extracting key features in the image, and the image noise is reduced to the minimum, thereby increasing the accuracy of defect detection. The invention constructs the memory network encoder, and enhances the robustness of the method. The memory network encoder functions like a human memory system that is capable of storing and retrieving previous information. In defect detection, this means that the system can remember previous matches, thereby better accommodating complex image scenarios and different defect types. The memory network encoder can capture the matching relation between the features by updating the hidden state and the memory unit, so that the robustness is further improved.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1: a method for detecting a defect based on semi-self-supervised feature matching, the method comprising:

this step first obtains a set of image samples from a defect detection dataset. These image samples typically contain an image of the workpiece or product to be inspected. The quality and diversity of the data set is critical to the performance of the model. At the same time, a set of binary labels corresponding to the set of image samples is defined +. >. These labels are used to indicate whether each image pixel contains a defect, i.e. +.>. Marking is performed by professionals who will scrutinize each image to determine the location of the defect. Combining the image samples in the dataset with the corresponding tags, a set of image samples is constructed +.>And tag set->These samples will be used forTraining and evaluating a semi-self-supervision feature matching defect detection model.

This step provides label information about whether each image pixel is defective. These tags are necessary for supervised learning on the image data and they will be used to train the model to identify defects. Construct a set of image samplesAnd tag set->These data will be input to a training semi-self-supervised feature matching defect detection model. In deep learning, well-prepared training data is critical to training an accurate model. The label information defines the target of the task for the model, i.e. identifying defects in the image. The model will attempt to learn features from the input image that match the label to achieve accurate defect detection.

Sparse coding is a signal processing technique whose principle is to represent the observed signal (in this case an image) as a linear combination of a set of basis functions, where most coefficients are zero and only a few coefficients are non-zero. This means that most of the information in the signal can be represented with a small number of non-zero coefficients. In image processing, noise refers to random or useless pixel values that may interfere with image analysis and feature extraction. By applying sparse coding techniques, noise can be reduced or eliminated to improve image quality.

Various types of noise may be included in the image samples, such as sensor noise, noise due to illumination variations, and the like. The main function of step 2 is to reduce or eliminate these noise so that the subsequent image processing steps are not disturbed by the noise. The denoising process helps to improve the quality of the image, so that the image is clearer and easier to analyze. This is critical to the defect detection task because defects can be very small or ambiguous, and are more difficult to identify if the image quality is poor. In step 3, a feature extractor will be used to extract features of the image. The denoising process helps to improve the performance of feature extraction because the feature extractor more easily recognizes and extracts sharp features without interference from noise. The denoising process may improve the performance of the entire defect detection system. By reducing noise, the model will learn more easily about the defect-related features and perform better in the defect detection task.

feature extraction is a key step in image processing, and its principle is to extract representative features from original image data by mathematical methods and algorithms. These features are typically represented in digital form and capture information of shape, texture, color, etc. in the image. In the field of deep learning, a commonly used feature extractor is a Convolutional Neural Network (CNN). CNN is a neural network with multiple convolutional layers and pooling layers that is capable of automatically learning features in an image. Through convolution operations, the CNN can capture local patterns and structures in the image.

The main function of step 3 is to extract key information about the defect from the input image sample. Such information may include shape, size, texture, etc. characteristics of the defect to facilitate subsequent defect detection tasks. Feature extraction may also convert high-dimensional image data into a low-dimensional feature representation. This helps to reduce computational complexity and improves training efficiency of the model. The feature extractor can automatically learn abstract feature representations in the image through convolution and pooling operations. These representations help the model better understand the image, thereby improving the performance of defect detection.

Step 4: constructing a feature matching task of self-supervision learning, and defining a matching graph setWherein each match is->Comprises->A number of matching points, expressed as，/>Representing pixel +.>Matching degree of (3);

self-supervised learning is an unsupervised learning method in which models are learned by automatically generating labels or tasks. In this case, self-supervised learning is used to build feature matching tasks related to defect detection, rather than relying on external labels. Each image sampleWill construct a matching diagram +.>. Every point in the matching diagram +.>Representing pixel position +.>Is a value between 0 and 1 reflecting the degree of similarity of the pixel to other pixels.

The main function of the step 4 is to construct a feature matching task of self-supervision learning. The goal of this task is to establish associations between different pixels in an image through a matching graph to facilitate subsequent feature matching analysis. This task is innovative in defect detection because it does not rely on external labels. Unlike conventional supervised learning, self-supervised learning does not require external tags. The matching values in the matching graph serve as automatically generated labels for the task, and the model learns the features by maximizing the consistency of these labels. Building a matching graph helps the model learn structural information in the image. The model will try to understand which pixels are interrelated in the image, which helps to improve the accuracy of defect detection, especially for complex defect shapes. The construction of the matching graph and the feature matching task help improve the quality of the feature representation. The model may adjust the feature representation by matching the degree of matching in the graph to better reflect the relevant information in the image.

The memory network encoder is a neural network structure,it is used to encode input information into a hidden state to capture relationships and dependencies between inputs. In this case, the input to the memory network encoder is the matching degree information in the matching graph. Memorizing hidden states of network encoderIs a vector, wherein>Indicate->Image samples->Representing pixel position +.>Representing a time step. Updating of the hidden state depends on the degree of matching of the current pixel +.>And the hidden state of the previous time step +.>。

The main function of step 5 is to encode the matching graph. The matching degree information in the matching graph is converted to a hidden state by the memory network encoder so that subsequent steps can use these encodings to calculate feature matching scores. The memory network encoder captures the relationships and dependencies between pixels in the match graph through hidden states. This helps the model to better understand the information in the matching graph, further improving the accuracy of feature matching. By encoding the matching degree information into a hidden state, the memory network encoder helps model the correlation between different pixels in the image. This is very important for the defect detection task, since defects typically have a certain spatial correlation. The hidden state of the encoded matching graph may be used to calculate feature matching scores that are used to measure the consistency between features. By encoding the matching graph, the matching degree between the features can be evaluated more accurately, so that the feature matching performance is improved.

Feature matching scoreFor measuring each image sample +.>Features of (a) and (b) the corresponding matching diagram>Is a relationship between the degree of matching in the database. A higher match score indicates that features in the image more consistently match the match graph, while a lower match score indicates inconsistency.

The main function of step 6 is to measure the consistency between features in the image and the matching graph by calculating a feature matching score. This helps determine which parts of the features match the match graph better, potentially representing normal parts, and which parts of the features do not match, potentially representing defects. Feature matching scoreCan be used to estimate each pixel in generalPosition ofDefect probability of->. A higher matching score may mean that the pixel location is less likely to contain defects, while a lower matching score may represent a higher probability of defects. The feature matching score is used to determine which pixel locations in the image may contain defects. By setting the appropriate threshold, the matching score can be compared to the threshold, marking the pixel location as defective or non-defective. Step 6 helps to improve the accuracy of defect detection because it takes into account the relationship between the matching map and the image features by means of feature matching. This may make the system more sensitive to detecting defects and reduce false positives.

specifically, random gradient descent (SGD): SGD is an optimization algorithm used to train a deep learning model. The principle is to update the parameters of the model by means of a back-propagation algorithm to minimize the loss function. At each iteration, the SGD randomly selects a small set of training samples to estimate the gradient and update the parameters. Overall loss functionIs the sum of the losses of all samples, usually expressed as +.>Wherein->Is per sample->Is a loss value of (2).

The main role of step 7 is to train the model by using the loss function of the semi-self-supervised feature matching task. In each training iteration, the model continuously adjusts its parameters by minimizing the loss function so that it can better adapt to the task. The SGD optimization algorithm gradually adjusts the weights and deviations of the model through repeated iterations to converge the loss function to a minimum. This helps the model learn better about feature representations related to feature matching and defect detection tasks. During training, the model learns the relationship between the image features and the matching graph by observing the training data. This helps to improve the generalization ability of the model, enabling it to detect defects on new images that have not been seen. In the training process, the super parameters (such as learning rate and the like) of the model can be adjusted so as to further improve the training efficiency and performance.

In step 7, the model has been trained to learn the relationship between the image features and the matching graph and how to detect defects. Step 8 is a process of applying the trained model to the new image. In the defect detection task, it is often necessary to set a threshold value to determine the presence or absence of a defect. The threshold is a parameter that determines whether pixel locations with feature matching scores above or below the threshold are marked as defective.

The main function of step 8 is to detect defects in the new image using the trained model. For each pixel location in the new imageModel calculation feature matching score ++>(calculated in step 6) and then compared with a preset threshold. If the feature matches the score->Above a set threshold value, the pixel position +.>Will be marked as non-defective. If below the threshold, the pixel location may be marked as defective. This step can be used to locate and mark defective areas in the image. Step 8 allows the application of the trained model in real time for defect detection. This is very important for real-time applications such as automated production lines, as it can help to find and handle defects in time. Based on the output of the model, decisions related to defect detection, such as triggering alarms, downtime, recording defect locations, etc., can be made automatically.

Example 2: based on the above embodiment, step 2 specifically includes:

step 2.1: each image sample is processedConsidered as a matrix, the height of the matrix is +.>Width is->The method comprises the steps of carrying out a first treatment on the surface of the Each element in the matrix is a pixel +.>Corresponding pixel value +.>The method comprises the steps of carrying out a first treatment on the surface of the Learning a sparse dictionary->Wherein->Is the number of atoms in the dictionary; wherein (1)>Representing a dimension of a matrix, here representing a sparse dictionary +.>In>In (I)>Representing the real number field, ++>Indicating height, ->Representing the width;

Specifically, sparse dictionary learning is an unsupervised learning method aimed at learning a set of atoms (or dictionary terms) from image data so as to represent an input image sample. Each atom is a small image block of fixed shape, and the entire image can be represented by a linear combination. The purpose of learning a sparse dictionary is to represent the image samples as a linear combination of a set of atoms, which helps reduce noise while preserving information about the image structure. The selection and learning of dictionary items is performed based on input data, and thus has adaptivity.

Sparse coding is the process of representing image samples as linear combinations of dictionary terms. Pixel value for each pixelThe sparse coding method finds the weights that best fit the dictionary terms that represent it, so that this representation has sparsity, i.e., most of the weights are close to zero. Sparse coding converts an image sample from an original pixel space into a sparse representation of dictionary terms by linearly combining each pixel. This helps to remove noise in the image and extract more informative features.

In this step, each pixel is estimated by local statisticsNoise level of pixel values of (2)>. The noise level represents the noise level of each pixel. Then, using the estimated noise level +.>Noise correction is performed for each pixel, typically by using an image denoising filter. The purpose of noise estimation and correction is to reduce noise in the image and improve the quality of the image. By estimating and correcting the noise level, it is possible toThe details and features in the image are better preserved, thereby facilitating feature extraction and defect detection in subsequent steps.

In summary, the three sub-steps of step 2 are commonly used for image denoising and feature enhancement. First, a sparse dictionary is learned for representing an image, which is then represented as a linear combination of dictionary terms using a sparse coding method. Finally, by estimating the noise level and performing noise correction, the image quality is improved, providing better input for subsequent feature extraction and defect detection. These steps help reduce noise interference with defect detection and improve detection performance.

Example 3: based on the above embodiment, the sparse dictionary is learnedWhen learning is aimed at minimizing the following optimization problems:

；

Specifically, the data item is fitted: first itemWeigh through dictionary item->Fitting the input image samples by linear combination of +.>To a degree of (3). The goal is to minimize the fitting error to ensure that the dictionary representation is able to reconstruct the original image.

Sparsity regularization term: second itemIs a sparsity regularization term which encourages sparse coding matrix +.>Has as many zero elements as possible, which can be achieved by +.>Norms. Sparsity helps to ensure that each pixel is associated with only a few dictionary atoms, thereby improving the compactness of the representation.

Regularization parameters: regularization parameter->The weight of the sparsity term is controlled. By adjusting->Can balance the trade-off between data fit and sparsity. Greater- >The values will push the coding matrix more sparse, while smaller +.>The values will then emphasize the data fit even more.

Dictionary learning: by optimizing the problem, the optimal dictionary can be learnedAnd sparse coding matrix->. Dictionary->The columns are learned as the atoms most suitable for representing the data, and +.>The value of (2) represents the relationship of each pixel to the dictionary atom. The dictionary learning method is adaptive and can learn the best representation according to the input data.

Example 4: on the basis of the above embodiment, for each pixel value in step 2.2When solving sparse coding by using a sparse coding method, for each sparse coding in a sparse coding matrix +.>The following formula is used to solve:

；

In particular, the method comprises the steps of,is the pixel value to be encoded. />Is a dictionary matrix that has been learned to represent image features. />Is sparsely encoded, representing pixel values +.>For dictionary->Is a weight of a linear combination of (a) and (b). />Is->Loss terms are used to measure the reconstruction error of the code. It means that the pixel value is to be minimized +.>Sparse coding of->Dictionary->Euclidean distance between them. />Is->Regularization term for promoting sparsity. Regularization parameter- >Controls the degree of sparsity, greater +.>The value will promote +>With more zero elements, thereby increasing sparsity.

Fitting the data item: first itemThe reconstruction capability of sparse coding is measured. The objective of the optimization problem is to minimize the code reconstruction error to ensure that sparse coding can reconstruct the original pixel values efficiently.

Sparsity regularization term: second itemIs->Regularization term encouraging coding matrix +.>With as many zero elements as possible, thereby improving sparsity. Sparsity helps select dictionary atoms that are strongly related to image pixels.

Regularization parameters: regularization parameter->The weight of the sparsity term is controlled. Greater->The value will emphasize sparsity even more, resulting in a code with more zerosPlain, thereby improving sparsity.

Example 5: on the basis of the above embodiment, for each pixelPixel value +.>The estimated noise level based +.>And (3) performing noise correction:

；

Specifically, noise estimation: noise levelIs estimated from the local statistics in step 2.3 above. It represents the noise level of each pixel, typically obtained by statistical methods of the local area of the image.

Noise correction: the purpose of noise correction is to reduce noise in an image to improve image quality. Correction of denominator portion of formulaIs a noise correction factor which is based on the estimated noise level +.>Pixel value +.>. Molecular fraction->Is the original pixel value, and +.>The intensity of the correction is controlled. The larger the noise correction factor, the more obvious the correction effect.

Denominator prevents zero division: in denominator ofPart is used to prevent denominator from being zero. If->Is very small, possibly resulting in a denominator close to zero, thus adding +.>As a small positive value, it is ensured that the denominator is always greater than zero.

Example 6: on the basis of the above embodiment, each image sample is sampledInputting into a preset convolutional neural network for feature extraction to obtain each image sample +.>Feature map set->Wherein->Representing the number of characteristic channels>Representing a first characteristic map,/->Representing a second feature map and so on up to +.>The method comprises the steps of carrying out a first treatment on the surface of the Constructing a memory network to encode the matching graph, and defining the hidden state of the memory network encoder as +.>Wherein->Indicate->The number of image samples is one,representing pixel position +.>Representing a time step; the input of the memory network encoder is the matching degree of the current pixel +. >And the hidden state of the previous time step +.>The method comprises the steps of carrying out a first treatment on the surface of the Initializing a network-based hidden state +.>And memory cell->Is zero vector; for each time step +.>Calculate->、/>、/>、/>、/>And->：

；

Wherein,is in a hidden state; />The memory unit is used for memorizing the internal state of the network and storing the previous information; />Matching degree calculated in the feature matching task of self-supervision learning; />For the input unit, the control is made at time step +.>How much new information is written into the memory unit; />Is a Sigmoid function; />Is a forgetting unit; />Is an output unit; />As a new memory cell, this is a candidate new memory cell, which is indicated at time step +.>New information that should be updated into the memory cell; it passes the hyperbolic tangent function->Scaling to produce a candidate memory having a value between-1 and 1; />A weight matrix for the input unit; />A weight matrix for forgetting units; />A weight matrix for the output unit;a weight matrix for the new memory cell; />A bias term for the input unit; />Controlling the opening and closing of the forgetting unit as a deviation item of the forgetting unit; />Controlling the opening and closing of the output unit as a deviation term of the output unit; / >The bias term for the new memory cell affects the calculation of the new memory cell.

In particular, hidden stateAnd memory cell->: these formulae describe the time step +.>In each pixel position +.>Hidden state +.>And memory cell->Is updated by the update process of (a). The hidden state is used for storing the previous information, and the memory unit is used for storing the previous memory. The effect of these states is to capture timing information in the match graph. Hidden state->Stores the current time step->Information in (a) and memory cell->History information is stored. This helps the memory network encoder understand the evolution process of the matching graph.

Input unitAnd forgetting unit->: input unit->And forgetting unit->Controlling the input of new information and the forgetting of old information. Their calculation is based on a matching graph->And the hidden state of the previous time step +.>. Input unitControlled at time step->How much new information is written into the memory cell->. Forgetting unit->Which old information should be forgotten. This enables the memory network to manage the updating and retention of information to accommodate different data patterns in the timing tasks.

Output unitAnd a new memory cell->: output unit- >Controlled at time step->The memory cell is to be output +.>Which information is to be used in the process. New memory cell->Provides candidate new information by hyperbolic tangent function +.>Scaling is performed. Output unit->Which information should be passed on to subsequent processing steps. New memory cell->Provides candidate new information for updating memory cell +.>. These two elements together affect the information management and output of the memory network.

Weight matrix and bias term: the weight matrix in these formulasAnd deviation term->Are model parameters that are used to adjust the input, forget, output and computation of new memory cells. The weight matrix and the bias term determine the calculation process for each cell. By training these parameters, the model can adapt to different tasks and data to improve the effectiveness of coding and information management.

Firstly, calculating the matching degree through a semi-self-supervision feature matching taskFor representing pixel positionIs a degree of matching of (a). Every image sample->Is input into a preset convolutional neural network for feature extraction to obtain a feature map set +.>Including a plurality of characteristic channels. In this step, the memory network encoder plays a key role. It uses a time sequence model for matching the degree of matching in the matching graph >Encoded as hidden state->And memory cell->. The memory network encoder manages the updating and retention of information through iterations of time steps to capture timing information in the match graph.

The presence of the memory network encoder helps to capture timing information in the match graph. Through iterative computation of hidden states and memory units, the encoder can understand the evolution process of the matching graph, thereby improving the sensitivity to image feature changes. The method integrates information from different time steps and feature channels through feature extraction and memory network coding. This helps enrich the feature representation so that subsequent defect detection is more accurate. The weight matrix and the deviation term in the memory network encoder are trainable parameters, and can be adjusted according to specific tasks and data, so that the model is more self-adaptive, and the generalization performance is improved. After feature extraction, time sequence information capture and feature integration, the method can calculate feature matching scores and defect probabilities for final defect detection. The presence of the memory network encoder helps to improve the accuracy and stability of defect detection.

Example 7: on the basis of the above embodiment, the feature matching score for each pixel position is calculated using the following formula ：

；

Specifically, a feature matching score is used to measure each pixel locationThe degree of matching between the features at that location and the hidden state in the memory network encoder. If the characteristics of a pixel location have a high degree of concealmentConsistency, then the feature matching score will be higher and vice versa. By calculating the feature matching score, the method can emphasize those pixel locations that are associated with defects in the feature representation. This helps to focus on areas that may contain defects, improving the sensitivity of detection. The summing operation in the formula takes into account the information of all the characteristic channels, not just a single characteristic channel. This helps to integrate the information of the different feature channels, providing a more comprehensive representation of the features, thereby enhancing the robustness of defect detection. The feature matching score is a quantitative measure that can be used to determine the degree of matching for each pixel location, which is useful for subsequent probability of defect estimation. When the feature matching score is high, it means that the pixel location is more likely to contain a defect.

Example 8: on the basis of the above embodiment, the defect probability of each pixel position is estimated using the following formula ：

。

By calculating probability of defectThe method enables an estimation of the probability of defects for each pixel location. This is one of the key outputs of defect detection, which indicates whether each pixel location is likely to contain a defect. The cross entropy loss in the formula measures the difference between the model's predictions and the actual labels. When the predictions of the model are close to the actual tags, the losses are lower, whereas the losses are higher. By minimizing the loss function, the model may learn the tuning parameters to improve the accuracy of defect detection. Defect probability->Is a probability value, and the value is between 0 and 1. This enables the model to provide uncertainty information about each pixel location, i.e. the confidence of the model for the existence of defects. Through meterCalculating loss value->And gradient descent optimization is performed, and the method can train parameters of the model to adapt to different defect detection tasks and data sets. This allows the model to adaptively improve detection performance.

Example 9: on the basis of the above embodiment, the feature extractor is a neural network.

Specifically, the feature extractor is a neural network whose purpose is to sample the input imageConversion to a higher level of the characteristic representation +. >. Neural networks typically include components of convolution layers, pooling layers, activation functions, etc., that are capable of learning abstract features in an image, such as texture, shape, edges, etc. The neural network may learn a higher level representation of features in the image. These feature representations may include semantic information, such as the shape and structure of the object, which helps to better understand the content in the image. End-to-end learning can be performed using a neural network as the feature extractor. This means that the parameters of the neural network can be trained along with the entire defect detection model to maximize detection performance.

Example 10: based on the above embodiment, the step 7 specifically includes: initializing parameters of a memory network encoder and a feature extractor; for each image sampleAnd corresponding label->The following training steps were performed: use of feature extractor for image +.>Forward propagation is carried out to obtain a feature map set +.>The method comprises the steps of carrying out a first treatment on the surface of the Matching graph using memory network encoder +.>Forward propagation is carried out, and a coded matching diagram is obtained; according to feature matching score->And tag->Calculating a loss function->The method comprises the steps of carrying out a first treatment on the surface of the Calculating gradients using a back propagation algorithm, updating parameters of the memory network encoder and feature extractor to reduce the loss function +. >The method comprises the steps of carrying out a first treatment on the surface of the Repeating the steps until reaching the preset stopping condition.

Specifically, first, parameters of the feature extractor and the memory network encoder need to be initialized. This step ensures that the initial parameters of the model are reasonable and ready for subsequent training. For each image sampleForward propagation through feature extractor to obtain feature map set +.>. At the same time, use memory network encoder to match the map +.>And carrying out forward propagation to obtain a coded matching graph. According to feature matching score->And tag->Calculating a loss function->. The loss function is typically a cross entropy loss or other loss function suitable for a two-class task. It measures the difference between the model's predictions and the actual labels. Calculating a loss function using a back propagation algorithm>Gradient with respect to model parameters. This step determines how to adjust the model parameters to reduce the loss function. And updating parameters of the feature extractor and the memory network encoder by using the calculated gradient. The parameter updating operation is typically performed using a gradient descent method or a variation thereof. The above steps are repeated until a preset stopping condition is reached, such as a maximum number of iterations or the loss function converges to a stable value. The iterative training process enables the model to be continuously optimized, and defect detection performance is improved.

The present invention has been described in detail above. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method for detecting a defect based on semi-self-supervised feature matching, the method comprising:

step 1: acquiring a set of image samples from a defect detection datasetWherein->Representing the number of samples; every image sample->Comprises->Individual pixels, denoted->The method comprises the steps of carrying out a first treatment on the surface of the For each pixel->A representation; at the same time, a binary label set corresponding to the image sample set is defined +.>Wherein->，/>Representing pixel +.>Whether or not a defect is contained;

step 3: extracting a set of image samples using a feature extractorIs +. >Is characterized byA set of graphs;

step 4: constructing a feature matching task of self-supervision learning, and defining a matching graph setWherein each match is->Comprises->The matching points are denoted->，Representing pixel +.>Matching degree of (3);

Step 6: computing feature matching scores for each pixel locationFor measuring each image sample +.>Is +.>Consistency between; use of feature matching score->To estimate the defect probability +/for each pixel position>；

2. The method for detecting defects based on semi-self-supervised feature matching as recited in claim 1, wherein the step 2 specifically comprises:

step 2.1: each image sample is processedConsidered as a matrix, the height of the matrix is +.>Width is->The method comprises the steps of carrying out a first treatment on the surface of the Each element in the matrix is a pixel +.>Corresponding pixel value +.>The method comprises the steps of carrying out a first treatment on the surface of the Learning a sparse dictionary->Wherein->Is the number of atoms in the dictionary; wherein (1)>Representing a dimension of a matrix, here representing a sparse dictionary +.>In>In (I)>Representing the real number field, ++>Representing height; />Representing the width;

step 2.3: estimating each pixel value using local statisticsNoise level +.>The method comprises the steps of carrying out a first treatment on the surface of the For each pixel valueBased on the estimated noise level ∈ ->Noise correction is performed.

3. The semi-self-supervised feature matching imperfection detection method based on claim 2, wherein a sparse dictionary is learnedWhen learning is aimed at minimizing the following optimization problems:

；

wherein the method comprises the steps ofIs a sparse coding matrix, < >>Is a regularization parameter, +.>Representing the 2 nd order Frobenius norm; />Representing the Manhattan norm of order 1, the Manhattan norm also being referred to as the absolute value norm or L1 norm; / >Represents the dimension of a matrix, here +.>In>In (I)>Representing the real number field, ++>Indicating height, ->Representing the width.

4. A semi-self-supervised feature matching flaw detection method as recited in claim 3, wherein for each pixel value in step 2.2When solving sparse coding by using a sparse coding method, for each sparse coding in a sparse coding matrix +.>The following formula is used to solve:

；

5. The semi-self-supervised feature matching flaw detection method as recited in claim 4, wherein for each pixelPixel value +.>The estimated noise level based +.>And (3) performing noise correction:

；

wherein the method comprises the steps ofIs a positive value set to prevent zero denominator ++>For the pixel value of each pixel +.>A result after noise correction is carried out; />The representation is->Or->Is a larger value of (a).

6. The semi-self-supervised feature matching flaw detection method as recited in claim 1, wherein each image sample is processedInputting into a preset convolutional neural network for feature extraction to obtain each image sample +. >Feature map set->Wherein->Representing the number of characteristic channels>Representing a first characteristic map,/->Representing a second feature map and so on up to +.>The method comprises the steps of carrying out a first treatment on the surface of the Building a memory network pair matchCoding the diagram, defining the hidden state of the memory network coder as +.>Wherein->Indicate->Image samples->Representing pixel position +.>Representing a time step; the input of the memory network encoder is the matching degree of the current pixel +.>And the hidden state of the previous time step +.>The method comprises the steps of carrying out a first treatment on the surface of the Initializing network-based hidden statesAnd memory cell->Is zero vector; for each time step +.>Calculate->、/>、/>、/>、/>And->：

；

Wherein,is in a hidden state; />The memory unit is used for memorizing the internal state of the network and storing the previous information; />Matching degree calculated in the feature matching task of self-supervision learning; />For the input unit, the control is made at time step +.>How much new information is written into the memory unit; />Is a Sigmoid function; />Is a forgetting unit; />Is an output unit; />As a new memory cell, this is a candidate new memory cell, which is indicated at time step +.>New information that should be updated into the memory cell; it passes the hyperbolic tangent function- >Scaling to produce a candidate memory having a value between-1 and 1;a weight matrix for the input unit; />A weight matrix for forgetting units; />A weight matrix for the output unit; />A weight matrix for the new memory cell; />A bias term for the input unit; />Controlling the opening and closing of the forgetting unit as a deviation item of the forgetting unit; />Controlling the opening and closing of the output unit as a deviation term of the output unit; />The bias term for the new memory cell affects the calculation of the new memory cell.

7. The semi-self-supervised feature matching flaw detection method as recited in claim 6, wherein the feature matching score for each pixel location is calculated using the following formula：

；

Wherein,represented in the image sample +.>The%>The characteristic channels are in pixel positionsPut->Is a value of (2).

8. The semi-self-supervised feature matching defect detection method as recited in claim 7, wherein the probability of defects for each pixel location is estimated using the following equation：

。

9. The semi-self-supervised feature matching flaw detection method as recited in claim 8, wherein the feature extractor is a neural network.

10. The method for detecting defects based on semi-self-supervised feature matching as recited in claim 9, wherein the step 7 specifically includes: initializing parameters of a memory network encoder and a feature extractor; for each image sample And corresponding label->The following training steps were performed: use of feature extractor for image +.>Forward propagation is carried out to obtain a feature map set +.>The method comprises the steps of carrying out a first treatment on the surface of the Matching graph using memory network encoder +.>Forward propagation is carried out, and a coded matching diagram is obtained; according to feature matching score->And tag->Calculating a loss function->The method comprises the steps of carrying out a first treatment on the surface of the Calculating gradients using a back propagation algorithm, updating parameters of the memory network encoder and feature extractor to reduce the loss function +.>The method comprises the steps of carrying out a first treatment on the surface of the Repeating the steps until reaching the preset stopping condition.