CN114170478A - Defect detection and positioning method and system based on cross-image local feature alignment - Google Patents

Defect detection and positioning method and system based on cross-image local feature alignment Download PDF

Info

Publication number
CN114170478A
CN114170478A CN202111502012.0A CN202111502012A CN114170478A CN 114170478 A CN114170478 A CN 114170478A CN 202111502012 A CN202111502012 A CN 202111502012A CN 114170478 A CN114170478 A CN 114170478A
Authority
CN
China
Prior art keywords
model
loss function
learning model
local feature
distillation learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111502012.0A
Other languages
Chinese (zh)
Inventor
苏勤亮
胡枭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111502012.0A priority Critical patent/CN114170478A/en
Publication of CN114170478A publication Critical patent/CN114170478A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a defect detection and positioning method and system based on cross-image local feature alignment, and relates to the technical field of object surface defect abnormity detection and positioning.

Description

Defect detection and positioning method and system based on cross-image local feature alignment
Technical Field
The invention relates to the technical field of object surface defect anomaly detection and positioning, in particular to a defect detection and positioning method and system based on cross-image local feature alignment.
Background
With the development of computer vision research, object surface defect detection and positioning technologies are widely applied in the fields of industrial vision detection, medical image lesion screening and the like, and the purpose of anomaly detection and positioning is to screen an abnormal sample picture and position an abnormal region in the sample.
At present, methods for anomaly detection and localization can be divided into two methods based on reconstruction and characterization similarity. The reconstruction-based method is mainly used for reconstructing a normal sample through a training self-encoder, a variational self-encoder or an impedance generation network, and during testing, an abnormal sample cannot be well reconstructed and is identified. When judging whether an image is an abnormal sample, the reconstruction error of the whole image is used, when an abnormal area is positioned, the reconstruction error of a pixel level is used, the reconstruction-based method is very visual and has interpretability, but the performance is often limited by a generated model, because the abnormal sample can be well reconstructed at some times, and particularly when the abnormal sample is highly similar to a normal sample, the reconstruction error failure phenomenon can be generated. The method based on the similarity of the features is to use a deep neural network to extract the features of the whole image for anomaly detection, extract the features of the local blocks of the image for anomaly location, and most methods based on the similarity of the features can obtain better results than methods based on reconstruction, but lack interpretability because the anomaly score in the method is derived from the distance between the image features of the test set and the normal sample features of the training set, and it is difficult to know which part of the anomaly image leads to high anomaly score. Moreover, the amount of calculation for locating the abnormality based on the image block is relatively large in such methods.
In the method based on the characterization similarity, a mode based on a distillation learning model is also one of the modes, and the method has better interpretability, for example, the prior art discloses a positive sample industrial defect detection method based on knowledge distillation, firstly, an industrial data set is constructed, then, the industrial data set is preprocessed, the preprocessed industrial data set comprises a positive sample set and a defect sample set without labels, and then, a teacher network model is pre-trained on the formed industrial data set by using self-supervision contrast learning; and on the basis of the formed positive sample set, guiding the training of the student network model by using the teacher network model obtained by training, and finally carrying out defect detection on the picture to be detected by using the teacher network model obtained by training and the student network model obtained by training. Because the student network model only learns the capability of extracting the positive sample characteristics, the characteristics extracted from the defect area are greatly different from the teacher network model, and therefore the student network model can be used as the basis for defect judgment. Actually, an industrial data set for detecting surface defects of an object at present has a big characteristic that most of images are the same object after registration (alignment of two or more images of the same target at a spatial position), at this time, cross-image local features are highly correlated, and cross-image local feature alignment information can ensure sensitivity of a model to fine-grained local pixel-level features, but the existing method based on a distillation learning model is not applied to the information, so that the defect detection and positioning effect is not good, and how to apply the information improves the defect detection and positioning effect, which becomes a difficult problem to be solved.
Disclosure of Invention
In order to solve the problem that the defect detection and positioning effects of the traditional mode based on the distillation learning model are poor, the invention provides a method and a system for defect detection and positioning based on cross-image local feature alignment
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a defect detection and positioning method based on cross-image local feature alignment comprises the following steps:
s1, constructing a distillation learning model, wherein the distillation learning model comprises a teacher model T and a student model S;
s2, determining a loss function of the distillation learning model;
s3, constructing a data set, dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;
s4, constructing local feature alignment loss functions of cross-images in a plurality of normal image samples in the same training batch;
s5, integrating the local feature alignment loss function in the S2 with the loss function of the distillation learning model in the S4 to form a total loss function of the distillation learning model based on cross-image local feature alignment;
s6, inputting normal image samples in the training set into a teacher model T and a student model S at the same time, fixing the parameters of the teacher model T unchanged, taking a total loss function as training guidance, and guiding the student model S to train by using the teacher model T so as to train a distillation learning model and obtain a trained distillation learning model;
and S7, taking the image samples of the test set as input samples of the trained distillation learning model, and starting from the gradient of the total loss function relative to the input samples, and carrying out defect detection and positioning on the image samples of the test set by using the trained distillation learning model.
In the technical scheme, firstly, a distillation learning model (comprising a teacher model T and a student model S) is built, a loss function of the distillation learning model is determined, then a data set is built and divided into a training set and a testing set, cross-image local feature alignment information in a plurality of normal image samples of a uniform training batch is fused into the loss function of the distillation learning model to form a final loss function, the normal image samples are simultaneously input into the teacher model T and the student model S, parameters of the teacher model T are fixed and unchanged, the teacher model T is used for guiding the student model S to train so that the student model only obtains the extraction capability of the normal image sample features, the model can use the cross-image pixel local corresponding relation of the training set to constrain a feature space, and the sensitivity of the model to fine-grained local pixel-level features is ensured by means of the cross-image feature alignment information, thereby ensuring the effect of detecting and positioning the surface defect abnormity of the object.
Preferably, in the distillation learning model constructed in step S1, the teacher model T adopts a VGG16 network structure loaded with weights pre-trained on ImageNet, and the student model S adopts the same VGG16 network structure as the teacher model T and randomly initializes the weights.
In order to shorten the training time of the initial distillation learning model, the teacher model T loads weights pre-trained on ImageNet, the teacher model T has good feature extraction capability on normal image samples and abnormal image samples, and the student model S adopts the VGG16 network structure which is the same as that of the teacher model T, so that the teacher model T can further guide the training of the student model S.
Preferably, the VGG16 network structure includes several modules,
in step S2, the teacher model T and the student model S each take the last layer of the last four modules in their respective network structures as their respective key layer, and the loss function of the distilled learning model is:
L1=Lval+λLdir
wherein the content of the first and second substances,
Figure BDA0003402055040000031
Figure BDA0003402055040000032
wherein L is1A loss function representing a distillation learning model; CP (CP)iRepresents the output of the i-th critical layer of the VGG16 network structure; CP (CP)0Normal image features representing the original input VGG16 network structure;
Figure BDA0003402055040000033
the method comprises the steps of representing an activation value of an ith key layer of a teacher model T, wherein the activation value is a normal image feature output by a network structure key layer;
Figure BDA0003402055040000034
representing the activation value of the ith key layer of the student model S; n is a radical ofiRepresents CPiThe number of neurons in;
Figure BDA0003402055040000035
representing the activation value of the jth neuron in the ith key layer in the teacher model T;
Figure BDA0003402055040000036
representing the activation value of the jth neuron in the ith key layer in the student model S; n is a radical ofcpRepresenting the total number of critical layers; l isvalRepresenting the sum of Euclidean distances of corresponding activation values in each key layer of the teacher model T and the student model S; vec () represents a vectorization function that converts a matrix having an arbitrary dimension into a one-dimensional vector; l isdirThe cosine similarity of the vectors converted from the key layers corresponding to the teacher model T and the student model S is represented; λ represents an artificially set hyper-parameter. The process of establishing the loss function can restrict the similarity of the activation values output by the key layers of the student model S and the teacher model T and can restrict the similarity of the vector directions output by the key layers.
Preferably, it is assumed that cross-image local feature alignment loss function construction is performed on K normal image samples in the same training batch in step S4, in the training process of the distillation learning model, alignment losses of the activation value maps output by the first, second, and K-1 normal image samples and other K-1 normal image samples in the key layer of the VGG16 network structure corresponding to the teacher model T and the student model S are sequentially calculated pixel by pixel, then 1/2 is taken to eliminate contents of repeated calculation, and a cross-image local feature alignment loss function in the K normal image samples in the same training batch is obtained, where the expression is:
Figure BDA0003402055040000041
wherein L is2Representing a local feature alignment loss function of cross images in K normal image samples in the same training batch; because the alignment loss is calculated by pixel-by-pixel across the image by using the activation value map output by the key layer of the VGG16 network structure, the activation value map is obtained by performing convolution calculation on the original input normal image sample by using a convolution kernel, and one pixel position in the activation value map is equivalent to the original input normal image sampleOf at least 3 x 3 pixels.
Preferably, the expression of the total loss function of the formed distillation learning model based on cross-image local feature alignment in step S5 is:
Ltotal=L1+γL2
wherein L istotalRepresenting a total loss function based on a distillation learning model aligned across image local features; gamma denotes a hyper-parameter set manually at the time of training.
Preferably, in step S6, the normal image samples in the training set are input into the teacher model T and the student model S simultaneously, the parameters of the teacher model T are fixed and the total loss function is used as the training guidance, when the teacher model T is used to guide the student model S to train, the training mode is back propagation and gradient descent, the output of the key layer of the student model S is the image features of the normal image samples extracted by the student model S, the output of the key layer of the student model S is used to fit the output of the key layer of the teacher model T, so that the student model S only obtains the extraction capability of the features of the normal image samples, and in the training process, the total loss function L is usedtotalAnd when the model does not descend any more in 20 training rounds, completing model training to obtain a trained distillation learning model.
Here, the total loss function LtotalAlso ensures that the cross-image local feature alignment loss function L2 converges when the value of (c) stabilizes in a state where convergence no longer falls, ensuring that the cross-image local feature consistency, i.e., "alignment".
Preferably, in step S7, when the trained distillation learning model is used to detect and locate the defects of the image samples in the test set based on the gradient of the total loss function with respect to the input samples, the input samples in the test set are represented as x, and the gradient chart Λ of the total loss function with respect to the input samples is represented as x
Figure BDA0003402055040000051
Pixel gradient values of input sample x are trainedOne-time back propagation in the training process is directly obtained due to the total loss function L of the distillation learning model based on cross-image local feature alignmenttotalLoss function L including distillation learning model1And local feature alignment loss function L across images2When the step S7 uses the test set test, the loss function L of the distillation learning model is used when the input sample x is a normal image sample1And local feature alignment loss function L across images2The average is small, the gradient is small, and the opposite is true when the input sample x is an abnormal image sample; setting the gradient contribution threshold as epsilon, generating large gradient value at pixel position of input sample x with gradient contribution larger than epsilon, wherein the pixel position corresponds to abnormal defect area, namely, L is utilizedtotalThe abnormal defect area which causes the value increase is obtained.
The method comprises two layers when the trained distillation learning model is used for detecting and positioning the defects of the image samples in the test set, and firstly, the total loss function L based on the trained distillation learning modeltotalThe value of (a), in the abnormality detection, the input sample is judged to be a normal image sample or an abnormal image sample, and then, the total loss function LtotalThe gradient of the input sample x is significant, and the pixel contributing to the larger gradient, namely the region where the output difference of the key layer and the local feature alignment loss of the teacher model T and the student model T are larger, is also the abnormal detection region.
Preferably, L is being utilizedtotalWhen the gradient map is used for carrying out abnormal positioning, the method is realized by combining a smoothGrad algorithm.
Preferably, in order to improve the accuracy of defect detection and location, the gradient map Λ is processed by using gaussian smoothing and morphological open operation, and the satisfied process formula is as follows:
M=gσ(Λ)
Figure BDA0003402055040000052
wherein M represents the result of Gaussian smoothing of the gradient map Lambda; b represents an ellipse or a circleA binary map of (2);
Figure BDA0003402055040000053
and
Figure BDA0003402055040000054
respectively representing morphological erosion and expansion operations carried out by using the structural element B, also called morphological open operation; l ismapThe abnormal localization diagram after the gaussian smoothing and morphological opening operation of the gradient diagram Λ is shown. This process also reduces the noise of the gradient map Λ.
The invention also provides a defect detection and positioning system based on cross-image local feature alignment, which comprises the following components:
the distillation learning model building module is used for building a distillation learning model, and the distillation learning model comprises a teacher model T and a student model S;
a loss function determination module for determining a loss function of the distillation learning model;
the data set constructing and dividing module is used for constructing a data set and dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;
the alignment loss function construction module is used for constructing local feature alignment loss functions of cross images in a plurality of normal image samples in the same training batch;
the total loss function building module integrates the local feature alignment loss function with the loss function of the distillation learning model to form a total loss function of the distillation learning model based on cross-image local feature alignment;
the distillation learning model training module is used for inputting the normal image samples in the training set into the teacher model T and the student model S at the same time, fixing the parameters of the teacher model T unchanged, taking the total loss function as training guidance, and guiding the student model S to train by using the teacher model T, so that the distillation learning model is trained, and the trained distillation learning model is obtained;
and the test module takes the image samples of the test set as input samples of the trained distillation learning model, and performs defect detection and positioning on the image samples of the test set by using the trained distillation learning model from the gradient of the total loss function relative to the input samples.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a defect detection and positioning method and system based on cross-image local feature alignment, which comprises the steps of firstly building a distillation learning model and determining a loss function, then integrating cross-image local feature alignment information into the loss function of the distillation learning model to form a total loss function, training the distillation learning model under the total loss function by using a data set training set with the aid of the total loss function as training guidance, and finally carrying out defect detection and positioning on image samples in a test set by using the trained distillation learning model from the gradient of the total loss function relative to input samples, wherein in the process of the method, the distillation learning model can use the pixel-pixel local corresponding relation of the training set cross-image to restrict a feature space so as to ensure the sensitivity of the distillation learning model to local pixel-level features with fine granularity, thereby improving the effect of detecting and positioning the surface defect abnormity of the object.
Drawings
Fig. 1 is a schematic flowchart of a defect detection and positioning method based on cross-image local feature alignment according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram showing a VGG16 network structure adopted by both the student model S and the teacher model T according to embodiment 1 of the present invention;
FIG. 3 is a schematic block diagram of the overall process of defect detection and localization based on cross-image local feature alignment proposed in embodiment 2 of the present invention;
fig. 4 is a structural diagram of a defect detection and localization system based on cross-image local feature alignment according to embodiment 3 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for better illustration of the present embodiment, certain parts of the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions;
it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted.
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The invention provides a defect detection and positioning method based on cross-image local feature alignment in embodiment 1, wherein a flow schematic diagram of the method is shown in fig. 1, and the method specifically comprises the following steps:
s1, constructing a distillation learning model, wherein the distillation learning model comprises a teacher model T and a student model S;
in this embodiment, the teacher model T in the constructed distillation learning model adopts a VGG16 network structure loaded with weights pre-trained on ImageNet, the student model S adopts a VGG16 network structure the same as that of the teacher model T and randomly initializes the weights, and the specific network structure adopted by the teacher model T or the student model S is not limited to the VGG16 network structure but may be other network structures, and the student model S may also adopt a network structure similar to, not completely identical to, but more compact than that of the teacher model T. On the basis that the explicit distillation learning model has been constructed, the loss function of the distillation learning model is further determined, that is, step S2 is performed:
s2, determining a loss function of the distillation learning model;
in this embodiment, as shown in fig. 2, the VGG16 network structure adopted by the teacher model T or the student model S includes several modules, and the block structure in fig. 2 is a representation module, in step S2, the last layer of the last four modules in the network structure of each of the teacher model T and the student model S is taken as a respective key layer, specifically, referring to fig. 2, which represents that any one structure of the teacher model T or the student model S and extraction of the key layer are performed, and the loss function of the distillation learning model is:
L1=Lval+λLdir
wherein the content of the first and second substances,
Figure BDA0003402055040000081
Figure BDA0003402055040000082
wherein L is1A loss function representing a distillation learning model; CP (CP)iRepresents the output of the i-th critical layer of the VGG16 network structure; CP (CP)0Normal image features representing the original input VGG16 network structure;
Figure BDA0003402055040000083
the method comprises the steps of representing an activation value of an ith key layer of a teacher model T, wherein the activation value is a normal image feature output by a network structure key layer;
Figure BDA0003402055040000084
representing the activation value of the ith key layer of the student model S; n is a radical ofiRepresents CPiThe number of neurons in;
Figure BDA0003402055040000085
representing the activation value of the jth neuron in the ith key layer in the teacher model T;
Figure BDA0003402055040000086
representing the activation value of the jth neuron in the ith key layer in the student model S; n is a radical ofcpRepresenting the total number of critical layers; l isvalRepresenting the sum of Euclidean distances of corresponding activation values in each key layer of the teacher model T and the student model S; vec () represents a vectorization function that converts a matrix having an arbitrary dimension into a one-dimensional vector; l isdirThe cosine similarity of the vectors converted from the key layers corresponding to the teacher model T and the student model S is represented; λ represents an artificially set hyper-parameter.
After the loss function of the distillation learning model is determined, in order to introduce the cross-image local feature alignment information, the data set needs to be started for introducing the cross-image local feature alignment information, so that the steps S3 and S4 are sequentially executed:
s3, constructing a data set, dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;
in this embodiment, the constructed data sets are MVTecAD and Head-CT, respectively, where MVTecAD is an industrial quality inspection data set, and includes 15 types of industrial products, each type of product is divided into a training set and a test set, the training set includes only normal image samples (about 300 pieces per type), and the test set includes normal image samples, different types of abnormal image samples (about 30 pieces per type), and a binary image labeled with abnormal regions of the abnormal image samples. Head-CT is a medical data set comprising 100 normal brain CT's and 100 diseased brain CT's.
S4, constructing local feature alignment loss functions of cross-images in a plurality of normal image samples in the same training batch;
performing cross-image local feature alignment loss function construction on the same K normal image samples in the training batch in the step S4, sequentially calculating the alignment loss of the first normal image sample, the second normal image sample, the Kth normal image sample and other K-1 normal image samples in the activation value graph output by the teacher model T and the student model S corresponding to the key layer of the VGG16 network structure pixel by pixel in the training process of the distillation learning model, then taking 1/2 to eliminate the content of repeated calculation, and obtaining a cross-image local feature alignment loss function in the same K normal image samples in the training batch, wherein the expression is as follows:
Figure BDA0003402055040000091
wherein L is2Representing a local feature alignment loss function of cross images in K normal image samples in the same training batch; activation because the alignment loss is computed pixel by pixel across the image using a map of activation values output by a key layer of the VGG16 network structureThe value map is obtained by performing convolution calculation on the original input normal image sample by using a convolution kernel, and one pixel position in the activation value map corresponds to the local feature of at least 3 x 3 pixels in the original input normal image sample.
S5, integrating the local feature alignment loss function in the S2 with the loss function of the distillation learning model in the S4 to form a total loss function of the distillation learning model based on cross-image local feature alignment;
the expression of the overall loss function of the formed distillation learning model based on cross-image local feature alignment is as follows:
Ltotal=L1+γL2
wherein L istotalRepresenting a total loss function based on a distillation learning model aligned across image local features; gamma denotes an artificially set hyper-parameter.
On the premise that the framework of the distillation learning model is certain, after a total loss function of the distillation learning model based on cross-image local feature alignment is formed, the total loss function is used as a training guide, the distillation learning model is trained at this time, the model can ensure the sensitivity of the model to local pixel-level features with fine granularity by means of cross-image feature alignment information, and the specific training process executes the step S6:
s6, inputting normal image samples in the training set into a teacher model T and a student model S at the same time, fixing the parameters of the teacher model T unchanged, taking a total loss function as training guidance, and guiding the student model S to train by using the teacher model T so as to train a distillation learning model and obtain a trained distillation learning model;
inputting normal image samples in a training set into a teacher model T and a student model S at the same time, fixing the parameters of the teacher model T unchanged, taking a total loss function as training guidance, when the teacher model T is used for guiding the student model S to train, the training mode is back propagation and gradient descent, the output of a key layer of the student model S is the image characteristics of the normal image samples extracted by the student model S, and the output of the key layer of the student model S is used for fitting the output of the key layer of the teacher model T, so that the student model S only acquires normal imagesThe extraction capability of the image sample features, the total loss function L in the training processtotalWhen there is no further decrease in the 20 training rounds, the model training is complete, i.e. at the total loss function LtotalWhen the value of (A) is stabilized in a state of no longer descending and converging, the convergence of the cross-image local feature alignment loss function L2 is also ensured, and the cross-image local feature consistency, namely 'alignment', is ensured to obtain a trained distillation learning model.
And S7, taking the image samples of the test set as input samples of the trained distillation learning model, and starting from the gradient of the total loss function relative to the input samples, and carrying out defect detection and positioning on the image samples of the test set by using the trained distillation learning model.
On the whole, firstly, a distillation learning model is built, the model comprises a teacher model T and a student model S, a loss function of the distillation learning model is determined, then a data set is built and divided into a training set and a testing set, cross-image local feature alignment information in a plurality of normal image samples of a uniform training batch is fused into the loss function of the distillation learning model to form a final loss function, the normal image samples are simultaneously input into the teacher model T and the student model S, parameters of the teacher model T are fixed and unchanged, the teacher model T is used for guiding the student model S to train so that the student model only obtains the extraction capability of the normal image sample features, the model can use the cross-image pixel local corresponding relation of the training set to constrain a feature space, and the sensitivity of the model to fine-grained local pixel-level features is ensured by means of the cross-image feature alignment information, thereby ensuring the effect of detecting and positioning the surface defect abnormity of the object.
In this embodiment, in the final step S7, when the trained distillation learning model is used to detect and locate the defects of the image samples in the test set based on the gradients of the total loss function with respect to the input samples, the input samples in the test set are represented as x, and the gradient chart Λ of the total loss function with respect to the input samples is represented as
Figure BDA0003402055040000101
The pixel gradient values of the input sample x are directly obtained by one-time back propagation in the training process, due to the total loss function L of the distillation learning model based on cross-image local feature alignmenttotalLoss function L including distillation learning model1And local feature alignment loss function L across images2When the step S7 uses the test set test, the loss function L of the distillation learning model is used when the input sample x is a normal image sample1And local feature alignment loss function L across images2The average is small, the gradient is small, and the opposite is true when the input sample x is an abnormal image sample; setting the gradient contribution threshold as epsilon, generating large gradient value at pixel position of input sample x with gradient contribution larger than epsilon, wherein the pixel position corresponds to abnormal defect area, namely, L is utilizedtotalThe abnormal defect area which causes the value increase is obtained. Namely, the trained distillation learning model is used for carrying out defect detection and positioning on the image samples in the test set, wherein the two layers are included, and firstly, the total loss function L is based on the trained distillation learning modeltotalThe value of (a), in the abnormality detection, the input sample is judged to be a normal image sample or an abnormal image sample, and then, the total loss function LtotalThe gradient of the input sample x is significant, and the pixel contributing to the larger gradient, namely the region where the output difference of the key layer and the local feature alignment loss of the teacher model T and the student model T are larger, is also the abnormal detection region.
In the utilization of LtotalWhen the gradient map is used for abnormal positioning, the abnormal positioning is realized by combining a SmoothGrad algorithm, which is a common algorithm for positioning and is not described herein again.
In order to improve the accuracy of defect detection and positioning, the gradient diagram Lambda is processed by Gaussian smoothing and morphological open operation, and the satisfied process formula is as follows:
M=gσ(Λ)
Figure BDA0003402055040000111
wherein M represents the result of Gaussian smoothing of the gradient map Lambda; b represents a binary diagram in the shape of an ellipse or a circle;
Figure BDA0003402055040000112
and
Figure BDA0003402055040000113
respectively representing morphological erosion and expansion operations carried out by using the structural element B, also called morphological open operation; l ismapThe abnormal localization diagram after the gaussian smoothing and morphological opening operation of the gradient diagram Λ is shown. This process also reduces the noise of the gradient map Λ.
Example 2
In this embodiment, the implementation process of the defect detection and localization method described in embodiment 1 is further described in a schematic block diagram form, referring to fig. 3, the input samples of the same batch are nut image samples, as can be seen from fig. 3, the nut image samples include normal image samples and defective abnormal image samples (as shown in the last one), the nut image samples are used as input, the input nut image samples enter the teacher model T and the student model S of the distillation learning model, the alignment loss and the original key layer output loss are fused, and the total loss function L is obtained by summing uptotalIn, total loss function LtotalLoss function L including distillation learning model1(Key layer output) and local feature alignment loss function L across images2(alignment loss), training the whole distillation learning model by taking the total loss function as a training guide in a gradient descent and back propagation mode, and obtaining an abnormal positioning gradient thermodynamic diagram and an abnormal region binary diagram through gradient return so as to finish defect abnormal detection and positioning.
Example 3
As shown in fig. 3, in order to implement the methods in embodiments 1 and 2, this embodiment proposes a defect detection and positioning system based on cross-image local feature alignment, including:
the distillation learning model building module 101 is used for building a distillation learning model, and the distillation learning model comprises a teacher model T and a student model S;
a loss function determination module 102 for determining a loss function of the distillation learning model;
the data set constructing and dividing module 103 is used for constructing a data set and dividing the data set into a training set and a test set, wherein the training set only contains normal image samples, and the test set comprises normal image samples and abnormal image samples;
an alignment loss function construction module 104, configured to construct a local feature alignment loss function across images in multiple normal image samples in the same training batch;
a total loss function constructing module 105, which integrates the local feature alignment loss function with the loss function of the distillation learning model to form a total loss function of the distillation learning model based on the cross-image local feature alignment;
the distillation learning model training module 106 is used for inputting the normal image samples in the training set into the teacher model T and the student model S at the same time, fixing the parameters of the teacher model T unchanged, taking the total loss function as training guidance, and guiding the student model S to train by using the teacher model T, so as to train the distillation learning model and obtain a trained distillation learning model;
the testing module 107 takes the image samples of the test set as input samples of the trained distillation learning model, and performs defect detection and location on the image samples of the test set by using the trained distillation learning model based on the gradient of the total loss function with respect to the input samples.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A defect detection and positioning method based on cross-image local feature alignment is characterized by comprising the following steps:
s1, constructing a distillation learning model, wherein the distillation learning model comprises a teacher model T and a student model S;
s2, determining a loss function of the distillation learning model;
s3, constructing a data set, dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;
s4, constructing local feature alignment loss functions of cross-images in a plurality of normal image samples in the same training batch;
s5, integrating the local feature alignment loss function in the S2 with the loss function of the distillation learning model in the S4 to form a total loss function of the distillation learning model based on cross-image local feature alignment;
s6, inputting normal image samples in the training set into a teacher model T and a student model S at the same time, fixing the parameters of the teacher model T unchanged, taking a total loss function as training guidance, and guiding the student model S to train by using the teacher model T so as to train a distillation learning model and obtain a trained distillation learning model;
and S7, taking the image samples of the test set as input samples of the trained distillation learning model, and starting from the gradient of the total loss function relative to the input samples, and carrying out defect detection and positioning on the image samples of the test set by using the trained distillation learning model.
2. The method for defect detection and localization based on cross-image local feature alignment of claim 1, wherein the teacher model T in the distillation learning model constructed in step S1 adopts VGG16 network structure loaded with pre-training weights on ImageNet, and the student model S adopts VGG16 network structure same as that of the teacher model T and randomly initializes the weights.
3. The method of claim 2, wherein the VGG16 network structure includes several modules, in step S2, the teacher model T and the student model S each take the last layer of the last four modules in their respective network structures as their respective key layers, and the loss function of the distillation learning model is:
L1=Lval+λLdir
wherein the content of the first and second substances,
Figure FDA0003402055030000011
Figure FDA0003402055030000021
wherein L is1A loss function representing a distillation learning model; CP (CP)iRepresents the output of the i-th critical layer of the VGG16 network structure; CP (CP)0Normal image features representing the original input VGG16 network structure;
Figure FDA0003402055030000022
the method comprises the steps of representing an activation value of an ith key layer of a teacher model T, wherein the activation value is a normal image feature output by a network structure key layer;
Figure FDA0003402055030000023
representing the activation value of the ith key layer of the student model S; n is a radical ofiRepresents CPiThe number of neurons in;
Figure FDA0003402055030000024
representing the activation value of the jth neuron in the ith key layer in the teacher model T;
Figure FDA0003402055030000025
representing the activation value of the jth neuron in the ith key layer in the student model S; n is a radical ofcpRepresenting the total number of critical layers; l isvalRepresenting the corresponding stress in each key layer of the teacher model T and the student model SSum of Euclidean distances of live values; vec () represents a vectorization function that converts a matrix having an arbitrary dimension into a one-dimensional vector; l isdirThe cosine similarity of the vectors converted from the key layers corresponding to the teacher model T and the student model S is represented; λ represents an artificially set hyper-parameter.
4. The method for defect detection and localization based on cross-image local feature alignment of claim 3, wherein the cross-image local feature alignment loss function construction is performed on the same K normal image samples in the training batch in step S4, in the training process of the distillation learning model, the alignment loss of the activation value graph output by the first, second, …, K normal image samples and other K-1 normal image samples in the key layer of the VGG16 network structure corresponding to the teacher model T and the student model S is calculated pixel by pixel in sequence, then 1/2 is taken to eliminate the repeatedly calculated content, and the local feature alignment loss function of the same K normal image sample images in the training batch is obtained, and the expression is:
Figure FDA0003402055030000026
wherein L is2Representing a local feature alignment loss function of cross images in K normal image samples in the same training batch; because the alignment loss is calculated by using the activation value map output by the key layer of the VGG16 network structure pixel by pixel across the image, the activation value map is convolution calculated by using a convolution kernel on the original input normal image sample, and one pixel position in the activation value map is equivalent to the local feature of at least 3 × 3 pixels in the original input normal image sample.
5. The method for detecting and locating defects based on cross-image local feature alignment according to claim 4, wherein the expression of the total loss function of the formed distillation learning model based on cross-image local feature alignment in step S5 is as follows:
Ltotal=L1+γL2
wherein L istotalRepresenting a total loss function based on a distillation learning model aligned across image local features; gamma denotes an artificially set hyper-parameter.
6. The method for defect detection and localization based on cross-image local feature alignment of claim 5, wherein in step S6, the normal image samples in the training set are simultaneously input to the teacher model T and the student model S, the parameters of the teacher model T are fixed, the total loss function is used as the training guidance, when the teacher model T is used to guide the student model S to train, the training mode is back propagation and gradient descent, the output of the key layer of the student model S is the image features of the normal image samples extracted by the student model S, the output of the key layer of the student model S is used to fit the output of the key layer of the teacher model T, so that the student model S only obtains the extraction capability of the normal image sample features, and in the training process, the total loss function L is usedtotalAnd when the model does not descend any more in 20 training rounds, completing model training to obtain a trained distillation learning model.
7. The method for detecting and locating defects based on cross-image local feature alignment as claimed in claim 6, wherein in step S7, starting from the gradient of the total loss function with respect to the input samples, when the trained distillation learning model is used to detect and locate the defects of the image samples in the test set, the input samples in the test set are represented as x, and the gradient chart Λ of the total loss function with respect to the input samples is obtained, where the expression is
Figure FDA0003402055030000031
The pixel gradient values of the input sample x are directly obtained by one-time back propagation in the training process, due to the total loss function L of the distillation learning model based on cross-image local feature alignmenttotalLoss function L including distillation learning model1And local features across imagesCharacterizing the alignment loss function L2When the step S7 uses the test set test, the loss function L of the distillation learning model is used when the input sample x is a normal image sample1And local feature alignment loss function L across images2The average is small, the gradient is small, and the opposite is true when the input sample x is an abnormal image sample; setting the gradient contribution threshold as epsilon, generating large gradient value at pixel position of input sample x with gradient contribution larger than epsilon, wherein the pixel position corresponds to abnormal defect area, namely, L is utilizedtotalThe abnormal defect area which causes the value increase is obtained.
8. The method of claim 7, wherein L is utilized in the defect detection and localization based on cross-image local feature alignmenttotalWhen the gradient map is used for carrying out abnormal positioning, the method is realized by combining a smoothGrad algorithm.
9. The method for detecting and positioning defects based on cross-image local feature alignment according to claim 7 or 8, wherein in order to improve the accuracy of defect detection and positioning, the gradient map Λ is processed by using Gaussian smoothing and morphological open operation, and the satisfied process formula is as follows:
M=gσ(Λ)
Figure FDA0003402055030000032
wherein M represents the result of Gaussian smoothing of the gradient map Lambda; b represents a binary diagram in the shape of an ellipse or a circle;
Figure FDA0003402055030000041
and
Figure FDA0003402055030000042
respectively representing morphological erosion and expansion operations carried out by using the structural element B, also called morphological open operation; l ismapRepresenting Gaussian smoothing and morphological opening operations on the gradient map LambdaAnd (5) the subsequent abnormal location map.
10. A system for defect detection and localization based on cross-image local feature alignment, comprising:
the distillation learning model building module is used for building a distillation learning model, and the distillation learning model comprises a teacher model T and a student model S;
a loss function determination module for determining a loss function of the distillation learning model;
the data set constructing and dividing module is used for constructing a data set and dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;
the alignment loss function construction module is used for constructing local feature alignment loss functions of cross images in a plurality of normal image samples in the same training batch;
the total loss function building module integrates the local feature alignment loss function with the loss function of the distillation learning model to form a total loss function of the distillation learning model based on cross-image local feature alignment;
the distillation learning model training module is used for inputting the normal image samples in the training set into the teacher model T and the student model S at the same time, fixing the parameters of the teacher model T unchanged, taking the total loss function as training guidance, and guiding the student model S to train by using the teacher model T, so that the distillation learning model is trained, and the trained distillation learning model is obtained;
and the test module takes the image samples of the test set as input samples of the trained distillation learning model, and performs defect detection and positioning on the image samples of the test set by using the trained distillation learning model from the gradient of the total loss function relative to the input samples.
CN202111502012.0A 2021-12-09 2021-12-09 Defect detection and positioning method and system based on cross-image local feature alignment Pending CN114170478A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111502012.0A CN114170478A (en) 2021-12-09 2021-12-09 Defect detection and positioning method and system based on cross-image local feature alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111502012.0A CN114170478A (en) 2021-12-09 2021-12-09 Defect detection and positioning method and system based on cross-image local feature alignment

Publications (1)

Publication Number Publication Date
CN114170478A true CN114170478A (en) 2022-03-11

Family

ID=80485102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111502012.0A Pending CN114170478A (en) 2021-12-09 2021-12-09 Defect detection and positioning method and system based on cross-image local feature alignment

Country Status (1)

Country Link
CN (1) CN114170478A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294407A (en) * 2022-09-30 2022-11-04 山东大学 Model compression method and system based on preview mechanism knowledge distillation
CN116028891A (en) * 2023-02-16 2023-04-28 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116664576A (en) * 2023-07-31 2023-08-29 厦门微图软件科技有限公司 Method, device and equipment for detecting abnormality of welding bead of battery shell

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294407A (en) * 2022-09-30 2022-11-04 山东大学 Model compression method and system based on preview mechanism knowledge distillation
CN116028891A (en) * 2023-02-16 2023-04-28 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116028891B (en) * 2023-02-16 2023-07-14 之江实验室 Industrial anomaly detection model training method and device based on multi-model fusion
CN116664576A (en) * 2023-07-31 2023-08-29 厦门微图软件科技有限公司 Method, device and equipment for detecting abnormality of welding bead of battery shell
CN116664576B (en) * 2023-07-31 2023-11-03 厦门微图软件科技有限公司 Method, device and equipment for detecting abnormality of welding bead of battery shell

Similar Documents

Publication Publication Date Title
CN111598881B (en) Image anomaly detection method based on variational self-encoder
CN110533631B (en) SAR image change detection method based on pyramid pooling twin network
Kaul et al. Focusnet: An attention-based fully convolutional network for medical image segmentation
EP3961484A1 (en) Medical image segmentation method and device, electronic device and storage medium
CN114170478A (en) Defect detection and positioning method and system based on cross-image local feature alignment
CN107818554B (en) Information processing apparatus and information processing method
CN107633522B (en) Brain image segmentation method and system based on local similarity active contour model
CN111222519B (en) Construction method, method and device of hierarchical colored drawing manuscript line extraction model
CN112102229A (en) Intelligent industrial CT detection defect identification method based on deep learning
CN111493935A (en) Artificial intelligence-based automatic prediction and identification method and system for echocardiogram
CN112365497A (en) High-speed target detection method and system based on Trident Net and Cascade-RCNN structures
CN108985161B (en) Low-rank sparse representation image feature learning method based on Laplace regularization
CN112348059A (en) Deep learning-based method and system for classifying multiple dyeing pathological images
CN114511710A (en) Image target detection method based on convolutional neural network
CN114565594A (en) Image anomaly detection method based on soft mask contrast loss
CN114445356A (en) Multi-resolution-based full-field pathological section image tumor rapid positioning method
CN111124896B (en) Metamorphic test system for primary and secondary peak ratio calculation algorithm
CN116994044A (en) Construction method of image anomaly detection model based on mask multi-mode generation countermeasure network
CN115131503A (en) Health monitoring method and system for iris three-dimensional recognition
CN111414579B (en) Method and system for acquiring brain region association information based on multi-angle association relation
CN113706496A (en) Aircraft structure crack detection method based on deep learning model
CN117726814A (en) Retinal vessel segmentation method based on cross attention and double branch pooling fusion
CN117152601A (en) Underwater target detection method and system based on dynamic perception area routing
CN113888538B (en) Industrial anomaly detection method based on memory block model
CN116452965A (en) Underwater target detection and recognition method based on acousto-optic fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination