CN114170478A

CN114170478A - Defect detection and positioning method and system based on cross-image local feature alignment

Info

Publication number: CN114170478A
Application number: CN202111502012.0A
Authority: CN
Inventors: 苏勤亮; 胡枭
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-12-09
Filing date: 2021-12-09
Publication date: 2022-03-11

Abstract

The invention provides a defect detection and positioning method and system based on cross-image local feature alignment, and relates to the technical field of object surface defect abnormity detection and positioning.

Description

Defect detection and positioning method and system based on cross-image local feature alignment

Technical Field

The invention relates to the technical field of object surface defect anomaly detection and positioning, in particular to a defect detection and positioning method and system based on cross-image local feature alignment.

Background

With the development of computer vision research, object surface defect detection and positioning technologies are widely applied in the fields of industrial vision detection, medical image lesion screening and the like, and the purpose of anomaly detection and positioning is to screen an abnormal sample picture and position an abnormal region in the sample.

At present, methods for anomaly detection and localization can be divided into two methods based on reconstruction and characterization similarity. The reconstruction-based method is mainly used for reconstructing a normal sample through a training self-encoder, a variational self-encoder or an impedance generation network, and during testing, an abnormal sample cannot be well reconstructed and is identified. When judging whether an image is an abnormal sample, the reconstruction error of the whole image is used, when an abnormal area is positioned, the reconstruction error of a pixel level is used, the reconstruction-based method is very visual and has interpretability, but the performance is often limited by a generated model, because the abnormal sample can be well reconstructed at some times, and particularly when the abnormal sample is highly similar to a normal sample, the reconstruction error failure phenomenon can be generated. The method based on the similarity of the features is to use a deep neural network to extract the features of the whole image for anomaly detection, extract the features of the local blocks of the image for anomaly location, and most methods based on the similarity of the features can obtain better results than methods based on reconstruction, but lack interpretability because the anomaly score in the method is derived from the distance between the image features of the test set and the normal sample features of the training set, and it is difficult to know which part of the anomaly image leads to high anomaly score. Moreover, the amount of calculation for locating the abnormality based on the image block is relatively large in such methods.

In the method based on the characterization similarity, a mode based on a distillation learning model is also one of the modes, and the method has better interpretability, for example, the prior art discloses a positive sample industrial defect detection method based on knowledge distillation, firstly, an industrial data set is constructed, then, the industrial data set is preprocessed, the preprocessed industrial data set comprises a positive sample set and a defect sample set without labels, and then, a teacher network model is pre-trained on the formed industrial data set by using self-supervision contrast learning; and on the basis of the formed positive sample set, guiding the training of the student network model by using the teacher network model obtained by training, and finally carrying out defect detection on the picture to be detected by using the teacher network model obtained by training and the student network model obtained by training. Because the student network model only learns the capability of extracting the positive sample characteristics, the characteristics extracted from the defect area are greatly different from the teacher network model, and therefore the student network model can be used as the basis for defect judgment. Actually, an industrial data set for detecting surface defects of an object at present has a big characteristic that most of images are the same object after registration (alignment of two or more images of the same target at a spatial position), at this time, cross-image local features are highly correlated, and cross-image local feature alignment information can ensure sensitivity of a model to fine-grained local pixel-level features, but the existing method based on a distillation learning model is not applied to the information, so that the defect detection and positioning effect is not good, and how to apply the information improves the defect detection and positioning effect, which becomes a difficult problem to be solved.

Disclosure of Invention

In order to solve the problem that the defect detection and positioning effects of the traditional mode based on the distillation learning model are poor, the invention provides a method and a system for defect detection and positioning based on cross-image local feature alignment

In order to achieve the technical effects, the technical scheme of the invention is as follows:

a defect detection and positioning method based on cross-image local feature alignment comprises the following steps:

s1, constructing a distillation learning model, wherein the distillation learning model comprises a teacher model T and a student model S;

s2, determining a loss function of the distillation learning model;

s3, constructing a data set, dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;

s4, constructing local feature alignment loss functions of cross-images in a plurality of normal image samples in the same training batch;

s5, integrating the local feature alignment loss function in the S2 with the loss function of the distillation learning model in the S4 to form a total loss function of the distillation learning model based on cross-image local feature alignment;

s6, inputting normal image samples in the training set into a teacher model T and a student model S at the same time, fixing the parameters of the teacher model T unchanged, taking a total loss function as training guidance, and guiding the student model S to train by using the teacher model T so as to train a distillation learning model and obtain a trained distillation learning model;

and S7, taking the image samples of the test set as input samples of the trained distillation learning model, and starting from the gradient of the total loss function relative to the input samples, and carrying out defect detection and positioning on the image samples of the test set by using the trained distillation learning model.

In the technical scheme, firstly, a distillation learning model (comprising a teacher model T and a student model S) is built, a loss function of the distillation learning model is determined, then a data set is built and divided into a training set and a testing set, cross-image local feature alignment information in a plurality of normal image samples of a uniform training batch is fused into the loss function of the distillation learning model to form a final loss function, the normal image samples are simultaneously input into the teacher model T and the student model S, parameters of the teacher model T are fixed and unchanged, the teacher model T is used for guiding the student model S to train so that the student model only obtains the extraction capability of the normal image sample features, the model can use the cross-image pixel local corresponding relation of the training set to constrain a feature space, and the sensitivity of the model to fine-grained local pixel-level features is ensured by means of the cross-image feature alignment information, thereby ensuring the effect of detecting and positioning the surface defect abnormity of the object.

Preferably, in the distillation learning model constructed in step S1, the teacher model T adopts a VGG16 network structure loaded with weights pre-trained on ImageNet, and the student model S adopts the same VGG16 network structure as the teacher model T and randomly initializes the weights.

In order to shorten the training time of the initial distillation learning model, the teacher model T loads weights pre-trained on ImageNet, the teacher model T has good feature extraction capability on normal image samples and abnormal image samples, and the student model S adopts the VGG16 network structure which is the same as that of the teacher model T, so that the teacher model T can further guide the training of the student model S.

Preferably, the VGG16 network structure includes several modules,

in step S2, the teacher model T and the student model S each take the last layer of the last four modules in their respective network structures as their respective key layer, and the loss function of the distilled learning model is:

L₁＝L_val+λL_dir

wherein the content of the first and second substances,

wherein L is₁A loss function representing a distillation learning model; CP (CP)_iRepresents the output of the i-th critical layer of the VGG16 network structure; CP (CP)₀Normal image features representing the original input VGG16 network structure;

the method comprises the steps of representing an activation value of an ith key layer of a teacher model T, wherein the activation value is a normal image feature output by a network structure key layer;

representing the activation value of the ith key layer of the student model S; n is a radical of_iRepresents CP_iThe number of neurons in;

representing the activation value of the jth neuron in the ith key layer in the teacher model T;

representing the activation value of the jth neuron in the ith key layer in the student model S; n is a radical of_cpRepresenting the total number of critical layers; l is_valRepresenting the sum of Euclidean distances of corresponding activation values in each key layer of the teacher model T and the student model S; vec () represents a vectorization function that converts a matrix having an arbitrary dimension into a one-dimensional vector; l is_dirThe cosine similarity of the vectors converted from the key layers corresponding to the teacher model T and the student model S is represented; λ represents an artificially set hyper-parameter. The process of establishing the loss function can restrict the similarity of the activation values output by the key layers of the student model S and the teacher model T and can restrict the similarity of the vector directions output by the key layers.

Preferably, it is assumed that cross-image local feature alignment loss function construction is performed on K normal image samples in the same training batch in step S4, in the training process of the distillation learning model, alignment losses of the activation value maps output by the first, second, and K-1 normal image samples and other K-1 normal image samples in the key layer of the VGG16 network structure corresponding to the teacher model T and the student model S are sequentially calculated pixel by pixel, then 1/2 is taken to eliminate contents of repeated calculation, and a cross-image local feature alignment loss function in the K normal image samples in the same training batch is obtained, where the expression is:

wherein L is₂Representing a local feature alignment loss function of cross images in K normal image samples in the same training batch; because the alignment loss is calculated by pixel-by-pixel across the image by using the activation value map output by the key layer of the VGG16 network structure, the activation value map is obtained by performing convolution calculation on the original input normal image sample by using a convolution kernel, and one pixel position in the activation value map is equivalent to the original input normal image sampleOf at least 3 x 3 pixels.

Preferably, the expression of the total loss function of the formed distillation learning model based on cross-image local feature alignment in step S5 is:

L_total＝L₁+γL₂

wherein L is_totalRepresenting a total loss function based on a distillation learning model aligned across image local features; gamma denotes a hyper-parameter set manually at the time of training.

Preferably, in step S6, the normal image samples in the training set are input into the teacher model T and the student model S simultaneously, the parameters of the teacher model T are fixed and the total loss function is used as the training guidance, when the teacher model T is used to guide the student model S to train, the training mode is back propagation and gradient descent, the output of the key layer of the student model S is the image features of the normal image samples extracted by the student model S, the output of the key layer of the student model S is used to fit the output of the key layer of the teacher model T, so that the student model S only obtains the extraction capability of the features of the normal image samples, and in the training process, the total loss function L is used_totalAnd when the model does not descend any more in 20 training rounds, completing model training to obtain a trained distillation learning model.

Here, the total loss function L_totalAlso ensures that the cross-image local feature alignment loss function L2 converges when the value of (c) stabilizes in a state where convergence no longer falls, ensuring that the cross-image local feature consistency, i.e., "alignment".

Preferably, in step S7, when the trained distillation learning model is used to detect and locate the defects of the image samples in the test set based on the gradient of the total loss function with respect to the input samples, the input samples in the test set are represented as x, and the gradient chart Λ of the total loss function with respect to the input samples is represented as x

Pixel gradient values of input sample x are trainedOne-time back propagation in the training process is directly obtained due to the total loss function L of the distillation learning model based on cross-image local feature alignment_totalLoss function L including distillation learning model₁And local feature alignment loss function L across images₂When the step S7 uses the test set test, the loss function L of the distillation learning model is used when the input sample x is a normal image sample₁And local feature alignment loss function L across images₂The average is small, the gradient is small, and the opposite is true when the input sample x is an abnormal image sample; setting the gradient contribution threshold as epsilon, generating large gradient value at pixel position of input sample x with gradient contribution larger than epsilon, wherein the pixel position corresponds to abnormal defect area, namely, L is utilized_totalThe abnormal defect area which causes the value increase is obtained.

The method comprises two layers when the trained distillation learning model is used for detecting and positioning the defects of the image samples in the test set, and firstly, the total loss function L based on the trained distillation learning model_totalThe value of (a), in the abnormality detection, the input sample is judged to be a normal image sample or an abnormal image sample, and then, the total loss function L_totalThe gradient of the input sample x is significant, and the pixel contributing to the larger gradient, namely the region where the output difference of the key layer and the local feature alignment loss of the teacher model T and the student model T are larger, is also the abnormal detection region.

Preferably, L is being utilized_totalWhen the gradient map is used for carrying out abnormal positioning, the method is realized by combining a smoothGrad algorithm.

Preferably, in order to improve the accuracy of defect detection and location, the gradient map Λ is processed by using gaussian smoothing and morphological open operation, and the satisfied process formula is as follows:

M＝g_σ(Λ)

wherein M represents the result of Gaussian smoothing of the gradient map Lambda; b represents an ellipse or a circleA binary map of (2);

and

respectively representing morphological erosion and expansion operations carried out by using the structural element B, also called morphological open operation; l is_mapThe abnormal localization diagram after the gaussian smoothing and morphological opening operation of the gradient diagram Λ is shown. This process also reduces the noise of the gradient map Λ.

The invention also provides a defect detection and positioning system based on cross-image local feature alignment, which comprises the following components:

the distillation learning model building module is used for building a distillation learning model, and the distillation learning model comprises a teacher model T and a student model S;

a loss function determination module for determining a loss function of the distillation learning model;

the data set constructing and dividing module is used for constructing a data set and dividing the data set into a training set and a testing set, wherein the training set only contains normal image samples, and the testing set comprises normal image samples and abnormal image samples;

the alignment loss function construction module is used for constructing local feature alignment loss functions of cross images in a plurality of normal image samples in the same training batch;

the total loss function building module integrates the local feature alignment loss function with the loss function of the distillation learning model to form a total loss function of the distillation learning model based on cross-image local feature alignment;

the distillation learning model training module is used for inputting the normal image samples in the training set into the teacher model T and the student model S at the same time, fixing the parameters of the teacher model T unchanged, taking the total loss function as training guidance, and guiding the student model S to train by using the teacher model T, so that the distillation learning model is trained, and the trained distillation learning model is obtained;

and the test module takes the image samples of the test set as input samples of the trained distillation learning model, and performs defect detection and positioning on the image samples of the test set by using the trained distillation learning model from the gradient of the total loss function relative to the input samples.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the invention provides a defect detection and positioning method and system based on cross-image local feature alignment, which comprises the steps of firstly building a distillation learning model and determining a loss function, then integrating cross-image local feature alignment information into the loss function of the distillation learning model to form a total loss function, training the distillation learning model under the total loss function by using a data set training set with the aid of the total loss function as training guidance, and finally carrying out defect detection and positioning on image samples in a test set by using the trained distillation learning model from the gradient of the total loss function relative to input samples, wherein in the process of the method, the distillation learning model can use the pixel-pixel local corresponding relation of the training set cross-image to restrict a feature space so as to ensure the sensitivity of the distillation learning model to local pixel-level features with fine granularity, thereby improving the effect of detecting and positioning the surface defect abnormity of the object.

Drawings

Fig. 1 is a schematic flowchart of a defect detection and positioning method based on cross-image local feature alignment according to embodiment 1 of the present invention;

fig. 2 is a schematic diagram showing a VGG16 network structure adopted by both the student model S and the teacher model T according to embodiment 1 of the present invention;

FIG. 3 is a schematic block diagram of the overall process of defect detection and localization based on cross-image local feature alignment proposed in embodiment 2 of the present invention;

fig. 4 is a structural diagram of a defect detection and localization system based on cross-image local feature alignment according to embodiment 3 of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for better illustration of the present embodiment, certain parts of the drawings may be omitted, enlarged or reduced, and do not represent actual dimensions;

it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted.

The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

the technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

Example 1

The invention provides a defect detection and positioning method based on cross-image local feature alignment in embodiment 1, wherein a flow schematic diagram of the method is shown in fig. 1, and the method specifically comprises the following steps:

in this embodiment, the teacher model T in the constructed distillation learning model adopts a VGG16 network structure loaded with weights pre-trained on ImageNet, the student model S adopts a VGG16 network structure the same as that of the teacher model T and randomly initializes the weights, and the specific network structure adopted by the teacher model T or the student model S is not limited to the VGG16 network structure but may be other network structures, and the student model S may also adopt a network structure similar to, not completely identical to, but more compact than that of the teacher model T. On the basis that the explicit distillation learning model has been constructed, the loss function of the distillation learning model is further determined, that is, step S2 is performed:

s2, determining a loss function of the distillation learning model;

in this embodiment, as shown in fig. 2, the VGG16 network structure adopted by the teacher model T or the student model S includes several modules, and the block structure in fig. 2 is a representation module, in step S2, the last layer of the last four modules in the network structure of each of the teacher model T and the student model S is taken as a respective key layer, specifically, referring to fig. 2, which represents that any one structure of the teacher model T or the student model S and extraction of the key layer are performed, and the loss function of the distillation learning model is:

L₁＝L_val+λL_dir

wherein the content of the first and second substances,

representing the activation value of the jth neuron in the ith key layer in the student model S; n is a radical of_cpRepresenting the total number of critical layers; l is_valRepresenting the sum of Euclidean distances of corresponding activation values in each key layer of the teacher model T and the student model S; vec () represents a vectorization function that converts a matrix having an arbitrary dimension into a one-dimensional vector; l is_dirThe cosine similarity of the vectors converted from the key layers corresponding to the teacher model T and the student model S is represented; λ represents an artificially set hyper-parameter.

After the loss function of the distillation learning model is determined, in order to introduce the cross-image local feature alignment information, the data set needs to be started for introducing the cross-image local feature alignment information, so that the steps S3 and S4 are sequentially executed:

in this embodiment, the constructed data sets are MVTecAD and Head-CT, respectively, where MVTecAD is an industrial quality inspection data set, and includes 15 types of industrial products, each type of product is divided into a training set and a test set, the training set includes only normal image samples (about 300 pieces per type), and the test set includes normal image samples, different types of abnormal image samples (about 30 pieces per type), and a binary image labeled with abnormal regions of the abnormal image samples. Head-CT is a medical data set comprising 100 normal brain CT's and 100 diseased brain CT's.

performing cross-image local feature alignment loss function construction on the same K normal image samples in the training batch in the step S4, sequentially calculating the alignment loss of the first normal image sample, the second normal image sample, the Kth normal image sample and other K-1 normal image samples in the activation value graph output by the teacher model T and the student model S corresponding to the key layer of the VGG16 network structure pixel by pixel in the training process of the distillation learning model, then taking 1/2 to eliminate the content of repeated calculation, and obtaining a cross-image local feature alignment loss function in the same K normal image samples in the training batch, wherein the expression is as follows:

wherein L is₂Representing a local feature alignment loss function of cross images in K normal image samples in the same training batch; activation because the alignment loss is computed pixel by pixel across the image using a map of activation values output by a key layer of the VGG16 network structureThe value map is obtained by performing convolution calculation on the original input normal image sample by using a convolution kernel, and one pixel position in the activation value map corresponds to the local feature of at least 3 x 3 pixels in the original input normal image sample.

the expression of the overall loss function of the formed distillation learning model based on cross-image local feature alignment is as follows:

L_total＝L₁+γL₂

wherein L is_totalRepresenting a total loss function based on a distillation learning model aligned across image local features; gamma denotes an artificially set hyper-parameter.

On the premise that the framework of the distillation learning model is certain, after a total loss function of the distillation learning model based on cross-image local feature alignment is formed, the total loss function is used as a training guide, the distillation learning model is trained at this time, the model can ensure the sensitivity of the model to local pixel-level features with fine granularity by means of cross-image feature alignment information, and the specific training process executes the step S6:

inputting normal image samples in a training set into a teacher model T and a student model S at the same time, fixing the parameters of the teacher model T unchanged, taking a total loss function as training guidance, when the teacher model T is used for guiding the student model S to train, the training mode is back propagation and gradient descent, the output of a key layer of the student model S is the image characteristics of the normal image samples extracted by the student model S, and the output of the key layer of the student model S is used for fitting the output of the key layer of the teacher model T, so that the student model S only acquires normal imagesThe extraction capability of the image sample features, the total loss function L in the training process_totalWhen there is no further decrease in the 20 training rounds, the model training is complete, i.e. at the total loss function L_totalWhen the value of (A) is stabilized in a state of no longer descending and converging, the convergence of the cross-image local feature alignment loss function L2 is also ensured, and the cross-image local feature consistency, namely 'alignment', is ensured to obtain a trained distillation learning model.

On the whole, firstly, a distillation learning model is built, the model comprises a teacher model T and a student model S, a loss function of the distillation learning model is determined, then a data set is built and divided into a training set and a testing set, cross-image local feature alignment information in a plurality of normal image samples of a uniform training batch is fused into the loss function of the distillation learning model to form a final loss function, the normal image samples are simultaneously input into the teacher model T and the student model S, parameters of the teacher model T are fixed and unchanged, the teacher model T is used for guiding the student model S to train so that the student model only obtains the extraction capability of the normal image sample features, the model can use the cross-image pixel local corresponding relation of the training set to constrain a feature space, and the sensitivity of the model to fine-grained local pixel-level features is ensured by means of the cross-image feature alignment information, thereby ensuring the effect of detecting and positioning the surface defect abnormity of the object.

In this embodiment, in the final step S7, when the trained distillation learning model is used to detect and locate the defects of the image samples in the test set based on the gradients of the total loss function with respect to the input samples, the input samples in the test set are represented as x, and the gradient chart Λ of the total loss function with respect to the input samples is represented as

The pixel gradient values of the input sample x are directly obtained by one-time back propagation in the training process, due to the total loss function L of the distillation learning model based on cross-image local feature alignment_totalLoss function L including distillation learning model₁And local feature alignment loss function L across images₂When the step S7 uses the test set test, the loss function L of the distillation learning model is used when the input sample x is a normal image sample₁And local feature alignment loss function L across images₂The average is small, the gradient is small, and the opposite is true when the input sample x is an abnormal image sample; setting the gradient contribution threshold as epsilon, generating large gradient value at pixel position of input sample x with gradient contribution larger than epsilon, wherein the pixel position corresponds to abnormal defect area, namely, L is utilized_totalThe abnormal defect area which causes the value increase is obtained. Namely, the trained distillation learning model is used for carrying out defect detection and positioning on the image samples in the test set, wherein the two layers are included, and firstly, the total loss function L is based on the trained distillation learning model_totalThe value of (a), in the abnormality detection, the input sample is judged to be a normal image sample or an abnormal image sample, and then, the total loss function L_totalThe gradient of the input sample x is significant, and the pixel contributing to the larger gradient, namely the region where the output difference of the key layer and the local feature alignment loss of the teacher model T and the student model T are larger, is also the abnormal detection region.

In the utilization of L_totalWhen the gradient map is used for abnormal positioning, the abnormal positioning is realized by combining a SmoothGrad algorithm, which is a common algorithm for positioning and is not described herein again.

In order to improve the accuracy of defect detection and positioning, the gradient diagram Lambda is processed by Gaussian smoothing and morphological open operation, and the satisfied process formula is as follows:

M＝g_σ(Λ)

wherein M represents the result of Gaussian smoothing of the gradient map Lambda; b represents a binary diagram in the shape of an ellipse or a circle;

and

Example 2

In this embodiment, the implementation process of the defect detection and localization method described in embodiment 1 is further described in a schematic block diagram form, referring to fig. 3, the input samples of the same batch are nut image samples, as can be seen from fig. 3, the nut image samples include normal image samples and defective abnormal image samples (as shown in the last one), the nut image samples are used as input, the input nut image samples enter the teacher model T and the student model S of the distillation learning model, the alignment loss and the original key layer output loss are fused, and the total loss function L is obtained by summing up_totalIn, total loss function L_totalLoss function L including distillation learning model₁(Key layer output) and local feature alignment loss function L across images₂(alignment loss), training the whole distillation learning model by taking the total loss function as a training guide in a gradient descent and back propagation mode, and obtaining an abnormal positioning gradient thermodynamic diagram and an abnormal region binary diagram through gradient return so as to finish defect abnormal detection and positioning.

Example 3

As shown in fig. 3, in order to implement the methods in embodiments 1 and 2, this embodiment proposes a defect detection and positioning system based on cross-image local feature alignment, including:

the distillation learning model building module 101 is used for building a distillation learning model, and the distillation learning model comprises a teacher model T and a student model S;

a loss function determination module 102 for determining a loss function of the distillation learning model;

the data set constructing and dividing module 103 is used for constructing a data set and dividing the data set into a training set and a test set, wherein the training set only contains normal image samples, and the test set comprises normal image samples and abnormal image samples;

an alignment loss function construction module 104, configured to construct a local feature alignment loss function across images in multiple normal image samples in the same training batch;

a total loss function constructing module 105, which integrates the local feature alignment loss function with the loss function of the distillation learning model to form a total loss function of the distillation learning model based on the cross-image local feature alignment;

the distillation learning model training module 106 is used for inputting the normal image samples in the training set into the teacher model T and the student model S at the same time, fixing the parameters of the teacher model T unchanged, taking the total loss function as training guidance, and guiding the student model S to train by using the teacher model T, so as to train the distillation learning model and obtain a trained distillation learning model;

the testing module 107 takes the image samples of the test set as input samples of the trained distillation learning model, and performs defect detection and location on the image samples of the test set by using the trained distillation learning model based on the gradient of the total loss function with respect to the input samples.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A defect detection and positioning method based on cross-image local feature alignment is characterized by comprising the following steps:

s2, determining a loss function of the distillation learning model;

2. The method for defect detection and localization based on cross-image local feature alignment of claim 1, wherein the teacher model T in the distillation learning model constructed in step S1 adopts VGG16 network structure loaded with pre-training weights on ImageNet, and the student model S adopts VGG16 network structure same as that of the teacher model T and randomly initializes the weights.

3. The method of claim 2, wherein the VGG16 network structure includes several modules, in step S2, the teacher model T and the student model S each take the last layer of the last four modules in their respective network structures as their respective key layers, and the loss function of the distillation learning model is:

L₁＝L_val+λL_dir

wherein the content of the first and second substances,

representing the activation value of the jth neuron in the ith key layer in the student model S; n is a radical of_cpRepresenting the total number of critical layers; l is_valRepresenting the corresponding stress in each key layer of the teacher model T and the student model SSum of Euclidean distances of live values; vec () represents a vectorization function that converts a matrix having an arbitrary dimension into a one-dimensional vector; l is_dirThe cosine similarity of the vectors converted from the key layers corresponding to the teacher model T and the student model S is represented; λ represents an artificially set hyper-parameter.

4. The method for defect detection and localization based on cross-image local feature alignment of claim 3, wherein the cross-image local feature alignment loss function construction is performed on the same K normal image samples in the training batch in step S4, in the training process of the distillation learning model, the alignment loss of the activation value graph output by the first, second, …, K normal image samples and other K-1 normal image samples in the key layer of the VGG16 network structure corresponding to the teacher model T and the student model S is calculated pixel by pixel in sequence, then 1/2 is taken to eliminate the repeatedly calculated content, and the local feature alignment loss function of the same K normal image sample images in the training batch is obtained, and the expression is:

wherein L is₂Representing a local feature alignment loss function of cross images in K normal image samples in the same training batch; because the alignment loss is calculated by using the activation value map output by the key layer of the VGG16 network structure pixel by pixel across the image, the activation value map is convolution calculated by using a convolution kernel on the original input normal image sample, and one pixel position in the activation value map is equivalent to the local feature of at least 3 × 3 pixels in the original input normal image sample.

5. The method for detecting and locating defects based on cross-image local feature alignment according to claim 4, wherein the expression of the total loss function of the formed distillation learning model based on cross-image local feature alignment in step S5 is as follows:

L_total＝L₁+γL₂

6. The method for defect detection and localization based on cross-image local feature alignment of claim 5, wherein in step S6, the normal image samples in the training set are simultaneously input to the teacher model T and the student model S, the parameters of the teacher model T are fixed, the total loss function is used as the training guidance, when the teacher model T is used to guide the student model S to train, the training mode is back propagation and gradient descent, the output of the key layer of the student model S is the image features of the normal image samples extracted by the student model S, the output of the key layer of the student model S is used to fit the output of the key layer of the teacher model T, so that the student model S only obtains the extraction capability of the normal image sample features, and in the training process, the total loss function L is used_totalAnd when the model does not descend any more in 20 training rounds, completing model training to obtain a trained distillation learning model.

7. The method for detecting and locating defects based on cross-image local feature alignment as claimed in claim 6, wherein in step S7, starting from the gradient of the total loss function with respect to the input samples, when the trained distillation learning model is used to detect and locate the defects of the image samples in the test set, the input samples in the test set are represented as x, and the gradient chart Λ of the total loss function with respect to the input samples is obtained, where the expression is

The pixel gradient values of the input sample x are directly obtained by one-time back propagation in the training process, due to the total loss function L of the distillation learning model based on cross-image local feature alignment_totalLoss function L including distillation learning model₁And local features across imagesCharacterizing the alignment loss function L₂When the step S7 uses the test set test, the loss function L of the distillation learning model is used when the input sample x is a normal image sample₁And local feature alignment loss function L across images₂The average is small, the gradient is small, and the opposite is true when the input sample x is an abnormal image sample; setting the gradient contribution threshold as epsilon, generating large gradient value at pixel position of input sample x with gradient contribution larger than epsilon, wherein the pixel position corresponds to abnormal defect area, namely, L is utilized_totalThe abnormal defect area which causes the value increase is obtained.

8. The method of claim 7, wherein L is utilized in the defect detection and localization based on cross-image local feature alignment_totalWhen the gradient map is used for carrying out abnormal positioning, the method is realized by combining a smoothGrad algorithm.

9. The method for detecting and positioning defects based on cross-image local feature alignment according to claim 7 or 8, wherein in order to improve the accuracy of defect detection and positioning, the gradient map Λ is processed by using Gaussian smoothing and morphological open operation, and the satisfied process formula is as follows:

M＝g_σ(Λ)

and

respectively representing morphological erosion and expansion operations carried out by using the structural element B, also called morphological open operation; l is_mapRepresenting Gaussian smoothing and morphological opening operations on the gradient map LambdaAnd (5) the subsequent abnormal location map.

10. A system for defect detection and localization based on cross-image local feature alignment, comprising: