WO2023015843A1

WO2023015843A1 - Anomaly detection method and apparatus, electronic device, computer readable storage medium, computer program, and computer program product

Info

Publication number: WO2023015843A1
Application number: PCT/CN2022/071448
Authority: WO
Inventors: 杨凯; 尤志远; 崔磊
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-08-13
Filing date: 2022-01-11
Publication date: 2023-02-16
Also published as: CN113688889A

Abstract

An anomaly detection method and apparatus, an electronic device, a computer readable storage medium, a computer program, and a computer program product. The method comprises: training an initial detection network using a first sample set, to obtain a detection network, the first sample set being a positive sample set (S101); during a process in which the detection network performs anomaly detection on a plurality of images to be detected, obtaining a second sample set on the basis of a detected normal image set and abnormal image set, the second sample set being an incremental sample set comprising a positive sample and a negative sample, wherein the negative sample is an abnormal image, in which an anomaly exists, among the images (S102); and updating and training the detection network using the second sample set, to obtain an updated detection network (S103). According to the method, the accuracy and flexibility of anomaly detection can be improved.

Description

Anomaly detection method, device, electronic device, computer readable storage medium, computer program and computer program product

Cross References to Related Applications

This disclosure is based on the Chinese patent application with the application number 202110932191.5, the application date is August 13, 2021, and the application name is "Anomaly detection method, device, electronic equipment and computer-readable storage medium", and requires the Chinese patent application Priority, the entire content of the Chinese patent application is hereby incorporated by reference into this disclosure.

technical field

The present disclosure relates to machine vision technology, and in particular to an anomaly detection method, device, electronic equipment, computer-readable storage medium, computer program and computer program product.

Background technique

In recent years, deep learning algorithms have made great progress in various fields, and have also achieved landing in many industrial visual inspection fields. In related technologies, an abnormality detection model is obtained through deep learning algorithm training to detect abnormal images. Deep learning algorithms often require a large number of high-quality training samples. However, the current learning methods based on normal samples and learning methods based on abnormal samples have different model structures and learning strategies. Therefore, the current deep learning methods are not suitable for different types of training samples. The compatibility of the model is poor, resulting in a single training sample that can be used, which not only reduces the flexibility of anomaly detection, but also reduces the accuracy of anomaly detection.

Contents of the invention

Embodiments of the present disclosure provide an anomaly detection method, device, electronic equipment, computer-readable storage medium, computer program and computer program product, which can improve the accuracy and flexibility of anomaly detection.

The technical scheme of the embodiment of the present disclosure is realized in this way:

An embodiment of the present disclosure provides an anomaly detection method, including: using a first sample set to train an initial detection network to obtain a detection network; the first sample set is a positive sample set; During the abnormal detection process of the image to be detected, a second sample set is obtained based on the detected normal image set and abnormal image set, and the second sample set is an incremental sample set including positive samples and negative samples; wherein, the The negative samples are abnormal images with abnormalities in the image; using the second sample set, the detection network is updated and trained to obtain an updated detection network.

An embodiment of the present disclosure provides an abnormality detection device, including: a first training part configured to use a first sample set to train an initial detection network to obtain a detection network; the first sample set is a positive sample set; The acquisition part is configured to obtain a second sample set based on the detected normal image set and abnormal image set during the process of the detection network performing anomaly detection on a plurality of images to be detected, and the second sample set contains positive Incremental sample sets of samples and negative samples; wherein, the negative samples are abnormal images with abnormalities in the image; the second training part is configured to use the second sample set to perform update training on the detection network, Get the updated detection network.

An embodiment of the present disclosure provides an electronic device, including: a memory configured to store executable instructions; a processor configured to implement the above abnormality detection method when executing the executable instructions stored in the memory.

An embodiment of the present disclosure provides a computer-readable storage medium, which stores a computer program and is used to realize the above abnormality detection method when executed by a processor.

An embodiment of the present disclosure provides a computer program, including computer readable codes. When the computer readable codes run in an electronic device, a processor in the computer device executes the above-mentioned anomaly detection method. step.

An embodiment of the present disclosure provides a computer program product, including computer program instructions, where the computer program instructions cause a computer to execute the steps in the above anomaly detection method.

Embodiments of the present disclosure have the following beneficial effects:

The positive sample set is used to train the initial detection network to obtain the detection network. In the process of abnormal detection of multiple images to be detected by the detection network, based on the detected normal image set and abnormal image set, a positive sample and a negative sample are obtained. The incremental sample set of , where the negative sample is an abnormal image with abnormalities in the image, and the incremental sample set including positive samples and negative samples is used to update and train the detection network to obtain the updated detection network. Therefore, we get The updated detection network is more suitable for the actual detection scene, and the detection accuracy of the updated detection network is higher, so that when using the updated detection network to detect abnormalities in the image to be detected, the detection results obtained are more accurate; at the same time, due to Incremental samples include positive samples and negative samples, so the updated detection network can be trained using incremental samples of positive samples and negative samples, so that the detection network can be compatible with different situations such as positive samples and negative samples. , improving the versatility and flexibility of anomaly detection.

Description of drawings

FIG. 1 is a schematic flowchart of an optional anomaly detection method provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an exemplary negative sample set including 3 negative samples provided by an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of an optional anomaly detection method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic flow diagram of obtaining a multi-scale feature sequence of a target image I provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a reconstructed network structure of an encoder-decoder structure provided by an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of an exemplary process for obtaining image reconstruction features provided by an embodiment of the present disclosure;

Fig. 7 is a schematic diagram of the effect of an abnormality detection image A' obtained by exemplary abnormality detection on a target image A provided by an embodiment of the present disclosure;

FIG. 8 is a schematic flowchart of an optional anomaly detection method provided by an embodiment of the present disclosure;

9 is a schematic flow diagram of obtaining the training loss corresponding to the positive sample by reconstructing the sample feature difference sequence between the sample feature sequence and the multi-scale sample feature sequence provided by the embodiment of the present disclosure;

FIG. 10 is a schematic flowchart of an optional anomaly detection method provided by an embodiment of the present disclosure;

FIG. 11 is a schematic flowchart of an optional anomaly detection method provided by an embodiment of the present disclosure;

FIG. 12 is an optional schematic flowchart of an abnormality detection method provided by an embodiment of the present disclosure;

FIG. 13 is a schematic flowchart of an exemplary updated reconstructed network provided by an embodiment of the present disclosure;

FIG. 14 is a schematic structural diagram of an abnormality detection device provided by an embodiment of the present disclosure;

FIG. 15 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below in conjunction with the accompanying drawings. All other embodiments obtained under the premise of creative labor belong to the protection scope of the present disclosure.

In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.

In the following description, the term "first\second\third" is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, "first\second\third" Where permitted, the specific order or sequencing may be interchanged such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terms used herein are only for the purpose of describing the embodiments of the present disclosure, and are not intended to limit the present disclosure.

The embodiment of the present disclosure provides an anomaly detection method, which can improve the accuracy and flexibility of anomaly detection. The anomaly detection method provided by the embodiments of the present disclosure is applied to electronic equipment. The electronic device provided by the embodiment of the present disclosure can be implemented as AR glasses, notebook computer, tablet computer, desktop computer, set-top box, mobile device (for example, mobile phone, portable music player, personal digital assistant, dedicated message device, portable game device) Various types of user terminals can also be implemented as servers.

FIG. 1 is a schematic flowchart of an optional anomaly detection method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 1 .

S101. Using the first sample set to train the initial detection network to obtain the detection network; the first sample set is a positive sample set.

The anomaly detection method in the embodiments of the present disclosure is applicable to the scene where anomalies are discovered through image detection. For example, by performing anomaly detection on product images on the production line, abnormal products, such as defective products, etc. are found; or, for batch images Anomaly detection is performed, and scenes such as abnormal images that do not conform to the preset overall specification are found, which can be selected according to the actual situation, which is not limited by the embodiments of the present disclosure.

In the embodiment of the present disclosure, the positive sample set contains at least one positive sample. A positive sample is a sample in which every pixel in the image is normal; an image with an abnormal part in the image is called an abnormal image, and the opposite of the positive sample is a negative sample, which is actually an abnormal image. Exemplarily, Fig. 2 shows a negative sample set that contains 3 negative samples, wherein: negative sample 2-1 is a germinated seed image, negative sample 2-2 is a toothbrush image with abnormal bristles at the tip of the brush head, negative sample 2-3 are images of nuts with breakage. 2-10, 2-20 and 2-30 in FIG. 2 respectively show the regions containing abnormalities (ie abnormal pixels) in the negative samples 2-1, 2-2 and 2-3.

In the embodiment of the present disclosure, the electronic device may use a positive sample set including at least one positive sample to continuously train the initial detection network until a trained detection network is obtained.

In some embodiments, the initial detection network may be composed of an initial feature extraction network and an initial reconstruction network, and the electronic device may perform continuous training on the initial feature extraction network and the initial reconstruction network simultaneously or separately to obtain a trained feature extraction network and an initial reconstruction network. The trained reconstruction network is used, and the obtained trained feature extraction network and the trained reconstruction network are used as the obtained detection network. In other embodiments, the initial detection network can also be composed of a pre-trained feature extraction network and an initial reconstruction network, and the electronic device can only perform continuous training on the initial reconstruction network to obtain a trained reconstruction network, and The pre-trained feature extraction network and the trained reconstruction network are used as the obtained detection network.

In some embodiments, the electronic device may use a deep convolutional neural network (Convolutional Neural Network, CNN) network including multiple feature extraction layers, such as ResNet-34, as the feature extraction network.

In some embodiments, the reconstructed network may be implemented by a neural network with an encoder-decoder (Transformer) structure. Among them, Transformer is a deep neural network with a global attention mechanism, which first appeared in the field of natural language processing. In recent years, Transformer has gradually been widely used in computer vision tasks due to its powerful representation capabilities. At present, the application of Transformer has not been involved in the anomaly detection scenario, but the embodiments of the present disclosure can extend the application of Transformer to the anomaly detection scenario, and use Transformer to capture the global attention of the input sequence to strengthen the semantic information of the reconstruction network. Deep understanding enables the reconstruction network to realize feature reconstruction on the premise of understanding the deep semantic features of the image.

S102. During the abnormality detection process of multiple images to be detected by the detection network, a second sample set is obtained based on the detected normal image set and abnormal image set, and the second sample set is an incremental sample including positive samples and negative samples set; where the negative samples are abnormal images with abnormalities in the image.

In the embodiment of the present disclosure, the image to be detected may be any image, for example, may be the product image on the above-mentioned production line, which is not limited in the embodiment of the present disclosure. In some embodiments, the electronic device can acquire the image of the item to be detected as the image to be detected through the image acquisition device, for example, collect the image of the product produced on the production line to obtain the image to be detected, so as to pass the method of the embodiment of the present disclosure Anomaly detection is performed on the image to be detected, so as to determine whether the object to be detected is abnormal; or, the electronic device can also directly obtain the image to be detected from other devices. The electronic device may acquire one image to be detected at a time, or may acquire multiple images to be detected at a time, which is not limited in this embodiment of the present disclosure.

In the embodiment of the present disclosure, the electronic device can use the detection model trained by the positive sample set to perform anomaly detection on multiple images to be detected to obtain a normal image set and an abnormal image set, and based on the obtained normal image set and abnormal image set , to obtain an incremental sample set consisting of a normal image set and an abnormal image set, wherein the normal image set contains at least one normal image, and the abnormal image set contains at least one abnormal image.

In some embodiments, the electronic device may use all detected normal images as a normal image set, and use all detected abnormal images as an abnormal image set. In some embodiments, the electronic device may also select some normal images and abnormal images from all detected normal images and all abnormal images, so as to obtain a normal image set and an abnormal image set, which is not limited in this embodiment of the present disclosure .

S103. Using the second sample set, update and train the detection network to obtain an updated detection network.

In the embodiment of the present disclosure, the electronic device can use the incremental sample set including positive samples and negative samples, and continue to update and train the obtained detection network until the final loss reaches the preset loss threshold. detection network, and use the trained detection network as an updated detection network.

In some embodiments, after the above S103, S104-S107 may also be executed; FIG. 3 is an optional schematic flowchart of the anomaly detection method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 3 .

S104. Using the updated feature extraction network, perform feature processing of different scales on the target image to obtain a multi-scale feature sequence; the updated detection network includes: an updated feature extraction network and an updated reconstruction network.

In the embodiment of the present disclosure, the target image may be any image, for example, may be the image of the above-mentioned product on the production line, which is not limited in the embodiment of the present disclosure.

In the embodiment of the present disclosure, when the updated detection network includes: the updated feature extraction network and the updated reconstruction network, the electronic device can use the updated feature extraction network to perform feature extraction of different scales on the target image Extract to obtain multiple features of different scales of the target image at multiple scales; fuse multiple features of different scales to obtain multi-scale features; perform shape transformation on multi-scale features, and convert multi-dimensional feature information into In sequence form, a multi-scale feature sequence is obtained. For example, in the case of using ResNet-34 as the feature extraction network, the electronic device can use the updated ResNet-34 to perform multi-scale feature extraction on the target image, and use the underlying feature extraction layer of ResNet-34 to obtain the representation of the target image. Visual features, such as color, boundary division, contrast, brightness, etc., use the high-level extraction layer of ResNet-34 to output high-level semantic information of the target image; electronic devices use the apparent features and high-level semantic information of the target image as multiple Features of different scales are combined to obtain multi-scale features.

Exemplarily, as shown in FIG. 4 , the electronic device uses the updated CNN network as a feature extraction network to perform feature extraction of four scales on the target image I to obtain features B(I) of four scales, where,

B(I)=f ₁ , f ₂ , f ₃ , f ₄ . here,

i represents the i-th network layer of the CNN network. It can be seen from Figure 4 that the scale features extracted by different network layers have different feature sizes. The electronic device can first adjust the features of different sizes to the same size, such as adjusting to the size H×W, and then perform feature fusion on the four adjusted features to obtain multi-scale features, as shown in formula (1):

In formula (1), R represents the resizing operation, and C = ∑ _i C _i represents the feature channel of the multi-scale feature f. It can be seen that in the embodiments of the present disclosure, multiple features of different scales obtained from different feature extraction layers have different levels of receptive fields, and are very sensitive to different degrees of abnormal regions that may exist in the target image. Therefore, based on the multi-scale Anomaly detection using features is more helpful to improve the accuracy of anomaly detection.

In the embodiment of the present disclosure, the electronic device can perform shape transformation on the multi-scale features, so as to transform the multi-scale features into a sequence form, which is used as an input for the subsequent reconstruction network. In some embodiments, as shown in Figure 4, for a multi-scale feature f with a dimension of H×W×C, the electronic device can merge the dimensions of H×W corresponding to each channel to realize the shape transformation of the multi-scale feature , get H×W C-dimensional multi-scale feature vectors as multi-scale feature sequences.

In some embodiments, the multi-scale feature sequence can be a feature word sequence containing any form of word vector, such as word2vec sequence, GloVe sequence, one-hot encoding sequence, etc., which can be selected according to the actual situation, and the embodiment of the present disclosure is not limited .

S105. Using the updated reconstruction network, reconstruct the multi-scale feature sequence and the preset query word sequence to obtain the reconstructed feature sequence.

In the embodiment of the present disclosure, the electronic device can use the sequence-to-sequence (Sequence to Sequence) conversion model as the updated reconstruction network, use the multi-scale feature sequence in the form of sequence as the guidance information, and combine the guidance information with the preset inquiry The word sequence is reconstructed to obtain the reconstructed feature sequence, and then anomaly detection is realized based on the reconstructed feature sequence.

In some embodiments, the updated reconstructed network may be implemented by a neural network with an encoder-decoder (Transformer) structure. It should be noted that the updated reconstructed network has the same structure as the reconstructed network trained with the positive sample set, but the network parameters are different. In some embodiments, if the multi-scale feature sequence f _s is reconstructed directly through the reconstruction network of the Transformer structure, a single-input and single-output reconstructed feature sequence will be obtained

in,

R stands for Refactoring Operations. However, this single-input-single-output method tends to cause the reconstruction network to only learn simple identity mappings, and it is difficult to understand the task goals that need to be learned from the input samples. In this way, no matter whether the input is a normal sample or an abnormal sample, the model Both can only output the simple identity mapping of the input samples, and cannot learn the ability to distinguish abnormal images. Therefore, the embodiment of the present disclosure introduces a learnable query word sequence, uses the query word sequence and the multi-scale feature sequence as dual inputs of the reconstruction network to perform feature reconstruction, and obtains the reconstructed feature sequence.

In some embodiments, the reconstructed feature sequence obtained by the double-input single-output method in the embodiment of the present disclosure can be expressed as

Among them, q represents a preset query word sequence, and the preset query word sequence has the same dimension as the multi-scale feature sequence.

In the embodiment of the present disclosure, the query word sequence is a vector sequence that can be learned. The electronic device can initialize the vectors in the query word sequence at the start-up stage of the model training of the reconstructed network, and iteratively update the initial query word sequence during the process of training the reconstructed network, and finally obtain the preset sequence of query words.

In the embodiment of the present disclosure, the electronic device encodes the multi-scale feature sequence based on the encoder part in the reconstruction network of the Transformer structure to obtain the coded sequence. The electronic device can use the coding sequence as the guidance information for reconstructing the decoder part of the network, which is used to guide the decoding process of the encoder, so that based on the coding sequence, the decoder can decode and reconstruct the preset query word sequence , to get the reconstructed feature sequence.

In some embodiments, the reconstruction network of the Transformer structure may be shown in FIG. 5 , including an encoder obtained by stacking N encoding modules and a decoder obtained by stacking N decoding modules. Among them, each of the N coding modules includes a multi-head self-attention (Multi-head Attention) layer, a feed-forward network (Feed Forward Network, FFN) and a residual and normalization (ADD&Normalize) layer. Exemplarily, the feedforward network may be a fully connected neural network.

Exemplarily, as shown in Figure 5, the electronic device inputs the multi-scale feature sequence obtained in Figure 4 into the encoder, and converts it into Q vector, K vector and V vector for attention calculation through three preset weight matrices Vector, and then through the multi-head self-attention layer, calculate the attention value according to the Q vector, K vector and V vector, and pass the residual and normalization layer according to the calculated attention value, as well as the original Q vector, K vector and The V vector performs residual calculation and normalization processing, and performs linear and nonlinear transformation on the processing results through the feedforward network, and then passes through the residual and normalization layer to obtain the coding sequence corresponding to the multi-scale feature sequence, and convert the coding sequence to Enter the decoder. Furthermore, in the decoder, through the multi-head self-attention layer and the residual and normalization layer, the attention value calculation and residual normalization processing are performed on the preset query word sequence, and the encoding output by the encoder The sequence is combined with the processing results of the preset query word sequence, and the attention value is calculated again to obtain the combined attention value, and the residual and normalization is performed by combining the processing results of the preset query word sequence and the combined attention value. Normalization processing, and then through the feedforward network and the residual normalization network, the reconstructed feature sequence is obtained.

In some embodiments, as shown in FIG. 5 , the electronic device can also use the property of Transformer permutation invariance to add position information to each layer of the multi-head self-attention layer, for example, it can be sinusoidal position information, so as to pass The location information further improves the reconstruction network's ability to understand semantics. The selection can be made according to the actual situation, which is not limited in the embodiment of the present disclosure.

Here, the electronic device uses the multi-scale feature sequence extracted from the image as guidance information, and performs double-input feature reconstruction on the learnable query word sequence, which can make the reconstruction network combine the semantics in the query word sequence, Decoding reconstruction is performed on the basis of deep semantic understanding, so as to improve the accuracy of reconstructed features, and then improve the accuracy of anomaly detection.

The above process of S104-S105 can be shown in Figure 6. As shown in Figure 6, the electronic device can perform feature extraction and feature fusion of different scales on the target image through the updated feature extraction network to obtain multi-scale features; The shape of the feature is transformed to obtain a multi-scale feature sequence, and then the multi-scale feature sequence and the initial query word sequence are reconstructed through the updated reconstruction network to obtain the reconstructed feature sequence. Further, the electronic device can perform shape transformation on the reconstructed feature sequence in the form of a sequence to obtain the reconstructed feature.

S106. Determine the feature difference at each pixel position of the target image according to the reconstructed feature sequence and the multi-scale feature sequence.

In the embodiment of the present disclosure, the multi-scale feature sequence represents the original features in the target image, and the reconstructed feature sequence represents the features reconstructed based on the original features. The electronic device can compare the reconstruction feature sequence with the multi-scale feature sequence Difference, which determines the sequence of feature differences for the target image.

In the embodiment of the present disclosure, each feature difference in the feature difference sequence may include corresponding pixel position information and a difference value in the target image. The electronic device can determine the difference value at each pixel position in the target image through the feature difference, and use the feature difference to represent the reconstruction effect of the reconstruction network on each pixel of the target image.

In some embodiments, the electronic device may calculate a feature difference between the reconstructed features and the multi-scale features. Exemplarily, the electronic device can calculate the feature difference between the reconstructed feature and the multi-scale feature through formula (2), as follows:

In formula (2), f(u,i) is a multi-scale feature,

is the reconstructed feature; u is the index coordinate of the spatial position, for example, it can be the coordinates of each position in the H×W feature map of multi-scale features or reconstructed features. i is the index of the feature channel. The electronic device can use the formula (2) to subtract the multi-scale features and the reconstructed features to obtain the feature difference d(u,i) at each position.

In some embodiments, the electronic device may also perform difference calculation according to the multi-scale feature sequence and the reconstructed feature sequence to obtain a feature difference sequence.

S107. Determine an abnormality score at each pixel position based on the feature difference, and draw an abnormality detection image corresponding to the target image based on the abnormality score.

In the embodiment of the present disclosure, since the reconstruction network has the characteristics that the reconstructed features of normal images or normal pixels are close to the original features, while the reconstructed features of abnormal images or abnormal pixels are far away from the original features, so the electronic The device can determine the abnormal score at each pixel position based on the reconstruction effect at each pixel position represented by the feature difference, and the abnormal score is used to indicate whether the corresponding pixel is an abnormal pixel. Based on the anomaly score, the electronic device distinguishes and draws normal pixels and abnormal pixel points respectively, and obtains an abnormality detection image corresponding to the target image. In this way, abnormal regions can be automatically marked in anomaly detection images.

In some embodiments, the electronic device can compare the abnormality score of each pixel position with the preset scoring interval, and determine The target pixel value corresponding to each pixel location. The electronic device draws an abnormality detection image according to the target pixel value corresponding to each pixel position.

Exemplarily, when the abnormality score at each pixel position is less than or equal to the preset scoring threshold, the electronic device draws the pixel value corresponding to the pixel position as the first value; the abnormality score at each pixel position is greater than In the case of a preset scoring threshold, the electronic device draws the pixel value corresponding to the pixel position as a second value; until the pixel position of the target image is drawn, an abnormality detection image is obtained; for example, for a target image A, the electronic device An abnormality score of the target image A can be obtained, and an abnormality detection image A' is drawn according to the abnormality score, so as to use the abnormality detection image A' to represent whether the target image A is a normal image or an abnormal image. Exemplarily, FIG. 7 is a schematic diagram of an abnormality detection image A' obtained after abnormality detection of a target image A provided by an embodiment of the present disclosure; as shown in FIG. 7 , when the target image A is a toothbrush image Next, the drawn anomaly detection image is toothbrush image A', and, as shown in image A, in the case of abnormal bristles at the tip of the brush head, in the drawn toothbrush image A', the area where the bristles at the tip of the brush head is 7- The pixel color (pixel value) of 10 is different from the pixel color of the region 7-20 where other bristles are located, which indicates that the target image A is an abnormal image.

In some embodiments, the above S101 may be implemented through S1011-S1013; FIG. 8 is an optional schematic flow chart of the anomaly detection method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 8 .

S1011. Use the initial detection network to detect the positive samples in the first sample set, and obtain a first feature difference sequence corresponding to the positive samples.

In the embodiment of the present disclosure, for each positive sample, the electronic device can obtain the multi-scale feature sequence of the positive sample through the initial detection network, and reconstruct it according to the multi-scale feature sequence and the preset query word sequence to obtain the The reconstructed feature sequence of the positive sample, and by comparing the difference between the reconstructed feature sequence and the multi-scale feature sequence, determine the feature difference sequence of the positive sample, which is hereinafter referred to as the first feature difference sequence. It should be noted that the initial detection network and the updated detection network have the same network structure, but different network parameters. Therefore, the process of the electronic device obtaining the first feature difference sequence corresponding to the positive sample is the same as the above S104-S106.

S1012. Determine the training loss corresponding to the positive sample according to the first feature difference sequence and the normal loss function; wherein, the normal loss function indicates that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample.

In the embodiment of the present disclosure, when the electronic device obtains the first feature difference sequence, it may use a normal loss function to calculate the training loss corresponding to the positive sample.

In some embodiments, the normal loss function may include a regression loss function; for example, it may be smooth (Smooth) L1 loss, MSE loss, etc., which can be selected according to actual conditions, and are not limited in this embodiment of the present disclosure. Exemplarily, the electronic device can use the smooth L1 loss as a normal loss function to calculate the training loss corresponding to the positive sample, as shown in formula (3), as follows:

In formula (3),

Represents the first feature difference or the first feature difference sequence, and L _nor represents the training loss corresponding to the positive sample.

In some embodiments, FIG. 9 shows that in the case of a positive sample, the training loss corresponding to the positive sample is obtained through the first feature difference sequence of the positive sample.

It can be understood that the electronic device can calculate the gap between the original feature and the reconstructed feature through the normal loss function to evaluate the reconstruction ability of the initial detection network for the positive sample, so that the initial detection network can support the use of positive samples training.

S1013. Based on the training loss, train and adjust the initial detection network until the obtained final loss is less than the preset loss threshold, and obtain the detection network.

In the embodiment of the present disclosure, when the electronic device obtains the training loss corresponding to the positive sample, it can adjust the network parameters of the initial detection network based on the training loss corresponding to the positive sample, and complete the training process of using the positive sample in the current round. The electronic device can continue to use the positive samples in the positive sample set to iteratively train the initial detection network until the final loss is less than the preset loss threshold to obtain the detection network.

In the embodiments of the present disclosure, the preset loss threshold may be set according to actual needs, which is not limited in the embodiments of the present disclosure.

In some embodiments, the initial detection network can be composed of a pre-trained feature extraction network and an initial reconstruction network; based on this, the electronic device can perform an initial reconstruction on the initial reconstruction network according to the obtained first feature difference sequence and normal loss function Training, and when the final loss is less than the preset loss threshold, the reconstruction network is obtained, so as to obtain the detection network composed of the pre-trained feature extraction network and the obtained reconstruction network.

In other embodiments, the initial detection network can be composed of an initial feature extraction network and an initial reconstruction network, and the electronic device can calculate the initial The training loss corresponding to the feature extraction network. At the same time, the first feature difference sequence is obtained according to the obtained multi-scale features and reconstruction features, and the initial reconstruction is determined according to the first feature difference sequence and the normal loss function corresponding to the initial reconstruction network. The training loss corresponding to the network, and the training loss corresponding to the initial feature extraction network is used to adjust the initial feature extraction network, and the training loss corresponding to the initial reconstruction network is used to train and adjust the initial reconstruction network until the initial feature extraction is obtained When the final loss corresponding to the network is less than the corresponding preset loss threshold, and the final loss corresponding to the initial reconstruction network is less than the corresponding preset loss threshold, the feature extraction network and the reconstruction network are obtained, so that the feature extraction network and reconstruction network are obtained. A detection network composed of a structural network. In some embodiments, the electronic device may also train the initial feature extraction network and the initial reconstruction network separately, and the training method is the same as the above-mentioned method.

In some embodiments, after the above S101, S201-S203 may also be executed; FIG. 10 is an optional flowchart of the anomaly detection method provided by the embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 10 .

S201. Use a detection network to perform anomaly detection on each image to be detected, and obtain an anomaly score at each pixel position of each image to be detected.

In the embodiment of the present disclosure, for each image to be detected, the electronic device can use the detection network to perform feature processing on the image to be detected at different scales to obtain a multi-scale feature sequence, and to obtain multi-scale feature sequences and preset questions. The query word sequence is reconstructed to obtain the reconstructed feature sequence, and according to the reconstructed feature sequence and the obtained multi-scale feature sequence, the feature difference at each pixel position of the image to be detected is determined, and according to the feature difference, determine The anomaly score of each pixel position of the image to be detected is obtained.

S202. Draw an abnormality detection image corresponding to each image to be detected based on the abnormality score.

S203. Obtain a normal image set and an abnormal image set among the plurality of images to be detected according to the abnormality detection image.

In the embodiment of the present disclosure, for each image to be detected, the electronic device can draw an abnormality detection image corresponding to the image to be detected according to the abnormality score at each pixel position of the image to be detected and the preset scoring threshold , and determine whether the image to be detected is an abnormal image or a normal image according to the corresponding abnormal detection image; in this way, the electronic device can divide multiple detected images to be detected into a normal image set and an abnormal image set, thus obtaining A normal image set contains at least one normal image, and an abnormal image set contains at least one abnormal image.

In some embodiments of the present disclosure, the electronic device may take the image to be detected as an abnormal image when the abnormality detection image corresponding to each image to be detected is represented as an abnormal image, and traverse After a plurality of images to be detected, an abnormal image set containing at least one abnormal image is obtained; The detected image is used as a normal image, and after traversing through multiple images to be detected, a normal image set containing at least one normal image is obtained; in this way, an abnormal image set and a normal image set can be obtained.

In some embodiments of the present disclosure, the electronic device may stop detecting the image to be detected when the number of detected images to be detected reaches a preset number, and according to each detected image to be detected Anomaly detection images corresponding to the images to be detected, divide multiple images to be detected into normal image sets and abnormal image sets; in this way, a sufficient number of normal images and abnormal images can be obtained, which is conducive to subsequent updates with higher detection accuracy After the detection network. In some embodiments of the present disclosure, the electronic device may also, when the number of abnormal images in the obtained abnormal image set reaches a preset number, or when the number of normal images in the obtained normal image set reaches a preset number In the case, stop the detection of the image to be detected, and divide the multiple images to be detected into a normal image set and an abnormal image set according to the abnormal detection image corresponding to each image to be detected in the multiple images to be detected that have been detected; In this way, a sufficient number of abnormal images and normal images can be obtained, which is conducive to obtaining an updated detection network with higher detection accuracy.

In some embodiments of the present disclosure, the electronic device may also, in the obtained abnormal image set, the abnormal images corresponding to the first maximum abnormal score belonging to the first preset value range of the preset abnormal threshold (hereinafter referred to as abnormal suspicious images ) reaches the preset number (the first maximum abnormal score corresponding to each abnormal image is: the maximum abnormal score among the abnormal scores of all pixels in the abnormal image), or, in the obtained normal image set, the corresponding When the number of normal images (hereinafter referred to as normal suspicious images) whose second maximum abnormality score belongs to the second preset value range of the preset abnormality threshold reaches the preset number (the second maximum abnormality corresponding to each normal image The score is: the maximum abnormal score in the abnormal scores of all pixels in the normal image), stop the detection of the image to be detected, and according to the abnormal detection image corresponding to each image to be detected in the multiple images to be detected that have been detected, Divide multiple images to be detected into a normal image set and an abnormal image set; in this way, a sufficient number of normal suspicious images or abnormal suspicious images can be obtained, which is conducive to obtaining an updated detection network with higher detection accuracy. It should be noted that the first preset numerical range of the preset abnormal threshold and the second preset numerical range of the preset abnormal threshold are ranges composed of values close to the preset abnormal threshold, which can be set according to actual needs. In some embodiments of the present disclosure, the first maximum abnormality score corresponding to the abnormal image is usually lower than the preset abnormality threshold, and the second maximum abnormality score corresponding to the normal image is usually higher than the preset abnormality threshold, for example, when the preset abnormality threshold is In the case of 5, the first preset value range may be [4.5,5), and the second preset value range may be (5,5.5].

In some embodiments of the present disclosure, the electronic device may also obtain a preset number of abnormal images in the set of abnormal images; and, in the obtained set of normal images, the number of normal suspicious images reaches a preset number In the case of a large number, stop the detection of the image to be detected, and divide the multiple images to be detected into a normal image set and an abnormal image according to the abnormal detection image corresponding to each image to be detected in the multiple images to be detected that have been detected In this way, a sufficient number of normal suspicious images and abnormal suspicious images can be obtained, which is conducive to obtaining an updated detection network with higher detection accuracy.

In some embodiments, obtaining the second sample set based on the detected normal image set and abnormal image set in S102 above may be implemented through S1021-S1022.

S1021. Determine images to be verified from the detected normal image set and abnormal image set.

S1022. Perform verification on the image to be verified to obtain a second sample set.

In the embodiment of the present disclosure, the electronic device can determine the image to be verified according to all normal images and abnormal images contained in the obtained normal image set and abnormal image set, and obtain the first positive sample and negative sample according to the verification result. Two sample sets.

In the embodiment of the present disclosure, the electronic device may automatically verify the image to be verified, or may verify the image to be verified according to the received verification operation of the user, which is not limited in the embodiment of the present disclosure.

In some embodiments, the electronic device may use all normal images and abnormal images included in the normal image set and the abnormal image set as images to be verified for verification.

In some embodiments, the electronic device may also select some normal images and some abnormal images from the normal image set and the abnormal image set as images to be verified for verification. Exemplarily, the above S1022 can be implemented by using S301-S304:

S301. From each abnormal image in the abnormal image set, determine the first maximum abnormal score among the abnormal scores at each pixel position.

When the electronic device obtains the abnormal image set, for each abnormal image in the abnormal image set, it can determine a maximum abnormal score from the abnormal scores of all pixels in the abnormal image as the first maximum abnormal score.

S302. From each normal image in the normal image set, determine the second largest abnormality score among the abnormality scores at each pixel position.

When the electronic device obtains the normal image set, for each normal image in the normal image set, it can determine a maximum abnormality score as the second maximum abnormality score from the abnormality scores of all pixels in the normal image.

S303. When the first maximum abnormality score belongs to the first preset value range of the preset abnormality threshold, determine that the abnormal image corresponding to the first maximum abnormality score belongs to the image to be verified.

For an abnormal image, when the electronic device selects the first maximum abnormal score, it can determine whether the selected first maximum abnormal score is within the first preset value range of the preset abnormal threshold, and within the selected If the first maximum abnormality score is within the first preset value range of the preset abnormality threshold, the abnormal image is used as the image to be verified.

In some embodiments of the present disclosure, the first preset numerical range of the preset abnormal threshold is a range consisting of values close to the preset abnormal threshold, which can be set according to actual needs, which is not limited in the embodiments of the present disclosure. In some embodiments of the present disclosure, the first maximum abnormality score corresponding to the abnormal image is usually lower than the preset abnormality threshold. For example, when the preset abnormality threshold is 5, the first preset value range may be [4.5,5 ); then in the case where the first maximum abnormal score determined from the abnormal scores of all pixels of an abnormal image B is 4.5, it means that the first maximum abnormal score corresponding to the abnormal image B belongs to the first preset value range [4.5,5), then the abnormal image B can be used as the image to be verified.

S304. When the second maximum abnormality score belongs to the second preset value range of the preset abnormality threshold, determine that the normal image corresponding to the second maximum abnormality score belongs to the image to be verified.

When the electronic device obtains a normal image set, for each normal image in the normal image set, it can determine a maximum abnormal score from the abnormal scores of all pixels in the normal image as the second maximum abnormal score , and determine whether the selected second maximum abnormal score is within the second preset numerical range of the preset abnormal threshold, and if the selected second maximum abnormal score is within the second preset numerical range of the preset abnormal threshold In this case, the normal image is used as the image to be verified.

In some embodiments of the present disclosure, the second preset numerical range of the preset abnormal threshold is also a range consisting of values close to the preset abnormal threshold, which can be set according to actual needs. In some embodiments of the present disclosure, the second maximum abnormality score corresponding to a normal image is usually higher than the preset abnormality threshold. For example, when the preset abnormality threshold is 5, the second preset value range may be (5,5.5 ]; then in the case that the second maximum abnormality score determined from the abnormality scores of all pixels of a normal image C is 5.1, it means that the second maximum abnormality score corresponding to the normal image C belongs to the second preset value range (5,5.5], the normal image C can be used as the image to be verified.

In some embodiments, the above S1022 may be implemented through S401-S404; FIG. 11 is an optional schematic flow chart of the anomaly detection method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 11 .

S401. Perform verification on at least one abnormal image and at least one normal image respectively to obtain respective verification results; the images to be verified include: at least one abnormal image and at least one normal image.

In the embodiment of the present disclosure, the electronic device can verify each image in the abnormal image set and the normal image set to determine whether the abnormality detection result of the image is correct, and obtain the verification result of the image, and the verification result represents The detection result of this image is correct or incorrect.

S402. Use the abnormal image whose verification result represents correctly as a negative sample, and use the normal image whose verification result represents correctly as a positive sample.

In the embodiment of the present disclosure, for an abnormal image with a correct verification result, the electronic device sets labeling information for the abnormal image to indicate that the abnormal image is a negative sample, and obtains a negative sample; for a normal image with a correct verification result, the electronic device sets the normal image Set the label information that characterizes the normal image as a positive sample, and obtain a positive sample for subsequent update training of the detection network.

S403. Correctly annotate the abnormal images with wrong representations of the verification results and the normal images with wrong representations of the verification results to obtain positive samples and negative samples.

In the embodiment of the present disclosure, for an abnormal image with an incorrect verification result, the electronic device will judge whether the abnormal image is an abnormal image or a normal image, and mark the abnormal image with correct annotation information according to the judgment result; and, for the verification result For wrong normal images, the electronic device will judge whether the normal images are abnormal images or normal images, and mark the normal images with correct labeling information according to the judgment results; for all abnormal images with wrong verification results, and all verification results After the wrong normal images are correctly labeled, the electronic device can obtain positive samples and negative samples according to the correctly labeled label information.

S404. Determine a set of negative samples and positive samples as a second sample set.

In the embodiment of the present disclosure, the electronic device may use all the obtained negative samples and positive samples as the second sample set for updating and training the detection network.

In some embodiments, the above S103 may be implemented through S1031-S1033; FIG. 12 is an optional schematic flow chart of the anomaly detection method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 12 .

S1031. In the case of using the positive samples in the second sample set to train the detection network, use the detection network to obtain the first feature difference sequence corresponding to the positive sample, and determine the positive The first training loss corresponding to the sample; the normal loss function represents that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample.

In the embodiment of the present disclosure, when the electronic device uses the positive samples in the second sample set to train the detection network, it can perform feature processing and reconstruction on the currently acquired positive samples through the method in FIG. 6 above, and Use the above formula (2) to calculate the degree of feature difference and other processing to obtain the first feature difference or the first feature difference sequence corresponding to the positive sample, and then use the normal loss function to calculate the first training loss corresponding to the positive sample, which is used for Detect the network to adjust the network parameters. It should be noted that the process of S1031 is the same as the content of the above S1011-S1012.

S1032. In the case of using the negative samples in the second sample set to train the detection network, use the detection network to obtain the second feature difference sequence corresponding to the negative sample, and according to the second feature difference sequence and the true value of the preset negative sample value and the abnormal loss function to determine the second training loss corresponding to the negative sample.

In the embodiment of the present disclosure, when the electronic device uses the negative samples in the second sample set to train the detection network, it can obtain negative samples from the second sample set, and the preset negative samples corresponding to the negative samples actual value. Exemplarily, the real value of the preset negative sample may be annotation information in the form of a pixel-level label or an image-level label, wherein the pixel-level label may be annotation information for each pixel in the negative sample, and is used to label each pixel It is a normal pixel or an abnormal pixel; the image-level label can be annotation information for the entire image, and is used to mark whether there is an abnormality in the entire image. Exemplarily, the image-level label corresponding to the image containing some abnormal pixels may be a label representing the abnormality of the entire image.

In the implementation of the present disclosure, the electronic device can perform feature processing and reconstruction on the negative samples through the above-mentioned method in FIG. Two feature difference sequences; furthermore, the electronic device calculates the difference between the second feature difference or the second feature difference sequence and the real value of the preset negative sample through the abnormal loss function, and obtains the second training loss corresponding to the negative sample, so that It is used to adjust the network parameters of the detection network.

S1033. Based on the first training loss and the second training loss, respectively train and adjust the detection network until the obtained final loss is less than the preset loss threshold, and obtain an updated detection network; wherein, the abnormal loss function represents the negative sample The reconstructed feature sequence corresponding to the abnormal part is pushed away from the abnormal part, and the reconstructed feature sequence corresponding to the normal part in the negative sample is pulled closer to the normal part.

In the embodiment of the present disclosure, the electronic device adjusts the network parameters of the detection network based on the first training loss corresponding to the positive samples or the second training loss corresponding to the negative samples, and completes the training process of using positive samples or negative samples in the current round. Afterwards, the electronic device can continue to use the positive samples and negative samples in the incremental samples to iteratively train the detection network until the preset training conditions are met, such as when the final loss is less than the preset loss threshold, the updated detection network.

It should be noted that, in the embodiment of the present disclosure, as described in the above-mentioned part S1013, the detection network may be composed of a pre-trained feature extraction network and a reconstruction network, and the electronic device may only train the reconstruction network to obtain the pre-trained An updated detection network composed of a well-trained feature extraction network and an updated reconstruction network; and, the detection network can also be composed of a feature extraction network and a reconstruction network, and the electronic device can simultaneously use the feature extraction network and the reconstruction network Training is performed, or the feature extraction network and the reconstruction network are trained separately to obtain an updated detection network composed of an updated feature extraction network and an updated reconstruction network. The process of the electronic device using the second sample set to train the detection network is the same as the process of the above S1013, the only difference is that the samples used are different, and the training losses corresponding to the obtained samples are different.

In some embodiments, for different types of real values of preset negative samples, such as pixel-level labels or image-level labels, the electronic device may use different abnormal loss functions to calculate the second training loss corresponding to the negative samples. It should be noted that, for the real values of different types of preset negative samples, the overall characteristics that the abnormal loss function needs to meet are: the reconstructed feature sequence corresponding to the abnormal part in the negative sample is pushed away from the abnormal part, and the negative sample The reconstructed feature sequence corresponding to the normal part of is closer to the normal part.

In some embodiments, the preset real value of the negative sample includes: a pixel-level real value, that is, a pixel-level label; the abnormal loss function includes: a pixel-level loss function. The electronic device trains and adjusts the detection network based on the second feature difference sequence of the negative sample, the preset real value of the negative sample, and the abnormal loss function until an updated detection network whose final loss is less than the preset loss threshold can be obtained. include:

1) Carry out scalarization processing of the channel dimension on the second feature difference sequence of the negative sample, and obtain the feature difference measure at each pixel position. Here, in the case where the actual value of the negative sample is the preset pixel-level actual value, taking the calculation of the abnormal loss function through the second feature difference sequence as an example, the electronic device can use the second feature difference sequence to represent each The feature difference vector at each pixel position is converted into a scalar in the channel dimension, so that each pixel position corresponds to only one scalar feature difference measure; 2) Based on the feature difference measure, pixel-level real value and pixel-level loss function, the detection network Perform training adjustments until an updated detection network with a final loss smaller than a preset loss threshold is obtained.

In some embodiments, the pixel-level loss function is a loss function with the above-mentioned "pull-in-pull-away" characteristic, and its design goal can be: using the pixel-level loss function, the normal pixel in the negative sample corresponds to the feature The difference measure is reduced to shorten the distance between the reconstructed features of the normal pixels and the original features, such as multi-scale features; at the same time, the feature difference measure at the corresponding position of the abnormal pixels in the negative sample is increased, that is, pushing The distance between the reconstructed feature of the far outlier pixel and the original feature.

In some embodiments, the pixel-level loss function includes: a normal pixel loss part and an abnormal pixel loss part. In the case that the pixel-level true value corresponding to a negative sample indicates that the pixel is a normal pixel, the electronic device can perform a weighted average on the feature difference measure of the normal pixel based on the normal pixel loss part to obtain the normal pixel loss; When the pixel-level real value corresponding to a negative sample indicates that the pixel is an abnormal pixel, the electronic device can perform a weighted average on the feature difference measure of the abnormal pixel based on the abnormal pixel loss part to obtain the abnormal pixel loss. Among them, the normal pixel loss is positively correlated with the feature difference measure of the normal pixel point, and the abnormal pixel point loss is negatively correlated with the feature difference measure of the abnormal pixel point, so as to achieve the reconstruction feature of the normal pixel point and the original feature. distance, and the effect of pushing away the distance between the reconstructed feature of the abnormal pixel and the original feature.

It is understandable that the electronic device can use the pixel-level loss function to evaluate the reconstruction ability of the detection network for negative samples of pixel-level labels, so that the reconstruction features of the detection network for normal pixels are close to the original features, and for abnormal The reconstructed features of pixels are far away from the original features, so that they are compatible with the training of negative samples of pixel-level labels, improve the flexibility and accuracy of the detection network using different types of samples for training, and then improve the flexibility and accuracy of anomaly detection based on the detection network. accuracy.

In some embodiments of the present disclosure, in the process of scalarizing the feature difference vectors at each pixel position, the electronic device equally averages the feature difference vectors corresponding to different feature channels. In practical applications, different feature channels may contribute differently to abnormalities. Therefore, electronic devices can also perform weight analysis on the feature difference vector corresponding to each feature channel through the classification network to combine different weights for pixel-level loss. The calculation of the function further improves the accuracy. In this case, the electronic device trains and adjusts the detection network based on the second feature difference sequence, the preset true value of the negative sample, and the abnormal loss function until an updated detection network with a final loss less than the preset loss threshold is obtained The process can include:

3) Use the classification network to classify the pixel positions of the second feature difference sequence, and obtain the sample classification probability at each pixel position; 4) Based on the sample classification probability at each pixel position, the pixel-level true value and the classification loss function to obtain the current classification loss; 5) based on the feature difference measure, the pixel-level real value and the pixel-level loss function, the first current loss is obtained; 6) based on the current classification loss and the first current loss, the detection network and the classification network are The training is adjusted until the updated classification network and the updated detection network with the final loss less than the preset loss threshold are obtained.

In some embodiments, for the second feature difference d(u,i) of H×W×C dimension, each feature difference vector in the second feature difference can be represented by d(u) of C dimension, the electronic device Each d(u) can be classified by formula (4), and the probability that each d(u) belongs to abnormality or defect can be obtained as the sample classification probability at the corresponding pixel position u, so as to obtain the The sample classification probability is as follows:

p(u)＝C(d(u)) (4)

In formula (4), p(u) is the sample classification probability at pixel position u. In this way, for each C-dimensional feature difference vector, the electronic device can use the classification network to perform classification prediction in combination with the feature differences corresponding to each feature channel in the C-dimensional channel, and obtain the sample classification probability at each pixel position.

In some embodiments, the classification network can be FFN (Feed Forward Network), or other network models with classification functions, which can be selected according to actual conditions, and are not limited in this embodiment of the present disclosure.

It can be understood that, by introducing a classification network to classify and predict the second feature difference sequence, the difference vectors of each dimension in each feature difference vector can be combined to obtain the sample classification probability at each pixel position, and the second is based on the sample classification probability weight. Calculation of the current loss, so as to further improve the accuracy of training the detection network using the first current loss, and further improve the accuracy of anomaly detection using the updated detection network.

In some embodiments, the preset real value of the negative sample includes: an image-level real value, that is, an image-level pixel label; and the abnormal loss function includes: an image-level loss function. The electronic device trains and adjusts the detection network based on the second feature difference sequence, the preset true value of the negative sample, and the abnormal loss function until an updated detection network whose final loss is less than the preset loss threshold can be obtained. The process may include:

7) Carry out scalarization processing of the channel dimension on the second feature difference sequence to obtain the feature difference measure at each pixel position; 8) From the feature difference measure at each pixel position, determine the K largest feature differences with the largest value measure; K is a positive integer greater than 1; 9) Average the K largest feature difference measures to obtain the feature average difference measure; 10) Based on the feature average difference measure, image-level real value and image-level loss function, the detection network Perform training adjustments until an updated detection network with a final loss smaller than a preset loss threshold is obtained.

In some embodiments, the image-level loss function includes: a normal image loss part and an abnormal image loss part. In the case that the image-level true value corresponding to a negative sample indicates that the image is a normal image, the electronic device measures the average difference of features based on the normal image loss part, and determines it as a normal image loss; the image-level true value corresponding to a negative sample When the value characterizes the image as a negative image, the electronic device obtains the abnormal image loss based on the abnormal image loss part and the feature average difference measure. Among them, the feature average difference measure of normal images is positively correlated with normal image loss, and the feature average difference measure of abnormal images is negatively correlated with abnormal image loss. The electronic device determines the second current loss based on the normal image loss or the abnormal image loss; based on the second current loss and the preset loss threshold, the detection network is trained and adjusted until an updated detection network whose final loss is less than the preset loss threshold is obtained .

In some embodiments, the image-level loss function is a loss function with the above-mentioned "pull-in-pull-away" characteristic, and its design goal can be: use the image-level loss function to reduce the average difference measure of the features corresponding to the normal image, so as to The distance between the reconstructed features of the normal image and the original features is shortened; at the same time, the average difference measure of the features corresponding to the abnormal image is increased, that is, the distance between the reconstructed features of the abnormal image and the original features is pushed farther.

It is understandable that the electronic device can evaluate the reconstruction ability of the detection network for negative samples of image-level labels through the image-level loss function, so that the reconstruction features of the detection network for normal images are close to the original features, and the negative samples The reconstructed features are far away from the original features, so that they are compatible with the training of negative samples of image-level labels, improve the flexibility and accuracy of the updated detection network using different types of samples for training, and then improve the anomaly detection based on the updated detection network. flexibility and accuracy.

In some embodiments, when the detection network or the updated detection network is obtained without using the classification network training, the method for the electronic device to calculate the abnormality score at each pixel position may be: the feature difference at each pixel position Perform channel averaging to obtain the intermediate anomaly score at each pixel location; use the intermediate anomaly score as the anomaly score.

In some embodiments, the electronic device can perform channel averaging on the feature difference at each pixel position in the image to be detected or the target image to obtain the intermediate abnormality score at each pixel position, and use the intermediate abnormality score as the abnormality score, as shown in the formula As shown in (5), as follows:

In formula (5), d'(u,i) is the feature difference at each pixel position, C is the number of feature channels corresponding to each feature difference in the feature difference sequence, and s(u) is the The intermediate anomaly score obtained by channel-averaging the feature differences on , that is, the anomaly score at each pixel position in the image to be detected or the target image output by the detection network that does not include the classification network.

In some embodiments, in the case of using the classification network training to obtain the detection network, the method for the electronic device to calculate the abnormality score at each pixel position may be: perform channel averaging on the feature difference at each pixel position, and obtain each The intermediate anomaly score at the pixel position; the classification network is used to classify the feature difference at each pixel position at the pixel position to obtain the classification probability; the classification probability is multiplied by the intermediate anomaly score to obtain the Exception score.

In some embodiments, the electronic device uses the classification probability as a weight to weight the intermediate anomaly score to obtain the anomaly score at each pixel position. Exemplarily, as shown in formula (6):

In the embodiment of the present disclosure, on the basis of using positive samples to obtain a trained detection network, incremental samples including negative samples and positive samples are also used to continue training the trained detection network until the trained updated The detection network makes the detection network compatible with the two learning methods of positive sample learning and incremental abnormal sample learning, thereby improving the compatibility and flexibility of the detection network.

FIG. 13 is a schematic flowchart of an updated detection network provided by an embodiment of the present disclosure. The technical solution of the embodiment of the present disclosure will be described below through a detailed embodiment with reference to FIG. 13 .

In the cold start phase of the electronic device, the initial detection network is continuously trained through the preset positive sample set until a trained detection network is obtained. In the deployment incremental training phase, the electronic device takes multiple images to be detected generated on the production line as production line data sets, input them into the detection network trained in the cold start phase, and collect the output corresponding to each image to be detected output by the detection network. Anomaly detection image, using the abnormal detection image as the collected detection result, and determining whether the corresponding image to be detected is a normal image or an abnormal image according to the collected detection result, so as to divide the detected multiple images to be detected into a normal image set and the abnormal image set, and the normal image set is used as a normal sample, and the abnormal image set is used as an abnormal sample; after that, both the normal sample and the abnormal sample are verified, and the verified normal sample and abnormal sample with label information are obtained, The normal samples (positive samples) with labeled information and the abnormal samples (negative samples) with labeled information are used together as the incremental data set (incremental sample set), and the incremental data set is used to train the cold start stage. The detection network continues to train continuously until the trained and updated detection network is obtained, and the detection network obtained in the cold start stage is updated with the updated detection network, so that the updated detection network is used to continue to process the images generated on the production line. detection.

The anomaly detection method provided by the embodiments of the present disclosure can be applied to an intelligent artificial intelligence (AI) training platform, anomaly detection, online automatic training and improvement of anomaly detection, and the like.

The present disclosure also provides a detection device. FIG. 14 is a schematic structural diagram of the abnormality detection device provided by the embodiment of the present disclosure; as shown in FIG. 14 , the abnormality detection device 500 includes: a first training part 501 configured to adopt the This episode trains the initial detection network to obtain a detection network; the first sample set is a positive sample set; the acquisition part 502 is configured to perform abnormal detection on multiple images to be detected by the detection network based on The detected normal image set and abnormal image set are used to obtain a second sample set, which is an incremental sample set including positive samples and negative samples; wherein, the negative sample is an abnormal image in which there is anomaly in the image ; The second training part 503 is configured to use the second sample set to perform update training on the detection network to obtain an updated detection network. .

In some embodiments of the present disclosure, the first training part 501 is further configured to use the initial detection network to detect the positive samples in the first sample set, and obtain the first A feature difference sequence; according to the first feature difference sequence and a normal loss function, determine the training loss corresponding to the positive sample; based on the training loss, train and adjust the initial detection network until the final loss obtained is less than When the loss threshold is preset, the detection network is obtained; wherein, the normal loss function represents that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample.

In some embodiments of the present disclosure, the acquisition part 502 is further configured to determine an image to be verified from the detected normal image set and the abnormal image set; and verify the image to be verified to obtain The second sample set.

In some embodiments of the present disclosure, the image to be verified includes: at least one abnormal image and at least one normal image; At least one normal image is verified separately to obtain respective verification results; the abnormal image with the correct verification result is used as the negative sample, and the normal image with the correct verification result is used as the positive sample; Correctly annotate the abnormal image with the wrong character of the verification result and the normal image with the wrong character of the verification result to obtain the positive sample and the negative sample; determine the set of the negative sample and the positive sample as the set Describe the second sample set.

In some embodiments of the present disclosure, the abnormality detection device 500 further includes: a detection part; the detection part is further configured to use the first sample set to train the initial detection network, and after obtaining the detection network, use the The detection network performs anomaly detection on each image to be detected, and obtains an abnormality score on each pixel position of each image to be detected; based on the anomaly score, draws an abnormality detection image corresponding to each image to be detected ; The acquiring part 502 is further configured to obtain the normal image set and the abnormal image set among the plurality of images to be detected according to the abnormality detection image.

In some embodiments of the present disclosure, the acquisition part 502 is further configured to, when the abnormality detection image representation corresponding to each image to be detected, and each image to be detected is an abnormal image, the Each image to be detected is used as an abnormal image, and after traversing the multiple images to be detected, the abnormal image set including at least one abnormal image is obtained; the abnormality detection image representation corresponding to each image to be detected, When each of the images to be detected is a normal image, each of the images to be detected is regarded as a normal image, and after traversing the plurality of images to be detected, the normal image set containing at least one normal image is obtained .

In some embodiments of the present disclosure, the acquiring part 502 is further configured to determine, from each abnormal image in the abnormal image set, the first maximum abnormal score among the abnormal scores at each pixel position; From each normal image in the normal image set, determine a second maximum abnormality score that is the largest among the abnormality scores at each pixel position; the first maximum abnormality score belongs to a first preset value of a preset abnormality threshold at the first maximum abnormality score range, determine that the abnormal image corresponding to the first maximum abnormality score belongs to the image to be verified; in the case that the second maximum abnormality score belongs to the second preset value range of the preset abnormality threshold, It is determined that the normal image corresponding to the second maximum abnormality score belongs to the image to be verified.

In some embodiments of the present disclosure, the acquiring part 502 is further configured to determine all images in the normal image set and the abnormal image set as the images to be verified.

In some embodiments of the present disclosure, the second training part 502 is further configured to use the detection network to obtain The first feature difference sequence corresponding to the positive sample, and according to the first feature difference sequence and the normal loss function, determine the first training loss corresponding to the positive sample; when using the negative sample in the second sample set , when the detection network is trained, the detection network is used to obtain the second feature difference sequence corresponding to the negative sample, and according to the second feature difference sequence, the preset true value and abnormality of the negative sample A loss function, determining the second training loss corresponding to the negative sample; based on the first training loss and the second training loss, respectively training and adjusting the detection network until the obtained final loss is less than the preset loss threshold, the updated detection network is obtained; wherein, the normal loss function represents that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample; the abnormal loss function represents the abnormality in the negative sample The reconstructed feature sequence corresponding to the part is pushed away from the abnormal part, and the reconstructed feature sequence corresponding to the normal part in the negative sample is pulled closer to the normal part.

In some embodiments of the present disclosure, the updated detection network includes: an updated feature extraction network and an updated reconstruction network; the detection part is further configured to use the second sample set to The detection network is updated and trained, and after the updated detection network is obtained, the updated feature extraction network is used to perform feature processing of different scales on the target image to obtain a multi-scale feature sequence; using the updated reconstruction network, reconstructing the multi-scale feature sequence and the preset query word sequence to obtain a reconstructed feature sequence; according to the reconstructed feature sequence and the multi-scale feature sequence, determine each of the target images A feature difference at a pixel position; based on the feature difference, determining an abnormality score at each pixel position, and drawing an abnormality detection image corresponding to the target image based on the abnormality score.

In the embodiments of the present disclosure and other embodiments, a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.

The embodiment of the present disclosure also provides an electronic device. FIG. 15 is a schematic structural diagram of the electronic device provided by the embodiment of the present disclosure. As shown in FIG. 15 , the electronic device 2 includes: a memory 21 and a processor 22, wherein the memory 21 and the processing The device 22 is connected through a communication bus 23; the memory 21 is configured to store executable instructions (executable computer programs); the processor 22 is configured to execute the executable instructions stored in the memory 21 to realize the implementation provided by the embodiments of the present disclosure. The method, for example, the anomaly detection method provided by the embodiment of the present disclosure.

The embodiment of the present disclosure provides a computer-readable storage medium storing a computer program for implementing the method provided by the embodiment of the present disclosure, for example, the anomaly detection method provided by the embodiment of the present disclosure when executed by the processor 22 .

In some embodiments of the present disclosure, the storage medium may be a tangible device capable of holding and storing instructions used by the instruction execution device, and may be a volatile storage medium or a non-volatile storage medium. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of computer-readable storage media include: portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or flash memory), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanically encoded device, such as a printer with instructions stored thereon A hole card or a raised structure in a groove, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., pulses of light through fiber optic cables), or transmitted electrical signals.

In some embodiments of the present disclosure, executable instructions may take the form of programs, software, software modules, scripts, or code written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages) , and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other part suitable for use in a computing environment.

As an example, executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in a Hyper Text Markup Language (HTML) document in one or more scripts, in a single file dedicated to the program in question, or in multiple cooperating files (for example, files that store one or more modules, subroutines, or sections of code).

As an example, executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, on multiple computing devices distributed across multiple sites and interconnected by a communication network. to execute.

To sum up, the initial detection network is trained with the positive sample set, and the detection network is obtained. In the process of abnormal detection of multiple images to be detected by the detection network, based on the detected normal image set and abnormal image set, the detection network is obtained. An incremental sample set of samples and negative samples, where the negative sample is an abnormal image with abnormalities in the image, and the incremental sample set including positive samples and negative samples is used to update and train the detection network to obtain an updated detection network , therefore, the obtained updated detection network is more suitable for the detection scene of the actual production line. more accurate; at the same time, since the incremental samples include positive samples and negative samples, the updated detection network obtained can be trained using incremental samples of positive samples and negative samples, so that the detection network can be compatible with positive samples and negative samples. Different situations such as negative samples improve the versatility and flexibility of anomaly detection.

The above descriptions are merely examples of the present disclosure, and are not intended to limit the protection scope of the present disclosure. Any modifications, equivalent replacements and improvements made within the spirit and scope of the present disclosure are included in the protection scope of the present disclosure.

Industrial Applicability

The embodiment of the present disclosure discloses an abnormality detection method, device, electronic equipment, computer-readable storage medium, computer program and computer program product. The method includes: using the first sample set to train the initial detection network to obtain the detection network; the first sample set is a positive sample set; during the process of abnormal detection of multiple images to be detected by the detection network, Based on the detected normal image set and abnormal image set, a second sample set is obtained, and the second sample set is an incremental sample set including a positive sample and a negative sample; wherein, the negative sample is an abnormality in the image An image: using the second sample set to update and train the detection network to obtain an updated detection network. Through the present disclosure, the accuracy and flexibility of anomaly detection can be improved.

Claims

An anomaly detection method comprising:

Using the first sample set to train the initial detection network to obtain the detection network; the first sample set is a positive sample set;

In the process of the detection network performing anomaly detection on multiple images to be detected, a second sample set is obtained based on the detected normal image set and abnormal image set, and the second sample set is an augmented sample set containing positive samples and negative samples. Quantitative sample set; Wherein, described negative sample is the abnormal image that there is abnormality in the image;

Using the second sample set, update and train the detection network to obtain an updated detection network.
The method according to claim 1, wherein said using the first positive sample set to train the initial detection network to obtain the detection network comprises:

Using the initial detection network to detect positive samples in the first sample set to obtain a first feature difference sequence corresponding to the positive samples;

determining a training loss corresponding to the positive sample according to the first feature difference sequence and a normal loss function;

Based on the training loss, the initial detection network is trained and adjusted until the obtained final loss is less than a preset loss threshold, and the detection network is obtained;

Wherein, the normal loss function characterizes that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample.
The method according to claim 1 or 2, wherein said obtaining a second sample set based on the detected normal image set and abnormal image set includes:

Determining an image to be verified from the detected normal image set and the abnormal image set;

Verifying the image to be verified to obtain the second sample set.
The method according to claim 3, wherein the image to be verified includes: at least one abnormal image and at least one normal image; the second sample set is obtained by verifying the image to be verified, include:

Verifying the at least one abnormal image and the at least one normal image respectively to obtain respective verification results;

taking the abnormal image whose verification result is correctly represented as the negative sample, and using the normal image whose verification result is correctly represented as the positive sample;

Correctly annotating the abnormal image with wrong representation of the verification result and the normal image with wrong representation of the verification result to obtain the positive sample and the negative sample;

A set of the negative samples and the positive samples is determined as the second sample set.
The method according to any one of claims 1-4, wherein the first sample set is used to train the initial detection network, and after obtaining the detection network, the method further includes:

Using the detection network to perform anomaly detection on each image to be detected, to obtain an abnormality score at each pixel position of each image to be detected;

Drawing an abnormality detection image corresponding to each image to be detected based on the abnormality score;

According to the abnormality detection image, the normal image set and the abnormal image set among the plurality of images to be detected are obtained.
The method according to claim 5, wherein said obtaining said normal image set and said abnormal image set in a plurality of images to be detected according to said abnormality detection image comprises:

The abnormality detection image corresponding to each of the images to be detected is represented, and in the case that each of the images to be detected is an abnormal image, each of the images to be detected is regarded as an abnormal image, and after traversing the plurality of images to be detected After that, the abnormal image set containing at least one abnormal image is obtained;

In the case that the abnormality detection image corresponding to each of the images to be detected is represented, and each of the images to be detected is a normal image, each of the images to be detected is regarded as a normal image, and after traversing the plurality of images to be detected After that, the normal image set including at least one normal image is obtained.
The method according to claim 3, wherein, determining the image to be verified from the detected normal image set and the abnormal image set includes:

From each of the abnormal images in the set of abnormal images, determining a first maximum abnormal score among the abnormal scores at each pixel position;

From each normal image in said set of normal images, determining a second maximum anomaly score among the anomaly scores at each pixel location;

When the first maximum abnormality score belongs to a first preset value range of a preset abnormality threshold, it is determined that the abnormal image corresponding to the first maximum abnormality score belongs to the image to be verified;

If the second maximum abnormality score belongs to the second preset value range of the preset abnormality threshold, it is determined that the normal image corresponding to the second maximum abnormality score belongs to the image to be verified.
The method according to claim 3, wherein said determining the image to be verified from the detected normal image set and the abnormal image set comprises:

All images in the normal image set and the abnormal image set are determined as the images to be verified.
The method according to any one of claims 1-8, wherein, using the second sample set, performing update training on the detection network to obtain an updated detection network, comprising:

In the case of using the positive samples in the second sample set to train the detection network, using the detection network to obtain the first feature difference sequence corresponding to the positive sample, and according to the first feature difference sequence and a normal loss function to determine the first training loss corresponding to the positive sample;

In the case of using the negative samples in the second sample set to train the detection network, using the detection network to obtain the second feature difference sequence corresponding to the negative sample, and according to the second feature difference sequence , the actual value of the preset negative sample and the abnormal loss function, and determine the second training loss corresponding to the negative sample;

Based on the first training loss and the second training loss, respectively train and adjust the detection network until the obtained final loss is less than a preset loss threshold, and obtain the updated detection network;

Wherein, the normal loss function represents that the reconstructed feature sequence corresponding to the positive sample is pulled closer to the positive sample; the abnormal loss function represents that the reconstructed feature sequence corresponding to the abnormal part in the negative sample is pushed away from the abnormal part, and the reconstructed feature sequence corresponding to the normal part in the negative sample pulls the normal part closer.
The method according to any one of claims 1-9, wherein the updated detection network comprises: an updated feature extraction network and an updated reconstruction network;

The said second sample set is used to update and train the detection network, and after obtaining the updated detection network, the method further includes:

Using the updated feature extraction network to perform feature processing of different scales on the target image to obtain a multi-scale feature sequence;

Using the updated reconstruction network to reconstruct the multi-scale feature sequence and the preset query word sequence to obtain a reconstructed feature sequence;

determining the feature difference at each pixel position of the target image according to the reconstructed feature sequence and the multi-scale feature sequence;

An abnormality score at each pixel position is determined based on the feature difference, and an abnormality detection image corresponding to the target image is drawn based on the abnormality score.
An anomaly detection device, comprising:

The first training part is configured to use the first sample set to train the initial detection network to obtain the detection network; the first sample set is a positive sample set;

The acquisition part is configured to obtain a second sample set based on the detected normal image set and abnormal image set during the process of the detection network performing anomaly detection on a plurality of images to be detected, and the second sample set contains positive An incremental sample set of samples and negative samples; wherein, the negative samples are abnormal images with abnormalities in the image;

The second training part is configured to use the second sample set to perform update training on the detection network to obtain an updated detection network.
The device according to claim 11, wherein the first training part is further configured to use the initial detection network to detect the positive samples in the first sample set, and obtain the first positive sample corresponding to the positive sample. A feature difference sequence; according to the first feature difference sequence and a normal loss function, determine the training loss corresponding to the positive sample; based on the training loss, train and adjust the initial detection network until the final loss is obtained When it is less than the preset loss threshold, the detection network is obtained; wherein, the normal loss function indicates that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample.
The device according to claim 11 or 12, wherein the acquisition part is further configured to determine an image to be verified from the detected normal image set and the abnormal image set; verify, and obtain the second sample set.
The device according to claim 13, wherein the image to be verified includes: at least one abnormal image and at least one normal image; The at least one normal image is verified separately to obtain respective verification results; the abnormal image with the correct verification result as the negative sample, and the normal image with the correct verification result as the positive sample; An abnormal image whose verification result represents an error, and a normal image whose verification result represents an error are correctly marked to obtain the positive sample and the negative sample; the set of the negative sample and the positive sample is determined as The second sample set.
The device according to any one of claims 11-14, wherein the abnormality detection device further comprises: a detection part; the detection part is configured to use the first sample set to train the initial detection network to obtain After the detection network, use the detection network to perform anomaly detection on each image to be detected, and obtain an abnormality score on each pixel position of each image to be detected; based on the abnormality score, draw each image to be detected An abnormality detection image corresponding to the image; the acquiring part is further configured to obtain the normal image set and the abnormal image set among the plurality of images to be detected according to the abnormality detection image.
The device according to claim 15, wherein the acquisition part is further configured to, when the abnormality detection image representation corresponding to each image to be detected is that each image to be detected is an abnormal image, the Each of the images to be detected is used as an abnormal image, and after traversing the plurality of images to be detected, the abnormal image set containing at least one abnormal image is obtained; the abnormality detection image corresponding to each image to be detected is represented , when each of the images to be detected is a normal image, each of the images to be detected is regarded as a normal image, and after traversing the multiple images to be detected, the normal image containing at least one normal image is obtained set.
The apparatus according to claim 13, wherein the acquiring part is further configured to determine, from each abnormal image in the abnormal image set, the first maximum abnormal score that is the largest among the abnormal scores at each pixel position ; from each normal image in the set of normal images, determine the second maximum anomaly score that is the largest among the anomaly scores at each pixel location; where the first maximum anomaly score belongs to a first preset of a preset anomaly threshold In the case of a numerical range, it is determined that the abnormal image corresponding to the first maximum abnormal score belongs to the image to be verified; in the case of the second maximum abnormal score belonging to the second preset numerical range of the preset abnormal threshold , determining that the normal image corresponding to the second maximum abnormality score belongs to the image to be verified.
The apparatus according to claim 13, wherein the acquisition part is further configured to determine all images in the normal image set and the abnormal image set as the images to be verified.
The device according to any one of claims 11-18, wherein the second training part is further configured to, when using the positive samples in the second sample set to train the detection network, Obtaining the first feature difference sequence corresponding to the positive sample by using the detection network, and determining the first training loss corresponding to the positive sample according to the first feature difference sequence and a normal loss function; For the negative samples in the two-sample set, when the detection network is trained, the detection network is used to obtain the second feature difference sequence corresponding to the negative sample, and according to the second feature difference sequence, the preset negative The actual value of the sample and the abnormal loss function determine the second training loss corresponding to the negative sample; based on the first training loss and the second training loss, the detection network is trained and adjusted until the obtained When the final loss is less than the preset loss threshold, the updated detection network is obtained; wherein, the normal loss function represents that the reconstructed feature sequence corresponding to the positive sample is closer to the positive sample; the abnormal loss function represents the The reconstructed feature sequence corresponding to the abnormal part in the negative sample is pushed away from the abnormal part, and the reconstructed feature sequence corresponding to the normal part in the negative sample is pulled closer to the normal part.
The device according to any one of claims 11-19, wherein the updated detection network comprises: an updated feature extraction network and an updated reconstruction network; the detection part is further configured to use In the second sample set, the detection network is updated and trained, and after the updated detection network is obtained, the updated feature extraction network is used to perform feature processing of different scales on the target image to obtain a multi-scale feature sequence; Using the updated reconstruction network to reconstruct the multi-scale feature sequence and the preset query word sequence to obtain a reconstructed feature sequence; according to the reconstructed feature sequence and the multi-scale feature sequence, determining a feature difference at each pixel position of the target image; determining an abnormality score at each pixel position based on the feature difference, and drawing an abnormality detection image corresponding to the target image based on the abnormality score .
An electronic device comprising:

memory configured to store executable instructions;

A processor configured to implement the method according to any one of claims 1 to 10 when executing the executable instructions stored in the memory.
A computer-readable storage medium storing a computer program configured to implement the method according to any one of claims 1 to 10 when executed by a processor.
A computer program, comprising computer-readable codes, when the computer-readable codes run in an electronic device, a processor in the computer device executes the program to implement any one of claims 1 to 10 The steps of the anomaly detection method.
A computer program product, comprising computer program instructions, the computer program instructions cause a computer to execute the steps of the anomaly detection method according to any one of claims 1 to 10.