WO2023087636A1

WO2023087636A1 - Anomaly detection method and apparatus, and electronic device, storage medium and computer program product

Info

Publication number: WO2023087636A1
Application number: PCT/CN2022/092205
Authority: WO
Inventors: 余家伟; 汪翔; 郑烨; 李韡; 吴立威; 赵瑞
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-11-16
Filing date: 2022-05-11
Publication date: 2023-05-25
Also published as: CN114049332A

Abstract

The embodiments of the present disclosure provide an anomaly detection method and apparatus, and an electronic device, a storage medium and a computer program product. The method comprises: performing feature extraction on an original image to obtain a first feature map; mapping the first feature map into a second feature map; obtaining an anomaly heat map of the original image according to the second feature map; and obtaining an anomaly detection result of the original image according to the anomaly heat map.

Description

Abnormality detection method and device, electronic device, storage medium and computer program product

Cross References to Related Applications

This patent application claims the priority of the Chinese patent application number 202111359576.3 submitted on November 16, 2021, the applicant is Shanghai Shangtang Intelligent Technology Co., Ltd., and the application name is "abnormal detection method and device, electronic equipment and storage medium". The entirety of the application is incorporated by reference into this disclosure.

technical field

The present disclosure relates to but not limited to the technical field of computer vision, and in particular relates to an anomaly detection method and device, electronic equipment, storage media and computer program products.

Background technique

The purpose of anomaly detection and localization in the field of computer vision is to identify and locate abnormal areas in images, and is widely used in industrial defect detection, medical image detection, and security detection. The current mainstream anomaly detection method is based on deep learning for abnormal location. However, due to the lack of abnormal samples, it is very difficult to collect and label a large amount of abnormal data in practice, which is not conducive to training a model with high detection accuracy. In order to solve this problem, the industry has proposed unsupervised anomaly detection and positioning technology. The unsupervised anomaly detection and positioning technology detects and locates abnormal areas in the image by modeling normal samples without using any labeling information.

Contents of the invention

Embodiments of the present disclosure provide an anomaly detection method and device, electronic equipment, a storage medium, and a computer program product, which are beneficial to improving the accuracy of image anomaly detection.

On the one hand, an embodiment of the present disclosure provides an anomaly detection method, including: performing feature extraction on an original image to obtain a first feature map; mapping the first feature map to a second feature map; and obtaining the original image according to the second feature map The anomaly heat map; according to the anomaly heat map, the abnormal detection result of the original image is obtained.

On the other hand, an embodiment of the present disclosure provides an abnormality detection device, including a feature extraction part and a processing part, wherein the feature extraction part is configured to perform feature extraction on an original image to obtain a first feature map, and is also configured to extract The first feature map is mapped to a second feature map; the processing part is configured to obtain an abnormal heat map of the original image according to the second feature map, and is also configured to obtain an abnormal detection result of the original image according to the abnormal heat map.

In yet another aspect, an embodiment of the present disclosure provides an electronic device, including an input device and an output device, and also includes a processor and a computer storage medium; the processor is adapted to implement one or more instructions; and the computer storage medium stores One or more instructions adapted to be loaded by the processor and execute the method as described above.

In yet another aspect, an embodiment of the present disclosure provides a computer storage medium, where one or more instructions are stored in the computer storage medium, and the one or more instructions are suitable for being loaded by a processor and executing the method as described above.

In yet another aspect, an embodiment of the present disclosure provides a computer program product. The computer program product includes a non-transitory computer storage medium storing a computer program. The computer program is read by a computer and executes the method as described above.

Implementing the embodiments of the present disclosure has the following beneficial effects:

It can be seen that in the embodiment of the present disclosure, the first feature map is obtained by extracting the features of the original image; the first feature map is mapped to the second feature map; according to the second feature map, the abnormal heat map of the original image is obtained ; According to the anomaly heat map, the anomaly detection result of the original image is obtained. Since the first feature map is beneficial to better capture the relationship between the local and the global of the original image, and at the same time, the second feature map obtained by mapping the first feature map retains the feature space information of the original image, so based on the original image local and global relationship, feature space information, and post-processing the second feature map, the abnormal heat map obtained can more accurately represent the abnormal distribution (or abnormal region) in the original image, which is conducive to improving the accuracy of image anomaly detection.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present disclosure. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

FIG. 1 is an architecture diagram of an application environment provided by an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a deep learning architecture provided by an embodiment of the present disclosure;

FIG. 3 is a schematic flowchart of an abnormality detection method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure;

FIG. 6 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 7 is another schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure;

FIG. 8 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 9 is another schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure;

FIG. 10 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 11 is another schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure;

FIG. 12 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of a reversible conversion provided by an embodiment of the present disclosure;

FIG. 14 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 15 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 16 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 17 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 18 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 19 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

FIG. 20 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure;

Fig. 21 is a diagram of defect prediction results of detecting defects in an image through the anomaly detection method provided by the implementation of the present disclosure provided by the embodiment of the present disclosure;

FIG. 22 is a schematic structural diagram of an abnormality detection device provided by an embodiment of the present disclosure;

FIG. 23 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present disclosure with reference to the accompanying drawings in the embodiments of the present disclosure. Apparently, the described embodiments are part of the embodiments of the present disclosure, not all of them. Based on the embodiments in the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.

The terms "first", "second", "third" and "fourth" in the description and claims of the present disclosure and the drawings are used to distinguish different objects, not to describe a specific order . Furthermore, the terms "include" and "have", as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or parts is not limited to the steps or parts listed, but may also include steps or parts that are not listed, or may also include for these processes, other steps or developments inherent in the method, product, or apparatus.

Reference herein to an "embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present disclosure. The occurrences of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.

The unsupervised anomaly detection and localization algorithm aims to directly model normal samples, that is, it can detect anomalies and locate abnormal areas. The training data does not contain any label information except for the normal samples themselves. The current technical solutions in this field mainly include image reconstruction, feature distance measurement and feature probability density estimation.

Image reconstruction algorithms can achieve image pixel-level segmentation, but the modeling accuracy of high-frequency information is limited. The feature distance measurement depends on maintaining the normal sample feature dictionary, and the nearest neighbor search is required during the test, which is not high in time and space efficiency. The feature probability density estimation is mainly based on the standardized flow model for the maximum likelihood estimation of normal sample features. During the test, the long-tail area of the probability distribution model is judged as abnormal, which has a complete theoretical basis and universal applicability.

However, at present, the above-mentioned technical solutions in related algorithms are all fully connected networks, so they only support probability density estimation of one-dimensional feature vectors. For example, related algorithms pool the global average of feature maps to obtain a feature vector, and related algorithms Feature maps are modeled block-by-block and differentiated using positional encoding. The former cannot be accurately segmented due to the loss of feature space information, while the latter needs to traverse all small blocks to complete the anomaly detection and location of the whole image, and because it is a local rather than global modeling, the anomaly detection accuracy of this method is limited.

The embodiment of the present disclosure provides an abnormality detection method, which can be implemented based on the application environment shown in FIG. 1 . As shown in FIG. Among them, the image acquisition device 101, the user equipment 102 and the electronic device 103 are connected through a wired or wireless network. The image acquisition device 101 can be an image acquisition device in the fields of industry and medical treatment, such as auxiliary medical equipment, depth camera, high-speed rail line patrol, etc. The camera under inspection, etc., the image acquisition device 101 can be configured to perform image acquisition in a specific scene, and send the collected original image to the electronic device 103, and the electronic device 103 performs anomaly detection on the original image, such as extracting The multi-level features of the original image, the multi-level features are reversibly transformed to map the features into probability density estimates, the probability density estimates are post-processed, and the abnormal regions in the original image are located based on the post-processed probability density estimates.

In some embodiments, a neural network model is deployed in the electronic device 103, and some or all of the steps in the anomaly detection method can be executed by the neural network model, wherein the user device 102 can be configured to provide the electronic device 103 with a positive sample image set , the electronic device 103 can train the neural network through the positive sample image set, and deploy the trained neural network model locally or on other devices. It should be noted that the training device and the invoking device of the neural network model can be the same device or different devices. For example, the electronic device 103 can be a device in a server cluster. After a device has trained the neural network model, it can Deployed on other devices in the cluster, the other devices invoke the neural network model to detect anomalies in images. Since the electronic device 103 better captures the relationship between local information and global information of the original image and retains feature space information when performing anomaly detection, it is beneficial to improve the accuracy of anomaly detection.

Exemplarily, the electronic device 103 can be an independent physical server, an embedded device or an artificial intelligence device, or a server cluster or a distributed system, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage Cloud servers for basic cloud computing services such as network services, cloud communications, middleware services, domain name services, security services, or big data and artificial intelligence platforms are not limited in this disclosure.

The deep learning architecture provided by the embodiments of the present disclosure will be briefly described below in conjunction with related drawings.

Please refer to FIG. 2 . FIG. 2 is a schematic diagram of a deep learning architecture provided by an embodiment of the present disclosure. As shown in FIG. 2 , the deep learning architecture may at least include a feature extractor 202 and a neural network. ,

In some embodiments, the neural network can be a two-dimensional normalized flow (FastFlow) 204, the two-dimensional normalized flow 204 can also be called a two-dimensional convolution flow, including at least two reversible transformation blocks f, and the corresponding The convolution kernel sizes in adjacent reversible transformation modules are different. Among them, the feature extractor 202 is configured to perform feature extraction on the input image 201 to obtain a feature map 203 with a width of W, a height of H, and a number of channels of C; the two-dimensional normalization flow 204 is configured to use At least two reversible transformation blocks f process the feature map 203 output by the feature extractor 202 to obtain a probability density estimate 205 of the features of the input image 201 .

Exemplarily, the at least two reversible transformation blocks f may adopt a method of alternating 3*3 convolution and 1*1 convolution (referred to as the method of alternating large and small convolutions), so as to preserve the feature space information of the original image. Among them, an affine coupling layer is used in each reversible transformation block f to map the input features. Based on the probability density estimate 205 output by the two-dimensional normalized flow 204, the probability density estimate 205 can be post-processed to obtain the abnormal heat map 206, and the abnormal score in the abnormal heat map 206 is compared with the preset value, and the image The region where the abnormality score is infinitely close to the preset value is determined as a normal region, and the region where the abnormality score is far from the preset value is determined as an abnormal region.

It can be seen that in the deep learning framework provided by the disclosed embodiment, the feature extractor 202 can better capture the relationship between the local information and the global information of the input image 201, and the two-dimensional normalization flow 204 adopts alternating large and small convolutions The processing method is conducive to retaining the feature space information of the input image 201, so that a more accurate probability density estimation 205 can be obtained, which in turn is conducive to improving the accuracy of image anomaly detection.

The anomaly detection method provided by the embodiments of the present disclosure will be described in detail below in conjunction with related drawings.

Please refer to FIG. 3. FIG. 3 is a schematic flowchart of an anomaly detection method provided by an embodiment of the present disclosure. The method can be implemented based on the application environment shown in FIG. 1 and applied to electronic equipment, as shown in FIG. 3, including S301 to S304:

S301. Perform feature extraction on the original image to obtain a first feature map.

In some embodiments, the original image refers to an image collected in actual scenes such as medical care and industry. It should be understood that, in terms of feature extraction for anomaly detection, the sliding window method is usually used in the related art to model the feature map block by block (Patch) And use position coding to distinguish, block-by-block modeling requires a large number of image blocks for model training and testing, the computational complexity is relatively high, and because it is modeled for the local area, it is easy to lose the global information of the image, in the subsequent reasoning It is difficult to make full use of the correlation between global information and local information.

In the embodiment of the present disclosure, a residual network (Residual Network, ResNet) or a visual converter (Vision Transformers, ViT) with a distillation token (Token) is used as a feature extractor. For example, the residual network can be a resnet18 network, wide_resnet50_2 network, etc., the visual converter can be a data-efficient image converter (Data-efficient image Transformers, DeiT), a deeper image converter (Going deeper with Image Transformers, CaiT), etc. The feature extractor can also use other pre-trained classification networks, target detection networks, segmentation networks, reconstruction networks, image repair networks and super-resolution networks and other task networks, which are highly scalable. A feature extractor can also be a combination of multiple vision transformers.

It should be noted that the feature extractor can be pre-trained, and the datasets it uses are not limited to common vision task datasets and public datasets, and can also be some private datasets.

Exemplarily, since the visual converter has a global receptive field, it has a stronger ability to capture the relationship between the local and the global. In the case of using the visual converter as a feature extractor, the first feature map can be a visual converter The feature map output by a hidden layer in , that is, the feature map of a single level.

Exemplarily, the ability of the residual network to capture the local and global relationship is relatively lower than that of the visual converter. In the case of using the visual converter as a feature extractor, in order to better capture the local and global relationship of the original image, The first feature map can be the feature map output by multiple residual blocks in the residual network, that is, a multi-level feature map. For example, you can select the feature map output by the first residual block of the residual network, the second residual The feature map output by the block and the feature map output by the third residual block are used as the input of the subsequent neural network model, that is, the input of the two-dimensional normalized flow model.

S302. Map the first feature map to a second feature map.

In some embodiments, mapping the first feature map to the second feature map is actually performing a reversible conversion process on the first feature map to obtain the probability density estimation of the features of the original image, that is, the feature representation in the second feature map The probability density estimate of the feature at the corresponding location in the original image. For example, the reversible transformation process can be completed through a two-dimensional normalized flow model. The features in the first feature map undergo at least two reversible transformation processes to obtain the corresponding hidden variable, which can be used as the probability density estimation of the corresponding feature .

Please refer to FIG. 4 . FIG. 4 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. Based on FIG. 3 , S302 can be implemented through S401 , which will be described in conjunction with the steps shown in FIG. 4 .

S401. Map the first feature map to the second feature map through at least two reversible transformation processes.

In the embodiment of the present disclosure, please continue to refer to FIG. 2 , the first feature map is input into the first reversible conversion block f1 of the two-dimensional normalized flow model 204 to perform the first reversible conversion process, and the output is the first reversible conversion process The obtained probability density estimate, the probability density estimate obtained by the first reversible transformation process is input into the second reversible transformation block f2 of the two-dimensional normalized flow model 204 for the second reversible transformation process, and the output is the second reversible transformation The probability density estimate obtained by processing, ..., the probability density estimate obtained by the r-th reversible transformation process is input into the (r+1)th reversible transformation block fr+1 of the two-dimensional normalized flow model 204 for the (r+ 1) The reversible transformation process, the output is the probability density estimate obtained by the (r+1)th reversible transformation process, after at least two reversible transformation processes, the second feature map is obtained. For example, the feature map output by the second reversible transformation process or the feature map output by any reversible transformation process after the second reversible transformation process can be used as the second feature map, specifically see the two-dimensional normalized flow model 204 Depends on the number of reversible transformation blocks f. It should be understood that normalizing flows (Normalizing Flows, NF) are mainly used to learn the transformation between data distributions, and their special property is that their transformation process is bidirectional, and the flow model can be used in both directions. Based on this, the mapping of the first feature map in the embodiment of the present disclosure should also be bidirectionally reversible. Assuming that the first feature map is X1 and the probability density estimate is Z, then the mapping of X1 to Z should satisfy formula (1) and formula ( 2):

and

where H denotes the reversibly transformed feature map and K denotes the number of reversibly transformed blocks f.

Exemplarily, the size of the two-dimensional convolution kernel used in the first invertible transformation process in at least two invertible transformation processes is different from the size of the two-dimensional convolution kernel used in the second invertible transformation process, and the second The first reversible conversion process is the previous reversible conversion process of the second reversible conversion process. As shown in FIG. 2 , the size of the two-dimensional convolution kernels used by at least two reversible transformation blocks f may be in a manner of alternating 3*3 convolutions and 1*1 convolutions. It should be noted that the alternating appearance is intended to emphasize the alternating appearance of large and small convolution kernels, and is not limited to the alternating appearance of 3*3 convolution and 1*1 convolution.

In this embodiment, for the first feature map extracted by the feature extractor, it is mapped through the two-dimensional normalized flow model, and the method of alternating large and small convolution kernels is used in the model, which is conducive to retaining the feature space of the original image information, which is conducive to the subsequent obtaining of the first probability density estimation with higher accuracy.

Exemplarily, when the first feature map is a single-level feature map, the second feature map is a feature map obtained by mapping the single-level feature map. For example, please refer to Figure 5. Figure 5 is a schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure. As shown in Figure 5, for an original image of 3*256*256, it is assumed that 256*64 The first feature map of *64, the first feature map is mapped by at least two reversible transformation blocks f in the two-dimensional normalized flow model to obtain the second feature map of 256*64*64. In this embodiment, when the first feature map is a single-level feature map, since the visual converter itself has excellent global capture capabilities, its output is used as the input of the two-dimensional normalized flow model, which can fully The local-global relationship between the features of the original image is used, which is beneficial to improving the accuracy of the first probability density estimation.

Please refer to FIG. 6. FIG. 6 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. Implementation will be described in conjunction with the steps shown in FIG. 6 .

S601. For a multi-level feature map, through at least two reversible conversion processes, multiple second feature maps are obtained. The multiple second feature maps correspond to the multi-level feature maps one-to-one. The features in the multiple second feature maps Represents the probability density estimate of the feature at the corresponding location in the original image.

For example, please refer to FIG. 7. FIG. 7 is another schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure. As shown in FIG. 7, for an original image of 3*256*256, it is assumed that three The first feature map of the level, the scales are 256*64*64, 512*32*32 and 1024*16*16 respectively, the first feature maps of the three levels are respectively passed through at least two reversible two-dimensional normalized flow models The mapping of the transformation block f results in a plurality of second feature maps corresponding to the scales. In this embodiment, when the first feature map is a multi-level feature map, multiple two-dimensional normalized flow models perform reversible conversion processing on the feature maps of each level in parallel, which is beneficial to improve the inference speed.

Please refer to FIG. 8. FIG. 8 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. The implementation to S803 will be described in conjunction with the steps shown in FIG. 8 .

S801. Perform scale normalization on the multi-level feature maps to obtain a plurality of first feature maps to be stitched, and the plurality of first feature maps to be stitched are in one-to-one correspondence with the multi-level feature maps;

S802. Stitch a plurality of first feature maps to be spliced into a third feature map;

S803. For the third feature map, obtain a second feature map through at least two reversible transformation processes.

For example, please refer to FIG. 9. FIG. 9 is another schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure. As shown in FIG. 9, for an original image of 3*256*256, four residual blocks are processed , to generate the first feature map of three levels, the scales are 512*32*32, 1024*16*16 and 2048*8*8 respectively, the reference scale can be determined from the scales of the three first feature maps, for example, 16*16 as the reference scale, that is, the first feature map of 1024*16*16 can be kept unchanged, and the first feature map of 512*32*32 is down-sampled to obtain the first feature to be stitched of 2048*16*16 In the figure, the first feature map of 2048*8*8 is up-sampled to obtain the first feature map to be stitched of 512*16*16, and the target second feature map of 1024*16*16 is also used as the first feature map to be stitched The feature map is spliced, and then the three first feature maps to be spliced are spliced into a third feature map of (2048+1024+512)*16*16. The third feature map is mapped by at least two reversible transformation blocks f in the two-dimensional normalized flow model to obtain a second feature map corresponding to the scale. In this embodiment, before the two-dimensional normalized flow model is processed, the multi-level feature maps are normalized and concatenated, which is beneficial to reduce the number of two-dimensional normalized flow models and reduce the need to deploy multiple two-dimensional normalized flow models. The overhead brought by the inference of the streaming model.

Please refer to FIG. 10. FIG. 10 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. 3, when the first feature map includes multi-level feature maps, S302 can also pass The implementation of S1001 to S1004 will be described in conjunction with the steps shown in FIG. 10 .

S1001. For multi-level feature maps, through at least two reversible conversion processes, multiple fourth feature maps are obtained, and the multiple fourth feature maps are in one-to-one correspondence with the multi-level feature maps;

S1002. Perform scale normalization on multiple fourth feature maps to obtain multiple second feature maps to be stitched, and the multiple second feature maps to be stitched are in one-to-one correspondence with the multiple fourth feature maps;

S1003. Stitch a plurality of second feature maps to be spliced into a fifth feature map;

S1004. For the fifth feature map, obtain a second feature map through at least two reversible transformation processes.

For example, please refer to FIG. 11. FIG. 11 is another schematic diagram of obtaining an abnormal heat map provided by an embodiment of the present disclosure. As shown in FIG. 11, for an original image of 3*256*256, four residual blocks are processed , to generate the first feature maps of three levels, the scales are 512*32*32, 1024*16*16 and 2048*8*8 respectively, the first feature maps of the three levels are respectively passed through the two-dimensional normalized flow model The mapping of at least two reversible transformation blocks f results in three fourth feature maps corresponding to the scales. Determine the reference scale from the scales of the three fourth feature maps. For example, 16*16 can be used as the reference scale, that is, the fourth feature map of 1024*16*16 can remain unchanged, and the fourth feature map of 512*32*32 The feature map is down-sampled to obtain the second feature map to be stitched of 2048*16*16, and the fourth feature map of 2048*8*8 is up-sampled to obtain the second feature map to be stitched of 512*16*16, and , the fourth feature map of 1024*16*16 is also used as the second feature map to be stitched, and then the three second feature maps to be stitched are stitched into a fifth feature map of (2048+1024+512)*16*16. The fifth feature map is mapped by at least two reversible transformation blocks f in the two-dimensional normalized flow model to obtain the second feature map corresponding to the scale.

In this embodiment, when the first feature map is a multi-level feature map, multiple two-dimensional normalized flow models perform reversible conversion processing on the feature maps of each level in parallel, which is beneficial to improve the inference speed. In addition, normalizing and splicing multi-scale feature maps in the two-dimensional normalized flow model is conducive to making full use of multi-level features for probability density estimation. After normalizing and splicing multi-scale feature maps , the two-dimensional normalized flow model can also be used to process the feature map. The flexibility of the model setting is high, not only for multiple inputs and multiple outputs, but also for multiple inputs and single outputs.

It should be noted that since the "flow" of the two-dimensional normalized flow model from input to output is reversible, the downsampling and upsampling of the target second feature map should also be reversible.

Please refer to FIG. 12. FIG. 12 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. As shown in FIG. 12, any one of the at least two reversible conversion processes in S401 can pass through The implementation of S1203 will be described in conjunction with the steps shown in FIG. 12 .

S1201. For the target feature map to be subjected to reversible conversion processing, split the target feature map into a first sub-target feature map and a second sub-target feature map; the number of channels of the first sub-target feature map and the second sub-target feature map are equal ;

S1202. For the first sub-object feature map and the second sub-object feature map, obtain a first to-be-connected feature map through at least one affine coupling operation;

S1203. Connect the first feature map to be connected with the second feature map to be connected to obtain a reversibly transformed feature map;

Wherein, the second to-be-connected feature map of the first affine coupling operation in the at least one affine coupling operation is the first sub-target feature map or the second sub-target feature map; the at least one affine coupling The second to-be-connected feature map of the non-first affine coupling operation in operation is the feature map obtained by the last affine coupling operation of the non-first affine coupling operation.

For example, please refer to FIG. 13. FIG. 13 is a schematic diagram of a reversible transformation provided by an embodiment of the present disclosure. FIG. 13 shows the processing flow in each reversible transformation block f, and the input target feature map in the affine coupling layer y1301, for example: the target feature map y1301 can be the first feature map, the third feature map or the fifth feature map, first perform zero initialization through the scaling and bias layer (Actnorm) with data-dependent initialization, and then perform channel transformation (Channel Permute), and then split it into the first sub-target feature map y _a 1302 and the second sub-target feature map y _b 1303 with the number of channels C/2, and its formula is expressed as formula (3):

y _a , y _b =Split(y) formula (3);

For the input y _a and y _b , it is processed by at least one affine coupling operation, and the output of the last affine coupling operation is used as the first feature map to be connected. When only one affine coupling operation is used, the output of the first Once the feature map to be connected is connected (Concat) with y _a or y _b to obtain the feature map 1304 after reversible transformation, that is, a reversible transformation process is completed. For example: when y _a is subjected to two-dimensional convolution processing in the affine coupling operation, the first feature map to be connected is connected to y _a , and the second feature map to be connected is y _a at this time; when the affine coupling operation is to perform two-dimensional convolution processing on y _b , then connect the first feature map to be connected with y _b , and at this time the second feature map to be connected is y _b . In the case of using at least one affine coupling operation, the first feature map to be connected output by one of the affine coupling operations is connected with the feature map output by the previous affine coupling operation of this affine coupling operation to obtain a reversible transformation The final feature map 1304 is to complete a reversible transformation process. At this time, the second feature map to be connected is the feature map obtained by one of the affine coupling operations.

Exemplarily, connecting the first feature map to be connected with the second feature map to be connected may be splicing feature maps based on channels.

Please refer to FIG. 14. FIG. 14 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. As shown in FIG. 14, any one of the at least one affine coupling operation in S1202 can pass S1401 The implementation to S1404 will be described in conjunction with the steps shown in FIG. 14 .

S1401. When any affine coupling operation is the first affine coupling operation, perform two-dimensional convolution processing on the first sub-object feature map or the second sub-object feature map to obtain the first scaling coefficient and the first translation coefficient ;

S1402. Using the first scaling coefficient and the first translation coefficient, combine the second sub-object feature map or the first sub-object feature map that has not been subjected to two-dimensional convolution processing with the two-dimensional convolution processing in the first affine coupling operation The input is linearly combined to obtain the output of the first affine coupling operation;

S1403. In the case that any affine coupling operation is a non-first affine coupling operation, perform two-dimensional convolution processing on the output of the last affine coupling operation that is not the first affine coupling operation to obtain the second scaling factor and the second scaling factor. Two translation coefficients;

S1404. Using the second scaling coefficient and the second translation coefficient, linearly combine the output of the last affine coupling operation that is not the first affine coupling operation with the input of the two-dimensional convolution processing in the last affine coupling operation to obtain a non-first affine coupling operation. The output of the first affine coupling operation.

For example, suppose two-dimensional convolution of y _a by two subnetworks s(·) and b(·) results in the first scaling coefficient s1, and the first translation coefficient b(y _a ), that is, b1 in the figure, for s1 Perform the exponential operation exp with the natural logarithm base e as the base to get s(y _a ), y _b is the feature map that has not been processed by two-dimensional convolution, use s(y _a ) and b(y _a ) to find y _a and y _b linear combination, its formula is expressed as formula (4):

y′ _b ＝s(y _a )⊙y _b +b(y _a ), formula (4);

Among them, y′ _b represents the output of the first affine coupling operation. In the case of only one affine coupling operation, this y′ _b can be used as the first feature map to be connected, and it can be connected with y _a , The feature map after reversible transformation can be obtained, and its formula is expressed as formula (5):

y'=concat(y' _b ,y' _a ), formula (5);

where y′ represents the reversibly transformed feature map, y′ _a represents the identity map of y _a , and concat( ) represents the concatenation.

Exemplarily, when any affine coupling operation is the mth affine coupling operation, such as the second affine coupling operation, the first affine coupling operation is performed through two subnetworks s(·) and b(·) The output y′ _b of the operation is subjected to two-dimensional convolution to obtain the second scaling coefficient s2 and the second translation coefficient b2, and the exponent operation exp with the natural logarithm base e as the base number is performed on s2 to obtain s(y′ _b ), using s( y′ _b ) and b2 calculate the linear combination of y′ _b and the input y _a of the two-dimensional convolution processing in the first affine coupling operation, and obtain the output of the second affine coupling operation. Similarly, if there are more affine coupling operations, refer to the description in the second affine coupling operation.

In this embodiment, since the existing normalized flow usually uses a fully connected network in the reversible transformation process, the features need to be compressed from two dimensions to one dimension, which destroys the spatial position relationship of the input feature map to a certain extent. In the embodiment of the present disclosure, the two-dimensional convolution layer is used in the two subnets s(·) and b(·) to perform two-dimensional convolution, which is beneficial to retain the spatial position information of the features of the original image. In addition, the embodiment of the present disclosure adopts a two-dimensional convolutional layer in the two-dimensional normalized flow model, so that the model supports tensor as input, and can realize the estimation of the tensor probability density end-to-end.

S303. Obtain an abnormal heat map of the original image according to the second feature map.

Exemplarily, in the case where the first feature map is a single-level feature map, corresponding to the embodiment shown in FIG. 5 , the probability density estimates corresponding to positions in all 64*64 second feature maps are summed by squares , calculate the mean value of the sum of squares at each position to obtain a 1*64*64 abnormal heat map, and scale the abnormal heat map to obtain an abnormal heat map of the target scale, such as a 1*256*256 abnormal heat map, where, Scaling can be upsampling, eg linear interpolation. Among them, the features in the abnormal heat map of 1*256*256 are used to represent the abnormal score of the corresponding position in the original image.

Please continue to refer to FIG. 6. FIG. 6 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. 3, when the first feature map includes multi-level feature maps, corresponding to In the embodiment shown in FIG. 7 , S303 can be implemented through S602 to S603 , which will continue to be described in conjunction with the steps shown in FIG. 6 .

S602. For each second feature map in the plurality of second feature maps, in the channel dimension, calculate the sum of the squares of the probability density estimates corresponding to the positions in each second feature map;

S603. Obtain an abnormal heat map according to the sum of squares.

Please refer to FIG. 15 , which is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. 6 , S603 can be implemented through S1501 to S1502 , which will be described in conjunction with the steps shown in FIG. 15 .

S1501. Obtain the feature map to be normalized corresponding to each second feature map based on the mean value of the sum of squares;

S1502. Perform scale normalization on the feature map to be normalized, and fuse the scale-normalized feature maps to obtain an abnormal heat map.

For example, calculate the sum of the squares of the probability density estimates corresponding to the positions in the second feature map of all 64*64, and calculate the mean value of the sum of squares at each position to obtain an abnormal heat map of 1*64*64; for all 32*32 Calculate the sum of the squares of the probability density estimates corresponding to the positions in the second feature map, and calculate the mean value of the sum of squares at each position to obtain a 1*32*32 abnormal heat map; for all 16*16 positions corresponding to the second feature map The probability density is estimated to calculate the sum of squares, and the mean value of the sum of squares at each position is calculated to obtain a 1*16*16 abnormal heat map.

Among them, the abnormal heat map of 1*64*64, the abnormal heat map of 1*32*32 and the abnormal heat map of 1*16*16 are the feature maps to be normalized, and the feature maps to be normalized are scaled to the target scale , get the feature map after scale normalization, and fuse it to get the final anomaly heat map. The scaling can be upsampling, specifically linear interpolation, where the target scale can be the scale of the original image (256*256), or the scale of the feature map to be normalized, which is not limited here.

In this embodiment, the multi-scale second feature map output by each two-dimensional normalized flow model is scaled and fused, which is conducive to fully utilizing multi-level features to estimate the probability density.

Please refer to FIG. 16. FIG. 16 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. Based on FIG. 8, in the case where the first feature map includes multi-level feature maps, it corresponds to FIG. 9 or FIG. 11 In the illustrated embodiment, S303 may be implemented through S1601 to S1603, which will be described in conjunction with the steps shown in FIG. 16 .

S1601. In the channel dimension, calculate the sum of the squares of the probability density estimates corresponding to the positions in the second feature map;

S1602. Obtain the feature map to be scaled based on the mean value of the sum of squares;

S1603. Scale the to-be-scaled feature map to obtain the abnormal heat map.

As shown in Figure 9 or Figure 11, after at least two reversible conversion processes of the two-dimensional normalized flow model, the second feature map of (2048+1024+512)*16*16 is obtained, and for all 16*16 Calculate the sum of the squares of the probability density corresponding to the position in the two feature maps, calculate the mean value of the sum of squares at each position to obtain a 1*16*16 abnormal heat map (that is, the feature map to be scaled), and scale the feature map to be scaled Obtain an abnormal score map of the target scale, such as a 1*256*256 abnormal heat map, and the features in the 1*256*256 abnormal heat map are used to represent the abnormal score of the corresponding position in the original image. Wherein, scaling may be upsampling, for example, may be linear interpolation.

It should be understood that the operation of calculating the mean of the sum of squares in the embodiment of the present disclosure is beneficial to eliminate the influence of the number of channels on the calculation of the first probability density estimate.

S304. Obtain an abnormality detection result of the original image according to the abnormality heat map.

Please refer to FIG. 17 . FIG. 17 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. 3 , S304 can be implemented through S1701 to S1702 , which will be described in conjunction with the steps shown in FIG. 17 .

S1701. Obtain the difference between the abnormality score and the preset value, the preset value is determined based on the distribution parameters of the normal image, that is, compare the abnormality score in the abnormality heat map with the preset value to obtain the difference between the two;

S1702. Determine the abnormal region in the original image according to the difference between the abnormality score and the preset value, and obtain an abnormality detection result.

Among them, the preset value is determined based on the distribution parameters of the normal image. For example, the preset value can be the center 0 of the normal distribution of the normal image. By calculating the difference between the abnormal score and 0, the abnormal score in the original image can be calculated The area close to 0 is determined as a normal area, and the area with an abnormal score much greater than 0 or far less than 0 is determined as an abnormal area.

It can be seen that the embodiment of the present disclosure obtains the first feature map by performing feature extraction on the original image; maps the first feature map to the second feature map; obtains the abnormal heat map of the original image according to the second feature map; Heat map to get the anomaly detection results of the original image. Since the first feature map is beneficial to better capture the relationship between the local and the global image of the original image, at the same time, the second feature map obtained by mapping the first feature map retains the feature space information of the original image, so based on the local and global features of the original image relationship, feature space information, and post-processing the second feature map, the obtained anomaly heat map can more accurately represent the abnormal area in the original image, which is conducive to improving the accuracy of image anomaly detection.

Please refer to FIG. 18. FIG. 18 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure, which can also be implemented based on the application environment shown in FIG. 1, as shown in FIG. 18, including S1801 to S1804:

S1801. Perform feature extraction on the original image to obtain a first feature map;

S1802. Map the first feature map to the second feature map through at least two reversible transformation processes;

Among them, the size of the two-dimensional convolution kernel used in the first reversible transformation process in the at least two reversible transformation processes is different from the size of the two-dimensional convolution kernel used in the second reversible transformation process, and the first reversible transformation process The last reversible conversion process that is processed as the second reversible conversion process.

SS1803. According to the second feature map, an abnormal heat map of the original image is obtained;

SS1804. Obtain the abnormality detection result of the original image according to the abnormality heat map.

Wherein, specific implementation manners of S1801 to S1804 have been described in the embodiment shown in FIG. 3 .

Please refer to FIG. 19. FIG. 19 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. As shown in FIG. 19, before performing feature extraction on the original image to obtain the first feature map, the method also includes S1901 Steps to S1904 will be described in conjunction with the steps shown in FIG. 19 .

S1901. Perform feature extraction on the positive sample image to obtain a sixth feature map;

S1902. Map the sixth feature map to a seventh feature map through a neural network, and the features in the seventh feature map represent probability density estimates of features at corresponding positions in the positive sample image;

S1903. Determine the target loss according to the probability density estimation in the seventh feature map;

S1904. Adjust the parameters of the neural network based on the target loss to obtain a neural network model.

Among them, the positive sample image can be an image in the anomaly detection or positioning data set MVTec AD, BTAD BeanTech or CIFAR-10, and the feature extractor shown in Figure 2 can be used to perform feature extraction on the positive sample image. It should be understood that the feature extractor It can be a residual network or a visual converter, and the sixth feature map can be a single-level feature map or a multi-level feature map.

Wherein, the neural network refers to a two-dimensional normalized flow including at least two reversible transformation blocks f, and the sixth feature map is input into the two-dimensional normalized flow for at least two reversible transformation processes to obtain the seventh feature map, that is, by At least two reversible conversion blocks f map the feature x in the sixth feature map X2 to a normal distribution Q with a mean of 0 and a variance of 1 to obtain a probability density estimate q of the features of the positive sample image. The purpose is to make the model Learn the distribution of positive sample images (i.e. normal images).

Please refer to FIG. 20. FIG. 20 is a schematic flowchart of another anomaly detection method provided by an embodiment of the present disclosure. On the basis of FIG. 19, as shown in FIG. Outlined steps are described.

S2001. Determine the log likelihood estimation of the features in the sixth feature map according to the probability density estimation in the seventh feature map;

S2002. Taking the negative log likelihood estimation of the features in the sixth feature map as the target loss.

It should be understood that if the feature x in the sixth feature map X2 satisfies the distribution p _x , the probability density estimate q in the seventh feature map satisfies the known distribution p _z , and mapping x to q is expressed as formula (6):

f(x)=q, formula (6);

According to the variable substitution rule, there is formula (7):

Taking log on both sides at the same time has the formula (8):

where x∈p _X2 (x), q∈p _Q (q),

Represents the Jacobian determinant of the two-dimensional normalized flow when x is mapped to q, log p _X2 (x) represents the log likelihood estimate of x, and log p _Q (q) represents the log likelihood of q estimate. Maximize the log-likelihood of x, that is, minimize its negative number, and use the negative log-likelihood of x as the target loss L(x), whose formula is expressed as Equation)9:

in,

Represents the square of the L2 norm of q, adjusts the parameter θ of the two-dimensional normalized flow according to L(x), and iterates the positive sample image continuously to minimize the negative log likelihood estimate of q, and obtains the trained neural network model , that is, the two-dimensional normalized flow model.

Among them, the mapping of the two-dimensional normalized flow to the sixth feature map X2, that is, the process of performing reversible conversion processing on the sixth feature map X2, can refer to the reversible conversion processing of the first feature map in the embodiment shown in Figure 3 description of.

Referring to FIG. 21 , FIG. 21 is a diagram of defect prediction results of defects in an image detected by the anomaly detection method provided by the implementation of the present disclosure provided by the embodiment of the present disclosure. As shown in FIG. 21 , FIG. 21 shows a diagram of defect prediction results for images of four items, where each item includes three original images. Using the anomaly detection method provided by the embodiment of the present disclosure, the original image 2101 is first input, the first feature map is obtained through feature extraction, and then the first feature map is mapped to the second feature map, and the original image 2101 is obtained according to the second feature map Finally, according to the abnormal heat map, the defect prediction result can be obtained, that is, the defect prediction result schematic diagram 2102 corresponding to the original image 2101 is obtained. From the defect prediction result diagram 2102 , it can be seen that the abnormal region location result corresponding to the defect part in the original image 2101 is obtained.

It can be seen that the embodiment of the present disclosure obtains the first feature map by performing feature extraction on the original image; maps the first feature map to the second feature map; obtains the abnormal heat map of the original image according to the second feature map; Heat map to get the anomaly detection results of the original image. Since the first feature map is beneficial to better capture the relationship between the local and the global of the original image, and at the same time, the second feature map obtained by mapping the first feature map retains the feature space information of the original image, so based on the original image local and global relationship, feature space information, and post-processing the second feature map, the obtained anomaly heat map can more accurately represent the abnormal area in the original image, which is conducive to improving the accuracy of image anomaly detection.

Based on the description of the above-mentioned method embodiment, the embodiment of the present disclosure also provides an anomaly detection device 2200, please refer to FIG. 22. FIG. 22 is a schematic structural diagram of an anomaly detection device 2200 provided by the embodiment of the present disclosure, as shown in FIG. 22 , the device includes a feature extraction part 2201 and a processing part 2202; wherein, the feature extraction part 2201 is configured to perform feature extraction on the original image to obtain a first feature map; the feature extraction part 2201 is also configured to convert the first feature map The mapping is the second feature map; the processing part 2202 is configured to obtain the abnormal heat map of the original image according to the second feature map; the processing part 2202 is also configured to obtain the abnormal detection result of the original image according to the abnormal heat map.

It can be seen that in the device shown in Figure 22, the first feature map is obtained by performing feature extraction on the original image; the first feature map is mapped to the second feature map; according to the second feature map, the abnormality of the original image is obtained Heat map: According to the abnormal heat map, the abnormal detection results of the original image are obtained. Since the first feature map is beneficial to better capture the relationship between the local and the global of the original image, and at the same time, the second feature map obtained by mapping the first feature map retains the feature space information of the original image, so based on the original image local and global relationship, feature space information, and post-processing the second feature map, the obtained anomaly heat map can more accurately represent the abnormal area in the original image, which is conducive to improving the accuracy of image anomaly detection.

In some implementations, in terms of mapping the first feature map to the second feature map, the feature extraction part 2201 is further configured to: map the first feature map to the second feature map through at least two reversible transformation processes; wherein , the size of the two-dimensional convolution kernel used in the first reversible transformation process in at least two reversible transformation processes is different from the size of the two-dimensional convolution kernel used in the second reversible transformation process, the first reversible transformation process The last reversible transformation processed for the second reversible transformation.

In some implementations, when the first feature map includes multi-level feature maps, in terms of mapping the first feature map to the second feature map, the feature extraction part 2201 is further configured to: for multi-level feature maps Scale normalization is performed to obtain a plurality of first feature maps to be stitched, and the plurality of first feature maps to be stitched are in one-to-one correspondence with multi-level feature maps; the plurality of first feature maps to be stitched is stitched into a third feature map; For the third feature map, the second feature map is obtained through at least two reversible transformation processes.

In some implementations, when the first feature map includes multi-level feature maps, in terms of mapping the first feature map to the second feature map, the feature extraction part 2201 is further configured to: for the multi-level feature map , through at least two reversible transformation processes, multiple fourth feature maps are obtained, and multiple fourth feature maps are in one-to-one correspondence with multi-level feature maps; scale normalization is performed on multiple fourth feature maps to obtain multiple fourth feature maps Two feature maps to be spliced, a plurality of second feature maps to be spliced are in one-to-one correspondence with a plurality of fourth feature maps; multiple second feature maps to be spliced are spliced into a fifth feature map; for the fifth feature map, through at least two The second reversible transformation process is performed to obtain the second feature map.

In some implementations, the features in the second feature map represent the probability density estimates of the features at corresponding positions in the original image; in terms of obtaining the abnormal heat map of the original image according to the second feature map, the processing part 2202 is further configured to: In the channel dimension, the sum of the squares of the probability density estimates corresponding to the positions in the second feature map is calculated; based on the mean value of the sum of squares, the feature map to be scaled is obtained; the feature map to be scaled is scaled to obtain an abnormal heat map.

In some implementations, when the first feature map includes multi-level feature maps, in terms of mapping the first feature map to the second feature map, the feature extraction part 2201 is further configured to: for the multi-level feature map , through at least two reversible transformation processes, a plurality of second feature maps are obtained, and the plurality of second feature maps correspond to the multi-level feature maps one by one, and the features in the plurality of second feature maps represent the features of the corresponding positions in the original image Probability density estimation of ; in terms of obtaining the abnormal heat map of the original image according to the second feature map, the processing part 2202 is also configured to: for each second feature map in a plurality of second feature maps, in the channel dimension, for Calculate the sum of squares of the probability density estimates corresponding to the positions in each second feature map; and obtain an abnormal heat map according to the sum of squares.

In some implementations, in terms of obtaining the abnormal heat map according to the sum of squares, the processing part 2202 is further configured to: calculate the mean value of the sum of squares to obtain the feature map to be normalized corresponding to each second feature map; Scale normalization is performed on the scale-normalized feature maps, and the scale-normalized feature maps are fused to obtain an abnormal heat map.

In some implementations, in performing any one of the at least two reversible transformation processes, the feature extraction part 2101 is further configured to: for the target feature map to be subjected to the reversible transformation process, split the target feature map into the first A sub-target feature map and a second sub-target feature map; the number of channels of the first sub-target feature map and the second sub-target feature map are equal; for the first sub-target feature map and the second sub-target feature map, Through at least one affine coupling operation, the first feature map to be connected is obtained; the first feature map to be connected is connected to the second feature map to be connected to obtain a reversibly transformed feature map; wherein, the at least one affine The second feature map to be connected in the first affine coupling operation in the coupling operation is the first sub-target feature map or the second sub-target feature map; the non-first affine coupling in the at least one affine coupling operation The second to-be-connected feature map of the operation is the feature map obtained by the last affine coupling operation that is not the first affine coupling operation.

In some implementations, in terms of performing any one of the at least one affine coupling operation, the feature extraction part 2201 is further configured to: when any one of the affine coupling operations is the first affine coupling operation, Perform two-dimensional convolution processing on the first sub-target feature map or the second sub-target feature map to obtain the first scaling coefficient and the first translation coefficient; using the first scaling coefficient and the first translation coefficient, the two-dimensional convolution will not be performed The processed second sub-target feature map or the first sub-target feature map is linearly combined with the input of the two-dimensional convolution processing in the first affine coupling operation to obtain the output of the first affine coupling operation; in any affine coupling When the operation is a non-first affine coupling operation, two-dimensional convolution processing is performed on the output of the last affine coupling operation of the non-first affine coupling operation to obtain a second scaling coefficient and a second translation coefficient; using the first The second scaling coefficient and the second translation coefficient linearly combine the output of the last affine coupling operation of the non-first affine coupling operation with the input of the two-dimensional convolution processing in the last affine coupling operation to obtain the non-first affine The output of the coupling operation.

In some implementations, when the first feature map is a single-level feature map, the second feature map is a feature map obtained by mapping a single-level feature map, and the features in the second feature map represent the original Probability density estimation of features corresponding to positions in the image; in terms of obtaining the abnormal heat map of the original image according to the second feature map, the processing part 2202 is also configured to: in the channel dimension, the probability density corresponding to the position in the second feature map Estimate the sum of squares; based on the mean of the sum of squares, obtain the feature map to be scaled; scale the feature map to be scaled to obtain an abnormal heat map.

In some implementations, the features in the abnormal heat map are used to represent the abnormal score of the corresponding position in the original image. In terms of obtaining the abnormal detection result of the original image according to the abnormal heat map, the processing part 2202 is further configured to: obtain the abnormal score The difference with the preset value, the preset value is determined based on the distribution parameters of the normal image; the abnormal region in the original image is determined according to the difference between the abnormality score and the preset value, and the abnormality detection result is obtained.

In some embodiments, performing feature extraction on the first feature map to obtain the second feature map is performed through a neural network model, and the processing part 2202 is further configured to: perform feature extraction on the positive sample image to obtain the sixth feature map; Mapping the sixth feature map to a seventh feature map through a neural network, the features in the seventh feature map represent the probability density estimates of the features in the corresponding positions in the positive sample image; determine the target loss according to the probability density estimation in the seventh feature map; Adjust the parameters of the neural network based on the target loss to obtain the neural network model.

In some implementations, in determining the target loss from the probability density estimate in the seventh feature map, the processing part 2202 is further configured to: determine the pair of features in the sixth feature map from the probability density estimate in the seventh feature map Log-likelihood estimation; The negative log-likelihood estimation of the features in the sixth feature map is used as the target loss.

According to an embodiment of the present disclosure, various parts in the abnormality detection device 2200 shown in FIG. Divided into multiple functionally smaller parts, this can achieve the same operation without affecting the realization of the technical effects of the embodiments of the present disclosure. The above parts are divided based on logical functions. In practical applications, the functions of one part can also be realized by multiple parts, or the functions of multiple parts can be realized by one part. In other embodiments of the present disclosure, the anomaly detection device 2200 may also include other parts. In practical applications, these functions may also be implemented with the assistance of other parts, and may be implemented by cooperation of multiple parts.

According to another embodiment of the present disclosure, it may be implemented by including a central processing unit (or called CPU (Central Processing Unit, central processing unit)), a random access storage medium (RAM, Random Access Memory), a read-only storage medium (ROM) , Read-Only Memory) and other processing elements and storage elements such as computer general-purpose computing devices run computer programs (including program codes) that can execute the steps involved in the corresponding methods as shown in Figure 3 or Figure 18, to An anomaly detection device as shown in FIG. 22 is constructed to implement the anomaly detection method of the embodiment of the present disclosure. The computer program can be recorded in, for example, a computer-readable recording medium, loaded into the above-mentioned computing device through the computer-readable recording medium, and executed therein.

Based on the descriptions of the foregoing method embodiment and device embodiment, an embodiment of the present disclosure further provides an electronic device 2300 . Referring to FIG. 23 , the electronic device at least includes a processor 2310 , an input device 2320 , an output device 2330 and a computer storage medium 2340 . Wherein, the processor 2310, the input device 2320, the output device 2330 and the computer storage medium 2340 in the electronic device 2300 may be connected through a bus or other means.

The computer storage medium 2340 may be stored in the memory of the electronic device 2300, the computer storage medium 2340 is used to store a computer program, the computer program includes program instructions, and the processor 2310 is used to execute the program stored in the computer storage medium 2340. Program instructions. The processor 2310 is the computing core and control core of the electronic device, which is suitable for implementing one or more instructions, and is specifically suitable for loading and executing one or more instructions to realize corresponding method procedures or corresponding functions.

In some embodiments, the processor 2310 of the electronic device 2300 provided by the embodiments of the present disclosure may be configured to perform a series of anomaly detection processes: perform feature extraction on the original image to obtain a first feature map; map the first feature map to The second feature map; according to the second feature map, an abnormal heat map of the original image is obtained; according to the abnormal heat map, an abnormal detection result of the original image is obtained.

It can be seen that in the electronic device 2300 shown in FIG. 23, the first feature map is obtained by performing feature extraction on the original image; the first feature map is mapped to the second feature map; and the original image is obtained according to the second feature map The anomaly heat map; according to the anomaly heat map, the abnormal detection result of the original image is obtained. Since the first feature map is beneficial to better capture the relationship between the local and the global of the original image, and at the same time, the second feature map obtained by mapping the first feature map retains the feature space information of the original image, so based on the original image local and global relationship, feature space information, and post-processing the second feature map, the obtained anomaly heat map can more accurately represent the abnormal area in the original image, which is conducive to improving the accuracy of image anomaly detection.

In some embodiments, the processor 2310 performing the mapping of the first feature map to the second feature map includes: mapping the first feature map to the second feature map through at least two reversible conversion processes; wherein the at least two The size of the two-dimensional convolution kernel used in the first reversible conversion process in the second reversible conversion process is different from the size of the two-dimensional convolution kernel used in the second reversible conversion process, and the first reversible conversion process is The previous reversible conversion process of the second reversible conversion process.

In some embodiments, when the first feature map includes multi-level feature maps, the processor 2310 executes mapping the first feature map to the second feature map, including: performing scale normalization on the multi-level feature maps , to obtain a plurality of first feature maps to be spliced, and a plurality of first feature maps to be spliced are in one-to-one correspondence with multi-level feature maps; a plurality of first feature maps to be spliced are spliced into a third feature map; for the third feature map , through at least two reversible transformation processes to obtain the second feature map.

In some embodiments, when the first feature map includes a multi-level feature map, the processor 2310 executes mapping the first feature map to the second feature map, including: for a multi-level feature map, pass at least twice Reversible conversion processing to obtain multiple fourth feature maps, which correspond to multi-level feature maps one by one; scale normalization for multiple fourth feature maps to obtain multiple second feature maps to be stitched , multiple second feature maps to be spliced correspond to multiple fourth feature maps one-to-one; multiple second feature maps to be spliced are spliced into a fifth feature map; for the fifth feature map, through at least two reversible conversion processes, Get the second feature map.

In some embodiments, the features in the second feature map represent the probability density estimation of the features corresponding to the position in the original image; the processor 2310 executes to obtain the abnormal heat map of the original image according to the second feature map, including: in the channel dimension, Calculate the sum of the squares of the probability density estimates corresponding to the positions in the second feature map; obtain the feature map to be scaled based on the mean of the sum of squares; and scale the feature map to be scaled to obtain an abnormal heat map.

In some embodiments, when the first feature map includes a multi-level feature map, the processor 2301 executes mapping the first feature map to the second feature map, including: for a multi-level feature map, pass at least twice Reversible conversion processing to obtain a plurality of second feature maps, the plurality of second feature maps correspond to the multi-level feature maps one by one, and the features in the plurality of second feature maps represent probability density estimates of features at corresponding positions in the original image; The processor 2310 executes to obtain the abnormal heat map of the original image according to the second feature map, including: for each second feature map in multiple second feature maps, in the channel dimension, corresponding to the position in each second feature map Calculate the sum of the squares of the probability density estimate; according to the sum of squares, an abnormal heat map is obtained.

In some embodiments, the processor 2310 executes obtaining the abnormal heat map according to the sum of squares, including: obtaining the feature map to be normalized corresponding to each second feature map based on the mean value of the sum of squares; Scale normalization and fusion of the scale-normalized feature maps to obtain an abnormal heat map.

In some embodiments, the processor 2201 performs any one of the at least two reversible transformation processes, including: for the target feature map to be subjected to the reversible transformation process, splitting the target feature map into a first sub-target feature map and The second sub-target feature map; the number of channels of the first sub-target feature map and the second sub-target feature map are equal; for the first sub-target feature map and the second sub-target feature map, through at least one affine coupling operation to obtain the first feature map to be connected; connect the first feature map to be connected with the second feature map to be connected to obtain a reversibly transformed feature map; wherein, the first affine in the at least one affine coupling operation The second to-be-connected feature map of the affine coupling operation is the first sub-target feature map or the second sub-target feature map; the second to-be-connected feature map of the non-first affine coupling operation in the at least one affine coupling operation The feature map is the feature map obtained by the last affine coupling operation that is not the first affine coupling operation.

In some embodiments, the processor 2310 executes any one of the at least one affine coupling operation, including: when any one of the affine coupling operations is the first affine coupling operation, performing the first sub-target feature Two-dimensional convolution processing is performed on the map or the second sub-target feature map to obtain the first scaling coefficient and the first translation coefficient; the second sub-target that has not been subjected to two-dimensional convolution processing is obtained by using the first scaling coefficient and the first translation coefficient The feature map or the first sub-target feature map is linearly combined with the input of the two-dimensional convolution processing in the first affine coupling operation to obtain the output of the first affine coupling operation; any affine coupling operation is a non-first affine In the case of a coupling operation, two-dimensional convolution processing is performed on the output of the last affine coupling operation of the non-first affine coupling operation to obtain a second scaling coefficient and a second translation coefficient; the second scaling coefficient and the second The translation coefficient, the output of the last affine coupling operation of the non-first affine coupling operation is linearly combined with the input of the two-dimensional convolution processing in the last affine coupling operation to obtain the output of the non-first affine coupling operation.

In some embodiments, when the first feature map is a single-level feature map, the second feature map is a feature map obtained by mapping a single-level feature map, and the features in the second feature map represent the original The probability density estimation of the features corresponding to the position in the image; the processor 2310 executes according to the second feature map to obtain the abnormal heat map of the original image, including: in the channel dimension, calculate the sum of the squares of the probability density estimates corresponding to the positions in the second feature map ; Based on the mean value of the sum of squares, the feature map to be scaled is obtained; the feature map to be scaled is scaled to obtain an abnormal heat map.

In some embodiments, the features in the anomaly heat map are used to represent the anomaly score of the corresponding position in the original image, and the processor 2310 executes to obtain the abnormality detection result of the original image according to the anomaly heat map, including: obtaining the anomaly score and the preset value The default value is determined based on the distribution parameters of the normal image; the abnormal area in the original image is determined according to the difference between the abnormal score and the preset value, and the abnormal detection result is obtained.

In some embodiments, performing feature extraction on the first feature map to obtain the second feature map is performed through a neural network model, and the processor 2310 executes the training of the neural network model, including: performing feature extraction on the positive sample image to obtain the second feature map Six feature maps; the sixth feature map is mapped to the seventh feature map through the neural network, and the features in the seventh feature map represent the probability density estimation of the features corresponding to the position in the positive sample image; according to the probability density estimation in the seventh feature map Determine the target loss; adjust the parameters of the neural network based on the target loss to obtain a neural network model.

In some embodiments, the processor 2310 performing determining the target loss according to the probability density estimation in the seventh feature map includes: determining the log likelihood estimation of the features in the sixth feature map according to the probability density estimation in the seventh feature map ; Estimate the negative log-likelihood of the features in the sixth feature map as the target loss.

Exemplary, the electronic device 2310 may include but not limited to a processor 2310, an input device 2320, an output device 2330, a computer storage medium 2340, a memory 2350, a power supply 2360 and an application client part 2370, and the input device 2320 may be a keyboard 2321, a touch screen 2322, a radio frequency receiver 2323, etc., and the output device 2330 may be a speaker 2331, a display 2332, a radio frequency transmitter 2333, etc. Those skilled in the art can understand that the schematic diagram is only an example of the electronic device 2300, and does not constitute a limitation to the electronic device 2300. part.

It should be noted that since the processor 2310 of the electronic device 2300 executes the computer program to implement the steps in the above-mentioned anomaly detection method, the embodiments of the above-mentioned anomaly detection method are all applicable to the electronic device 2300, and can achieve the same or similar beneficial effect.

An embodiment of the present disclosure also provides a computer storage medium (Memory). The computer storage medium may be a volatile storage medium or a non-volatile storage medium, and is a memory device in the electronic device 2200 for storing programs and data. It can be understood that the computer storage medium here may include a built-in storage medium in the terminal, and of course may also include an extended storage medium supported by the terminal. The computer storage medium provides storage space, and the storage space stores the operating system of the terminal. Moreover, one or more instructions suitable for being loaded and executed by the processor 2310 are also stored in the storage space, and these instructions may be one or more computer programs (including program codes). It should be noted that the computer storage medium here can be a high-speed random access memory (Random Access Memory, RAM) memory, or a non-volatile memory (Non-Volatile Memory, NVM), such as at least one disk memory; It may be at least one computer-readable storage medium located away from the aforementioned processor 2310. In one embodiment, the processor 2310 can load and execute one or more instructions stored in the computer storage medium, so as to realize the corresponding steps of the above-mentioned anomaly detection method.

Exemplarily, the computer program on the computer storage medium includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (Read-Only Memory, ROM) , random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium, etc.

It should be noted that, since the computer program of the computer storage medium is executed by the processor to implement the steps in the above-mentioned anomaly detection method, all embodiments of the above-mentioned anomaly detection method are applicable to the computer storage medium, and can achieve the same or similar beneficial effects.

An embodiment of the present disclosure further provides a computer program product, wherein the above computer program product includes a computer program, and the above computer program is operable to cause a computer to execute the steps in the above anomaly detection method. The computer program product may be a software installation package.

It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described action sequence. Because of this disclosure, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the actions and modules involved are not necessarily required by the present disclosure.

In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the parts is only a logical function division. In actual implementation, there may be other division methods. For example, multiple parts or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or parts may be in electrical or other forms.

The parts described as separate components may or may not be physically separated, and the components displayed as "parts" may or may not be physical parts, that is, they may be located in one place, or may be distributed to multiple network parts superior. Some or all of them can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional "part" in each embodiment of the present disclosure may be integrated into one processing part, each part may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented not only in the form of hardware, but also in the form of software program modules.

The integrated parts may be stored in a computer-readable memory if implemented in the form of software program modules and sold or used as independent products. Based on such an understanding, the essence of the technical solution disclosed in this disclosure or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory. Several instructions are included to make a computer device (which may be a personal computer, server or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.

The embodiments of the present disclosure have been introduced in detail above, and the principles and implementation methods of the present disclosure have been explained by using specific embodiments in this paper. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure; at the same time, for Those skilled in the art may have changes in specific implementation methods and application scopes based on the idea of the present disclosure. In summary, the contents of this specification should not be construed as limiting the present disclosure.

Industrial Applicability

In the embodiment of the present disclosure, the first feature map is obtained by extracting features from the original image; the first feature map is mapped to the second feature map; according to the second feature map, the abnormal heat map of the original image is obtained; according to the abnormal heat map, Get the anomaly detection results of the original image. Since the first feature map is beneficial to better capture the relationship between the local and the global of the original image, and at the same time, the second feature map obtained by mapping the first feature map retains the feature space information of the original image, so based on the original image local and global relationship, feature space information, and post-processing the second feature map, the abnormal heat map obtained can more accurately represent the abnormal distribution (or abnormal region) in the original image, which is conducive to improving the accuracy of image anomaly detection.

Claims

An anomaly detection method, the method is performed by an electronic device, and the method includes:

performing feature extraction on the original image to obtain a first feature map;

mapping the first feature map to a second feature map;

Obtaining an abnormal heat map of the original image according to the second feature map;

According to the abnormal heat map, an abnormal detection result of the original image is obtained.
The method according to claim 1, wherein said mapping the first feature map to a second feature map comprises:

mapping the first feature map to the second feature map through at least two reversible transformation processes;

Wherein, the size of the two-dimensional convolution kernel used in the first reversible conversion process in the at least two reversible conversion processes is different from the size of the two-dimensional convolution kernel used in the second reversible conversion process, and the second The first reversible conversion process is the previous reversible conversion process of the second reversible conversion process.
The method according to claim 1 or 2, wherein, in the case where the first feature map includes a multi-level feature map, mapping the first feature map to a second feature map comprises:

performing scale normalization on the multi-level feature maps to obtain a plurality of first feature maps to be stitched, wherein the plurality of first feature maps to be stitched are in one-to-one correspondence with the multi-level feature maps;

splicing the plurality of first feature maps to be spliced into a third feature map;

For the third feature map, the second feature map is obtained through at least two reversible transformation processes.
The method according to claim 1 or 2, wherein, in the case where the first feature map includes a multi-level feature map, mapping the first feature map to a second feature map comprises:

For the multi-level feature map, a plurality of fourth feature maps are obtained through at least two reversible conversion processes, and the plurality of fourth feature maps correspond to the multi-level feature map one-to-one;

performing scale normalization on the plurality of fourth feature maps to obtain a plurality of second feature maps to be stitched, wherein the plurality of second feature maps to be stitched correspond one-to-one to the plurality of fourth feature maps;

splicing the plurality of second feature maps to be spliced into a fifth feature map;

For the fifth feature map, the second feature map is obtained through at least two reversible transformation processes.
The method according to claim 3 or 4, wherein the features in the second feature map represent probability density estimates of features at corresponding positions in the original image;

According to the second feature map, the abnormal heat map of the original image is obtained, including:

In the channel dimension, summing the squares of the probability density estimates corresponding to the positions in the second feature map;

Obtain the feature map to be scaled based on the mean value of the sum of squares;

Scaling the feature map to be scaled to obtain the abnormal heat map.
The method according to claim 1 or 2, wherein, in the case where the first feature map includes a multi-level feature map, mapping the first feature map to a second feature map comprises:

For the multi-level feature map, through at least two reversible conversion processes, a plurality of second feature maps are obtained, the plurality of second feature maps correspond to the multi-level feature map one-to-one, and the plurality of first feature maps The features in the two feature maps represent the probability density estimates of the features in the corresponding positions in the original image;

According to the second feature map, the abnormal heat map of the original image is obtained, including:

For each second feature map in the plurality of second feature maps, in the channel dimension, summing the squares of probability density estimates corresponding to positions in each second feature map;

According to the sum of squares, the abnormal heat map is obtained.
The method according to claim 6, wherein said obtaining said abnormal heat map according to said sum of squares comprises:

Obtaining a feature map to be normalized corresponding to each second feature map based on the mean value of the sum of squares;

Scale normalization is performed on the feature map to be normalized, and the scale normalized feature map is fused to obtain the abnormal heat map.
The method according to any one of claims 2 to 4 or 6, wherein any one of the at least two reversible conversion processes includes:

For the target feature map to be reversibly converted, split the target feature map into a first sub-target feature map and a second sub-target feature map; the first sub-target feature map and the second sub-target feature map The number of channels is equal;

For the first sub-object feature map and the second sub-object feature map, obtain a first feature map to be connected through at least one affine coupling operation;

connecting the first feature map to be connected with the second feature map to be connected to obtain a reversibly transformed feature map;

Wherein, the second to-be-connected feature map of the first affine coupling operation in the at least one affine coupling operation is the first sub-target feature map or the second sub-target feature map; the at least one affine coupling The second to-be-connected feature map of the non-first affine coupling operation in operation is the feature map obtained by the last affine coupling operation of the non-first affine coupling operation.
The method according to claim 8, wherein any one of the at least one affine coupling operation comprises:

In the case that any one of the affine coupling operations is the first affine coupling operation, two-dimensional convolution processing is performed on the first sub-target feature map or the second sub-target feature map to obtain the first scaling factor and first translation factor;

Using the first scaling coefficient and the first translation coefficient, the second sub-object feature map or the first sub-object feature map that has not been subjected to two-dimensional convolution processing, and the first affine coupling operation The input of the two-dimensional convolution processing in is linearly combined to obtain the output of the first affine coupling operation;

In the case that any one of the affine coupling operations is a non-first affine coupling operation, two-dimensional convolution processing is performed on the output of the last affine coupling operation of the non-first affine coupling operation to obtain a second scaling factor and the second translation coefficient;

Using the second scaling coefficient and the second translation coefficient, linearly combine the output of the last affine coupling operation with the input of the two-dimensional convolution processing in the last affine coupling operation to obtain the The output of the non-first affine coupling operation.
The method according to claim 1 or 2, wherein, when the first feature map is a single-level feature map, the second feature map is obtained by mapping the single-level feature map The feature map of the second feature map represents the probability density estimation of the feature of the corresponding position in the original image;

According to the second feature map, the abnormal heat map of the original image is obtained, including:

In the channel dimension, summing the squares of the probability density estimates corresponding to the positions in the second feature map;

Obtain the feature map to be scaled based on the mean value of the sum of squares;

Scaling the feature map to be scaled to obtain the abnormal heat map.
The method according to any one of claims 1 to 10, wherein the features in the abnormal heat map are used to represent the abnormal score of the corresponding position in the original image, and according to the abnormal heat map, the Anomaly detection results for raw images, including:

Acquiring the difference between the abnormal score and a preset value, the preset value is determined based on the distribution parameters of the normal image;

The abnormal region in the original image is determined according to the difference between the abnormality score and the preset value, and the abnormality detection result is obtained.
The method according to any one of claims 1 to 11, wherein the mapping of the first feature map to the second feature map is performed through a neural network model, and the neural network model is obtained by training through the following steps:

Perform feature extraction on the positive sample image to obtain the sixth feature map;

Mapping the sixth feature map to a seventh feature map through a neural network, where the features in the seventh feature map represent probability density estimates of features at corresponding positions in the positive sample image;

determining a target loss based on a probability density estimate in the seventh feature map;

The parameters of the neural network are adjusted based on the target loss to obtain the neural network model.
The method according to claim 12, wherein said determining a target loss based on a probability density estimate in said seventh feature map comprises:

determining log-likelihood estimates for features in the sixth feature map based on probability density estimates in the seventh feature map;

The negative log likelihood estimation of the features in the sixth feature map is used as the target loss.
An anomaly detection device, the device includes a feature extraction unit and a processing unit, wherein,

The feature extraction part is configured to perform feature extraction on the original image to obtain a first feature map;

The feature extraction part is further configured to map the first feature map to a second feature map;

The processing part is configured to obtain an abnormal heat map of the original image according to the second feature map;

The processing part is further configured to obtain an abnormality detection result of the original image according to the abnormality heat map.
An electronic device, including input devices and output devices, including a processor and computer storage media;

the processor configured to implement one or more instructions; and,

The computer storage medium stores one or more instructions, and the one or more instructions are loaded by the processor to execute the method according to any one of claims 1 to 13.
A computer storage medium, the computer storage medium stores one or more instructions, and the one or more instructions are loaded by a processor to execute the method according to any one of claims 1 to 13.
A computer program product, the computer program product comprising a non-transitory computer storage medium storing a computer program, the computer program being read by a computer and executing the method according to any one of claims 1 to 13.