CN113205512B

CN113205512B - Image anomaly detection method, device, equipment and computer readable storage medium

Info

Publication number: CN113205512B
Application number: CN202110580349.7A
Authority: CN
Inventors: 王慎执; 沈宇军; 吴立威; 崔磊
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2023-10-24
Anticipated expiration: 2041-05-26
Also published as: CN113205512A

Abstract

Disclosed are an image anomaly detection method, apparatus, device, and computer-readable storage medium, the method comprising: extracting features of a first region in the image to be processed to obtain first features; extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed; performing anomaly detection on the first feature and the second feature to obtain an anomaly score of the first region; and obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area. Because the association between the local information and the global information of the image to be processed is considered, the abnormal condition existing in the image to be processed can be detected more comprehensively, and the abnormal detection capability and effect are improved; and determining the position of the abnormality in the image to be processed according to the abnormality score of the first area, thereby realizing abnormality positioning.

Description

Image anomaly detection method, device, equipment and computer readable storage medium

Technical Field

The present disclosure relates to computer vision technology, and in particular, to a method, apparatus, device, and computer readable storage medium for detecting image anomalies.

Background

Image anomaly detection and anomaly localization are of great significance in industrial production and medical detection. In the existing anomaly detection and positioning method, only the mining of global information and local information of an image is focused, and it is difficult to comprehensively detect anomalies such as dislocation, exchange, missing and the like existing in the image, and the problem of missed judgment is easily generated.

Disclosure of Invention

The embodiment of the disclosure provides an image anomaly detection scheme.

According to an aspect of the present disclosure, there is provided an image anomaly detection method including: extracting features of a first region in the image to be processed to obtain first features; extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed; performing anomaly detection on the first feature and the second feature to obtain an anomaly score of the first region; and obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area.

In combination with any one of the embodiments provided in the present disclosure, the performing anomaly detection on the first feature and the second feature to obtain the first region anomaly score includes: performing consistency detection and/or distortion detection on the first feature and the second feature; obtaining an abnormality score of the first region according to a consistency detection result and/or a distortion detection result, wherein the consistency detection result is obtained by carrying out consistency detection on the first feature and the second feature; the distortion detection result is obtained by performing distortion detection on the first feature and the second feature.

In combination with any one of the embodiments provided in the present disclosure, the detecting the consistency of the first feature and the second feature includes: and determining a consistency score of the first feature and the second feature according to the vector distance between the first feature and the second feature.

In combination with any one of the embodiments provided in the present disclosure, the performing distortion detection on the first feature and the second feature includes: obtaining an input feature according to a second feature and a random feature, wherein the random feature is obtained with equal probability according to the first area or a generation area, and the generation area is obtained by randomly adding disturbance pixels in the first area; determining a probability that the input feature includes the first feature; and obtaining a confidence score of distortion in the first region according to the probability.

In combination with any one of the embodiments provided in the present disclosure, the obtaining the anomaly score according to the consistency detection result and/or the distortion detection result includes: and carrying out weighted summation on the consistency scores of the first feature and the second feature and the confidence score with distortion in the first region to obtain an anomaly score of the first region.

In combination with any one of the embodiments provided in the present disclosure, the image to be processed includes at least two first areas obtained by block sampling; obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first region, wherein the abnormality detection result comprises the following steps: obtaining an abnormality score for each of the at least two first regions; and obtaining an abnormality degree thermodynamic diagram of each pixel in the image to be processed according to the abnormality scores of the at least two first areas.

In combination with any one of the embodiments provided in the present disclosure, the feature extraction of the first region in the image to be processed to obtain a first feature includes: extracting features of the image corresponding to the first region through a first feature extraction network to obtain the first features; the method further comprises the steps of: performing knowledge distillation on the trained deep neural network on the universal image set to obtain an intermediate feature extraction network; and carrying out knowledge distillation on the intermediate feature extraction network on the target category of the target data set corresponding to the target scene to obtain the first feature extraction network.

In combination with any one of the embodiments provided in the present disclosure, the feature extraction of the second region in the image to be processed to obtain a second feature includes: performing feature extraction on the image corresponding to the second region through a second feature extraction network to obtain the second feature; the method further comprises the steps of: co-training the second feature extraction network and the distortion detection network, the trained network loss comprising: a first penalty for indicating a difference between the first feature and the second feature; and a second loss for indicating a difference between a classification result output by the distortion detection network indicating a probability that an input feature belongs to the first region sample or a generated region sample, wherein the generated region sample is obtained by randomly adding disturbance pixels in the first region sample, the first feature of the first region sample or the first feature of the generated region sample is input to the distortion detection network with equal probability, and a true value indicating that the input feature belongs to the first region sample or the generated region sample.

In combination with any one of the embodiments provided in the present disclosure, the feature extraction of the second region in the image to be processed to obtain a second feature includes: performing convolution operation on the second region by using a plurality of convolution layers; wherein each of the plurality of convolutional layers corresponds to a mask; each convolution layer carries out convolution operation on an input mask operation result, the mask operation result is obtained by carrying out mask operation on the input features of the convolution layers by using masks corresponding to the convolution layers, and the masks corresponding to the convolution layers update the convolution results of the masks according to the convolution layers; the input feature of the first convolution in the plurality of convolution layers is an image corresponding to the second area, the image corresponding to the second area is obtained by performing mask operation on the image to be processed through an initial mask, a pixel value corresponding to the first area in the initial mask is a first pixel value, and a pixel value corresponding to the second area is a second pixel value.

According to an aspect of the present disclosure, there is provided an image anomaly detection apparatus including: the first acquisition unit is used for extracting features of a first region in the image to be processed to obtain first features; the second acquisition unit is used for extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed; the third acquisition unit is used for carrying out anomaly detection on the first feature and the second feature to obtain an anomaly score of the first region; and the abnormality detection unit is used for obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area.

In combination with any one of the embodiments provided in the present disclosure, the third obtaining unit is specifically configured to: performing consistency detection and/or distortion detection on the first feature and the second feature; obtaining an abnormality score of the first region according to a consistency detection result and/or a distortion detection result, wherein the consistency detection result is obtained by carrying out consistency detection on the first feature and the second feature; the distortion detection result is obtained by performing distortion detection on the first feature and the second feature.

In combination with any one of the embodiments provided in the present disclosure, the third obtaining unit is specifically configured to: and determining a consistency score of the first feature and the second feature according to the vector distance between the first feature and the second feature.

In combination with any one of the embodiments provided in the present disclosure, the third obtaining unit is specifically configured to: obtaining an input feature according to a second feature and a random feature, wherein the random feature is obtained with equal probability according to the first area or a generation area, and the generation area is obtained by randomly adding disturbance pixels in the first area; determining a probability that the input feature includes the first feature; and obtaining a confidence score of distortion in the first region according to the probability.

In combination with any one of the embodiments provided in the present disclosure, when the third obtaining unit is configured to obtain the abnormal score according to the consistency detection result and/or the distortion detection result, the third obtaining unit is specifically configured to: and carrying out weighted summation on the consistency scores of the first feature and the second feature and the confidence score with distortion in the first region to obtain an anomaly score of the first region.

In combination with any one of the embodiments provided in the present disclosure, the image to be processed includes at least two first areas obtained by block sampling; the abnormality detection unit is specifically configured to: obtaining an abnormality score for each of the at least two first regions; and obtaining an abnormality degree thermodynamic diagram of each pixel in the image to be processed according to the abnormality scores of the at least two first areas.

In combination with any one of the embodiments provided in the present disclosure, the first obtaining unit is specifically configured to: extracting features of the image corresponding to the first region through a first feature extraction network to obtain the first features; the apparatus further comprises a first training unit for: performing knowledge distillation on the trained deep neural network on the universal image set to obtain an intermediate feature extraction network; and carrying out knowledge distillation on the intermediate feature extraction network on the target category of the target data set corresponding to the target scene to obtain the first feature extraction network.

In combination with any one of the embodiments provided in the present disclosure, the second obtaining unit is specifically configured to: performing feature extraction on the image corresponding to the second region through a second feature extraction network to obtain the second feature; the method and apparatus further comprise a second training unit for: co-training the second feature extraction network and the distortion detection network, the trained network loss comprising: a first penalty for indicating a difference between the first feature and the second feature; and a second loss for indicating a difference between a classification result output by the distortion detection network indicating a probability that an input feature belongs to the first region sample or a generated region sample, wherein the generated region sample is obtained by randomly adding disturbance pixels in the first region sample, the first feature of the first region sample or the first feature of the generated region sample is input to the distortion detection network with equal probability, and a true value indicating that the input feature belongs to the first region sample or the generated region sample.

In combination with any one of the embodiments provided in the present disclosure, the second obtaining unit is specifically configured to: performing convolution operation on the second region by using a plurality of convolution layers; wherein each of the plurality of convolutional layers corresponds to a mask; each convolution layer carries out convolution operation on an input mask operation result, the mask operation result is obtained by carrying out mask operation on the input features of the convolution layers by using masks corresponding to the convolution layers, and the masks corresponding to the convolution layers update the convolution results of the masks according to the convolution layers; the input feature of the first convolution in the plurality of convolution layers is an image corresponding to the second area, the image corresponding to the second area is obtained by performing mask operation on the image to be processed through an initial mask, a pixel value corresponding to the first area in the initial mask is a first pixel value, and a pixel value corresponding to the second area is a second pixel value.

According to an aspect of the present disclosure, there is provided an electronic device, the device including a memory for storing computer instructions executable on the processor for implementing the image anomaly detection method according to any embodiment of the present disclosure when the computer instructions are executed.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image abnormality detection method according to any of the embodiments of the present disclosure.

According to an aspect of the present disclosure, there is provided a computer program product including a computer program which, when executed by a processor, implements the image anomaly detection method according to any one of the embodiments of the present disclosure.

In the embodiment of the disclosure, feature extraction is performed on a first region in an image to be processed to obtain a first feature; simultaneously extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed; then, carrying out anomaly detection on the first feature and the second feature to obtain an anomaly score of the first region; and obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area. Because the embodiment of the invention considers the association between the local information and the global information of the image to be processed, the abnormal condition existing in the image to be processed can be detected more comprehensively, and the abnormality detection capability and effect are improved; and determining the position of the abnormality in the image to be processed according to the abnormality score of the first area, thereby realizing abnormality positioning.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.

FIG. 1A is a flow chart of an image anomaly detection method shown in at least one embodiment of the present disclosure;

FIG. 1B is a schematic diagram of a first region and a second region in an image anomaly detection method according to at least one embodiment of the present disclosure;

FIG. 2A is a schematic diagram of an image anomaly detection method shown in at least one embodiment of the present disclosure;

FIG. 2B is a schematic diagram of another image anomaly detection method shown in at least one embodiment of the present disclosure;

FIG. 2C is a schematic diagram of yet another image anomaly detection method illustrated by at least one embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a training method for an image anomaly detection network in accordance with at least one embodiment of the present disclosure;

FIG. 4 is a schematic diagram of detection results of an image anomaly detection method shown in at least one embodiment of the present disclosure;

FIG. 5 is a comparison of detection effects of an image anomaly detection method and a related method according to at least one embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an image anomaly detection device shown in at least one embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device shown in at least one embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the above systems, and the like.

A computer system/server may be described in the general context of computer-system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

Fig. 1A is a flowchart of an image anomaly detection method illustrated in at least one embodiment of the present disclosure. As shown in fig. 1A, the method includes steps 101 to 104.

In step 101, feature extraction is performed on a first region in an image to be processed, so as to obtain a first feature.

Wherein the first region is a portion of the image to be processed. For example, the image to be processed may be sampled in a gridding manner, so as to obtain a series of image blocks, and each image block in the series of image blocks may be used as the first area in turn. Since the first feature is a feature of a partial image, the first feature may be referred to as a local feature.

In some implementations, a first region image (patch) corresponding to the first region may be cropped from the image to be processed; and extracting the characteristics of the first area image by using a characteristic extraction network to obtain the first characteristics. The feature extraction network herein may be referred to as a first feature extraction network, or a local feature extraction network, in order to distinguish from subsequently utilized feature extraction networks.

In step 102, feature extraction is performed on the second region in the image to be processed, so as to obtain a second feature.

Wherein the second region is disposed around the first region in the image to be processed. In the case where the first region is one of a plurality of image blocks, for example, image block No. 6 in fig. 1B, the second region disposed around the first region is shown as image blocks No. 1, 2, 3, 5, 7, 9, 10, 11 in the dashed line box in fig. 1B. It should be noted that when the first region is an image block located at the edge of the image to be processed, an image block having a pixel value of 0 may be complemented outside the edge of the image to be processed to obtain a second region disposed around the first region.

In some implementations, the second region is a region in the image to be processed that is outside the first region, i.e., the second region is a remainder of the image that corresponds to the outside of the first region. Taking the image block No. 6 in fig. 1B as the first area, the second area in this case is an area formed by all the image blocks except the image block No. 6 in the image.

Since the second region includes a larger range than the first region, the first feature of the first region may be referred to as a local feature, and the second feature of the second region may be referred to as a global feature.

In some implementations, after the first region is cropped from the image to be processed, a remaining portion of the image, that is, an image corresponding to the second region, may be obtained as the second region image; and extracting the characteristics of the second region image by using a characteristic extraction network to obtain the second characteristics. In order to distinguish from a feature extraction network that performs feature extraction on a first region image, the feature extraction network herein may be referred to as a second feature extraction network, or a global feature extraction network.

In step 103, abnormality detection is performed on the first feature and the second feature, so as to obtain an abnormality score of the first region.

In an embodiment of the disclosure, the first feature and the second feature may be input into a plurality of anomaly detection heads, and the plurality of anomaly detection heads may detect the first feature and the second feature from different angles, so as to obtain an anomaly score (score) indicating that the first region has an anomaly confidence.

In step 104, according to the anomaly score of the first region, an anomaly detection result of the image to be processed is obtained.

In the case that the first region abnormality score is higher than a set threshold, it may be determined that an abnormality exists in the first region, so that a location where an abnormality exists in the image to be processed may be determined.

In the embodiment of the disclosure, feature extraction is performed on a first region in an image to be processed to obtain a first feature; simultaneously extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed; then, carrying out anomaly detection on the first feature and the second feature to obtain an anomaly score of the first region; and obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area. Because the embodiment of the invention considers the association between the local information and the global information of the image to be processed, the abnormal condition existing in the image to be processed can be detected more comprehensively, and the abnormality detection capability and effect are improved; and the position of the abnormality in the image to be processed can be determined by obtaining the abnormality of the first area, so that the abnormality positioning is realized.

In some implementations, the first feature extraction network may be a lightweight network.

In some implementations, the first feature extraction network may be pre-trained using knowledge distillation.

First, knowledge distillation is performed on a trained deep neural network, such as RseNet-18, on a generic image set, such as ImageNet, resulting in an intermediate feature extraction network;

and then, carrying out knowledge distillation on the intermediate feature extraction network on the target category of the target data set corresponding to the target scene to obtain the first feature extraction network, so that the first feature extraction network can adapt to the target scene.

In some implementations, the second feature extraction network may be a deep neural network.

In order to prevent local features from interfering with global features, embodiments of the present disclosure propose a feature extraction of the second region image using a second feature extraction network. In the method, the second region is convolved with a plurality of convolution layers comprised by the second feature extraction network.

Wherein each of the plurality of convolutional layers corresponds to a mask; each convolution layer carries out convolution operation on an input mask operation result, the mask operation result is obtained by carrying out mask operation on the input features of the convolution layers by using masks corresponding to the convolution layers, and the masks corresponding to the convolution layers update the convolution results of the masks according to the convolution layers;

The input feature of the first convolution in the plurality of convolution layers is an image corresponding to the second area, the image corresponding to the second area is obtained by performing mask operation on the image to be processed through an initial mask, a pixel value corresponding to the first area in the initial mask is a first pixel value, and a pixel value corresponding to the second area is a second pixel value.

Specifically, the following operations are performed for any one of the convolution layers (i-th convolution layer, i is a positive integer, 1<i N, where N is the number of layers of the convolution layer): performing mask operation on the features input into the ith convolution layer by using the ith mask to obtain a mask result; performing convolution operation on the mask result by using the ith convolution layer, and performing activation operation on the convolution operation result, for example, performing activation operation by using an activation function ReLU or a Leaky ReLU (giving non-zero slope to all negative values) to obtain an output result of the ith convolution layer; performing convolution operation on the ith mask by using the ith convolution layer, performing binarization operation on a convolution operation result, and updating the (i+1) th mask by using the binarization operation result; wherein, the 1 st, i.e. the first region pixel value in the initial mask is the first pixel value, and the second region pixel value is the second pixel value. Typically, the first region pixel value is 0 and the second region pixel value is 1.

The convolution operation performed at each location can be expressed by equation (1):

where, as follows, the Hadamard product is indicated, X indicates the input feature, M indicates the binary mask of the current layer, and b indicates the bias. As can be seen from the formula, at each position, the masking operation is performed on the input feature X to obtain a feature corresponding to the position in the second region, the convolution operation is performed on the feature,represents a scale factor for adjusting the amount of change that is not masked by the mask M. After the mask M is subjected to the same convolution operation, the updated mask is obtained through binarization operation.

For each pooling layer, the feature map is updated with normal pooling, and the mask M is updated to be the result obtained by the same pooling operation and then binarization operation. By the method, the global features can be extracted through the second feature extraction network without being interfered by the local features.

In order to mine the relevance of the local feature and the global feature of the image to be detected, the disclosure proposes an anomaly detection network: a consistency detection network. The consistency detection network is configured to determine a vector distance between the local feature and the global feature.

In one example, the vector distance between the local feature and the global feature may be calculated using a mean square error (Mean Squared Error, MSE) loss between the two, resulting in a consistency detection result. The MSE loss can be expressed using equation (2):

Wherein Z is _l As a first feature, Z _g For the second feature, n is the dimension of the first feature and the second feature.

In a normal image, the first and second features should be consistent, continuous, and in an image where anomalies are present, the opposite is true. Thus, the consistency detection result, e.g. MSE loss, may be used as a scoring function characterizing the presence of global and local inconsistencies in the first image, denoted S _IAD This score may also be referred to as a consistency score.

Fig. 2A is a schematic diagram of an image anomaly detection method according to at least one embodiment of the present disclosure. As shown in fig. 2A, the image to be processed is subjected to abnormality detection by using an image abnormality detection network 210, and an abnormality detection result can be obtained. Specifically, the first region image 201 is input to a first feature extraction network 211 in the image anomaly detection network 210 to extract first features of the first region image 201; at the same time, the second region image 202 is input to the second feature extraction network 212 to extract the second features of the second region image 202. Then, the first feature and the second feature are input to the consistency detection network 213 at the same time to obtain a consistency detection result. According to the consistency detection result, an anomaly score 220 of the first region may be obtained. As described above, the consistency detection result characterizes the vector distance between the first feature and the second feature, and the greater the distance between the first feature and the second feature, the worse the consistency of the first feature and the second feature is indicated, and thus the higher the abnormality score is; conversely, a smaller distance indicates better consistency of the two, and thus lower anomaly score.

In some implementations, the image to be processed includes a plurality of first regions, as shown at 231, and the anomaly degree thermodynamic diagram 233 of the image to be processed may be obtained according to the anomaly score of each first region. For example, an inverse distance weight (IDW, inverse Distance Weighted) interpolation method may be used to obtain an anomaly score of each pixel in the image to be processed, and according to an anomaly score interval to which each pixel belongs, the pixels are displayed in a corresponding color, so that an anomaly degree thermodynamic diagram of the whole image to be processed may be obtained. According to the abnormality degree thermodynamic diagram, the abnormality position in the image to be processed can be determined, and pixel level positioning of the image abnormality is realized.

In the embodiment of the disclosure, the consistency of the global features and the local features is mined by utilizing the consistency detection network, so that the abnormality detection capability of the image abnormality detection network can be improved.

To further mine the relevance of local and global features of an image, the present disclosure proposes another anomaly detection network: a distortion detection network. The distortion detection network may be a trainable classifier for detecting whether there is distortion (distortion) in the image, for example, whether there is a curved network in the image, and the distortion detection network may detect whether there is a small flaw in the first region image.

Fig. 2B is a schematic diagram of another image anomaly detection method shown in at least one embodiment of the present disclosure. This differs from the method shown in fig. 2A in that the first feature and the second feature are input to the distortion detection network 214 simultaneously to obtain a distortion detection result, and the abnormality score 220 of the first region is obtained from the distortion detection result.

In one example, the anomaly score 220 may be obtained by the following method.

First, an input feature is derived from the second feature and the random feature. Wherein the random feature is obtained with equal probability from the first region or a generated region obtained by randomly adding disturbance pixels in the first region.

Next, by inputting the input feature to the distortion detection network 214, a probability that the input feature contains the first feature of the first region may be obtained.

The distortion detection network 214 may include at least one fully connected layer, wherein a last fully connected layer outputs a probability that the input feature contains a first feature of the first region.

Finally, a confidence score that distortion exists in the first region can be obtained based on the probability. The higher the probability value, the greater the possibility that distortion exists in the first region image, and the higher the abnormality score; conversely, the lower the probability value, the less likely that distortion is present in the first image, and the lower the anomaly score.

Specifically, the second feature Zg and a random feature Z are input to a distortion detection network, and a probability p that the random feature Z is a feature of the first feature Zl or the generation region is output through the distortion detection network, p=c (Z, Z _g )。

Confidence scores of the first region distortion obtained from the probabilities can be noted as S _DAD The confidence score that there is distortion in the first region can be expressed using equation (3):

s _DAD ＝1-C(Z _l ,Z _g ) (3)

wherein C (Z _l ,Z _g ) Representing that the input features comprise first features Z _l Is a probability of (2).

In the embodiment of the disclosure, the abnormality detection capability of the image abnormality detection network can be improved by utilizing the distortion detection network to mine fine information distortion in the global features and the local features.

Fig. 2C is a schematic diagram of yet another image anomaly detection method illustrated by at least one embodiment of the present disclosure. The difference from the method shown in fig. 2A is that the first feature and the second feature are input to a consistency detection network 213 to obtain a consistency detection result, and the first feature and the second feature are input to a distortion detection network 214 to obtain a distortion detection result, and the consistency detection result and the distortion detection result are used to obtain the anomaly score 220 of the first area together.

In one example, a consistency score S may be scored _IAD And confidence score of distortion S _DAD And carrying out weighted summation to obtain an anomaly score s of the first region, wherein the anomaly score s is shown in a formula (4):

s＝λ _s s _IAD +(1-λ _s )s _DAD (4)

where λs is a weight coefficient to balance the consistency score and the confidence score that the first region has distortion, where the value of the weight coefficient may be specifically determined according to actual needs, for example λs=0.8.

In the embodiment of the disclosure, the correlation of the global feature and the local feature is commonly mined by utilizing the consistency detection network and the distortion detection network, so that the abnormal condition existing in the image to be processed can be more comprehensively detected, and the abnormality detection capability and effect are improved.

The embodiment of the disclosure provides a training method of an unsupervised image anomaly detection network. In the training stage, only normal samples are needed, and the problem that in a full-supervision or semi-supervision training method, abnormal image samples are difficult to collect or all possible abnormal conditions are inexhaustible, so that a network obtained through training is not suitable for absolute most abnormal detection scenes is solved.

Fig. 3 is a schematic diagram of a training method of an image anomaly detection network according to at least one embodiment of the present disclosure, where the anomaly detection network 310 includes a first feature extraction network 311, a second feature extraction network 312, a consistency detection network 313, and a distortion detection network 314. In this training method, each training image (normal image) is randomly cut out into an image block of a fixed size, which is used as an image 301 corresponding to a first region, and the remaining image region is used as an image 302 corresponding to a second region. Inputting the image 301 corresponding to the first region into a first feature extraction network 311 to obtain a first feature; and inputting the image 302 corresponding to the second region into the second feature extraction network 312 to obtain a second feature. The first and second features are input to a coincidence detecting network 313, while the first and second features are also input to a distortion detecting network 314. By a first loss lI indicative of a difference between a first feature and said second feature _AD And a second loss l indicating a difference between the classification result and the true value of the distortion detection network output _D And AD, obtaining total training loss so as to perform common training on the second characteristic extraction network and the distortion detection network. The classification result output by the distortion detection network indicates whether distortion exists in the first area image 301, and the true value is that distortion exists or no distortion exists in the first area image marked in advance.

In one example, the second loss may be obtained by:

the generation region is obtained by adding disturbance pixels to the first region. The first feature of the first region and the first feature of the generation region are input to a distortion detection network with equal probability, that is, the first feature of the first region and the second feature of the second region are input to the distortion detection network at the same time, or the first feature of the generation region and the second feature of the second region are input to the distortion detection network at the same time, so that the distortion detection network judges that the input feature is from the first region or from the generation region.

During training, the classifier in the distortion detection network may be supervised, for example, by cross entropy loss as shown in equation (5):

l _DAD ＝-((ylog(p)+(1-y)log(1-p)) (5)

wherein y is the target output of the classifier, i.e. 0 for the feature belonging to the first region, 1 for the generated region, and p represents the probability that the input feature contains the first feature.

The total training loss is shown in equation (6):

l＝l _IAD +λ _t l _DAD (6)

where λt is the loss weight to balance the different losses.

Essentially, the consistency penalty (first penalty) is used to guide the global feature network to imagine a local distribution, while the second penalty (distortion penalty) guides the global feature extraction network to learn more subtle differences, increasing resolution. Meanwhile, the purpose of training the distortion detection network is to find out the nuances between the normal first area image and the abnormal area image.

It is noted that the parameters of the first feature extraction network may be fixed during training of the second feature extraction network and the distortion detection network.

The image anomaly detection method provided by the embodiment of the disclosure can be used for flaw detection, damaged product detection, auxiliary medical image detection and the like on an industrial production line. By taking flaw detection on an industrial production line as an example, the image anomaly detection method is used for detecting the production line image, so that whether the production line image is abnormal or not can be determined, whether the flaw exists on the production line or not can be determined, the position of the flaw can be further determined, and the quality inspection accuracy and efficiency are improved, and meanwhile, the labor is saved.

Fig. 4 is a schematic diagram of detection results of an image anomaly detection method according to at least one embodiment of the present disclosure.

Wherein the first row shows an abnormal image sample, the second row shows the true value of the abnormality, and the third row shows an abnormality degree thermodynamic diagram obtained according to the image abnormality detection method proposed in the embodiment of the present disclosure. As can be seen from fig. 4, according to the embodiment of the present disclosure, an abnormality in an image can be detected well and abnormality location can be performed.

Fig. 5 shows a comparison of detection effects of an image anomaly detection method and a related method according to an embodiment of the present disclosure. The first column is a normal image, the second column is an abnormal image, the third column is a true value of an abnormality, the fourth behavior is related to an abnormality detection result of the method, and the fifth column is an abnormality degree thermodynamic diagram obtained by the image abnormality detection method according to the embodiment of the disclosure. As can be seen from fig. 5, the present disclosure can more accurately detect and locate abnormalities in an image than the related art.

Fig. 6 shows a schematic diagram of an image abnormality detection apparatus shown in an embodiment of the present disclosure. As shown in fig. 6, the apparatus includes: a first obtaining unit 601, configured to perform feature extraction on a first region in an image to be processed, so as to obtain a first feature; a second obtaining unit 602, configured to perform feature extraction on a second region in the image to be processed to obtain a second feature, where the second region is disposed around the first region in the image to be processed; a third obtaining unit 603, configured to perform anomaly detection on the first feature and the second feature, to obtain an anomaly score of the first area; and an anomaly detection unit 604, configured to obtain an anomaly detection result of the image to be processed according to the anomaly score of the first area.

In some implementations, the third obtaining unit is specifically configured to: performing consistency detection and/or distortion detection on the first feature and the second feature; obtaining an abnormality score of the first region according to a consistency detection result and/or a distortion detection result, wherein the consistency detection result is obtained by carrying out consistency detection on the first feature and the second feature; the distortion detection result is obtained by performing distortion detection on the first feature and the second feature.

In some implementations, the third obtaining unit is specifically configured to, when configured to perform consistency detection on the first feature and the second feature: and determining a consistency score of the first feature and the second feature according to the vector distance between the first feature and the second feature.

In some implementations, the third obtaining unit is specifically configured to, when configured to perform distortion detection on the first feature and the second feature: obtaining an input feature according to a second feature and a random feature, wherein the random feature is obtained with equal probability according to the first area or a generation area, and the generation area is obtained by randomly adding disturbance pixels in the first area; determining a probability that the input feature includes the first feature; and obtaining a confidence score of distortion in the first region according to the probability.

In some implementations, the third obtaining unit is specifically configured to, when configured to obtain the anomaly score according to the consistency detection result and/or the distortion detection result: and carrying out weighted summation on the consistency scores of the first feature and the second feature and the confidence score with distortion in the first region to obtain an anomaly score of the first region.

In some implementations, the image to be processed includes at least two first regions obtained by block sampling; the abnormality detection unit is specifically configured to: obtaining an abnormality score for each of the at least two first regions; and obtaining an abnormality degree thermodynamic diagram of each pixel in the image to be processed according to the abnormality scores of the at least two first areas.

In some implementations, the first obtaining unit is specifically configured to: extracting features of the image corresponding to the first region through a first feature extraction network to obtain the first features; the apparatus further comprises a first training unit for: performing knowledge distillation on the trained deep neural network on the universal image set to obtain an intermediate feature extraction network; and carrying out knowledge distillation on the intermediate feature extraction network on the target category of the target data set corresponding to the target scene to obtain the first feature extraction network.

In some implementations, the second obtaining unit is specifically configured to: performing feature extraction on the image corresponding to the second region through a second feature extraction network to obtain the second feature; the method and apparatus further comprise a second training unit for: co-training the second feature extraction network and the distortion detection network, the trained network loss comprising: a first penalty for indicating a difference between the first feature and the second feature; and a second loss for indicating a difference between a classification result output by the distortion detection network indicating a probability that an input feature belongs to the first region sample or a generated region sample, wherein the generated region sample is obtained by randomly adding disturbance pixels in the first region sample, the first feature of the first region sample and the first feature of the generated region sample are equally probability input to the distortion detection network, and a true value indicating that the input feature belongs to the first region sample or the generated region sample.

In some implementations, the second obtaining unit is specifically configured to: performing convolution operation on the second region by using a plurality of convolution layers; wherein each of the plurality of convolutional layers corresponds to a mask; each convolution layer carries out convolution operation on an input mask operation result, the mask operation result is obtained by carrying out mask operation on the input features of the convolution layers by using masks corresponding to the convolution layers, and the masks corresponding to the convolution layers update the convolution results of the masks according to the convolution layers; the input feature of the first convolution in the plurality of convolution layers is an image corresponding to the second area, the image corresponding to the second area is obtained by performing mask operation on the image to be processed through an initial mask, a pixel value corresponding to the first area in the initial mask is a first pixel value, and a pixel value corresponding to the second area is a second pixel value.

Fig. 7 is an electronic device provided in at least one embodiment of the present disclosure, the device including a memory for storing computer instructions executable on the processor for implementing the image anomaly detection method according to any implementation of the present disclosure when the computer instructions are executed.

At least one embodiment of the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image anomaly detection method according to any one of the implementations of the present disclosure.

At least one embodiment of the present disclosure also provides a computer program product, including a computer program, which when executed by a processor implements the image anomaly detection method according to any one of the implementations of the present disclosure.

One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Moreover, one or more embodiments of the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for data processing apparatus embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features of specific embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The foregoing description of the preferred embodiment(s) is (are) merely intended to illustrate the embodiment(s) of the present invention, and it is not intended to limit the embodiment(s) of the present invention to the particular embodiment(s) described.

Claims

1. An image anomaly detection method, the method comprising:

extracting features of a first region in the image to be processed to obtain first features;

extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed;

performing consistency detection and/or distortion detection on the first feature and the second feature; obtaining an abnormality score of the first region according to a consistency detection result and/or a distortion detection result, wherein the consistency detection result is obtained by carrying out consistency detection on the first feature and the second feature; the distortion detection result is obtained by performing distortion detection on the first characteristic and the second characteristic;

and obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area.

2. The method of claim 1, wherein the detecting of the correspondence between the first feature and the second feature comprises:

and determining a consistency score of the first feature and the second feature according to the vector distance between the first feature and the second feature.

3. The method of claim 1, wherein said distortion detection of said first and second features comprises:

obtaining an input feature according to a second feature and a random feature, wherein the random feature is obtained with equal probability according to the first area or a generation area, and the generation area is obtained by randomly adding disturbance pixels in the first area;

determining a probability that the input feature includes the first feature;

and obtaining a confidence score of distortion in the first region according to the probability.

4. The method according to claim 1, wherein the obtaining the anomaly score according to the consistency detection result and/or the distortion detection result comprises:

and carrying out weighted summation on the consistency scores of the first feature and the second feature and the confidence score with distortion in the first region to obtain an anomaly score of the first region.

5. The method according to claim 1, wherein the image to be processed comprises at least two first regions obtained by block sampling;

obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first region, wherein the abnormality detection result comprises the following steps:

Obtaining an abnormality score for each of the at least two first regions;

and obtaining an abnormality degree thermodynamic diagram of each pixel in the image to be processed according to the abnormality scores of the at least two first areas.

6. The method according to claim 1, wherein the feature extraction of the first region in the image to be processed to obtain the first feature includes:

extracting features of the image corresponding to the first region through a first feature extraction network to obtain the first features;

the method further comprises the steps of:

distilling the trained deep neural network on the universal image set to obtain an intermediate feature extraction network;

and distilling the intermediate feature extraction network on the target category of the target data set corresponding to the target scene to obtain the first feature extraction network.

7. The method according to any one of claims 1 to 6, wherein the feature extraction of the second region in the image to be processed to obtain the second feature includes:

performing feature extraction on the image corresponding to the second region through a second feature extraction network to obtain the second feature;

The method further comprises the steps of: co-training the second feature extraction network and the distortion detection network, the trained network loss comprising:

a first penalty for indicating a difference between the first feature and the second feature;

and a second loss for indicating a difference between a classification result output by the distortion detection network indicating a probability that an input feature belongs to the first region sample or a generated region sample, wherein the generated region sample is obtained by randomly adding disturbance pixels in the first region sample, the first feature of the first region sample or the first feature of the generated region sample is input to the distortion detection network with equal probability, and a true value indicating that the input feature belongs to the first region sample or the generated region sample.

8. The method according to any one of claims 1 to 6, wherein the feature extraction of the second region in the image to be processed to obtain the second feature includes:

performing convolution operation on the second region by using a plurality of convolution layers;

9. An image abnormality detection apparatus, characterized by comprising:

the first acquisition unit is used for extracting features of a first region in the image to be processed to obtain first features;

the second acquisition unit is used for extracting features of a second region in the image to be processed to obtain second features, wherein the second region is arranged around the first region in the image to be processed;

a third obtaining unit, configured to perform consistency detection and/or distortion detection on the first feature and the second feature; obtaining an abnormality score of the first region according to a consistency detection result and/or a distortion detection result, wherein the consistency detection result is obtained by carrying out consistency detection on the first feature and the second feature; the distortion detection result is obtained by performing distortion detection on the first characteristic and the second characteristic;

And the abnormality detection unit is used for obtaining an abnormality detection result of the image to be processed according to the abnormality score of the first area.

10. An electronic device comprising a memory, a processor for storing computer instructions executable on the processor for implementing the method of any one of claims 1 to 8 when the computer instructions are executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of any one of claims 1 to 8.