CN116645567A

CN116645567A - Unsupervised anomaly detection method based on pixel single-point structure and multi-element pairing logic

Info

Publication number: CN116645567A
Application number: CN202310570510.1A
Authority: CN
Inventors: 沈卫明; 刘照阁; 徐晓豪; 曹云康
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2023-05-19
Filing date: 2023-05-19
Publication date: 2023-08-25

Abstract

The invention belongs to the technical field of industrial image defect detection, and discloses an unsupervised anomaly detection method based on pixel single-point and multi-element pairing. The method comprises the following steps: s1, constructing a structural feature extraction branch network and a logic feature extraction branch network; s2, respectively inputting the images for training into the structural feature extraction branch network and the logic feature extraction branch network to form a structural feature memory bank and a logic feature memory bank; s3, obtaining test pixel characteristics and test logic characteristics corresponding to the image to be tested, and obtaining a structural abnormality score map and a logic abnormality score map of the image to be tested; and S4, fusing the structural anomaly score map and the logic anomaly score map to obtain a total anomaly score map of the image to be tested, and determining the position of the defect according to the anomaly score map. The invention solves the problem of abnormal surface structure and position logic in the product image.

Description

Unsupervised anomaly detection method based on pixel single-point structure and multi-element pairing logic

Technical Field

The invention belongs to the technical field related to industrial image defect detection, and particularly relates to an unsupervised anomaly detection method based on pixel single-point and multi-element pairing.

Background

In the actual industrial production and manufacturing process, various unknown conditions such as machine faults, transportation breakage, misoperation of workers and the like often exist, so that the industrial products are unqualified. Besides the quality defect of the surface of the industrial product, the quality defect of the industrial product can also cause the defects of incorrect placement of the product, missing of product packaging, abnormal plate production and the like. To increase the efficiency of production delivery, quality inspection methods have also been shifted from traditional manual inspection to vision-based automated inspection, with anomaly detection being a representative task. In the production process from production and packaging to delivery of products, not only the problem of abnormal structure of the surface quality of the products per se can occur, but also the problem of abnormal logic such as incorrect placement of the products, non-correspondence between the packaging and the products and the like can occur, such as missing quantity or incorrect assortment of the packaged screws, inconsistent filling of juice in glass bottles or incorrect position of product labels and the like. Because the abnormal situation is complex and difficult to predict, the method for detecting the structural abnormality by using the surface quality of the product is insufficient, and the existing method for detecting the structural abnormality is researched relatively mature, and is difficult to cope with the scene of detecting the logical abnormality of the product, the detection problem of the abnormal sample with the logical relation is the detection problem of the combined abnormality of the more complex structure and the logic, namely: not only surface structural anomalies of the product image in the sample, but also potential positional logic anomalies in the sample image are detected.

Therefore, the combined detection model of the structure and the logic abnormality can effectively utilize the image structure and the image pixel pair logic relation information of the abnormality sample, further improve the performance of the abnormality detection model and is important for the quality detection of industrial production.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the invention provides an unsupervised anomaly detection method based on pixel single-point and multi-element pairing, which solves the problem of anomaly in surface structure and position logic in a product image.

In order to achieve the above object, according to the present invention, there is provided an unsupervised anomaly detection method based on pixel single-point and multi-element pairing, the method comprising the steps of:

s1, constructing a structural feature extraction branch network for extracting pixel features of an image and a logic feature extraction branch network for extracting logic features;

s2, respectively inputting the images for training into the structural feature extraction branch network and the logic feature extraction branch network, so as to extract and obtain pixel features and logic features of each image, wherein the pixel features of all the images form a structural feature memory bank, and the logic features of all the images form a logic feature memory bank;

s3, inputting the to-be-tested image to the structural feature extraction branch network and the editing feature extraction branch network respectively to obtain a test pixel feature and a test logic feature corresponding to the to-be-tested image, and calculating a maximum distance score of a distance between the test pixel feature and the pixel feature in the structural feature memory bank to obtain an abnormal score of the to-be-tested image so as to obtain a structural abnormal score map of the to-be-tested image; calculating the global consistency between the test logic features and the logic features in the logic feature memory library to obtain the abnormal score of the image to be tested, so as to obtain a logic abnormal score graph of the image to be tested;

s4, fusing the structural abnormal score map and the logic abnormal score map to obtain a total abnormal score map of the image to be tested, comparing the abnormal scores of all positions in the abnormal score map with a preset threshold value, wherein the positions larger than the preset threshold value are positions where defects are located, otherwise, determining the positions where the defects of the image to be tested are located.

Further preferably, in step S1, the structural feature extraction branch network and the logical feature extraction branch network each employ a Wide res net50 network pre-trained on an ImageNet dataset.

Further preferably, in step S2, the logic feature extraction branch network forms a logic feature memory bank according to the following steps:

s21, extracting one or more groups of pixel feature pairs in each feature layer from the logic feature extraction branch network;

s22, connecting the pixel characteristic pairs in each characteristic layer according to the number of preset pixel pairs to form multiple pixel pairs, so as to obtain multiple pixel pairs in all the characteristic layers;

s23, aligning the multiple pixels of different feature layers, and forming a logic feature memory bank by all the aligned multiple pixels.

Further preferably, in step S3, the alignment employs a method of multi-scale feature fusion.

Further preferably, in step S3, the anomaly score in the structural anomaly score map is calculated according to the following relation:

wherein ,is the structural abnormality score, m ^test，* Is a test pixel feature, m ^* Is equal to m ^test，* Training pixel structural features with maximum similarity.

Further preferably, in step S3, the anomaly score in the logical anomaly score map is calculated according to the following relation:

wherein ,is a logical anomaly score, ++>Is a test logic feature, +.>Is in combination with->Training pixel logic features with maximum similarity. Further preferably, in step S4, the total anomaly score map of the image to be tested is calculated according to the following relation:

wherein ,is a structural abnormality score, & lt + & gt>Is a logical anomaly score.

Further preferably, in step S2, after the structural feature memory bank and the logic feature memory bank are formed, data processing is further required to be performed on the elements in each memory bank, so as to reject unqualified data.

Further preferred, the data processing employs a subsampling method of a greedy strategy.

In general, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:

1. the method simultaneously considers the implicit logic information between the image pixel single points and the image pixel pairs so as to solve the practical application problems that whether the combination and collocation of the number of workpieces in the automatic assembly line detection of industrial products meet the requirements, whether the type or the position of the product labels meet the specifications and the like need to be considered simultaneously, namely, the structure and the logic defects are considered simultaneously;

2. the method provided by the invention realizes the joint detection of the unsupervised structure and logic abnormality/defect, namely the abnormal joint detection problem of the local structure and the global logic in the sample to be tested in the test stage can be realized by only utilizing the normal sample information in the training stage.

Drawings

FIG. 1 is a schematic diagram of a training phase constructed in accordance with a preferred embodiment of the present invention;

FIG. 2 is a schematic diagram of a multi-pixel pair construction constructed in accordance with a preferred embodiment of the invention;

fig. 3 is a schematic diagram of a test phase constructed in accordance with a preferred embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Aiming at the defect that the existing anomaly detection method can only detect structural anomalies of a workpiece to be detected but cannot detect higher-level logic anomalies, an unsupervised anomaly detection method based on pixel-to-pixel logic relationship information is provided, and aims to model potential logic or geometric relationships in an anomaly-free sample by utilizing single pixel information and paired pixel relationship information at the same time, so that comparison and distance calculation are carried out between the potential logic or geometric relationships and the sample to be detected, and the structure and logic anomalies in the sample to be detected are identified.

Furthermore, the model network structure of the unsupervised anomaly detection method based on the pixel-to-logic relationship information mainly comprises a structural feature extraction branch network and a logic feature extraction branch network. The structural feature extraction branch network is a network model pre-trained on a large natural image data set, and the extracted features have high discriminant. The invention further carries out kernel set dimension reduction sampling on the characteristics, and can remove redundant information and reduce reasoning time. There is a certain logic relationship between pixels at different positions in the image, for example, there is a corresponding matching relationship between the packing number of the product and the grids (for example, whether there is a pin in each grid of the packing box), the product label and the product itself (for example, the juice label and the juice color). The logic feature extraction branch network is used for extracting logic features from logic relation information among a plurality of pixel features in an image, the extracted features can consider long-distance binary and multi-element logic relations among the pixels of the image, and logic anomalies (such as cable terminal position errors and mismatching of icons and fruit juice types) in the image can be distinguished through modeling the logic relation features. Therefore, the structural feature extraction branch network and the logic feature extraction branch network in the model network structure provided by the invention can enable the model to carry out joint detection on the structure and the logic abnormal condition of the sample to be detected. In summary, the present invention considers both structural anomalies and logical anomalies. Fitting and learning normal data feature distribution through two branches, wherein: the structural feature extraction network has better judging capability for structural anomalies, and the logic feature extraction network has better judging capability for logic anomalies.

When the structural feature extraction branch network is used for extracting image features, the pre-training model is directly used for extraction. Before the feature is extracted by the logic feature extraction branch network, the pixels are distributed independently, i.e. different pixels have no fixed relation, and therefore, the connection between the pixel features needs to be established manually. Therefore, the method adopts a mode of combining the long-short distance multi-scale pixel characteristic blocks, the spliced pixel pairs are sent to a network model to perform relation characteristic extraction so as to model the position logic relation in the image, and the extracted logic characteristics are sent to a characteristic memory bank.

After the structural feature extraction branch and the logic feature extraction branch are used for extracting features respectively, the structural feature extraction branch and the logic feature extraction branch can be used for detecting structural abnormality and logic abnormality in a combined mode respectively; and in order to obtain the final result of the joint detection, the abnormal score maps output by the two branch networks are added and fused to obtain the final abnormal score map.

Specifically:

as shown in fig. 1, the above method includes two feature extraction branch networks: structural feature extraction branch network N _con Logical feature extraction branch network N _log . Wherein, the structural feature extracts the branch network N _con For detecting structural anomalies, a logical feature extraction branch network N _log For detecting logical anomalies.

The invention includes a training phase and a testing phase, wherein the training phase is as shown in fig. 1, and comprises:

(1) Training stage of submodule aiming at abnormal structure:

in normal industrial product image I _n Extraction of a branched network N for input by means of structural features _con Extracting pixel block structure characteristic information of normal data, and obtaining normal pixel characteristic F _con Aggregate to a memory bank M _con Is a kind of medium. Then, setting l feature subsets by using a subsampling method based on greedy strategy, using the feature subsets as much as possible to represent most data features, selecting a point in one feature subset at a time, finding a farthest point in the subset to sample and store, namely, finding a farthest point in part, namely, an optimal solution, so as to realize the purpose of eliminating redundant features, and finally obtaining the structural feature memory M for detecting structural anomalies _con 。

(2) Sub-module training phase for logic anomalies:

in normal industrial product image I _n Extraction of a branched network N for input by means of structural features _con Extracting pixel characteristics F of normal data in different characteristic layers _con The following are provided:

F _con ＝N _con (I _n ) (I)

further, 1) we use logical bitsSign extraction branch network N _log Extracting one or more groups of pixel feature pairs at different positions in each feature layer, 2) forming a multi-element pixel pair F by connection (Cat) _Cat-K As shown in formula (2)

F _Cat·K ＝(F _log·1 ，F _log·2 ，……，F _log·k ) (2)

Multiple pixel pairs F _Cat·K Consists of a plurality of groups of pixel pairs, and K represents the number of the pixel pairs. As shown in fig. 2, here, 4 sets of pixel pairs are illustrated: logical feature pixel pairs F at positions near the top (top) and bottom (bottom) _log·1 ，F _log·2 The specific definition is shown in formula (3)

There are also pairs of logical feature pixels F in the middle (middle) and near the ends (top, bottom) _log·3 ，F _log·4 The specific definition is shown in formula (4)

Nearby local structure pixel features that may be left, right, or up, down (up, down) adjacent, e.g.Pixel characteristic representing top center position, +.>Representing the bottom right-hand pixel feature.

The minimum included angle formed by the two pixel characteristic blocks of the pixel pair and the horizontal direction or the vertical direction is theta, the included angle theta is less than or equal to 45 degrees, and the pixel characteristic blocks can be randomly selected and paired in the included angle range. Combining and connecting (Cat) a plurality of normal pixel blocks with different long and short distances to form a multi-pixel pair (pixel pair)The number of the contained pixel blocks can be more than or equal to three, so that the method provided by the invention is conveniently expressed, and the reference sign is simplified, and K is used for representing the total pairing number of the multi-element pixel pairs and is used for modeling the pixel relationship in the image. Specifically, the distance is longer than half of the distance of the furthest pixel pair, and the distance is shorter than the distance, so that the logic characteristic multi-scale logic characteristic F under different scales is formed _log ，

3) Through a Multi-scale feature fusion MFF (Multi-scale feature fusion) module, logic features formed by different feature layers are fused after being aligned by upper and lower layer sampling features, as shown in a formula (5) F _Cat·K ＝MFF(F _log·1 ，F _log·2 ，……，F _log·k ) (5)

Forming a multi-element feature pixel pair F _Cat·K Then, it is sent to a memory bank M for normal image logic feature aggregation _log Then, the sub-sampling method based on greedy strategy mentioned in the structural abnormality detection is adopted to remove redundant logic features, and finally, a logic feature memory library M for detecting logic abnormality is obtained _log

Memory M of structural characteristics _con And logic feature memory M _log Together form a feature memory M _bank The definition is as follows:

M _bank ＝M _con ∪M _log (6)

the test stage is shown in fig. 3, and is specifically described as follows:

test image I _test Input structural feature extraction branch network N _con And logic feature extraction branch network N _log The two networks respectively obtain the corresponding structural characteristics F _con And logic feature F _log . The structural feature extraction branch network stores the structural features of the pixel level of the normal image into the structural feature memory M in the training stage _con By calculating the pixel characteristics of the test image during the testAnd M is as follows _con Normal pixel feature F _con The maximum distance score between the two is used for estimating the abnormality score of the test image, and the structural abnormality score graph S of the test image is obtained by calculating the abnormality score of each pixel _loc . The logic feature extraction branch network links the normal image pixel pairs in the training stage and stores the logic features into the logic feature memory M _log By calculating image pixel pairs characteristic F at test time _log And M is as follows _log Estimating the abnormal score of the test image by the global consistency among the normal pixel pairs, and obtaining a logic abnormal score graph S of the test image by calculating the consistency of each pixel pair _log . Finally, the structure anomaly score map S _con And logical anomaly score graph S _log Fusing, and finally obtaining abnormal score S _map As an abnormal score graph of the test image.

Further, in the training stage of the structural feature extraction branch network, the process of acquiring the structural feature is shown in the upper partial branch network in fig. 3. First the normal sample is decomposed into a set of pixel level featuresFor use herein

To represent normal training samples x _i ∈X _N C in the j-th layer of the pre-trained network phi ^* The dimension position h e 1, h ^* The sum w e 1, once again, w ^* The feature of the location is called a feature block m, i.e. a pixel feature block.

Training samples x on all normal samples _i ∈X _N Structural feature memory M _con Simply defined as

To ensure the reasoning speed, the detection efficiency is improved, in thisWe use a core set sub-sampling mechanism to reduce the feature memory pool M, conceptually, the purpose of core set selection is to find a subsetSo that the problem solution on M is equal to M _c The above solution is close.

Namely, a feature library obtained by sub-sampling a core set is used for the convenience of illustration, and M is used for the following feature library _con and M_log The representation is actually a feature library after sub-sampling by the core set.

By calculating test feature blocks m ^test，* And M is as follows _con M of each nearest neighbor ^* Maximum distance score between to estimate test image x ^test Is a structure anomaly score graph of (1)

m ^test For each test feature block, m is the feature block found in the structural feature library.

In the logic anomaly training stage, the logic feature acquisition process is shown in the lower logic feature extraction branch network of fig. 3.

Normal sample decomposition into pixel-level feature setsPre-training network extracts normal pixel characteristics of product image at different positions, such as F _top-left 、F _middle-down Knot pair, F _top-center 、F _bottom-left Knot pair, F _top-right 、F _{bottom-center} Knot pair, F _middle-up 、F _bottom-right Junction pairs, wherein the included angle theta between the pixel characteristic and the horizontal or vertical angle theta is smaller than theta _min On the premise of randomly forming k groups of pixel pairs, and using m for logical characteristics of the pixel pairs through a merging connection (Cat) _Cat-k Representing to obtain the logic characteristic m of the dependency relationship of the long and short distance pixels with different scales _Cat-1 ……m _Cat-k Logical features after multi-scale feature fusion (MFF) are denoted as m _Cat-K The definition is as follows

m _Cat·K ＝MFF(m _Cat·1 ，m _Cat·2 ，……，m _Cat·k ) (12)

Then store it in the logic feature memory M _log . Where x is used _i-pair Representing pairs of feature pixels in a same layer network, phi _j-multi Representing a number of different j-th layers of the pre-trained network, MFF (phi) _j·multi (x _i·pair ) A) represents logical feature pixel pairs after multi-scale feature fusion.

Training samples x on all normal samples _i ∈X _N Logic characteristic memory M _log Simply defined as

Feature memory M with logic description _log We test feature m in the feature block set by computing it ^test Junction pair formed pixel pair logic featureAnd M is as follows _log Each nearest neighbor pixel logic feature pixel m _Cat Maximum distance score between to calculate a logic of the test imageEditing abnormal score map->

Finally obtained abnormal score map

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. An unsupervised anomaly detection method based on pixel single-point and multi-element pairing is characterized by comprising the following steps:

2. The method for unsupervised anomaly detection based on pixel single-point and multi-component pairing of claim 1, wherein in step S1, the structural feature extraction branch network and the logical feature extraction branch network are both Wide res net50 networks pre-trained on ImageNet data sets.

3. The method for unsupervised anomaly detection based on pixel single-point and multi-component pairing according to claim 1 or 2, wherein in step S2, the logic feature extraction branch network forms a logic feature memory bank according to the following steps:

4. A method of unsupervised anomaly detection based on pixel single-point and multi-component pairing as claimed in claim 3, wherein in step S3, the alignment uses a multi-scale feature fusion method.

5. The method for unsupervised anomaly detection based on pixel single-point and multi-component pairing according to claim 1 or 2, wherein in step S3, the anomaly score in the structural anomaly score map is calculated according to the following relation:

wherein ,is the structural abnormality score, m ^test，* Is the structural feature of the test pixel, m ^* Is equal to m ^test，* Training pixel structural features with maximum similarity.

6. The method for unsupervised anomaly detection based on pixel single-point and multi-component pairing according to claim 1 or 2, wherein in step S3, the anomaly score in the logical anomaly score map is calculated according to the following relation:

wherein ,is a patrolEditing abnormal score, & lt>Is a test pixel logic feature,/->Is in combination with->Training pixel logic features with maximum similarity.

7. The method for unsupervised anomaly detection based on pixel single-point and multi-component pairing according to claim 1 or 2, wherein in step S4, the total anomaly score map of the image to be tested is calculated according to the following relation:

8. The method for unsupervised anomaly detection based on pixel single-point and multi-component pairing according to claim 1 or 2, wherein in step S2, after the structural feature memory bank and the logic feature memory bank are formed, data processing is further required to be performed on the elements in each memory bank, so as to reject unqualified data.

9. An unsupervised anomaly detection method based on pixel single-point and multi-component pairing as in claim 7, wherein the data processing employs a subsampling method of greedy strategy.